Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

embedded subtitles and how to remove them

Thu Aug 19, 2021 12:30 am

Hi.

This is a subject that makes me mad. I mean, why the heck keep people embedding subtitles in video files?

The only reason I can find to explain it is that they are stupid. Well, things are as they are.

But I started to figure out how I could get rid of these.Because most of the time they are awful. Awful translation, awful timings, awful everything. And yet you must watch them there, ruining a good movie and a good experience.

This is how I finally got rid of them. I just wanted to share it with anyone interested. I know that opensubtitles.org is about the opposite thing, i.e. how to ADD. But you can't add before you remove. Well, in fact you could just put a new subtitle over the old one, but then they are often too hard to read. So, how can you remove the embedded subtitles? By the way, "embedded subtitles" are those that are hardcoded in the viideo. You can't hide them or edit them in any way since they are not text.

I don't know, maybe there is out there some tool to do this but I just don't know and since I am a DIY-person I just wrote a program to do it. But I didn't start from scratch. I based this on MPlayer, the player I use. It-s free software and has a lot of filters. One of these filters is called "delogo". as the name suggests it can be used to remove logos from the image that indicate the TV channel and the like. Well, they are not really removed since you can't recover the part of the image that's hidden by them. This filter just blurs the image so they are not visible.

So this is what I use. Why not use it to hide unwanted subtitles? This is what I did for some time. You have a rectangle blurring the image. But then, one day, I discovered, and this is the key to the whole thing, that you can direct the filter to change this shape and you can tell it when to do it. You just need to know the timings and the text. The right way to do this is OCR but I figured out some simpler method. Where can you find the text and timings? In a text file called SRT. So this is what you need to remove some embedded subtitles, just the same you need to add them. What my program does is simply convert the SRT in some simple format which is the input to the delogo filter.

I wrote it in ELISP. It should not be difficult to translate to another programming language. But for now this is what I have and what works for me. To run it you'll need a working copy of EMACS and the code I wrote.

The only drawback of this method is that you need to have the SRT that matches the embedded subtitles. My experience is that this is usually very easy to find but sometimes it is not available and then you have to tweak some SRT by hand.

If there is someone interested just let me know and I'll post the code and try to help.

suadnovic
Posts: 281
Joined: Tue Aug 19, 2014 7:41 pm

Re: embedded subtitles and how to remove them

Thu Aug 19, 2021 11:06 am

Congratulations, you certainly made a break around hardcoded subtitles. Can you, please put some video to illustrate that? Thank you.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: embedded subtitles and how to remove them

Thu Aug 19, 2021 10:07 pm

Certainly. I recorded some lightweight videos (5-9 MBytes each). I speak Spanish so they are in Spanish. Sorry. Anyway they are intended to show the final result without diving in the complexity. Something in the mood of "then and now", "before and after".

Could I just attach them here? I suppose there will be some limit to the size of attachements.

Besides, my internet connection currently is not very good and I'll have to find the right time to do it.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: embedded subtitles and how to remove them

Fri Aug 20, 2021 9:39 am

Trying to attach 12 MBytes... Nope. I think this is mainly to attach still images. It says "unrecognized extension".

Then I don't know. Sometimes I used one of those free services to upload files. But now I can't remember.

User avatar
SmallBrother
Site Admin
Posts: 3724
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: embedded subtitles and how to remove them

Fri Aug 20, 2021 10:17 am

No, it's not possible to attach video files to forum posts. But you can use any file upload/sharing site and post the link here. Obviously, only for the purpose of this topic, do not upload full movies :)
I would recommend https://www.filedropper.com/

As for your program, I think it's a great idea. I have been subtitling 'over' hardcoded subs and indeed, always some problem and never a good solution. Although it would involve making videos just to make subs, your method seems like the best and a very good solution. I am curious to see a demo.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: embedded subtitles and how to remove them

Fri Aug 20, 2021 12:47 pm

Thanks.

Here you are: http://www.filedropper.com/removingembeddedsubtitles

Sorry for my bad English and pronuntiation :) But I think it serves the purpose that is to show how it works.

suadnovic
Posts: 281
Joined: Tue Aug 19, 2014 7:41 pm

Re: embedded subtitles and how to remove them

Sun Aug 22, 2021 6:37 am

I remember that there is an image program that patches damaged parts of an image using undamaged parts of its environment. Just as an idea, maybe you need something similar but for a video to patch blured area.
And question: Is it possible to add some odher subtitle in your blured area?
P.S. BTW, which Glenn Ford's movie is in example?

User avatar
SmallBrother
Site Admin
Posts: 3724
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: embedded subtitles and how to remove them

Sun Aug 22, 2021 12:30 pm

Wow. I think the result looks pretty decent and I think the whole idea is brilliant. So much better than fighting with wrong timings, fiddling with color codes and/or text positionings and similar dodgy 'solutions'. Like this, the subtitler can just make the subtitles as they should be.

Subsequently, the video needs to be 'de-subbed' (blurred), using an SRT-formatted file with the text and timings of the hard coded subs to be blurred. Coincidentally, in the Dutch section of the forum there is a discussion going about how to rip hard coded subs as SRT file. To cut it short, it was said a combination of VideoSubFinder and ABBYY FineReader would do that job pretty well. See https://www.videoconverterfactory.com/t ... itles.html

So ideally, a subtitle for a video with hardcoded subs should go together with an SRT file to de-sub the hard coded subs.
Sorry for my bad English and pronuntiation :) But I think it serves the purpose that is to show how it works.
It does. And don't worry about your English. It's 100x better than my Spanish :)
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: embedded subtitles and how to remove them

Sun Aug 22, 2021 12:53 pm

As i said, the matching SRT available... let's say every 9 of 10. But yes, sometimes there's nowhere to find it. And yes, I was wondering how to get the timings and text of hardcoded subs. Now I'm doing it by hand. Too much time. It would be great to have those tools.

Ideally you should avoid the SRT step. I mean, you really don't need the text, just the extent of it, so you could extract the coordinates directly from the image, and just ignore the text.

But I followed a simpler path, without dealing with images or OCR at all. I just get the timing from the SRT and the coordinates are calculated from text, which is not exact but works well most of the time.

@suadnovic
That's a great idea too and I've thought about it. It would be great to apply the blurring just to each letter. But this delogo filter deals only with rectangles, not complicated shapes.

Of course you can. If you find a way to encode this, you can treat the video as any other video. I say "if you find a way" because I don-t know how. I just use it in real time while watching the film, not to do reencoding. But of course, with MPlayer you can then apply your subs to this image.

And the film is The Gazebo, 1959.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: embedded subtitles and how to remove them

Wed Sep 01, 2021 8:51 am

But words and thoughts are not very useful if they are not brought to life. So here is the piece of code that does the trick. It seems you can't attach text files here so I simply paste it here. Feel free to ask for help to make it work but if you use MS Windows just don't bother becasue I probably won't know the answer. Perhaps it's not very good style but it works :)

As I said you need EMACS and some expertise to make it work.

Basically what it does is converting an SRT file into the data that the delogo filter needs which is like a SUB, but instead of text you have the coordinates of the rectangle that delogo blurs. Like this:

60 40:380:560:38

This means at time 60 (seconds) put a blurred rectangle of width 560 and height 38, and top-left corner at 40,380

Using "code" just throws away all linefeeds so I'll use "quote" instead. It seems there is no way to keep text formatting here.
;; we need this buffer to get the output of calc_string_width
(defvar string-width-output (get-buffer-create "string-width-output"))
;; Params 0: height, 1:depth, 2:interline,
;; 3:bottom line, 4: font size
(defvar sub-textline-params '(0 0 0 0 0))
(defvar image-width 0)
(defvar default-rectangles '(0 0 0 0)) ; precalculated rectangles for subs with 1 to 4 lines
(defconst prepare-border 8)
(defvar timeline (cdr (assoc 'srt srt--timeline-regex)))

;; get params from file if it exists or else read them from minibuffer
;; the file must contain a LISP list in the form
;; (ascent descent interline bottom "size-in-points")
;; The double quote is necessary becaouse we pass this argument
;; as a string to calc_string_width. So we don't need float-string
;; conversion
(if (file-readable-p "prepare_delogo_params")
(progn
(setq sub-textline-params
(read (find-file-noselect "prepare_delogo_params")))
(kill-buffer "prepare_delogo_params"))
(setf (car sub-textline-params)
(read-from-minibuffer "Text line height in pixels: " nil nil 'read))
(setf (nth 1 sub-textline-params)
(read-from-minibuffer "Text line depth in pixels: " nil nil 'read))
(setf (nth 2 sub-textline-params)
(read-from-minibuffer "Distance from baseline to baseline in pixels: " nil nil 'read))
(setf (nth 3 sub-textline-params)
(read-from-minibuffer "Bottom line of subs in pixels: " nil nil 'read))
(setf (nth 4 sub-textline-params)
(read-from-minibuffer "Font size: ")))

(setq image-width
(read-from-minibuffer "Width of the image in pixels: " nil nil 'read))

(dotimes (i 4)
(setf (nth i default-rectangles)
(list 32
(- (nth 3 sub-textline-params)
(+ (* i (nth 2 sub-textline-params))
(car sub-textline-params)))
(- image-width 64)
(+ (* i (nth 2 sub-textline-params))
(car sub-textline-params)
(cadr sub-textline-params))
prepare-border)))

(defun prepare-set-rectangle (start lines width)
"Print a rectangle with the given lines and width"
(let ((r (nth lines default-rectangles)))
(setf (car r) (/ (- image-width width) 2))
(setf (nth 2 r) width)
(insert (number-to-string start) " ")
(dolist (c r)
(insert (number-to-string c)
":"))
(delete-char -1)
(insert "\n")))

(defun prepare-sub-to-delogo ()
"Convert the sub timings to seconds for delogo input and calculate
the rectangle coordinates for MPlayer delogo input"
(while (re-search-forward timeline nil t)
(let* ((start (/ (srt-string-to-ms (match-string 1)) 1000))
(end (/ (+ (srt-string-to-ms (match-string 2)) 990) 1000))
(pos (match-end 0))
(content (split-string (buffer-substring-no-properties
(1+ (point))
(progn (search-forward "\n\n")
(- (point) 2)))
"\n"))
;; we use this to index calculated-heights so it is number of lines
;; minus 1
(lines (1- (length content)))
(rect (nth lines default-rectangles))
(max-width 0)
next-start)

;; peek next sub start
(save-match-data
(if (re-search-forward timeline nil t)
(setq next-start (/ (srt-string-to-ms (match-string 1)) 1000))
(setq next-start (+ start 120))))

;; here we call an external program to calculate the width of the
;; delogo rectangle. It takes the string a 1st parameter and
;; the size in points as the second
(with-current-buffer string-width-output
(dolist (line content)
(call-process "calc_string_width" nil '(t nil) nil
"--"
line
(nth 4 sub-textline-params))
(setq max-width (max max-width
(string-to-number
(delete-and-extract-region
(line-beginning-position -1)
(point)))))))
(with-current-buffer delogo-buffer
(cond
;; 2nd area is smaller
((or (and (= lines (cadr prev-sub))
(< max-width (nth 2 prev-sub)))
(and (< lines (cadr prev-sub))
(< max-width (nth 2 prev-sub))))
(setq start (1+ start)))
;; 2nd area is different
((and (> (cadr prev-sub) -1)
(or (< lines (cadr prev-sub))
(< max-width (nth 2 prev-sub))))
;; insert one second of enclosing rectangle
(let ((extra-width (max max-width (nth 2 prev-sub)))
(extra-lines (max lines (cadr prev-sub))))
(prepare-set-rectangle start extra-lines extra-width)
(setq start (1+ start)))))
(prepare-set-rectangle start lines max-width)

(setf (cadr prev-sub) lines)
(if (< end (1- next-start))
(progn
(insert (number-to-string end)
" 0:0:0:0\n")
(setf (cadr prev-sub) -1))
(setcar prev-sub end)
(setf (nth 1 prev-sub) lines)
(setf (nth 2 prev-sub) max-width)))
(goto-char pos))))

(defun prepare-save-data ()
"Convert the whole buffer to delogo data and save"
(interactive)
(setq delogo-buffer (get-buffer-create "delogo-data"))
(prepare-sub-to-delogo)
(set-buffer delogo-buffer)
(set-visited-file-name "delogo_data")
(save-some-buffers)
(kill-buffer delogo-buffer)
;; get rid of some things needed for prepare-delogo-input
;; perhaps this is not the place to do it becasuse that means
;; you can call this command only once. Well, then do it so :-)
(kill-buffer string-width-output))
And here is the code that calculates the width in pixels of a text string
#include <stdlib.h>
#include <locale.h>
#include <ft2build.h>
#include <getopt.h>
#include <math.h>
#include FT_FREETYPE_H

#include "utf8.h"

FT_Library ftl;

char *font = "/usr/share/fonts/truetype/ttf-bitstream-vera/Vera.ttf";
float size;

typedef struct {
unsigned int *ptr;
unsigned int size;
unsigned int allocated;
} string_t;

string_t in_string_ucs;

void
cleanup()
{
if (in_string_ucs.ptr != NULL) {
free(in_string_ucs.ptr);
}
}

void
usage()
{
fprintf(stderr,"Usage: calc_string_length <-f font> string size\n");
}

int
main (int argc, char **argv)
{
FT_Error err;
FT_Face face;
FT_GlyphSlot slot;
int i;
unsigned int total_width=0;
int o;
char *endptr;
FT_F26Dot6 ft_size;

setlocale (LC_ALL, "");

const char *optstring="+f:";

while (-1 != (o=getopt (argc, argv, optstring))) {
switch (o) {
case 'f':
font=optarg;
break;
}
}

if (argc < 3) {
usage();
exit(1);
}

// get string and store as UTF-8
const char* in_string=argv[optind];
in_string_ucs.ptr = (unsigned int*)malloc(4*strlen(in_string)+4);
in_string_ucs.allocated=4*strlen(in_string) +4;
in_string_ucs.size=u8_toucs(in_string_ucs.ptr, in_string_ucs.allocated /4,
in_string, strlen(in_string));

// get font size
size=strtof(argv[optind+1], &endptr);
if (size==0 || endptr == argv[optind]) {
usage();
exit(1);
}
ft_size=floor(size*64);

err = FT_Init_FreeType(&ftl);
if (err) {
fprintf(stderr, "Algo ha fallado al inicializar FreeType\n");
cleanup();
exit(1);
}
err = FT_New_Face(ftl, font,0,&face);
if (err) {
fprintf(stderr, "Algo ha fallado al inicializar la fuente\n");
cleanup();
exit(1);
}
err = FT_Set_Char_Size(face,ft_size,0,96,0);
if (err) {
fprintf(stderr, "Algo ha fallado al fijar el tamaño\n");
cleanup();
exit(1);
}

slot=face->glyph;

for (i=0; i < in_string_ucs.size; ++i) {
uint32_t c=in_string_ucs.ptr;
unsigned int adv;

err = FT_Load_Char(face, c, FT_LOAD_MONOCHROME);
if (err) {
fprintf(stderr, "Algo ha fallado al cargar el caracter %i\n",i);
exit(1);
}
adv = slot->advance.x / 64;
fprintf(stderr, "Char \"%lc\": %i\n", c, adv);
total_width += adv;
}
printf("%i\n", total_width);
cleanup();
return 0;
}


User avatar
SmallBrother
Site Admin
Posts: 3724
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: embedded subtitles and how to remove them

Thu Sep 02, 2021 10:11 am

Using "code" just throws away all linefeeds so I'll use "quote" instead. It seems there is no way to keep text formatting here.
I am afraid also this is not really the right way. Because on line 8 I see a smiley with sunglasses, which is supposed to be

Code: Select all

8)
The "code" tag should not mess with the exact characters, including line breaks. That's the whole point of that tag. But I tried and you are right. I guess the latest forum update was a downgrade to some degree ;-)

So, maybe it's an idea to upload a txt file on www.filedropper.com and post the link here.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: embedded subtitles and how to remove them

Fri Sep 10, 2021 10:17 am

Yes. When I saw him I thought, "Hey, what´s doing this guy in my code? I don't remember him. Bugs don't usually wear sunglasses" :lol:

There seems to be not many people interested anyway :)

lxs602
Posts: 2
Joined: Wed Nov 24, 2021 5:30 pm

Re: embedded subtitles and how to remove them

Wed Nov 24, 2021 5:53 pm

This looks like it might be useful.

Why not open a project on a website for coding projects, such as Github, and put a link here?

I think it would gain more interest.

L

Práctica de traduccion (con ayuda de el diccionario), por estudiando español:

¿Por qué no pones en un sitio web por código de computadora, por ejemple, Github?

Podría ser qué hay más atención por tu trabajo.

L

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: embedded subtitles and how to remove them

Fri Nov 26, 2021 1:02 pm

Maybe it's a good idea. But then I'd have to learn how to use Github.

Some years ago I created some projects but now I have forgotten the little I knew about it.

Return to “Off-topic talk”

Who is online

Users browsing this forum: No registered users and 31 guests