Page 1 of 1

Subtitle Time Stamp Groups

Posted: Sun May 29, 2016 8:14 am
by oss
Hi,

after release https://github.com/Ivshti/node-subtitles-grouping I knew it is good feature, but never gets really into it. I have been contacted couple of days ago by its developer and I looked into it again, and it comes as nice idea.

In easy way how to explain what is going on - imagine you got 4 subtitles (they can be in 1 language or in any language, language is not important here) for 1 movie (imdbid). You download 1st subtitle and it is out of sync. It would be good to know, which subtitles you dont have to download, because they will be out of sync too.

And here comes Subtitle Time Stamp grouping. This algo makes heatmap of timestamps of subtitles, then compares them to others and then with give threshold it either groups that subtitles or it makes another group.

I made some tests, and it is quite interesting.

Now the database is populating with data, so it will take some time, but you can see in xml-rpc results 2 new fields, example:

Code: Select all

[SubTSGroup] => 2 [SubTSGroupHash] => 041ec2cdae88105ccce1e1f9f4eebc85
where:
SubTSGroup means IDSubtitleFile belongs to some TIME STAMP group for given movie
SubTSGroupHash is something like md5($SubTSGroup . "|" . $IDMovieIMDB) - so this will work system wide.

This could be used in many different areas (if it proves to be right)

You can for example right now check xml-rpc SearchSubtitles() with this request:

Code: Select all

'imdbid' => '49778'
results:
http://pastebin.com/kmGz5Zrb

and over there you can see SubTSGroup - and some number. So why not to download couple of subtitles and actually see, if SUBTITLE TIMESTAMPS are similar in each group (and not similar in other groups) ?

For now works just for srt files, support for other formats will be added later

Re: Subtitle Time Stamp Groups

Posted: Mon May 30, 2016 5:24 pm
by vankasteelj
Oh yes I've been told that this grouping thing was awesome but never took the time to look into it. Great news if it can improve automatic matching!

Re: Subtitle Time Stamp Groups

Posted: Tue May 31, 2016 2:42 pm
by IvoGeorgiev
hello, developer here

vtt would be simple to add, it's almost the same as srt and if I'm not mistaken the system may support vtt at the moment at the parsing level

I think there may be a lot of room for improvement on the algo itself, as @oss discovered "hearing impaired" subtitles are being grouped at the moment in separate groups, which is a problem

-----

Few words of explanation: the idea of grouping subtitles by how they're synced is beneficial, because we can cross-reference that with moviehash matches and fight the "right" group

Re: Subtitle Time Stamp Groups

Posted: Tue May 31, 2016 4:37 pm
by oss
yes vtt is simple, but we dont have vtt subtitles uploaded yet.

for hearing impaired, well, it can, but dont have to end up in different group, it will be on border with threshold.

Re: Subtitle Time Stamp Groups

Posted: Tue May 31, 2016 6:11 pm
by vankasteelj
you could easily make your srt into vtt with a script, but you probably know that, so I might misunderstand

Re: Subtitle Time Stamp Groups

Posted: Wed Jun 15, 2016 7:01 am
by oss
sure. But this is more on the "output support". (download as vtt)