after release https://github.com/Ivshti/node-subtitles-grouping I knew it is good feature, but never gets really into it. I have been contacted couple of days ago by its developer and I looked into it again, and it comes as nice idea.
In easy way how to explain what is going on - imagine you got 4 subtitles (they can be in 1 language or in any language, language is not important here) for 1 movie (imdbid). You download 1st subtitle and it is out of sync. It would be good to know, which subtitles you dont have to download, because they will be out of sync too.
And here comes Subtitle Time Stamp grouping. This algo makes heatmap of timestamps of subtitles, then compares them to others and then with give threshold it either groups that subtitles or it makes another group.
I made some tests, and it is quite interesting.
Now the database is populating with data, so it will take some time, but you can see in xml-rpc results 2 new fields, example:
Code: Select all
[SubTSGroup] => 2
[SubTSGroupHash] => 041ec2cdae88105ccce1e1f9f4eebc85
SubTSGroup means IDSubtitleFile belongs to some TIME STAMP group for given movie
SubTSGroupHash is something like md5($SubTSGroup . "|" . $IDMovieIMDB) - so this will work system wide.
This could be used in many different areas (if it proves to be right)
You can for example right now check xml-rpc SearchSubtitles() with this request:
Code: Select all
'imdbid' => '49778'
http://pastebin.com/kmGz5Zrb
and over there you can see SubTSGroup - and some number. So why not to download couple of subtitles and actually see, if SUBTITLE TIMESTAMPS are similar in each group (and not similar in other groups) ?
For now works just for srt files, support for other formats will be added later