Page 1 of 1

Hashes change due to advertisements

Posted: Thu Mar 30, 2017 5:43 pm
by amaarten
Hey,

Since some time, opensubtitles.org adds advertisements in subtitles.
This changes the hash of the subtitles.
My tool relies on these hashes to match downloaded subtitles and subtitles on the server.

My question(s) are:
  • Is there another way to match subtitles?
  • I would propose to add a parsable comment in the header of the downloaded subtitles containing the original hash.
    Adding something like:

    Code: Select all

    ;;ORIGINALHASH=d41d8cd98f00b204e9800998ecf8427e
Any ideas?

Many thanks!

Re: Hashes change due to advertisements

Posted: Tue Apr 04, 2017 4:26 pm
by oss
Hi

we can add original subtitle hash in HTTP header. It would help you ?

Re: Hashes change due to advertisements

Posted: Tue Apr 04, 2017 11:16 pm
by amaarten
When a subtitle is downloaded, the original (=without adds) hash is already known though your xml-rpc interface. So your suggestion wouldn't work.
My question is about the hash of an already downloaded file, with ads.
The hash of that file will not match with any file on the server, even so if the advert changes.

That's why I suggest to add the hash of the subtitle (=index in your database) as text in the subtitle itself.
Either by prepending it with something computer parsable like ;;ORIGINALHASH=d41d8cd98f00b204e9800998ecf8427e=ORIGINALHASH
A problem with this approach is that this might confuse some video players.
VLC 3.0 on linux does play srt files with the above line prepended.

Another possibility is to put this computer parsable hash at the end of the file at a time offset at or beyond the end of the movie. This way, the subtitle still has a valid syntax.

Re: Hashes change due to advertisements

Posted: Wed Apr 05, 2017 1:09 am
by oss
we keep track of all subtitle hashes, so it is linked to original file.

when you will download subtitle, in your app you will know (already from the search results) the original subtitle hash, so you can store it with subtitle file, and then compare this, so it will work.

we will not add anything anymore to subtitles itself, coz it could break the compatibility, and this could lead to bigger problems.

Adding some metadata in 0:00:00 -> 0:00:00 could be also not valid I think.

Re: Hashes change due to advertisements

Posted: Wed Apr 05, 2017 8:02 am
by amaarten
For a recently downloaded file you can keep track of the relation. Sure.

But when we scan a new directory with videos and subtitles (including ads), the mapping will not work.

Yeah, adding `0:00:00 -> 0:00:00` might nog be valid.
That's why I suggest adding something like

Code: Select all

09:59:58,000 -> 09:59:59,000 ORIGINALHASH=d41d8cd98f00b204e9800998ecf8427e
at the end of the file.

Video players will never see that line since it is beyond the length of the movie.
That way programs can simply apply a regex to the last chunk of a file.

Re: Hashes change due to advertisements

Posted: Thu Apr 06, 2017 4:32 am
by oss
Hi

did you try to use http://trac.opensubtitles.org/projects/ ... eckSubHash

maybe this would help you

adding original hash in subtitle contents creates more subtitle hashes, so I want to avoid (and also there are not only SRT out there...)

Another thing - in your app you could make track of subtitles which you already downloaded (so you got original hash from search results), and then ignore those...