Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

SubHash not matching gzip'd sub

Tue Sep 07, 2010 11:47 pm

Hello.

I found out today an error in the XML-RPC interface when searching subtitles.

In the Sub dataset there's a SubHash (md5sum of the subtitle) and then both a GZIP and ZIP versions of files to download.

SolEol has always downloaded the GZIP version because it's the smallest and easiest to manage. ZIP files contain a directory and an NFO file that make it harder to put files in their places.

As of some time ago, the SubHash doesn't match the GZIP subtitle but it matches the subtitle inside the zipfile. I believe this is incorrect.

For example:
subtitle 3538948
URL: http://www.opensubtitles.org/en/downloa ... 2161815.gz
Reported Subhash: b7a82a41c90693e49493527d14f9728c
Real Subhash: a08873deb9cd1137dca7760e6b7265ff
Reported Size: 25221
Real Size: 25303

The ZIP version http://www.opensubtitles.org/en/download/sub/3538948 matches both in SubHash and in size.

After review it becomes clear the subtitles are different. The GZ version has an extra line at the beginning (subtitle line 2) and an extra line at the end (subtitle 326). Obviously md5sums can't match.

This seems to happen for all subtitles.

What solution is there for this? I can only see one the following:

-Discontinue Gzip support (I'd rather this didn't happen)
-Match the zipped subtitle to the gzipped subtitle (or vice versa)

In the meantime SolEol (and I assume others like it) are re-downloading subtitles when they shouldn't. Adding to the loads of the servers unnecessarily.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: SubHash not matching gzip'd sub

Wed Sep 08, 2010 8:44 am

eduo,

thanks for post. I corrected this, temporary. It was caused, because I tested for sub and srt subtitles to put some signature at top and bottom.

- do you have to check md5 while downloading ?
- it is not enough, if you try to decompress gz, and when ok, everything is ok ?
- for already downloaded subtitles - only here I see problem, because if I put something in subtitles (signature, ad), the md5 will be different, so you can not say, which subitles you already got

From my side, there shouldn't be any problem, if user will download "signed" subtitles, and then try upload same subtitles back - I got md5 in db of this "signed" subtitles. Also, when he change this subs, so md5 will not match, and try to upload as anonymous, this shouldnt be also possible.

So, only one drawback what I see here, is with subtitles already downloaded, so program can not compare them, if they are same using md5.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: SubHash not matching gzip'd sub

Wed Sep 08, 2010 10:29 pm

eduo,

thanks for post. I corrected this, temporary. It was caused, because I tested for sub and srt subtitles to put some signature at top and bottom.

- do you have to check md5 while downloading ?
- it is not enough, if you try to decompress gz, and when ok, everything is ok ?
- for already downloaded subtitles - only here I see problem, because if I put something in subtitles (signature, ad), the md5 will be different, so you can not say, which subitles you already got

From my side, there shouldn't be any problem, if user will download "signed" subtitles, and then try upload same subtitles back - I got md5 in db of this "signed" subtitles. Also, when he change this subs, so md5 will not match, and try to upload as anonymous, this shouldnt be also possible.

So, only one drawback what I see here, is with subtitles already downloaded, so program can not compare them, if they are same using md5.
I'm sorry I wasn't clear enough.

Actual, uncompressed gzips don't match the SubHash in the XML response to SearchSubtitles.
I don't check md5 while downloading. I check md5 for existing subtitles. If the md5 exists in SearchSubtitles then no new subtitle is downloaded. If it doesn't exist then it's downloaded. This saves download counters and bandwidth. This means you can drop your whole series folder and check for subs and only the subs you don't have would be downloaded.

The only situation where I check MD5 is when I'm checking existing subtitles. But it's a very common situation as the idea is to drop your whole movies or series directory and get whatever is missing.

What you fixed means that the SubHash will match the gzipp'd sub or the the subs goes back to the way it was?
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: SubHash not matching gzip'd sub

Thu Sep 09, 2010 6:39 am

Hi Eduo,

I understand, what is problem, you describe it good in post before. I am just thinking, how to resolve this issue.

When I want to put some signature into subtitles, of course md5 will be different from original subtitles. What is worse, I don't know this md5 when Searching subtitles, I know it, only when downloading them (after download, to be precise).

I would not like to drop support of signed subtitles, so maybe the best idea/workaround is:
- signed subtitles turn on
- when downloading - update md5
- so when next time these subtitles will be searched, right md5 will be returned until different signature will be served.

For now, there is same md5 for zip and gzip (so I put it back as it was before).

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 24 guests