Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

API returning weird results.

Sun Oct 10, 2010 3:27 pm

Hello.

Today, after quite a while, I have used the API to get subs for the past two weeks of episodes. The results were VERY weird:

http://grab.by/6NaG

As you can see, it's returning all sorts of incorrect movies matched against the hash/bytesize.

I went to see the subtitle pages and they are, indeed, mismatched, for example:

http://www.opensubtitles.org/en/subtitl ... -dragon-en
http://www.opensubtitles.org/en/subtitl ... rd-kind-es
http://www.opensubtitles.org/en/subtitl ... -dragon-es
http://www.opensubtitles.org/en/subtitl ... e-neige-en

In all these cases it seems like the original subtitle mapping is correct (the movie is for the subtitle) but that someone has uploaded a match against a hash that is incorrect. This means this situation affects only the API users (rendering all API clients virtually useless) but doesn't affect the Web users.

Further, since a hash match is viral, once these incorrect matches are made other clients propagate them to their own subtitles and subtitle names, so we get dozens of subs and sub names mismatched against the incorrect movie.

Is there any way to check if these mismatches all belong to the same user(s)/client(s)? They may be a programming error from a player or a plugin somewhere that uploads all subs in a folder as matching a movie in that folder, even if they shouldn't match.

One way to filter all these would be to check against the IMDB-IDs of the subs and see if any given sub is assigned to two different IMDB IDs (it could be assigned to different movie names and different movie hashes, obviously, but never two different IMDB IDs), but clients can't do it as since there's no TV support there's no way to check the ID against the episode's ID.

Any ideas? It may be controllable server side, but I don't know the extents of the infection(*)

(*)"Infection" as in "it's an error that's propagating rapidly". Not because it's related to a virus, worm or anything like it.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: API returning weird results.

Sun Oct 10, 2010 5:48 pm

Hi,

thanks for quick report. It really looks horrible. The worst thing I am leaving in few hours to india. I am taking my computer (I didnt want, but what I can do...).

Anyway, do you know any date, when this can happen ?

Next thing, please try to find as many results, as possible, send me pages, like you sent, and then I can lookup where is problem. The best one is with one hash, so I can match the date or useragent, or ip...

I changed now API database from slave to master, please check it, I am caching results, so please enable cache to be expired in 24 hours.

Next thing, imdb.com changed their layout, thats the problem with numbers in title, but I hope this will be fixed soon (when new imdb module will be released - perl).

Thanks again.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: API returning weird results.

Sun Oct 10, 2010 6:06 pm

This is the original XML response:

http://snipt.net/eduo/spurious-searchsubtitles-response

It contains hashes and bytesizes.

I can't imagine when it happened but it doesn't seem to have been for more than a few days. Episodes from a week ago don't show as many weird results.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: API returning weird results.

Tue Oct 12, 2010 6:32 pm

thanks eduo, this needs more investigation. I did some, but I still dont have clue, what is really happening.

Important for the site is, if this weird hashes are still generating (e.g. those wrong results are raising), or it was few days issue, which is not continuing anymore.

After checking the db, maybe it will be good to delete all "bad" hashes, I mean, we got for one moviehash few subtitles, and sometimes few imdbs, so let's say, if there are 10 imdbs together, and there are 7 same, and 1,1,1 different, those 1,1,1 are wrong.

Before doing any change to database, I need to know this info. If you got some moviehashes to check, please send, it will help me a lot.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: API returning weird results.

Tue Oct 12, 2010 6:43 pm

Something else I noticed is that with the ones I downloaded the timing of the sub is all bad after the OS spam. The first couple of subs are timed correctly, then the OS spam appears and all the sub timings after that are broken.

All the subs I've downloaded this week I've had to delete and redownload by hand from other sites.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: API returning weird results.

Tue Oct 12, 2010 6:51 pm

this is interesting. OS spam you mean advertisement in subtitles, right ?

- I set up this as experimental, and for example Sol Eol shouldn't be affected
- when you download subtitles from website, the advertisement is not there

if you can, please send to my mail any subtitle, where is advertisement, and they are desynchronized after, so I can investigate whats going on.

Thanks.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: API returning weird results.

Tue Oct 12, 2010 7:54 pm

Yeah. I meant "Spam" for either the "Downloaded from OpenSubtitles" or the "Best Played with OpenSubtitles MKV player". I consider any advertising in subs to be strictly spam, if it happens during the main feature or inbetween subtitles (I'm ok with spamvertising in the final credits, although I'd rather not have them, different discussion for a different time).

The problem is that the way they've been inserted breaks one of the fundamental rules of SRT and SUB subtitling: Subtitles should *never*, *ever* overlap.
3
00:00:09,920 --> 00:00:12,321
Built-up resentment,
money issues,

4
00:00:14,000 --> 00:00:19,000
Downloaded from http://www.opensubtitles.org

5
00:00:12,322 --> 00:00:14,223
Met a younger lid.

6
00:00:14,224 --> 00:00:15,491
Huh?
Mm-hmm.
3
00:00:09,920 --> 00:00:12,321
Resentimiento,
problemas de dinero,

4
00:00:14,000 --> 00:00:19,000
Descargado desde http://www.opensubtitles.org

5
00:00:12,322 --> 00:00:14,223
conocieron a una tapa más joven.
SolEol downloads the gz version from the XML response, not the zipfile, if it's any help.

The way I usually see subs is after converting the file and subs into an itunes mp4. This means the SRTs get translated into XML TTXTs and since subtitles should never, ever overlap what the converter does is add the spam's duration to the rest of the subs (MP4 subtitles work different than SRTs. Instead of "start time" and "end time" they have a "elapsed since last sub" and "duration").

Other players do strange things too. My DVR/Mediacenter crashes, XBMC sometimes overlaps and sometimes eats up the subs of what's actually being said showing only the spam.

The SRT format doesn't allow for overlapping times on subs. I don't think it's a good practice to create them on purpose thinking that the players should support unsupported formats.

I'm concerned, as this degrades significantly OpenSubtitles' quality. Is this going to be the set-up going forward? if that's the case then I sadly can't continue developing for OpenSubtitles. The new version of SolEol converts to iPod/iPhone/iPad and this would mean the functionality couldn't be included.

I can think of one way to avoid this:
1.-Search for a 5 second gap in the first or last 5 minutes. If one exists then stick the spamtitle in it (you'd need to reparse the subtitle numbers, but that's not a problem).
2.-Search for one-line subtitles in the first or last 5 minutes and tack the spamtitle as a second line.

I think any of these two would be more acceptable for everyone. If you can't do any of the two above then add a spamtitle as an additional last subtitle at the end, 5~10 seconds after the last sub for the movie.

On the other subject: I tried with some other files. Things don't appear to be as bad as last time but maybe it's because they're more recent.

This is a screenshot of results: http://grab.by/6Ptt

This is the XML with the hashes: http://snipt.net/eduo/bad-searchsubtitles-2
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: API returning weird results.

Tue Oct 12, 2010 8:11 pm

"downloaded the timing of the sub is all bad" ok, so you meant, they are overlapping.

the initial (and still) idea was to put advertisement in subtitles as you mentioned, so they will never overlap, but I got some glitch somewhere. I will check it, I hope tomorrow, because also I think it isn't good.

As I said, for Sol Eol, as few useragents, I disabled adding of advertisement, but again, it adds for some reason, so I will debug that.

Thanks for report.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: API returning weird results.

Tue Oct 12, 2010 8:54 pm

"downloaded the timing of the sub is all bad" ok, so you meant, they are overlapping.

the initial (and still) idea was to put advertisement in subtitles as you mentioned, so they will never overlap, but I got some glitch somewhere. I will check it, I hope tomorrow, because also I think it isn't good.
As I said, for Sol Eol, as few useragents, I disabled adding of advertisement, but again, it adds for some reason, so I will debug that.
Thanks for report.
How weird. I'm getting them, the download is made through http, no idea if the link you send defines whether the ads are included or not, is the URL different for both versions? As I mentioned, I don't think it's too bad, if they were in a way less intrusive. I'm assuming you have to store the subtitle hash before and after the change.

I mentioned "the timing is all bad" because that's the final effect. Subs after the ad are off by four seconds or so (the duration of the ad) because if they don't overlap they get shifted (and in the iPhone they get shifted, which is where I saw it and from where I reported it after the third file with errors). I'm sorry for the inaccuracy. When I got home I could see the actual error in the SRT. You are right, the problem is the overlapping subs, which in turn cause problems with some players and when converted to other formats.

On another note: There was a thread in the forum about the problem that for validated users the main web shows code instead of valid HTML, but I see you've fixed that already.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: API returning weird results.

Wed Dec 29, 2010 9:31 am

do you still got some advertisment in SolEol ?

Link for subtitles with advertisment and without advertisment is same, it depends on various things (VIP member, useragent and so on...)

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: API returning weird results.

Fri Dec 31, 2010 1:26 pm

do you still got some advertisment in SolEol ?

Link for subtitles with advertisment and without advertisment is same, it depends on various things (VIP member, useragent and so on...)
Hi.

I'm not getting them right now. I was getting them only in the .gz version and not the zip version at one point but now I'm not anywhere.

For the record, it's OK if advertisements are added. The problem was that their timings overlapped other subtitles and some programs got confused.

Maybe it would make sense to standardize credits and try to find a way for everyone to agree. My proposal would be:

Credits section 5 seconds after last subtitle in the movie. Two subtitle blocks, with two lines each, 2 seconds display each:

1.-
Original Sub capture and Original Sub Sync
Original Releaser Site

2.-
Translator or Translation Team and Translation Sync
Distributor Site

I know these could be changed by anyone but every credit version could be modified anyway, so trying to get a common format wouldn't be that bad.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: API returning weird results.

Fri Jan 14, 2011 9:18 am

eduo, I enabled "advertisement" again, I check the code, now it should work correctly, if you will find out some problems, please let me know, it shouldnt overlap. Also it is disabled for SolEol, please let me know, if you get it in SolEol...

Also for credits we can use something more standardized, there is going to be new format, based on srt, so I will check if they got some idea, so we will be compatible...

in meantime, you can write here example in CODE blocks.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: API returning weird results.

Sun Jan 16, 2011 12:55 am

eduo, I enabled "advertisement" again, I check the code, now it should work correctly, if you will find out some problems, please let me know, it shouldnt overlap. Also it is disabled for SolEol, please let me know, if you get it in SolEol...

Also for credits we can use something more standardized, there is going to be new format, based on srt, so I will check if they got some idea, so we will be compatible...

in meantime, you can write here example in CODE blocks.
Hi there.

Thanks for your effort. The new routine seems to be OK. You can add it for SolEol as well, it should work and while I don't like advertisement current subs are full of it from other sites, they might as well include yours. :P

For the future, my proposal was to use credits blocks, as outlined below. I don't think it's a bad proposal, but it'd need support from the subtitle makers. Whatever you decide, if properly documented, may start being adopted if some sites take a hard stance with it. I believe it's a good compromise between having credits and not polluting the subs.

How it could be done:

1.-Decide on a policy. I vote for mine, obviously :D
Two blocks of non-overlapping credit lines as subs, one for the original creators/host, another for the repository/distributor. Standard tags for each type of line (AUTHORED BY/RIPPED BY, HOSTED BY, TRANSLATED BY, DISTRIBUTED BY). This means OpenSubtitles would have always have a line reserved in the second block. I would then delete systematically every credit in a subtitle that doesn't follow this convention. Credits block comes 5 seconds after last subtitle in movie and/or 20 seconds (minimum) before first subtitle in movie. Exception: Credit block can be at the end of the feature (when the credits start rolling), even if there's a post-credit scene (problem: this can't be done programmatically :-| )

Example: -Dexter 05x12 - The Big One.es.srt
RIPPED BY honeybuny
HOSTED BY addic7ed
---
TRANSLATED BY oscarmalo, kerensky, ilse
DISTRIBUTED BY OpenSubtitles

Optionally, a third block could have notes, comments, subtitle version (release name and edit version from wiki-based site), etc. Standard rules apply (37-40 chars per line, 4 seconds each block).

As usual, it makes no sense to put actual links in subtitles, as they're not clickable. But it's not forbidden, either.

2.-Actively advertise these rules for credits. In homepage, blog, upload pages, trac wiki.

3.-Systematically remove all credits that don't conform to these rules (a lot can be done programmatically to this end) and religiously keep all credits that conform to them (except for the "DISTRIBUTED BY" one). Ask other sites in #2 above to do the same.

4.-OPTIONAL: Ask developers to calculate the md5sum hash from the sub removing these credit lines (md5sum always implies loading the whole file, so it really doesn't make it any different for the program), to avoid uploading identical subs with different credits (I know you can compare internally, but it would save time :)

As for the new format for subs. I believe it's a fantastic idea. I assume you'll settle for SRT as the format (others are better, but unmanageable).

For this you could standardize subtitle formats as well:
-Remove all tags from subtitles. HTML or Bracketed.
-Use only Unicode UTF8
-Standard hearing-impaired symbols: #music# for music, *sound* for sounds, [words] for remote speech (telephone, P.A.), (whisper) for whispers. This can't be enforced, but can be mentioned in the ratings page.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: API returning weird results.

Tue Jan 18, 2011 5:53 am

hi Eduo,

I enabled it to SolEol, so you should receive signatures there, please let me know, if everything is OK (timing, pauses...)

For credits, it is nice idea, I really like to have this in some website, which is WIKI, and then these credits will be there automatically, so when 3 users are translating those subtitles, they will be there as translators in right order (can be with %), problem is, opensubtitles.org doesn't allow wiki edit for now.

Next problem I see is multilanguage: when I am watching czech subtitles, and I see there Ripped, translated, hosted...thats just english words, which shouldn't be there.

So, maybe it should be really like metadata, put between "[]", so players in future will support (and translate) this to right language. Or just put translated sentences there, but again, we need to know, these are metadatas.

For md5, it should be calculated without metadata, and optionally put in metadata itself, [MD5: ...]

Anyway, this will be nice future, for now, lets just stick with what we got.

For new format of subtitles, google for webvtt, html5. It was compatible with srt, but then they change it, google for websrt, it is really pitty, it is not compatible with srt anymore...

sethyx
Posts: 1
Joined: Sat Feb 05, 2011 7:00 pm

Re: API returning weird results.

Sat Feb 05, 2011 7:12 pm

Hi,

my post is not related to the previous entries, but the title reflects my problem as well.

My program keeps getting a boolean (0) reply when searching for a specific subtitle, so I tried the search manually via the XMLRPC debugger, and it's a bit weird.

If I'm searching with the sublanguageid "all", the above error happens, but if it is "eng", the search works as it should be.

Below are 2 screenshots from the debugger, the only thing what I changed is the sublanguageid.

http://files.sethyx.info/xmlrpc-all.png
http://files.sethyx.info/xmlrpc-eng.png

moviehash: ca7e5e2a10036fa5
moviebytesize: 368252117

(btw. it is a supernatural episode, s06e12)

Any suggestions why the API doesn't return search results if "all" is specified as langid?

Thx.
sethyx

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 26 guests