Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Problem with "Detect Language" method

Thu Jul 24, 2008 2:13 am

Hello.

I am having problems with Detect Language method. I suspect my Gzip + Base64 encode is not right but I have no way to know. When I try decoding it it works.

This is my test file:

http://eduo.info/hdp/DetectLanguage-out-TEST.xml

It contains the full base64 gzipped contents. Every file I throw at the method comes up as "jpn".

I need to know what is OpenSubtitles seeing. What is being shown as the decoded contents of the files. Otherwise it'll be impossible to know what's going on (Text Encodings, Endianness, etc.)

Please reply. Lately all forum posts are ignored.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Thu Jul 24, 2008 8:25 am

sorry for late replies, I have really bad connection here :(

ok, I look at yours problem - I cannot gzdecode bade64decoded data, what you use for gzipping ? Anyway, I prepared for you test file, so launch:

http://web1.opensubtitles.org/addons/xml-rpc-demo.php

as get/post parameter you can pass data. This snippet tells you more:

Code: Select all

function DetectLanguage($host, $uri, $options) { $string = isset($_REQUEST['string']) ? $_REQUEST['string'] : base64_encode(gzcompress('this is just small test, should english be detected, you understand me')); echo "Input string: $string\n"; echo "base64_decode: " . base64_decode($string) . "\n"; echo "gzuncompress: " . gzuncompress(base64_decode($string)) . "\n";; return xu_rpc_http_concise( array( 'method' => "DetectLanguage", 'args' => array( "token", array($string) ), 'host' => $host, 'uri' => $uri, 'port' => 80, 'options' => $options ) ); }
ofcourse this is only for you - for testing, so let me know if you find issue.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Thu Jul 24, 2008 9:55 am

Fantastic. Thanks a lot. This should help troubleshooting. It's obvious I'm messing up the transfer somehow.

I'll re-test as soon as I get home and let you know. If I understand this snippet you've put here, I should be able to just put the base64 code and will get the result of decoding that and then the result of decoding the gzip string (if it's such).

This is nice, I can compare my routines with yours. I personally believe I probably screwed up the endianness of the binary files. I'll verify.

Thanks a bunch.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Thu Jul 24, 2008 9:59 am

Two quick manual checks show that "Japanese" and "Middlefrisian" are very common when analyzing incorrect strings. I was getting "Japanese" and another guy in the forums was getting "Middlefrisian", so this points at an error on our end and our routines.

I don't have access to my code here, so I'll check it back when I get home in a few hours.

A good thing about the DetectLanguage method is that it works as a testing ground for the UploadSubtitle method. Since the base64/gzip routines are a place where there might be problems, it helps a lot as it doesn't really upload anything and thus can't screw up OpenSubtitles' database.

Are you doing any special filtering in DetectSubtitles? How does it ignore the subtitle codes and only focus on the text themselves?
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Thu Jul 24, 2008 7:23 pm

just report problems you will find, I use standard textcat... just look at php text cat - link is here: http://trac.opensubtitles.org/projects/ ... ctLanguage

for special filtering - I dont use nothing, because I believe numbers are not comparing at all, and special chars like "'!@#$%^&*()" and so on are not considered at all too, but maybe I am wrong :)

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Mon Jul 28, 2008 10:03 am

I've finally made this work (it's taken me a while, considering I've only been able to put 10 to 20 minutes a day) but it now seems to match what it should be.

The DetectLanguage functionality is interesting in that it is a test ground for the format the subtitles are uploaded in. It helps getting the gz/base64 right before uploading anything that might pollute the database.

Two questions:
-Will the "unknown" language be supported in UploadSubtitles?


-Is there an XMLRPC way of uploading unmatched subtitles? (I have dozens of unmatched subtitles and I guess users who start using the programs might be in the same situation).
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Thu Jul 31, 2008 5:37 am

-Will the "unknown" language be supported in UploadSubtitles?

-Is there an XMLRPC way of uploading unmatched subtitles? (I have dozens of unmatched subtitles and I guess users who start using the programs might be in the same situation).
unknown language should be supported, but I dont know if it is good idea.

unmatched subtitles - you mean subtitles with no movie file. XMLRPC was done for uploading matched subtitles, because of that I am trying to avoid add there fulltext searching, but it seems in some cases it is good. I have to think about about uploading unmatched subs, maybe later, there are more important things to do :)

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Thu Jul 31, 2008 9:13 am

For me the point of the "unknown" language is that I will believe the DetectLanguage results, whatever they say. It would save a communication step anyway.

Unmatched subtitles are, indeed, subtitles without a moviefile. I guess they are being uploaded manually at the moment. It would be interesting to be able to upload them through the XMLRPC interface. These would be matched to an IMDBid, just not to a moviefile.

Fulltext searching can be done if people want to anyway, the web pages can be used in the simplexml mode for that.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 23 guests