Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
gagaman
Posts: 2
Joined: Wed Nov 07, 2007 12:03 am

Subtitles + Unicode

Wed Nov 07, 2007 12:07 am

Hi, I was wondering if there is a way to upload & download subtitles in Unicode (I've got some Icelandic and Swedish subtitles). Not Oscar neither Subdownloader let me. Thanks in advance.

User avatar
oss
Site Admin
Posts: 5889
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Wed Nov 07, 2007 10:55 am

make sure those subtitles are utf-8 encoded (not utf-16 or utf-32). If you still cannot upload, try upload them via webpage. If that will not work either, send me those subtitles, I will look on them.

gagaman
Posts: 2
Joined: Wed Nov 07, 2007 12:03 am

Sun Nov 11, 2007 6:25 am

Hi, thanks for the answer, but now I came up with another problem: there's only one video player that can decode UTF-8 (that I know of) and it's VideoLAN. Most of the people who are going to download these subtitles won't be able to read them. Do I upload them anyway?

User avatar
oss
Site Admin
Posts: 5889
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Tue Nov 13, 2007 12:57 am

nope, use standard coding then, not utf8. We are using cp1251 for slovak/czech subtitles, so if you use some other cp thats ok...maybe I can code later download as UTF8 :)

TZOTZIOY
Posts: 25
Joined: Mon Dec 18, 2006 10:26 am
Location: Athens (the original one)
Contact: ICQ Website

UTF-8 should be forced down our throats like milk&cereal

Thu Nov 15, 2007 1:39 am

there's only one video player that can decode UTF-8 (that I know of) and it's VideoLAN.
VSFilter.dll (VobSub, the Win2k/XP version) displays UTF-8 encoded subtitles fine. Also, at least the versions of ffdshow that I have seen have no trouble with UTF-8. These are player independent; so use vobsub or ffdshow and all (windows) players using DirectShow filters work correctly.

I upload all my subtitles in UTF-8 encoding.
--
Just an earthbound misfit, I.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Tue Feb 19, 2008 9:28 pm

All current major players support UTF-8. Even in such a piece of crap platform as Windows still is (UTF-8 has been kludged into the Windows, but it's where it may give trouble).

But I really believe everything should be moved to UTF-8. I convert all my subtitles (and all text files) to UTF-8 using iconv and I couldn't be happier, as they work EVERYWHERE.

User avatar
oss
Site Admin
Posts: 5889
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Thu Apr 17, 2008 2:20 pm

for downloading - there is sometimes problem to detect codepage of source subtitles, so converting them to utf8 should be tricky. Also we should have problem by detecting same subtitles, because md5 of unicode will be not same as md5 of cp1251 for example...

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Thu Apr 17, 2008 3:14 pm

Ah, I can see your point. The last argument is actually pretty solid. Programs wouldn't report correct checks at the beginning.

It's a shame but I can see it's probably unsolvable unless multiple hashes are stored for non utf8 files.

wahoo
Posts: 1
Joined: Thu May 15, 2008 8:56 pm

Thu May 15, 2008 9:00 pm

Hi,

Have you looked at

http://chardet.feedparser.org/? It is pretty good at detecting code pages.

Another option would be to use the "UnicodeDammit" class from BeautifulSoup http://www.crummy.com/software/BeautifulSoup/, which will automatically call chardet if it's available and then convert just about anything you hand it to UTF-8.

I have been working with subtitles in lots of languages from opensubtitles, and I have to say it would be so much better if everything were in one encoding, and only Unicode can handle any language...

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 71 guests