Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
ivanhoe
Posts: 38
Joined: Mon Nov 06, 2006 8:55 pm
Location: Brno

character encoding

Wed Dec 17, 2008 6:44 pm

I know, most of subtitles are in win-1250. But I've just downloaded 2 subtitles, which encoding I can just guess ... it's not UTF-8 or ISO-8859-2 (czech subtitles - http://www.opensubtitles.org/en/subtitl ... st-fall-cs, http://www.opensubtitles.org/en/subtitl ... st-fall-cs)

Can you add some rutine which can just show the encoding (if is it possible)?

User avatar
oss
Site Admin
Posts: 5882
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Wed Dec 17, 2008 11:27 pm

I will try to do that and show it online. maybe Iconv or som PHP function can come to play.

User avatar
oss
Site Admin
Posts: 5882
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

[PHP] Detecting charcterset encoding

Fri Dec 19, 2008 6:39 am

Hello everybody,

maybe somebody of readers know, how to properly detect encoding of subtitle file in PHP (or Perl - I did not search for it yet, but PHP preffered). I tried a lot, but I don't know how to do that. mb_detect_encoding is not working as I expect.

Thanks for any help.

Cougar_
Posts: 19
Joined: Fri May 23, 2008 9:18 pm

Re: [PHP] Detecting charcterset encoding

Fri Dec 19, 2008 8:17 am

Hello everybody,

maybe somebody of readers know, how to properly detect encoding of subtitle file in PHP (or Perl - I did not search for it yet, but PHP preffered). I tried a lot, but I don't know how to do that. mb_detect_encoding is not working as I expect.

Thanks for any help.
You can't, solution that works with good efficacy dosn't exist.
Try web browser charset autodetection functionality - simply drag and drop subtitles with txt extension, you will see how hopeless it is.

I think, the best solution is that you detect language of subtitles and simply assume that they are encoded using windows charset for that language.
I have not seen polish subtitles encoded with ISO charset used by linux jet, so I thing in other languages is the same - one standard per language.

Subtitles which ivanhoe link put to, are encoded in windows 1250 but they are simply damaged - someone wrogngly set encoding in windows or editor or .. and guess what happened ;)

User avatar
oss
Site Admin
Posts: 5882
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Mon Dec 22, 2008 9:12 am

Cougar,

I know it is quite a problem, I expect there will be some slight errors, but thats ok.

Anyway, I found some regexes for that:
http://lachy.id.au/dev/2005/11/encoding ... ons-source

but it iseems I will give up on this issue...

Last chance is calling some external program for that. (linux/freebsd)

chats_cassy1
Posts: 1
Joined: Mon Jan 12, 2009 10:55 am

Re: character encoding

Mon Jan 12, 2009 10:57 am

hey, the subtitles are mismatching with the audio ongoing in the movie. what to do? how to know which r the perfect subtitles for the print i m having?

Return to “General talk”

Who is online

Users browsing this forum: Amazon [Bot] and 19 guests