Page 1 of 1

character encoding

Posted: Wed Dec 17, 2008 6:44 pm
by ivanhoe
I know, most of subtitles are in win-1250. But I've just downloaded 2 subtitles, which encoding I can just guess ... it's not UTF-8 or ISO-8859-2 (czech subtitles - http://www.opensubtitles.org/en/subtitl ... st-fall-cs, http://www.opensubtitles.org/en/subtitl ... st-fall-cs)

Can you add some rutine which can just show the encoding (if is it possible)?

Posted: Wed Dec 17, 2008 11:27 pm
by oss
I will try to do that and show it online. maybe Iconv or som PHP function can come to play.

[PHP] Detecting charcterset encoding

Posted: Fri Dec 19, 2008 6:39 am
by oss
Hello everybody,

maybe somebody of readers know, how to properly detect encoding of subtitle file in PHP (or Perl - I did not search for it yet, but PHP preffered). I tried a lot, but I don't know how to do that. mb_detect_encoding is not working as I expect.

Thanks for any help.

Re: [PHP] Detecting charcterset encoding

Posted: Fri Dec 19, 2008 8:17 am
by Cougar_
Hello everybody,

maybe somebody of readers know, how to properly detect encoding of subtitle file in PHP (or Perl - I did not search for it yet, but PHP preffered). I tried a lot, but I don't know how to do that. mb_detect_encoding is not working as I expect.

Thanks for any help.
You can't, solution that works with good efficacy dosn't exist.
Try web browser charset autodetection functionality - simply drag and drop subtitles with txt extension, you will see how hopeless it is.

I think, the best solution is that you detect language of subtitles and simply assume that they are encoded using windows charset for that language.
I have not seen polish subtitles encoded with ISO charset used by linux jet, so I thing in other languages is the same - one standard per language.

Subtitles which ivanhoe link put to, are encoded in windows 1250 but they are simply damaged - someone wrogngly set encoding in windows or editor or .. and guess what happened ;)

Posted: Mon Dec 22, 2008 9:12 am
by oss
Cougar,

I know it is quite a problem, I expect there will be some slight errors, but thats ok.

Anyway, I found some regexes for that:
http://lachy.id.au/dev/2005/11/encoding ... ons-source

but it iseems I will give up on this issue...

Last chance is calling some external program for that. (linux/freebsd)

Re: character encoding

Posted: Mon Jan 12, 2009 10:57 am
by chats_cassy1
hey, the subtitles are mismatching with the audio ongoing in the movie. what to do? how to know which r the perfect subtitles for the print i m having?