Hello everybody,
maybe somebody of readers know, how to properly detect encoding of subtitle file in PHP (or Perl - I did not search for it yet, but PHP preffered). I tried a lot, but I don't know how to do that. mb_detect_encoding is not working as I expect.
Thanks for any help.
You can't, solution that works with good efficacy dosn't exist.
Try web browser charset autodetection functionality - simply drag and drop subtitles with txt extension, you will see how hopeless it is.
I think, the best solution is that you detect language of subtitles and simply assume that they are encoded using windows charset for that language.
I have not seen polish subtitles encoded with ISO charset used by linux jet, so I thing in other languages is the same - one standard per language.
Subtitles which ivanhoe link put to, are encoded in windows 1250 but they are simply damaged - someone wrogngly set encoding in windows or editor or .. and guess what happened