Page 1 of 1

inconsistent language codes

Posted: Wed Jan 25, 2017 1:50 pm
by hector
Hi.
I am using language codes to manage subtitle files from OS. And I've noticed some inconsistency.

Surprisingly you are using ISO 639-2T in some cases and ISO-639-2B in others. More specifically, you use
2B codes
  • dut - Dutch
  • fre - French
  • ger - German
but 2T code:
  • ell - Greek
I think you usually use 2B with Greek being an exception. It would be great if you could be consistent and either use 2T (nld, fra, deu) for all (which I think would be preferable) or 2B for all (gre for Greek).

You culd say I'm too picky but with some experience in programming I know inconsistencies can cause a lot of trouble in the long run.

Re: inconsistent language codes

Posted: Thu Jan 26, 2017 12:07 pm
by hector
See the Wikipedia article on ISO 639-2
It says code "scc" (which OS is using too) is deprecated. I don't know how difficult it would be to change this now but I think it would be a good thing to switch to 639-2T or even 639-3 instead of 639-2B just partially.

Re: inconsistent language codes

Posted: Mon Jan 30, 2017 7:19 am
by oss
well, changing would produce again some problems in programs, which already have our language codes list.

Those codes are changing in the time, there was one member, for Greek we should use ell etc...in the beginning we used gre I think (or gr)

Re: inconsistent language codes

Posted: Mon Jan 30, 2017 10:07 pm
by hector
If you were using "gre" for Greek why did you change it?

Again, you can use the codes you like. You could even forget about standards and use your own. But then the problem comes when you interact with the outside world. That's why standards were invented.

I extract the language code from filename. If I use 639-2B Greek is not recognised. And if I choose 639-2T then all the other languages are not recognised. "scc" is not recognised because it is deprecated. You should use "srp" instead.

Well, I can workaround this but it would be much easier and simple if you'd follow the standard.

The same for "pob", "che" and some other non-standard codes. But those are needed because ISO 639 does not support country specification. You should consider IETF language tags but that's already been discussed somewhere else.

Re: inconsistent language codes

Posted: Tue Jan 31, 2017 11:27 am
by oss
yes, thats why we here: https://www.opensubtitles.org/addons/ex ... guages.php

changing to "gre" would cause problems in applications, which are working and not daily developed. As you find out, we are using custom codes also.

we are not going to change Greek to gre, sorry, it would cause mess in all existing applications.

Re: inconsistent language codes

Posted: Tue Jan 31, 2017 2:22 pm
by hector
Well, the mess is already there and is caused by mixing ad lib two different standards.
I hope you can be more standard in the future.
Thanks for the list, anyway.

Re: inconsistent language codes

Posted: Wed Feb 01, 2017 7:38 am
by oss
basically I could fix it, but it would produce more mess (accepting both codes), but I think more systematic is not to mix it.

I know what you mean to be more standards positive, but already when I add there POB and some Chinese stuff is already out of standard, so basically I could use any codes...because to implement it, one have to download "our codes" anyway.

Re: inconsistent language codes

Posted: Tue Feb 07, 2017 5:25 pm
by hector
some Chinese stuff is already out of standard
Speaking of consistency and language codes... the problem with "zhe" is that you are allowing bilingual subtitles. I think it's okay. But then (being consistent) you should do the same for other languages. Why not Spanish/English or POR/POB or whatever? Perhaps it is because nobody requested it. And perhaps it is because most video players support showing two subtitles at the same time.

So again, I think they are a waste of space and resources. And you could get rid of "zhe", "spe", "dee", "gee"... Oh, gee! :)