Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
NomadaPT
Posts: 20
Joined: Mon Dec 22, 2008 3:12 am

Problems with the addition of new lines in the srt files

Fri Aug 03, 2012 1:06 am

Recently OpenSubtitles started to add to the downloaded subtitles two new lines, one in the beginning (Subtitles downloaded from http://www.OpenSubtitles.org) and another in the end (Download Movie Subtitles Searcher from http://www.OpenSubtitles.org), but I see some problems resulting of this (I suppose automatic) additions:

First, if one subtitle have one line with a early time stamp, in the first minute for instance, the addition of the first line replace the (first) line of the subtitle.

Second, if the srt file is encoded in Unicode (UTF-8 usually) the edition change the encoding to ANSI and all the characters not covered in the ASCII table are replaced, for instance: in a Portuguese sentence like: "Quarenta dias farão um furor tão grande no céu" you got "Quarenta dias farão um furor tão grande no céu", the same happens with subtitles I've uploaded in Hebrew encoded in UTF-8, in this case you got plain gibberish ("אף אחד לא מלמד אותךמה ×–×” אומר להיות אימא טובה." instead of "אם זה האימא שהילד שלי היה רוצה שאני אהיה"), off course that this last problem can be solved by reopening the srt file and re-changing the encoding... but, how many people know how to do that?

User avatar
scooby007
Site Admin
Posts: 839
Joined: Thu Mar 05, 2009 10:49 pm
Location: Scandalous

Re: Problems with the addition of new lines in the srt files

Sat Aug 04, 2012 11:31 pm

Interesting... The lines don't appear to logged-in users, but I can see how they'd be a problem for logged-out users downloading subtitles. I'll forward this topic to the "admin homepage" section where oss can have a look at it and he'll respond to you here. Thanks for the report.

NomadaPT
Posts: 20
Joined: Mon Dec 22, 2008 3:12 am

Re: Problems with the addition of new lines in the srt files

Sun Aug 05, 2012 4:28 am

UTF-8 sucks in hundred ways by the way. ANSI has all the characters that UTF has but their number in the charcter table is different.
You CAN save your UTF in ANSI if you want, the only thing you have to do is check the "replace UTF characters" button in yout text editor.
Omerta, I totally disagree, to be honest I don't see the point of using ANSI and keep changing the code page of the character set, when with Unicode you got one single chart code invariable that covers almost all of the worlds scripts, and doesn't matter what ANSI code page your O.S. are, because you always got the same. Doing a analogy, it's like when you open one pdf file, doesn't matter what you have because the formating is always the same whether in China or in Canada.

Scooby007, thank you very much for your attention in the subject.

srtpal
Posts: 59
Joined: Sun Jun 21, 2009 5:28 pm

Re: Problems with the addition of new lines in the srt files

Sun Aug 05, 2012 7:50 pm

ANSI has all the characters that UTF has but their number in the charcter table is different.
I, too, disagree quite strongly. ANSI does not have all the characters of UTF-8. ANSI is limited to a single code page at a time (slightly less than 256 characters), so if the subtitles need more than one page (and mine usually do because I use ♫ to indicate to the hard-of-hearing people that the text is being sung), Unicode, whether 16-bit or 8-bit, is the only way to go.

Unfortunately, OS rejects the 16-bit Unicode, so for me the only way to go is UTF-8. This is unfortunate because I use two players, VLC and TotalMedia Theatre 5. While VLC supports both, 16-bit Unicode and UTF-8, TM Theatre 5 supports 16-bit Unicode but not UTF-8, which it treats as ANSI and displays all the non-ASCII characters that are part of UTF-8 as ANSI graphics. Additionally, I have not found a way to set the code page in either VLC or TM Theatre 5 and I am certainly not going change my system code page every time I need to watch a movie with subtitles on. So, I prepare all my subtitles in UTF-8 for the upload to OS, then convert them to 16-bit Unicode for my personal use.

There is a good reason why Unicode was invented and why the Internet standards have accepted UTF-8 as the one encoding all Internet protocols must support in this century (they used ISO/IEC 8859-1 as the default in the last century).

I edit all my subtitles in Notepad++ as a text file, as Notepad++ allows me to convert between any code page and UTF-8, then clean everything up by running it through my own srtpal before uploading it here.

By the way, it would be nice if OS displayed the code page for all subtitles, so people could easily convert them to whatever their system needs. Better yet, if it standardized on UTF-8 and rejected subtitles in any other code page. People would then know that everything they download is UTF-8 and, if they so desired, could then convert it to whatever else they want and need.

srtpal
Posts: 59
Joined: Sun Jun 21, 2009 5:28 pm

Re: Problems with the addition of new lines in the srt files

Sun Aug 05, 2012 8:29 pm

Still, average people use letters to read, not calligraphy.
Believe it or not, average people of the world do not use the plain Roman alphabet as is used by English speakers. Considering mere numbers, the average person uses the Chinese script (which cannot be fit into a single code page) or Devanagari (the script of Indic languages) or some derivative of Devanagari. Some 25-30% of people in Europe use the Cyrillic alphabet and most of the rest of Europe uses a modified Roman alphabet (i.e., modified by the use of diacritics, but different languages use different diacritics). And in Greece they use an entire different alphabet but have to use the Roman alphabet for certain names and such.

Long time ago, at university in Slovakia, I was watching an Arabic student who was taking his notes in Arabic, writing from right to left, but wrote any names and any Latin words (this was an anatomy class) left to right in the Roman alphabet. So, pardon me if I cannot think of someone from the UK, or even the rest of Europe and US, Canada and Australia, as an average person.

So, when I post English subtitles for a Czech movie (as seems to be the majority of my subs for some reason), I need to use the unmodified Roman alphabet for the English text and a modified Roman alphabet for the names of the characters. Now I cannot even start imagining the difficulty one would be faced with if he was making Chinese subtitles for a Slovak movie, or Hebrew subtitles for a German movie, etc.

I’d say the average person on this planet needs more than can be fit on a single code page.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: Problems with the addition of new lines in the srt files

Sun Aug 05, 2012 9:42 pm

I hadn't noticed that the Open Subtitles "ads" were changing the encoding of the subtitles, but it may explain some issues I'd found in some subs and some reports I used to get from SolEol users.

I assume something has been done, as the "ads" are now coming in english in spanish subs. May be unrelated, but seems like too much of a coincidence.

As for encodings themselves: Everything but UTF is plain evil. It's true that for a long time non-UTF (which is way more than ANSI covers) was so common that a lot of software still doesn't recognize UTF but, in reality, it's in everyone's best interest to see that the shift to UTF is done.

At the very least the subtitles should *always* be stored as UTF in OpenSubtitles and, if anything, an option upon download could convert them (converting UTF to other charsets is always easier than trying to figure out what charset non-UTF has and then converting to UTF). Without pushing for UTF this craziness we've endured for four decades will never stop.

That is, really, the real problem with non-UTF: There's nothing in the file that says what the encoding is. So software can't figure out easily what to use (the best character encoding "guesser" there is has been released by Mozilla and even that fails a LOT)(*).

Developers whose players don't support UTF should be berated and mocked publicly. UTF is almost 30 years old already, for Christ's sake ( http://unicode.org/history/ ). Having to convert to ANSI should be taken like the retrograde embarrassing unnecesary step that it is.

This is an excellent resource explaining both the history of character set encodings and codepages, as well as UTF. Great summary.
http://www.joelonsoftware.com/articles/Unicode.html

(*)Here you can read a fantastic paper about Mozilla's character encoding guessing which, ironically, has character set errors: http://www-archive.mozilla.org/projects ... ction.html
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: Problems with the addition of new lines in the srt files

Sun Aug 05, 2012 9:49 pm

Interesting... The lines don't appear to logged-in users, but I can see how they'd be a problem for logged-out users downloading subtitles. I'll forward this topic to the "admin homepage" section where oss can have a look at it and he'll respond to you here. Thanks for the report.
I just noticed your mention that it doesn't appear for logged-in users. For API users the lines show for all, regardless of authentication. I imagine that it's a way to compensate for not using the web but thought the mention was worth it.

For a while one of the two links provided in the API (a ZIP and a GZip) didn't include "ads" (I believe it was the GZip) but I can't recall when this was changed.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

NomadaPT
Posts: 20
Joined: Mon Dec 22, 2008 3:12 am

Re: Problems with the addition of new lines in the srt files

Mon Aug 06, 2012 1:30 am

Being a Western European, I know that is hard to someones understand that the Latin alphabet, despite being worldwide spread, is not worldwide used, and the code page ANSI 1252 used by the western european languages don't even cover the majority or the european languages that use the Latin alphabet themselves, tracing a line eastern to the german and italian languages, with ANSI you need to shift the code page almost three times in Europe (ANSI 1250 for Central and East European Latin, 1254 for Turkish and 1257 for Lithuanian and Latvian).
Eduo is right, everything but Unicode is past. Languages like the English don't use any kind of special characters or diacritics, and the remaining western european languages uses a short range ALL covered by the primary ASCII table, but that is the exception not the rule.

Thank you for your opinion guys, I stated to feel like a moron for using Unicode.

srtpal
Posts: 59
Joined: Sun Jun 21, 2009 5:28 pm

Re: Problems with the addition of new lines in the srt files

Mon Aug 06, 2012 2:01 am

I didnt say anything to prove you wrong did I?
Yes, you did. Your remark about average people, as if only the 6% of humans who understand English were people worth considering, was smug to say the least.
So I really thank you for your academic enlightment
Except there is nothing academic about the 94% of the world that the American standard (ANSI = American National Standards Institute) does not help. Unicode, and UTF-8 in particular, is the standard way of encoding text from the beginning of this century. If your player does not support it, inform its manufacturer you are not willing to pay for something as outdated as their software (or hardware).

User avatar
oss
Site Admin
Posts: 5893
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: Problems with the addition of new lines in the srt files

Mon Aug 06, 2012 5:57 am

thanks for posts, nice. Anyway, for testing I need to have URL of subtitles, where this mess appears, or ads are inserted in "wrong" places (replaces subtitle contents for example). Of course ads are inserted automatically.

NomadaPT
Posts: 20
Joined: Mon Dec 22, 2008 3:12 am

Re: Problems with the addition of new lines in the srt files

Mon Aug 06, 2012 10:39 am

Thank you oss.

My attention to the fact was brought by this subtitle in particular: http://www.opensubtitles.org/en/subtitl ... lt-love-pt

The first lines in the original file are:

Code: Select all

1 00:00:41,766 --> 00:00:46,440 ואהבת E DEVERÁS AMAR 2 00:01:48,480 --> 00:01:52,520 "Salmos de Recuperação Espiritual" 3 00:02:08,320 --> 00:02:12,040 Um salmo de David. "Bem-aventurado é aquele que atende ao pobre,
and in a downloaded subtitle altered by the ads:

Code: Select all

1 00:00:01,000 --> 00:00:04,074 Subtitles downloaded from www.OpenSubtitles.org 2 00:01:48,480 --> 00:01:52,520 "Salmos de Recuperação Espiritual" 3 00:02:08,320 --> 00:02:12,040 Um salmo de David. "Bem-aventurado é aquele que atende ao pobre,
Apparently the codification issue is already solved, thank you again for that.

User avatar
SimplyTheBOSS
Site Admin
Posts: 1326
Joined: Mon Feb 01, 2010 3:02 pm
Location: Finland

Re: Problems with the addition of new lines in the srt files

Mon Aug 06, 2012 1:58 pm

This debate reminds me of my cchool Days, my father's car is better :)
Personally I don't like special characters (♫ etc) in subtitles
Image

User avatar
oss
Site Admin
Posts: 5893
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: Problems with the addition of new lines in the srt files

Mon Aug 06, 2012 2:07 pm

Thank you oss.

My attention to the fact was brought by this subtitle in particular: http://www.opensubtitles.org/en/subtitl ... lt-love-pt

Apparently the codification issue is already solved, thank you again for that.
I didnt change anything. I will look whats going on, because exactly this shouldnt happen.

subshare
Posts: 9
Joined: Fri Jan 07, 2011 4:57 am

Re: Problems with the addition of new lines in the srt files

Mon Aug 06, 2012 2:39 pm

Hello, I encountered the same issue today. I just created a subtitle, uploaded it and then downloaded it for a check. I was logged in all the time.

The result is:

ORIGINAL:

Code: Select all

1 00:00:04,512 --> 00:00:06,578 <i>It was on the day of the May Dance,</i> 2 00:00:06,976 --> 00:00:09,734 <i>that Tess's father encountered the parson</i>

DOWNLOAD:

Code: Select all

1 00:00:01,000 --> 00:00:04,074 Subtitles downloaded from www.OpenSubtitles.org 2 00:00:06,976 --> 00:00:09,734 <i>that Tess's father encountered the parson</i>
So, being logged in or out in conjunction with getting the first line f***ed up seems to be a random behavior. I didn't have the problem when I downloaded it a second time.

But either way, this is unacceptable and renders Opensubtitles virtually useless. As a user I now have to check every time I download from Opensubtitles, if the first line is missing/replaced. Hence, in order to avoid this, I prefer getting subtitles from other sources, e.g. subscene. Not to mention that it's obtrusive adding additional lines. As a frequent uploader I expect my files to stay unmodified.

I wish all subtitles to stay in the original state, how they were uploaded. If any lines are added at all, then please only at the end of the subtitle, as the very last line. The interference with the first subtitle line is just not acceptable, no matter what.

Opensubtitles was the best place for getting subtitles so far. One reason for that was the fact that they didn't add additional lines as allsubs.org does for instance. As a user, I avoid sites that automatically modify the subtitles. That's why I don't like allsubs.org. And that's why I prefered opensubtitles.org (also because of the site organization of course).

Uploaders who improve and fix subtitles, are annoyed when they download such subtitles where lines have been automatically added.The first thing I always do is to remove those additional lines in order to keep the subtitles clean from things that don't belong to the film.

After having uploaded over 800 subtitles I am sad to see that opensubtitles messes around with the subtitles. It this issue doesn't get resolved properly, I'll focus on subscene to upload stuff. And I think the same goes for many other Gold Member uploaders.

NomadaPT
Posts: 20
Joined: Mon Dec 22, 2008 3:12 am

Re: Problems with the addition of new lines in the srt files

Mon Aug 06, 2012 3:15 pm

I didnt change anything. I will look whats going on, because exactly this shouldnt happen.
THAT is weird.... because right now I don't see any change in the encoding, and when I've started this topic the changes were there exactily the way I've posted, but again, the encoding was only part of the issue.

Now:

Code: Select all

1 00:00:01,000 --> 00:00:04,074 Subtitles downloaded from www.OpenSubtitles.org 2 00:00:20,960 --> 00:00:23,850 אם זה האימא שהילד שלי היה רוצה שאני אהיה 3 00:00:24,040 --> 00:00:26,486 או האימא שאני חושבת שהוא צריך.

Return to “General talk”

Who is online

Users browsing this forum: No registered users and 2 guests