Problems with the addition of new lines in the srt files

Talk here about new subtitles, movies, site improvements and everything regarding subtitles in ENGLISH language
Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.

Re: Problems with the addition of new lines in the srt files

Postby NomadaPT » Mon Aug 06, 2012 9:41 pm

srtpal wrote:Wait a minute. It actually replaces the first subtitle? If so, that is BAD! I don’t mind too much if it inserts the message BEFORE the first subtitle and renumbers the rest, but replacing the first subtitle would make me think twice about my continuous contribution to OS.


Like I said in the beginning, it only happens when the subtitle have a really early line, before the first minute, and that isn't so common at all.

----------------------------------------------------------

@oss:

I REALLY don't know how or why, but since I've started this subject the changing of the encoding never happen again, and I've tried several computers and operating systems. The first thought that occurred was any kind of glimpse in my PC, however, the problem wasn't initially detected by me but from a non-member friend and tested my myself, reason why I've created this topic. Normally, when I download any subtitle I do through the Media Player Classic and the ads never appear (logged or not) so, until recently, I never notice any kind of change in the subtitles or in the encoding.

Since that part of the problem don't occur again, I ask you tho forget that part and focus in the replacement of the first line.

------------------------------------------------------

About the encoding in general, I firmly believe that Unicode, particularly UTF-8, is the only way to go, and I'm gonna keep encoding all my subs that way, BUT... is a personal choice, a personal believe that the former ANSI code pages are obsolete, I don't criticize no one that rather use them, especially the one's with english as native tongue. There's no reason to almost transform that issue in a war about text encoding, at the present time choosing between ANSI or Unicode (when subtitles are the subject) still is a personal choice, in a non far future I only see one way and that's Unicode.
NomadaPT
 
Posts: 20
Joined: Mon Dec 22, 2008 3:12 am

Re: Problems with the addition of new lines in the srt files

Postby srtpal » Mon Aug 06, 2012 10:16 pm

Well, I just tested it. I downloaded one of my subs using SubDownloader, another from the web without logging in, and in both cases the subs came exactly as I had uploaded them. No ads.

So, I’m happy and will continue to contribute.
srtpal
 
Posts: 59
Joined: Sun Jun 21, 2009 5:28 pm

Re: Problems with the addition of new lines in the srt files

Postby jcdr » Tue Aug 07, 2012 11:59 am

Oss, I too agree 200% on the need to accept UTF-16 for OpenSubtitles. After all, it is a variable length code so it should not take a lot more space, if this is the reason for not accepting it ?

More than 50% of Windows softwares have moved from ANSI to Unicode. Including subtitle and video softwares, and internet (xtml, java, PHP...). Privileging the ANSI for a subtitle website in 2012 is indeed OK for English language users, but will keep on putting off many Asiatic people which are the main potential users.
User avatar
jcdr
Moderator
 
Posts: 367
Joined: Sun Apr 08, 2012 9:49 am

Re: Problems with the addition of new lines in the srt files

Postby oss » Tue Aug 07, 2012 12:53 pm

OS accepts UTF-8, I got some problems with UTF-16 while parsing the files and so on. Is there really need to have UTF-16 support, is not UTF8 enough?
User avatar
oss
Site Admin
 
Posts: 2208
Joined: Sat Feb 25, 2006 11:26 pm

Re: Problems with the addition of new lines in the srt files

Postby NomadaPT » Tue Aug 07, 2012 2:02 pm

oss wrote:OS accepts UTF-8, I got some problems with UTF-16 while parsing the files and so on. Is there really need to have UTF-16 support, is not UTF8 enough?


Well... personally, at least in a near future, I believe so, UTF-8 is the common encoding system through the net, however, Windows, Java and other programs they all use UTF-16, srtpal, proved that when reported that TotalMedia Theatre 5 only supports 16-bit Unicode.

More info: http://unicode.org/faq/utf_bom.html
NomadaPT
 
Posts: 20
Joined: Mon Dec 22, 2008 3:12 am

Re: Problems with the addition of new lines in the srt files

Postby scooby007 » Tue Aug 07, 2012 2:25 pm

Thanks for the laugh, guys! So much to cover, I'll try my best. I don't agree with eduo on the part that UTF should be the only way to go, I'd have to agree with STB that both should be available. When I used to edit subtitles, I always used ANSI and changed ♫ to # to represent music or songs. It's not difficult using "Edit -> replace all". In regards to what srtpal said, it's not that easy to get all standalone player manufactures to support UTF. Even if they did, it would be still difficult since most programs that incorporate subtitles into the avi file for you to burn to disc don't recognize UTF, and even refuse to do the job. You not only have to change the DVD players to accept them, but also the programs that I just mentioned. Until all these changes are made systematically, ANSI should be made avilable and supported. I see where you guys are coming from, but both should be available to accommodate many users who need ANSI to burn discs for the standalone player. The most famous programs available that infuse a subtitle/avi file together for disc don't support UTF. If I'm watching a movie on my PC, I have no problems with UTF format, but to watch it on the box, I will definitely need to convert the srt to ANSI.

If certain languages aren't supported by ANSI, then by-all-means use whatever format accommodates your preferences, but don't subject/force your way of thinking onto others who still benefit from the old format (and many do). I think current world affairs dictates that this doesn't work. I also agree with subshare that the advertisement should appear at the end of the subtitle (if it has to stay), otherwise I did warn the loss of many users, and more importantly, many contributors of good subtitles. Without them, the site won't work nor will it be able to compete with other sites.

@ oss I think UTF-8 is enough for now, but as NomadaPT points out, we should be able to accept UTF-16 in the future as many programs are beginning to use this format. As the owner of this site, I would be making my way to gradually bringing it in and making it a part of opensubtitles.
User avatar
scooby007
Site Admin
 
Posts: 248
Joined: Thu Mar 05, 2009 10:49 pm
Location: Scandalous

Re: Problems with the addition of new lines in the srt files

Postby oss » Tue Aug 07, 2012 5:06 pm

I understand, I will write to todo list support UTF16 subtitles.
User avatar
oss
Site Admin
 
Posts: 2208
Joined: Sat Feb 25, 2006 11:26 pm

Re: Problems with the addition of new lines in the srt files

Postby jcdr » Tue Aug 07, 2012 10:37 pm

Very good. So remains the subject of adding/replacing lines in the file.
As inserting a line at the right begining of the file seems to be a problem, maybe for now on it would be enough to add one only as the last line ?

Anyway I personaly think that this does more wrong than good to OS. Many users were put off from allsubs because of this. As many others, the first thing I do is always removing "alien" lines from the sub file. At least if the line is added at the end only, this won't be the first thing you see and it will be less protrusive.
User avatar
jcdr
Moderator
 
Posts: 367
Joined: Sun Apr 08, 2012 9:49 am

Re: Problems with the addition of new lines in the srt files

Postby arcchancellor » Wed Aug 08, 2012 6:50 am

jcdr wrote:Very good. So remains the subject of adding/replacing lines in the file.
As inserting a line at the right begining of the file seems to be a problem, maybe for now on it would be enough to add one only as the last line ?


I agree.
Many of my friends have told me that they are angry about this first item and at first delete it, while the last item does not bother and they feel it as a credit and a thanks to OS.
Honestly I don't know anybody who likes this first inserted item.
This first item is a way to annoying users permanently. And that's not good.
"I don't believe in God. I just believe in Billy Wilder" - Fernando Trueba
User avatar
arcchancellor
Moderator
 
Posts: 69
Joined: Sat Apr 03, 2010 12:56 pm
Location: Ankh-Mopork

Re: Problems with the addition of new lines in the srt files

Postby srtpal » Wed Aug 08, 2012 9:56 am

jcdr wrote:Oss, I too agree 200% on the need to accept UTF-16 for OpenSubtitles. After all, it is a variable length code so it should not take a lot more space, if this is the reason for not accepting it ?

While I agree that UTF-16 is desirable, I have to say that it is not a variable length code. UTF-16 is the same as UCS-2. It uses exactly two bytes for each character.

UTF-8 is the one with a variable length. Because of its variable length, most Roman-based alphabets produce a considerably smaller file in UTF-8 than in UTF-16. On the other hand, non-Roman characters tend to produce a bloated UTF-8 file and use up less space in UTF-16. For example Chinese characters will convert into about 4 bytes per character in UTF-8 but, as all others, only 2 bytes in UTF-16.

So, both have their advantages and disadvantages when it comes to the size of the file. Plus, as I mentioned Theatre 5 does not understand UTF-8 but does understand UCS-2 (aka UTF-16).
srtpal
 
Posts: 59
Joined: Sun Jun 21, 2009 5:28 pm

Re: Problems with the addition of new lines in the srt files

Postby jcdr » Wed Aug 08, 2012 2:11 pm

srtpal wrote:
jcdr wrote:Oss, I too agree 200% on the need to accept UTF-16 for OpenSubtitles. After all, it is a variable length code so it should not take a lot more space, if this is the reason for not accepting it ?

While I agree that UTF-16 is desirable, I have to say that it is not a variable length code. UTF-16 is the same as UCS-2. It uses exactly two bytes for each character.


Ah, my apologies. So if it takes twice the space for 90% of the files without any addition of non-roman letter/alphabet, then UTF-16 compatibility does not seem that important after all. Except for compatibility with few softwares such as Theatre 5, but even then, the majority of the files which will remain ANSI or UTF-8 coded will need conversion in Notepad.

EDIT: What about having the preview feature in UTF-8 instead of ANSI ? That would avoid having previews full of ��� for non-English files.
User avatar
jcdr
Moderator
 
Posts: 367
Joined: Sun Apr 08, 2012 9:49 am

Re: Problems with the addition of new lines in the srt files

Postby NomadaPT » Wed Aug 08, 2012 2:46 pm

jcdr wrote:So if it takes twice the space for 90% of the files without any addition of non-roman letter/alphabet, then UTF-16 compatibility does not seem that important after all.


100% correct, reason why UTF-8 is used preferably in net, further more... if, in the future the majority of softwares migrate to UTF-16, maybe that codification will be needed, but the more I think in that particular detail (use of space) I'm not so sure anymore that O.S. need to support UTF-16, after all the conversion from UTF-8 to UTF-16 (little indian or big indian) or to ANSI (when the latin alphabet is the issue) is fast (almost automatic in some softwares, even inherent) and without losses (srtpal can confirm this).

The real problem will be the non-latin alphabets, mainly the major ones of the Far East (Chinese, Japanese and Hangul) covered (partially) in the Supplementary Ideographic Plane, to be honest, I'm not so sure that subtitles in scripts covered in the Supplementary Multilingual Plane will arise suddenly, or even many of the Basic Multilingual Plane... I mean, how many subtitles do you believe O.S. is going to have in Cherokee (for instance)?
Last edited by NomadaPT on Wed Aug 08, 2012 3:05 pm, edited 1 time in total.
NomadaPT
 
Posts: 20
Joined: Mon Dec 22, 2008 3:12 am

Re: Problems with the addition of new lines in the srt files

Postby srtpal » Wed Aug 08, 2012 3:03 pm

jcdr wrote:Except for compatibility with few softwares such as Theatre 5, but even then, the majority of the files which will remain ANSI or UTF-8 coded will need conversion in Notepad.

Not just softwares, languages. For Chinese, UTF-16 is much better because the file is much smaller than UTF-8. And not just Chinese. Indic languages, too. So, if you consider the size of China and India, UTF-16 would be a very useful addition.

Simply put, straight 7-bit ASCII will take one byte, anything above it takes more than one byte. That means at least two bytes but the higher on the Unicode numbering scheme a character is, the more bytes it takes.

Please see this table. Only the first 127 characters (including control characters) are expressed with one byte. Characters 128-2047 take two bytes. Anything above that takes three or more bytes (up to six in theory, though in practice three, not the four I mistakenly mentioned earlier).

So, for anything numbered 2048 or above, UTF-16 is more compact than UTF-8. We are talking about billions of people who are better off with UTF-16. With the exception of the Middle East and the Asian part of the former Soviet Union, plus Vietnam and Mongolia (both of which use the Roman alphabet), pretty much most of Asia.
srtpal
 
Posts: 59
Joined: Sun Jun 21, 2009 5:28 pm

Re: Problems with the addition of new lines in the srt files

Postby NomadaPT » Wed Aug 08, 2012 9:29 pm

Just as information, because can be useful, Sourceforge developed a small program to convert between code pages (ANSI to UTF8/16/32, UTF* to ANSI/MAC/DOS, and so on).

I've tried and work's well.

It's open code and I wonder (please don't curse me) if the code can be included in the O.S. allowing the downloader to choose the final format.... it's just a thought, but would be a way to end the discussion about the codification(s) of the files and the allowed formats to support.

Here's the link: http://sourceforge.net/projects/cp-converter/


PHP, which OpenSubtitles uses, has this functionality built-in. It could be made into an option for downloads. Store the best format internally (UTF-8 or UTF-16), convert on the fly to preferred format for user. I don't feel it makes sense to have multiple subs with multiple encodings for movies.

cp-converter wasn't developed by sourceforge, by the way. It's hosted in Sourceforge but it's from sleeveroller. For every unix machine out there "iconv" does the same thing and it's easily scriptable (as well as the built-in editors in all Operating Systems, as far as I know).
NomadaPT
 
Posts: 20
Joined: Mon Dec 22, 2008 3:12 am

Re: Problems with the addition of new lines in the srt files

Postby subshare » Thu Aug 09, 2012 7:46 am

Hello, it's me again.

Meanwhile, it seems the problem got fixed. The first line isn't replaced anymore. Tried it today. However, if your first line is early in the file, then it overlaps with the line added by opensubtitles.org. This may lead to problems with some players. Some will show the following line nevertheless, others give priortiy to the very first line. Anyway, I know for sure it leads to problems when trying to mux it into an MKV file. Before doing so, you have to fix the overlap or MKVmerge will give an error.

As for the encoding, I think UTF-8 is the way to go. It's universal and covers all types of languages. In order to convert between different text encodings, you can simply use a text editor that is capable of doing this.

As a Ubuntu user, I simply use gedit which is preinstalled and handles everything. It can save in all sorts of codepages.
http://projects.gnome.org/gedit/

For Windows, I used Editpad Lite (freeware).
http://www.editpadlite.com/
It's the best Notepad replacement I know. It also handles all encodings and is able to convert between them. On top of that, you can also force Editpad Lite to interpret a text file with another codepage. This is very useful, if you extract srt files from an mkv, for instance, and then find out that it was saved with a wrong codepage.

I'm glad the first line issue got fixed. I can live with an extra line that promotes Opensubtitles. After all, it's still the best subtitles website. I found that Subscene is quite a mess sometimes, because the film data is not directly retrieved from IMDb, but titles are manually added by users. Thus, quite a few movies have wrong data (wrong IMDb number or wrong year).

Opensubtitles still has the best way to organize the movies/subtitles. Keep up the good work. I'll gladly continue to upload.
subshare
 
Posts: 9
Joined: Fri Jan 07, 2011 4:57 am

PreviousNext

Return to General talk

Who is online

Users browsing this forum: No registered users and 4 guests