Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
SmallBrother
Site Admin
Posts: 3724
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: Char Encoding

Wed Dec 07, 2016 11:47 pm

Anyways I've never understood the meaning of "release". A movie is a movie. It could have some different versions or cuts (then it should be: "directors_cut") but that's all. I don't understand all the hustle with "DVD" "BlueRay", "Remastered", "1080p", "XVid", "x264"... or anything concerning audio.
Of course, a directors cut or theatrical version are significantly different, they have different 'cuts'. And h264, AAC or 1080p or whatever just refers to a video or audio codec being used, or the output quality, and is just some additional info.

But the full release name refers to the exact video file. Imagine someone (let's say "ReLeASeR") takes a bluray, but cuts off the first 17 seconds, to get rid of some nagging for example. He calls it "The.Shining.1980.BRRip.1080p.h264.AAC.ReLeASeR.mp4".
Then comes SmallBrother, who thinks that nagging at the start should be included, and besides, he uses a VHS tape as source and the reel is running a little bit slow. He calls his video file "The.Shining.1980.VHS.72p.SmallBrother.mp4".
Both movies are exactly the same, no other cuts, lost or additional scenes. But subtitles synced to the ReLeASeR version can NOT be used for the version of SmallBrother. If ReLeASeR would just call it "The Shining" and SmallBrother would do the same, it would be confusing.
And that's also the answer to your next question:
What if you download a subtitle, unpack it and then you use it weeks or months later. How do you rate or comment it? I mean, how can you obtain the URL from the file name?
You cannot 'just' obtain the URL from the file name. But it would be easy to search. If there are 10 subtitles for The Shining, but all have unique release names, it's totally clear. Just search for "The.Shining.1980.VHS.72p.SmallBrother" and only that one will pop up.
For reality example, search for "The Shining" ( http://www.opensubtitles.org/en/search/ ... movie-3630 )
You will get six results.
Now search for "The.Shining.1980.DVDRip.DivX5" ( http://www.opensubtitles.org/en/search2 ... DRip+DivX5 ) and you will get the one you used and want to rate.
Most remarkably when you changed the forum design.
Ugh-ugh... I didn't change the forum design.
But now you got me going. To be honest, I think the forum is not so bad, but ask around what is my opinion about socalled responsive design. "Responsive" to 17 different kind of smartphones, but ignoring the fact that desktop screens exist. Or let me rephrase that:
ask arou
nd what
is my op
inion ab
out soca
lled res
ponsive
design

The next web site I will make will detect a smartphone and respond with a blue screen with "Turn off your phone, chill out, relax, enjoy the view, or read a book." If I will ever make another web site.

Oops, off-topic.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Char encoding and new design in forum

Thu Dec 08, 2016 12:12 pm

I meant you, administrators. I don't know what person took the decision or is responsible for this. In Spanish we have "tu" (you, singular) and "vosotros" (you, plural).

Yeah, speaking of "off topic"... What to do? You are replying to a post about some topic but then you refer to something "off topic". Sometimes this happens. Or perhaps most of the time. Sometimes I just change the subject so it reflects what's in the post. Yes, I know this has nothing to do with "Char encoding".

About responsive design... it must be a joke. This page takes 30 seconds to load in my computer. Well, I know this is the world we live in. Take it or leave it. In Spanish we say:

Para mañana, lentejas (lentils),
si quieres las comes, si no, las dejas.

And just not to be completely off-topic, I think I've already given my point of view about this subject. I think every file should be in UTF-8. I don't meddle in your (of you, OS admins) private policy. If you want to keep files as user uploads them, that's all right. It's up to you. But give me a good (web) interface where I can get UTF-8. Now I have to do detection and conversion manually. At least now, I have a shell script.

User avatar
SmallBrother
Site Admin
Posts: 3724
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: Char encoding and poetry

Thu Dec 08, 2016 3:24 pm

I meant you, administrators. I don't know what person took the decision or is responsible for this. In Spanish we have "tu" (you, singular) and "vosotros" (you, plural).
Don't worry, I just found a way to vent my opinion ;-)
And just not to be completely off-topic, I think I've already given my point of view about this subject. I think every file should be in UTF-8. I don't meddle in your (of you, OS admins) private policy. If you want to keep files as user uploads them, that's all right. It's up to you. But give me a good (web) interface where I can get UTF-8. Now I have to do detection and conversion manually. At least now, I have a shell script.
I think every file should be UTF8 AND software should support it and have it as default.
BUT this last one is not the case - and that's why I doubt if every file 'should' be UTF8.
I recall myself uploading my Dutch subs in CP1252 and being convinced that would be the best.
Later I thought UTF8 would be better and did some in UTF8.
Then back to CP1252 (sorry).
Now it's a mess. I think with some subs I mentioned the character encoding as additional info. Or at least I thought about it.

But okay, speaking of lentejas and whether to eat them or not:
Maybe oss can get that character encoding detection working for all users - although it would never be 100% accurate.
Another option would be to add an input field - but I think that would make the mess bigger instead of smaller.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: Char Encoding

Thu Dec 08, 2016 4:04 pm

Well, on topic again :)
There is nothing completely perfect and suitable for all needs. UTF-8 has some drawbacks compared to, let's say cp1252. It is more complex to handle and takes more space when stored. Besides, english (ASCII) is favoured over other languages like Russian or Arabic which can occupy more than double in comparison with dedicated scripts like Koi8. But in the age of terabyte disks and 8-core processors I think you can overlook those details. The benefits of having a universal encoding are worth it.

Another option would be adding it to search form.

And just another option would be to eradicate all the non-ASCII languages in the world. But I think I'm freaking out here. Just joking :roll: Esperanto is not ASCII-compliant.

User avatar
SmallBrother
Site Admin
Posts: 3724
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: Char Encoding

Thu Dec 08, 2016 5:01 pm

Another option would be adding it to search form.
Huh?
You mean using the detected character encoding and being able to search, using that as a filter?
I would say you (plural, general) would prefer to have a subtitle in cp1252 or even an unknown set, rather than having no subtitle at all.

Besides, in terms of programming it, I think it's easier to just mention the character set.
Esperanto is not ASCII-compliant.
If everybody would speak Esperanto, we wouldn't need subtitles ;-)
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

rcombs
Posts: 4
Joined: Fri Jan 26, 2018 11:19 pm

Re: Char Encoding

Sat Jan 27, 2018 1:33 am

I have some strong opinions on character encoding, but let's see if I can articulate a proposal on how to improve the situation here.

Basically, I'd like if all subtitle downloads could be UTF-8. This could be accomplished in a few ways.

- The simplest is to require UTF-8 on all uploads. This is very simple to enforce, and could be coupled with a help page on rejection indicating how to configure your software to save UTF-8.
- If rejecting legacy encodings is out of the question, the server could attempt to detect character encoding at upload-time based on the text and the specified language, and convert to UTF-8 either at that point or at download-time.
- The OpenSubtitles Uploader application could detect non-UTF-8, and convert from the system's legacy codepage in that case. This is probably a worthwhile endeavor even without any of the other suggestions here.

Ultimately, I'd love for clients never to have to do their own character encoding detection (or, less importantly, conversion).

User avatar
SmallBrother
Site Admin
Posts: 3724
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: Char Encoding

Sat Jan 27, 2018 12:16 pm

True, there are definitely some very good reasons to use UTF-8 encoding. But there are also good reasons not to. Some software simply doesn't support it, and if they do, sometimes the 'local' character set is default. Some tech-users might be able to recognize things and know what to do, but many others will just be upset by text not showing properly.

My personal opinion about what would be the best to use is changing about every month ;-)
But I am convinced it is not a good idea to *enforce* one way or another.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

rcombs
Posts: 4
Joined: Fri Jan 26, 2018 11:19 pm

Re: Char Encoding

Sat Jan 27, 2018 12:41 pm

What subtitle display software doesn’t support UTF-8, or fails to detect it?

And mismatching defaults are a problem with legacy encodings as well, whereas UTF-8 has the benefit of consistency and easy detection.

User avatar
Mazrim Taim
Posts: 122
Joined: Tue Mar 31, 2015 10:04 am

Re: Char Encoding

Mon Jan 29, 2018 12:45 am

I have some strong opinions on character encoding, but let's see if I can articulate a proposal on how to improve the situation here.

Basically, I'd like if all subtitle downloads could be UTF-8.
This could be accomplished in a few ways.

- The simplest is to require UTF-8 on all uploads. This is very simple to enforce, and could be coupled with a help page on rejection indicating how to configure your software to save UTF-8.
- If rejecting legacy encodings is out of the question, the server could attempt to detect character encoding at upload-time based on the text and the specified language, and convert to UTF-8 either at that point or at download-time.
- The OpenSubtitles Uploader application could detect non-UTF-8, and convert from the system's legacy codepage in that case. This is probably a worthwhile endeavor even without any of the other suggestions here.
(...)

What subtitle display software doesn’t support UTF-8, or fails to detect it? (...)



If your proposal comes to a vote, I'll go for nay. :wink:

First of all because of what Smallbrother mentioned. It's never a good idea to enforce a standard if a standard isn't really a standard.
A standard is only a standard if that standard is a universal standard. As in: applicable everywhere.

Video Players on a PC usually accept everything ranging from ANSI to Unicode and UTF-8 to SSA/ASS. Existing formats can easily be modified/updated; new formats can be added through updates. So no problems for people watching video on a PC.

A lot of people however, myself included, prefer to watch video using a TV and/or a peripheral device connected to a TV, as seen below.

Image



These devices run embedded software. Updating the firmware (via cable connection or via usb) is not an easy process for most.
That is assuming manufacturers even provide updates. Usually only in the first year after a product hits the market.
After that all support is dropped.


As a result, most low end or old(er) devices don't accept all existing formats.
This wouldn't be a problem if everybody would buy new devices at least once in 5 years.

Problem is: why would I, or why should anyone else?
I have a laptop from 2009 and it's running just fine. Capable of handling nearly everything I throw at it. So why replace it?

The same applies to my HDD/DVD-recorder. Also from 2009. Can't handle anything other than .mpg- and .avi-files.
Yes, it's annoying it can't handle .mkv and h264. However, I can still find XviD/avi releases for nearly everything I want to watch.
So again, why replace it with a younger model?



A lot of people feel this way. They wait for A) a device to break down, or B) must-have new features that make replacing worth while.
The added ability to play other video file formats just isn't enough (for me). So it's A) until B) comes along.

The same applies to subtitle files. The HDD/DVD recorder that I mentioned can only handle ANSI.
Use anything else (including UTF-8) and subs aren't displayed.

This happens to a lot of people who prefer to watch video using a TV or a peripheral device connected to a TV: no subs displayed.
And they start wondering why: https://forum.opensubtitles.org/viewtop ... 178#p37178


Which is precisely why I upload my subs in ANSI only. ANSI is truly universal. TV's, PC software and devices connected to a TV,
they all recognize ANSI (even old devices) and are able to handle it.

You'd be surprised how many devices (still) out there, won't accept anything else -
Media Player M102 only supports subtitle files in ANSI encoding format

So no, I prefer OpenSubtitles the way it is now. No restrictions on .srt-format. Let's keep it that way. :wink:
Image

rcombs
Posts: 4
Joined: Fri Jan 26, 2018 11:19 pm

Re: Char Encoding

Mon Jan 29, 2018 12:52 am

I'll point out that consistent UTF-8 can be converted to anything your device supports trivially, while legacy encodings require detection (which is much more complex) to support with modern software and devices.
You already have to convert anything with an encoding other than your device's supported one anyway; if all new files were consistent, this would be simpler.

User avatar
SmallBrother
Site Admin
Posts: 3724
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: Char Encoding

Mon Jan 29, 2018 6:52 pm

I understand what you are saying and you definitely have a point. This solution would maybe work for you and for Mazrim_Taim and some others. But reality is that many/most people are non-tech's. They will just see subs not working. They don't know it's because of UTF-8, they don't know converting the file is the solution, they don't know how to convert it. In other words: it would solve a (small) problem for a few and create a (big) problem for many...

But what would help is an input field where the uploader could state the character set used.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
Mazrim Taim
Posts: 122
Joined: Tue Mar 31, 2015 10:04 am

Re: Char Encoding

Mon Jan 29, 2018 11:32 pm

(...) reality is that many/most people are non-tech's. They will just see subs not working. They don't know it's because of UTF-8, they don't know converting the file is the solution, they don't know how to convert it. (...)
This!


But what would help is an input field where the uploader could state the character set used.
Or maybe "the system" can do the converting and all the downloader has to do is select the desired encoding.
Something like this could pop up after clicking 'download' -

Image


Another member here, mentioned this in 2016:
Most encodings are starting to slowly go away and move towards generalization of utf8 or similar "broad" standards.
It would be nice if we had a truly universal standard, but unfortunately we're not there yet. It may take another 10 years for 'outdated' devices to eventually be replaced. Problem is that by then there'll most likely be a new 'standard' and UTF-8 will have taken ANSI's place. :wink:
Image

rcombs
Posts: 4
Joined: Fri Jan 26, 2018 11:19 pm

Re: Char Encoding

Mon Jan 29, 2018 11:48 pm

Or maybe "the system" can do the converting and all the downloader has to do is select the desired encoding.
Something like this could pop up after clicking 'download' -
Sure, or a user could set a preference (with the default being UTF-8). Point being, the internal storage format would be consistent, so you could rely on downloads always being what you expect, rather than having to detect at download- or playback-time.
It would be nice if we had a truly universal standard, but unfortunately we're not there yet. It may take another 10 years for 'outdated' devices to eventually be replaced. Problem is that by then there'll most likely be a new 'standard' and UTF-8 will have taken ANSI's place. :wink:
That's very unlikely. Legacy encodings were replaced with Unicode in order to move to a single consistent standard. Unicode was designed to be expandable to contain far more code points than there exist characters in any language, and has broad support amongst all well-designed software and hardware from the past couple decades. There's no drive to replace it with anything else (unlike with legacy encodings, which needed to be replaced in order for devices from different countries to interact).

noembryo
Posts: 79
Joined: Thu Nov 16, 2017 12:07 am

Re: Char Encoding

Tue Mar 27, 2018 3:14 pm

Just saw this thread and I seize the opportunity to advertise my app Subber (;o) that has the option to select the encoding to use when saving the subtitles found (UTF-8 is the default)
SubberPrefsP2.png
SubberPrefsP2.png (10.22 KiB) Viewed 8074 times
It can use UTF-8 with a BOM byte that my "smart" TV needs (it will not work with a simple UTF-8 encoded file).
It can also convert the encoding of any srt/txt file already downloaded (using drag&drop).

Any more suggestions for features are welcomed.

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 28 guests