Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
Jolmer87
Posts: 3
Joined: Sun Dec 29, 2013 5:15 am

How trustworty are the subs on opensubtitles.org?

Sun Dec 29, 2013 5:21 am

Hello people,


How trustworty are the subtitles at www.opensubtitles.org? How do I know if the translation is correct?
Actually I believe the translation is correct, but some explaination of how subtitles are being checked, would be great.
I am going to watch a lot of english spoken movies with Spanish subtitles. Because I want to improve my Spanish.

How is the translation quality of English spoken movies, in relation with the Spanish subtitles here on opensubtitles.org? I recently downloaded a Spanish subtitle for the movie the Hangover part 1.

Kind Regards,

Jolmer de Jong

User avatar
SmallBrother
Site Admin
Posts: 3726
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on opensubtitles.org?

Sun Dec 29, 2013 10:33 am

Not every single subtitle is checked line by line. Subs are not first checked and approved before they are available for download. For quality control we -partially- rely on user reports and rating. As a result the quality could vary from anything between very bad (like machine translations) to excellent.

For choosing the best version you should preferably pick subs from admins or trusted users, and/or subs marked as featured (by us) or with a high rating (by users). With some knowledge of the subbed language you can use the preview to check the first few lines yourself.
Do not just blindly pick the subs with most downloads, as this is often just the oldest one - which is on the contrary more likely to be of less quality.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

Jolmer87
Posts: 3
Joined: Sun Dec 29, 2013 5:15 am

Re: How trustworty are the subs on opensubtitles.org?

Mon Dec 30, 2013 5:58 pm

Not every single subtitle is checked line by line. Subs are not first checked and approved before they are available for download. For quality control we -partially- rely on user reports and rating. As a result the quality could vary from anything between very bad (like machine translations) to excellent.

For choosing the best version you should preferably pick subs from admins or trusted users, and/or subs marked as featured (by us) or with a high rating (by users). With some knowledge of the subbed language you can use the preview to check the first few lines yourself.
Do not just blindly pick the subs with most downloads, as this is often just the oldest one - which is on the contrary more likely to be of less quality.

Thank you for your answer.

For example, how can I know that this subtitle is of good quality?

"http://www.opensubtitles.org/en/subtitl ... part-ii-es"

User avatar
SimplyTheBOSS
Site Admin
Posts: 1326
Joined: Mon Feb 01, 2010 3:02 pm
Location: Finland

Re: How trustworty are the subs on opensubtitles.org?

Mon Dec 30, 2013 6:36 pm

Not every single subtitle is checked line by line. Subs are not first checked and approved before they are available for download. For quality control we -partially- rely on user reports and rating. As a result the quality could vary from anything between very bad (like machine translations) to excellent.

For choosing the best version you should preferably pick subs from admins or trusted users, and/or subs marked as featured (by us) or with a high rating (by users). With some knowledge of the subbed language you can use the preview to check the first few lines yourself.
Do not just blindly pick the subs with most downloads, as this is often just the oldest one - which is on the contrary more likely to be of less quality.

Thank you for your answer.

For example, how can I know that this subtitle is of good quality?

"http://www.opensubtitles.org/en/subtitl ... part-ii-es"

The only way is to watch movie with subtitles.
Image

Jolmer87
Posts: 3
Joined: Sun Dec 29, 2013 5:15 am

Re: How trustworty are the subs on opensubtitles.org?

Mon Dec 30, 2013 6:51 pm

[/quote]


The only way is to watch movie with subtitles.[/quote]

Yeah, but I mean, can I see anything at this link? "http://www.opensubtitles.org/en/subtitl ... part-ii-es" That the uploader of this subtitle is a 'SIlver member', does that say anything about the trustworty of the subtitles?

User avatar
SimplyTheBOSS
Site Admin
Posts: 1326
Joined: Mon Feb 01, 2010 3:02 pm
Location: Finland

Re: How trustworty are the subs on opensubtitles.org?

Mon Dec 30, 2013 7:00 pm

The only thing it says is that he has uploaded more than 51 subtitles, that's all.
Image

User avatar
SmallBrother
Site Admin
Posts: 3726
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on opensubtitles.org?

Tue Dec 31, 2013 12:16 pm

Yeah, but I mean, can I see anything at this link? "http://www.opensubtitles.org/en/subtitl ... part-ii-es" That the uploader of this subtitle is a 'SIlver member', does that say anything about the trustworty of the subtitles?
As SimplyTheBoss said, the only conclusion you can make from a user being Silver/Gold/Platinum, is the number of subs which were uploaded. Only the groups Admin and Trusted mean something about quality (although also this is not fully reliable).

In your case, looking at the result list for The Hangover Part II (http://www.opensubtitles.org/en/search/ ... ovie-66967), there is one upload by a Trusted user, Spaceboy74 (http://www.opensubtitles.org/en/subtitl ... part-ii-es). There I see written "Subs por http://www.argenteam.net", which seems to be a serious team of translators. Looking at this user's profile page (http://www.opensubtitles.org/en/profile/iduser-969697), there are some positive comments. So yeah, I would go for this one.

See also the post viewtopic.php?f=8&t=14555#p28688 where something similar is discussed.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on opensubtitles.org?

Tue Sep 23, 2014 1:04 pm

I think this is the right thread to expose my reflections about how opensubtitles works and how I use it.

As a programmer I usually deal with low-level tasks but I think it is good to become a philosopher once in a while.

So. Here goes my high-level view of subtitling. The process that I go through as user is:

- I get a movie somewhere. Usually the Internet.
- If it doesn't have subtitles (most likely) I search for them.
- If I can't find them. I make them.

I say "make" because I really "make" them by hand. I usually watch old films, many of which aren't available on DVD, so this is the only way to make subtitles publicly available. By this means I've already made three or four files.

When I search for subtitles, I like to do it the old way. I search by title and language. Not hash. Sometimes it is not useful because the video file is not registered. My point here is: I usually find subtitles with very poor quality.

Quality, this the key. Reading the posts of developers, I can see that rating is left completely to the users. I don't think there is a selection performed by the web site.

If you want to give a good user experience, I think THERE SHOULD BE SOME MEANS OF SUBTITLE QUALITY CONTROL. I think it would be nice not to find 80% of the subtitles having typos, bad timings, etc. I'm not talking about translation matters, grammatical constructs and all that. I'm talking about basic things like OCR errors. I think they are easy to detect automatically. The thing is that the more straw, the more difficult it is to get the grain.

¿What do you think?

User avatar
SmallBrother
Site Admin
Posts: 3726
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on opensubtitles.org?

Fri Oct 03, 2014 9:47 am

Not all admins have the same opinion, but I'll give you mine. In general, I agree, hector. And I think of course everyone would love better quality subs.

The problem is only: how to achieve this? First of all, a practical problem: time. On the Dutch section an average of around 50 subtitles is posted every day. Checking and correcting the very basics of the uploads (correct movie name, episode, etc.) is already some work. Then with a preview it is relatively easy to detect disasters like machine translations or 'total chaos subs', with stuff like lines of 100 characters, three or four lines, extreme faulty punctuation, etc. But going a bit deeper will take at least a couple of minutes per subtitle, meaning a couple of hours per day. And that's for the Dutch section, the English section has much more uploads every day. And to REALLY check subtitles a couple of minutes is really not enough. And that's only the linguistic part. Then stuff like duplicates, false or missing credits, etc. And the most forgotten, but at least as important: what about timings that are completely off - difficult or at least time consuming to check... Unfortunately ;) there is more in life than only subtitles - so, realistically admins have an impossible job. Here is where we need the help of users. But in reality users hardly rate or comment subtitles, or only vaguely ("these subtitles suck").

Then even if it would be possible to detect the level of quality, there is another problem: what to do with subtitles with mistakes? Which subtitles are bad enough to delete and which are not perfect, but good enough to stay? With 99% of the subtitles something is wrong, one way or another. Deleting those would make opensubtitles.org quite empty.

So I think this is why the situation is as it is. Only extreme and easy cases of 'bad' are deleted. And to choose good subs, you will need to go for Trusted users or admins as uploaders, maybe rated by multiple users as 'good' and/or flagged as 'featured' and made by known good translators/subtitlers.

So I agree: quality control. But how...?
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on opensubtitles.org?

Sat Oct 04, 2014 2:40 am

Thanks for your reply.

I think user rating is a good thing and it would be more helpful if we did use it. I'll take it into account from now on.

I understand that there is more in life than only subtitles. But that's what computers are for, aren't they? To help us accomplish cumbersome tasks. And that's why I posted it in the developers section.

How? In the practical side, there are some basic things that can be easily detected. For example, OCR errors. It would be easy to search for wrong words. See programs like ispell or aspell. The sequence "l'" (letter l) instead of "I'" (first person pronoun with apostrophe) could be found in a good deal of subtitle files. For that you need to know the language, of course.

But I think this is more of a political matter. As you say, the problem is what to do. I think it would be nice only to have some quality indicator assigned by some algorithm. Perhaps every file could obtain some grade based on several factors like bad words, line length, bad formed subtitles in general, and so forth. It's up to you and your imagination. There are some rules which are universally accepted.

About the quality of a translation, it is more difficult to rate by a program. But some chaotic grammatical constructs can also be detected.

The same can be applied to timings. Though this is understandably far more difficult than detecting bad words. But automatic speech recognition is already at hand. Well, that's only dreaming. But by the time you can detect bad timings, you probably won't need to write the subtitles by yourself.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on opensubtitles.org?

Sat Jun 13, 2015 6:37 pm

Hi.
Here I am, again. This is some critique. It is meant to be constructive. But this depends on administrators' criteria.

I'm thankful for this site, really. But I think there are some awful things. One of them is what I once called "jungle", i.e. mess. This is only one example. I think you can extrapolate to almost any film.

The other day I watched "The princess bride" with some subtitles. Very good film. And very good subtitles. But it occurred to me making some simple exercise. I downloaded nearly all the English subtitles for this film (about 15). How can it be? 15 different versions for the same film? Let's see.

So, I took them and did a little analysis. First of all, I "normalised" them, i.e. I converted all to utf-8
with "\r\n" end of line, so I could compare them.

And here is the result:

OS-ID ---- reference number - differences
-----------------------------------------------------------
3094223: base. No OCR errors. Good quality
3147563: 3094223 -0.2 s
3557938: 3557939 + HI annotations
3557939: 3094223 + song lyrics
4067445: 3557938 + lots of OCR errors
4097613: 3557938 + OCR errors
4162684: 3557938 + 1.2 s
4237611: 4067445 - carriage returns ("\n\r" --> "\n")
4408005: + ~0.4 s
4703061: - ~2.4 s
4703608: + ~7.2 s
5558914: - ~9.4 s

With this you can trace more or less the derivation paths.

You can say that basically there are 3 different versions:
  • 3094223: First subtitle. Very good quality.
  • 3557939: The same with some differences in transcription. Song lyrics added.
  • 3557938: Same as 3557939 with HI annotations.
And that's it. The rest is derived by merely adding or subtracting some fixed time. But there are some remarkable, funny things about this. For example, you can see that 4067445 is newer than 3557938 and yet it is the same file with a lot of OCR errors ("l" instead of "I") added. Don't you think it's funny?

Even more fun: 4237611 is the same file as 4067445 but with "\n" (UNIX end of line) instead of "\r\n" (DOS end of line).

My conclusion is that you have 80% of your storage space wasted. But even worse, when you have to choose one subtitle you find that many of them are crap (I'm talking about OCR errors). I think I'm in the right to use this word.

I'm not saying that you should delete owned subtitles without uploader's permission. But many of these are anonymous.

I'd like to know why people keep uploading versions which add nothing but confusion. Moreover, sometimes they are clearly worse than existing subtitle.

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on opensubtitles.org?

Sun Jun 14, 2015 8:11 pm

I guess the key is just stick to "owned" subtitles and forget about anonymous ones.

User avatar
SmallBrother
Site Admin
Posts: 3726
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on opensubtitles.org?

Tue Jun 16, 2015 12:07 pm

See also what I wrote here, which btw, belongs more here, then there.

But anyway - you surely have a point. But what is the solution? How long did it take you to analyze your example?
I can only repeat my conclusion: we need the help of users. With rating, comments, reports.

There are two million registered users and we have 300,000 titles.
The solution would be if every single user would take one movie in one language and analyze like you did. But that's mathematics, not reality.
I guess the key is just stick to "owned" subtitles and forget about anonymous ones.
Not necessarily, but yes, in general 'owned' subs are a bit better or a lot better.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on opensubtitles.org?

Tue Jun 16, 2015 1:15 pm

How long did it take you to analyze your example?
About half an hour. I used diff, a very simple, but very useful program. A good deal of the time just to normalise the files to UTF-8.

I understand you can't do this manually for every title present. But, there isn't some kind of tool to do it automatically? At least, very basic things like OCR errors or line length. I'm not talking about grammatical analysis.

User avatar
SmallBrother
Site Admin
Posts: 3726
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on opensubtitles.org?

Tue Jun 16, 2015 3:05 pm

But, there isn't some kind of tool to do it automatically? At least, very basic things like OCR errors or line length. I'm not talking about grammatical analysis.
Any subtitle software will show instantly lines that are too long. OCR error detection is a bit more sophisticated and typically/often it would still need a human to be sure.

But even then... when reject a subtitle with OCR errors? 1 OCR error? 10+ errors? 50? Same for line length - where to draw the line. And why go 'extreme' on OCR errors and line length, but leave for example spelling mistakes or subs with lines of 40 cps? I mean, a lowercase L versus a capital i is invisible with some fonts (don't get me wrong, it is still an error). But spelling or grammar mistakes are much more disturbing and lines with a CPS of 40 are unreadable.

Another thing to consider is the fact that we are not only dealing with subs (a bunch of digital data), but also with users (people like you and me). What to do if someone uploads a subtitle and it has whatever kind and number of mistakes? Maybe he/she has spent 20 hours on making the subs, with all passion and love and wanting to share it. And imagine it's the only one available, and many downloaders are happy with it, eventhough the quality is so-so. Maybe it can be a source for relatively easy improvements.

My idea is that it would take a human to consider all aspects together and then decide: delete or not delete. What I can do as an admin is have a look at the preview, this will give an indication if the subs are at least bearable or kind of useless.

But I understand your point. Quality is an issue and of course everybody would love to have only quality subs for every existing movie.
I don't have an answer...
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

Return to “General talk”

Who is online

Users browsing this forum: No registered users and 95 guests