Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on

Tue Jun 16, 2015 4:34 pm

Yes, like many things in life, there isn't a simple answer to this question.

I know there are many cases in which it's difficult to take a decision. Yes, if it is the only subtitle available I think you can relax your rules. It's always better to have a subtitle with OCR errors than no subtitle at all. But there are also some clear cases in which deletion is the only reasonable action. For example, with this film, "The princess bride", several uploads add nothing to existing subtitles. No, sorry. They add a lot of errors. It is not that they aren't better, they are even clearly worse than the existing subtitles when they were uploaded. This is a clear case in which the upload should have been rejected. Or the file should be deleted now. Don't worry I'll report when I find such cases.

You know, the end-of-line convention is one of those funny, incomprehensible things. It is an example of how difficult is to find an agreement between different people. It is like a modern Tower of Babel. Something as simple as representing the end of a line. If you try to find some universal format you could say: "ASCII text". And then you discover that even then 3 different groups of people came to 3 different ways of representing end-of-line. There is a subtitle which is the same as another older one with a different end-of-line convention. Why not another upload with '\r' (Macintosh) end-of-line? You can't negate this is absurd.

I still think there should be some rules. I don't know what. But a minimal set of rules that everyone can agree upon.
I still think you should store everything in UTF-8. Either forcing uploads to be UTF-8 or asking to send encoding information or automatically detecting the encoding. Even converting everything to UTF-8 would be nice. But I'm aware that when you have 3 million files it is not so easy. This is why I think it is better to be less permissive with some things, before the mess is too unmanageable.

User avatar
Site Admin
Posts: 3737
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on

Tue Jun 16, 2015 7:01 pm

Don't worry, it IS already unmanageable - but we manage ;-)
For example, with this film, "The princess bride", several uploads add nothing to existing subtitles. No, sorry. They add a lot of errors. It is not that they aren't better, they are even clearly worse than the existing subtitles when they were uploaded. This is a clear case in which the upload should have been rejected. Or the file should be deleted now. Don't worry I'll report when I find such cases.
I agree, later subs which are worse than previous ones - that shouldn't be. How it CAN be? Various reasons, but for example a subtitle is taken somewhere and corrected, then uploaded. Later on, the same subtitle is just uploaded, without corrections, under a different release name. The admin taking care of that section has the flu, and doesn't notice the similarity or he has 100 subs uploaded that same day to handle. With more versions arriving, the job is exponentially more difficult.

Once again, yes please, report such cases.
For those who want to know how, see this topic: viewtopic.php?f=1&t=2595
There is a subtitle which is the same as another older one with a different end-of-line convention. Why not another upload with '\r' (Macintosh) end-of-line? You can't negate this is absurd.
I won't - you are right, it is absurd. But the same what I wrote before applies also here.
I still think there should be some rules. I don't know what. But a minimal set of rules that everyone can agree upon.
That would be nice. But if people cannot agree on an end-of-line standard, how can we ever agree on a set of rules about subs?
A while ago we agreed on stopping to accept machine translations. That's something.
I still think you should store everything in UTF-8.
There you go. There are many advantages to UTF-8 - but not only advantages. At least opinions about that vary. Some players vomit utf-8 and only accept utf-16, they might even treat utf-8 as being utf-16. Subtitle Workshop doesn't support UTF-8. Some people don't even know about the concept of character encoding, they just write text. Or they just use their player with standard settings (CP1252) and wonder why the (utf-8) subs have all these weird symbols. The list is endless.

Welcome to internet and to the world of computers, where standards are crucial but mostly absent, ignored or violated.
Nowadays a VPN is a must for everyone - it allows safe surfing and protects against spying governments and companies.
I advise AirVPN - now with a temporary birthday offer, from € 2,20 per month. Click the below banner for more info.


User avatar
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on

Tue Jun 16, 2015 7:35 pm

O.K. Let's admit there's more than UTF-8 in this world.

I think a lot of issues in this world of computers and Internet has to do with interfaces. You want to store in CP1252, BIG5 or whatever? O.K, do it. But give me a well-defined interface. Now, every time I download a file, the first thing for me to do is to detect the encoding and convert. It's not much trouble because usually it is UTF-8, ISO8859-1 or occasionally CP1252. But I'd rather get rid of this task. But I remember now that there's some effort in this direction: viewtopic.php?f=8&t=14992

Then you do the conversion at download time. I'd rather do it at upload but it's up to you.

I hope not to give you much trouble with all this discussion. It's only I fave too much free time. :-)

User avatar
Site Admin
Posts: 3737
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on

Tue Jun 16, 2015 9:09 pm

But give me a well-defined interface. Now, every time I download a file, the first thing for me to do is to detect the encoding and convert.
There is a point in for example adding an input field with "encoding", but once again, most people don't know about the concept. So what will happen...?

Detecting - yes, possible, but it's not fully accurate. Therefore, conversion at upload OR at download is tricky. You don't want to end up with a messed up file because of detection and conversion errors. That's worse than anything else.
I hope not to give you much trouble with all this discussion. It's only I fave too much free time. :-)
Yes, you do give some trouble ;-) but it's for a good cause and an interesting discussion.
"Too much free time" ??? - Clean up the subs, 3,331,429 more to go :)
Nowadays a VPN is a must for everyone - it allows safe surfing and protects against spying governments and companies.
I advise AirVPN - now with a temporary birthday offer, from € 2,20 per month. Click the below banner for more info.


User avatar
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on

Wed Jun 17, 2015 11:30 am

Clean up the subs, 3,331,429 more to go
What's this? Is it Svolochi(2006)? I can't. I don't speak nor write Chinese.

User avatar
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on

Wed Jun 17, 2015 3:56 pm

Oh, I get it. :) Sorry. I misunderstood that.

User avatar
Site Admin
Posts: 3737
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on

Wed Jun 17, 2015 8:26 pm

I don't speak nor write Chinese.
Okay, 3,332,344 subtitles minus 45,991 chinese subs equals 3,286,343 subs to go.
Nowadays a VPN is a must for everyone - it allows safe surfing and protects against spying governments and companies.
I advise AirVPN - now with a temporary birthday offer, from € 2,20 per month. Click the below banner for more info.


User avatar
Posts: 23
Joined: Tue Jan 08, 2013 7:07 pm
Location: Turtle Islands

Re: How trustworty are the subs on

Wed Jun 17, 2015 9:53 pm

I don't speak nor write Chinese.
Okay, 3,332,344 subtitles minus 45,991 chinese subs equals 3,286,343 subs to go.
SB you are going to scare hector away *grin*

User avatar
Site Admin
Posts: 3737
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on

Wed Jun 17, 2015 10:28 pm

SB you are going to scare hector away *grin*
Just imagine 3,286,343 reports tomorrow...
Nowadays a VPN is a must for everyone - it allows safe surfing and protects against spying governments and companies.
I advise AirVPN - now with a temporary birthday offer, from € 2,20 per month. Click the below banner for more info.


User avatar
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on

Wed Jun 17, 2015 11:03 pm

You don't know what I'm capable of. Don't tempt me :-D

User avatar
Site Admin
Posts: 3737
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on

Wed Jun 17, 2015 11:07 pm

I AM tempting you.
Nowadays a VPN is a must for everyone - it allows safe surfing and protects against spying governments and companies.
I advise AirVPN - now with a temporary birthday offer, from € 2,20 per month. Click the below banner for more info.


User avatar
Posts: 23
Joined: Tue Jan 08, 2013 7:07 pm
Location: Turtle Islands

Re: How trustworty are the subs on

Thu Jun 18, 2015 1:11 am

Not so sure if I want to wake up tomorrow.--> Let SB handle all those 3,286,343 reports made by hector. May the force be with you.

User avatar
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on

Thu Jun 18, 2015 5:10 pm

Well. Beginning at the left side (most significant digit) we have "3". And there I stopped. There's only 6 more digits to go, so in a week I have it done "green" I mean "grin"

Joking apart, perhaps I was overestimating the importance or relevance of OCR errors. Yes, there are some more relevant considerations like linguistic issues. But my point is that I think they are easier to detect than false credits or a bad translation. It's a simple dictionary lookup. Or the end-of-line issue. Perhaps it isn't very relevant either because this is not very frequent (file duplicated with a different EOL convention) but it is VERY EASY to detect automatically. So, my suggestion is to set some inlet filter. There you could compare against present subs and perhaps attach some automatic quality rating that could help deciding whether to accept it or not.

I don't quite know the human implications of setting such filter. I don't know if people would get angry because his/her subtitle has been rejected. I simply don't know. I'm not an admin myself. But I think it would improve quality and user experience. And if you define precisely your rules people would know in advance.

User avatar
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: How trustworty are the subs on

Fri Jun 19, 2015 1:07 pm

So, aren't those OCR errors enough reason to mark a subtitle as bad?

My experience tells me that talking about policy is the best way to make enemies. But it must be done. I'm convinced life is all about policy.

I use a font that makes them visible. Besides, what if I want to do a text search in my subtitles? If I search for "accept the things I cannot change" I'm out of luck :-( Then I should rather search for "accept the things l cannot change". Perhaps we should change the English language to accommodate it to OCR texts :-D

Then I was thinking and noticed that I marked three of them because they have OCR errors but the other one, that I didn't mark as bad, does have some too.

I don't know. I suppose I'm a little picky about these things. But again, OCR errors are very easy to catch. It just took me 5 minutes to correct. Using regular expressions you can easily correct the "l" for "I" error. And then, finally, a little spell checking.

I'd like to fave them marked as bad. Specially if there are some error-free ones. I think this is a key point: one subtitle is not "good" or "bad" for itself but compared to others. Or if they are not bad, at least I'd like to know that they contain OCR errors.

User avatar
Site Admin
Posts: 3737
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: How trustworty are the subs on

Fri Jun 19, 2015 2:23 pm

So, aren't those OCR errors enough reason to mark a subtitle as bad?
I think the "bad" mark should be used similar to the decision of an admin to delete or not delete. Different admins have different criteria, but this is how I see it:
For OCR errors, it depends on the kind and numbers. Three times l instead of I, no problem. 300 times I=l 1=l t=l nn=m and god knows what else could happen, and words are split with a space (accep t the things I ca nnot chan ge), and two words are merged into one (accept thethings I cannotchange), corrections are not very easy, and also other (better) versions are available, then it starts to be a bit very annoying and very clear. Anything in between, the decision bad/not bad (or delete/not delete) is a problem...

Same same for linguistic errors, too high CPS-ratio, etc..

So, I think maybe a policy would be nice, but it is not so easy. Not defining it, and even less to automate it.

Meanwhile, yes, it would be nice if users (and/or uploaders) comment subs - maybe even better than rating, because what if a subtitle is rated bad or with a "1", it's not clear WHY.
Nowadays a VPN is a must for everyone - it allows safe surfing and protects against spying governments and companies.
I advise AirVPN - now with a temporary birthday offer, from € 2,20 per month. Click the below banner for more info.


Return to “General talk”

Who is online

Users browsing this forum: No registered users and 3 guests