Hi.
Here I am, again. This is some critique. It is meant to be constructive. But this depends on administrators' criteria.
I'm thankful for this site, really. But I think there are some awful things. One of them is what I once called "jungle", i.e. mess. This is only one example. I think you can extrapolate to almost any film.
The other day I watched "The princess bride" with some subtitles. Very good film. And very good subtitles. But it occurred to me making some simple exercise. I downloaded nearly all the English subtitles for this film (about 15). How can it be? 15 different versions for the same film? Let's see.
So, I took them and did a little analysis. First of all, I "normalised" them, i.e. I converted all to utf-8
with "\r\n" end of line, so I could compare them.
And here is the result:
OS-ID ---- reference number - differences
-----------------------------------------------------------
3094223: base. No OCR errors. Good quality
3147563: 3094223 -0.2 s
3557938: 3557939 + HI annotations
3557939: 3094223 + song lyrics
4067445: 3557938 + lots of OCR errors
4097613: 3557938 + OCR errors
4162684: 3557938 + 1.2 s
4237611: 4067445 - carriage returns ("\n\r" --> "\n")
4408005: + ~0.4 s
4703061: - ~2.4 s
4703608: + ~7.2 s
5558914: - ~9.4 s
With this you can trace more or less the derivation paths.
You can say that basically there are 3 different versions:
- 3094223: First subtitle. Very good quality.
- 3557939: The same with some differences in transcription. Song lyrics added.
- 3557938: Same as 3557939 with HI annotations.
And that's it. The rest is derived by merely adding or subtracting some fixed time. But there are some remarkable, funny things about this. For example, you can see that 4067445 is newer than 3557938 and yet it is the same file with a lot of OCR errors ("l" instead of "I") added. Don't you think it's funny?
Even more fun: 4237611 is the same file as 4067445 but with "\n" (UNIX end of line) instead of "\r\n" (DOS end of line).
My conclusion is that you have 80% of your storage space wasted. But even worse, when you have to choose one subtitle you find that many of them are crap (I'm talking about OCR errors). I think I'm in the right to use this word.
I'm not saying that you should delete owned subtitles without uploader's permission. But many of these are anonymous.
I'd like to know why people keep uploading versions which add nothing but confusion. Moreover, sometimes they are clearly worse than existing subtitle.