Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
ai.opensubtitles.com
Posts: 18
Joined: Fri Dec 29, 2023 4:21 pm
Location: Planet Earth
Contact: Website

Re: MACHINE TRANSLATION NOTICE!

Sat Dec 30, 2023 10:27 am

I have replied to your message. You can now use the service on https://ai.opensubtitles.com/ with your OS.org account. I agree with your arguments against the translation quality. It may not be much better than what you expect, but it can still be useful. Transcription, however, is different. It can produce quite good results, depending on the audio quality and other factors, that can serve as a draft for manual fine-tuning. If you test it, keep in mind that English or other widely spoken languages as audio sources will likely produce better results, because there is more training data available for an AI to learn from. Among the transcription models, AWS and Assembly AI are the best in quality.

User avatar
SmallBrother
Site Admin
Posts: 3726
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: MACHINE TRANSLATION NOTICE!

Mon Jan 01, 2024 5:15 pm

So okay, I did some testing.

I had https://www.opensubtitles.org/en/subtit ... olocene-en translated from English into Dutch, using AWS.

Firstly, note that this subtitle is NOT verbatim, but a 'compressed' version taking technical things into consideration, reading speed being the most important one. I chose this one, to avoid an 'unfair' bad result, which would more likely happen with a verbatim subtitle having much more text within the same period of time, and thus be already too fast to begin with. I chose AWS, because supposingly that's the better one, beating DeepL.

I expected technical flaws, mainly in this reading speed, since Dutch is on average around 20-25% 'longer' than English. And indeed, average CPS-ratio (unit for reading speed) went up from 11,9 to 15,4 CPS. That's 30%, which is a bit more than expected, but okay. Maximum reading speed however went up from 22,2 to 84,4 CPS. That is extreme, very strange and not expected at all.

So I had a look what happened.

For some reason, the translating machine merged some lines together, first time at subtitle sequence #99. After that, all translated text moved up one sequence, For example the translated text of English #101 into the Dutch #100:

101
00:10:05,689 --> 00:10:09,316
Let's go, you ready?
I'll take it easy on you.

became:

100
00:10:03,028 --> 00:10:04,311
Laten we gaan, ben je er klaar voor?
Ik zal het rustig aan doen.

So from here the subs may be well translated, but they are more than two seconds off-sync. And from here the same thing happens 8 times more, so progressively things get worse and worse. Sorry to say, but this off-sync issue makes the subs simply worthless and not usable.

Still, I had a look at the actual translation. To be honest, I expected better, but also here the result is disappointing. A fair amount is quite okay, but many sentences seem to be translated word by word, creating twisted sentences and for example a plural verb tense for a singular subject. Or some alien full stops or capitalization. And of course some very literal translations for expressions like "I'd tap that" (I would sleep with her) or "you are a tool" (something like "you are a weirdo", nothing to do with hammers and screw drivers or so), but that's normal for machine translations.

Anyway, I did the same using DeepL.

This is much better, but still, a similar thing is happening. At #810 a line gets merged and so from line #811 the subs are off-sync.
Also translation wise it's better, but still obviously machine translated.

Since this subtitle source was meant to create the best results, I didn't bother to do some testing with subs from which I think the result would probably be bad.

Sorry, but I think your service needs some fine tuning.
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

Elliot92
Posts: 1
Joined: Sun Feb 04, 2024 4:24 pm

Re: MACHINE TRANSLATION NOTICE!

Sun Feb 04, 2024 4:33 pm

Exciting news! Will the API offer a feature to flag problematic subtitles, similar to marking BAD subtitles? Early machine-translated ones often overshadow quality releases, causing frustration. Limiting to two subtitles per language per release and potential collaboration with sources like addic7ed could bring positive change.

Solstyx
Posts: 4
Joined: Thu Mar 21, 2024 9:13 pm

Re: MACHINE TRANSLATION NOTICE!

Fri Mar 22, 2024 11:22 pm

Hi, I'm new here and I've spent a week working on my first subtitle. I initially had it translated by GPT-4, then meticulously improved each line in Subtitle Edit. I even watched the episode three times to ensure the subtitles felt natural and made sense within the context of the story, which is fundamental to subtitling.

However, it got removed because it was machine-translated. Out of 700 lines, only 10% were not good, yet all 630 lines had to be removed. It required a lot of effort. But because it was machine-translated, it had to go. This approach is outdated; advancements have been made rapidly, and it's almost self-defeating not to leverage them.

Therefore, I advocate for a minimum acceptable percentage of accuracy, regardless of how the file was created. The remaining percentage could be improved by the rest of the community. This way, we collaborate to achieve the best subtitles. Currently, the barrier is set too high, discouraging newcomers from even attempting, resulting in no subtitles at all. After all, 60% accuracy is always better than none, right?

Yes, a Google Translate file will never reach 60% and is a pitfall. However, with contemporary AI models, it's a different story. Technology progresses rapidly, and the rules surrounding this issue are outdated.

Solstyx
Posts: 4
Joined: Thu Mar 21, 2024 9:13 pm

Re: MACHINE TRANSLATION NOTICE!

Fri Mar 22, 2024 11:29 pm

As a small addition: Unfortunately, the AI service from OS is still below par. I tried it as well and unfortunately wasted 5 euros on it. It wasn't even 60% accurate and especially out of sync and poorly translated. However, GPT-4 was significantly better. Even after cross-checking everything, it would have easily met the 60% threshold.

I continue to find it regrettable that it was removed without any consideration, in a bold and blunt manner. Reading that it was deemed "nonsense" after a week of hard work is disheartening and something that, in my opinion, needs to be addressed. Otherwise, this community will undoubtedly wither away, in my view.

User avatar
OhItsStefan
Posts: 6
Joined: Sat Jan 06, 2024 7:36 pm

Re: MACHINE TRANSLATION NOTICE!

Sun Mar 24, 2024 6:07 pm

I get your frustration and initially fell into the same pitfalls that come with polishing up machine translations.

The truth is though that despite AI being advanced enough to get a somewhat decent result, it's just not equipped to deal with the intricacies of creating subtitling, the more technical side. Which results in subtitles that are just not good enough for anything other than personal use.

While I do agree that, with the rise of more advanced AI models, the guidelines should be looked at again, until that time it's best to have a strict policy to guard the quality of the subtitles that are uploaded. I think in the long run that will be the most beneficial for both the content that's hosted as well as the community that puts hours into making these subtitles.

Return to “General talk”

Who is online

Users browsing this forum: No registered users and 33 guests