Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
Gautier
Posts: 147
Joined: Fri Jan 01, 2016 6:32 pm
Location: France / België / UK

How to make sense from these rubbish subtitles

Fri Feb 05, 2016 4:13 am

Hello, I just ripped a subtitle off a release. But look at the garbage I'm getting: https://copy.com/TQ8jTLDUIfFZ7yeb/bbc.srt?download=1

Two different ways of extracting gave me the same results. Every word of every line is added one at a time. Is there a tool that can make sense of this, and collapse the individual lines?

Code: Select all

1 00:00:06,680 --> 00:00:06,920 Britain's 2 00:00:06,920 --> 00:00:07,160 Britain's railway 3 00:00:07,160 --> 00:00:07,520 Britain's railway - 4 00:00:07,520 --> 00:00:07,760 Britain's railway - the 5 00:00:07,760 --> 00:00:07,920 Britain's railway - the oldest 6 00:00:07,920 --> 00:00:08,120 oldest and 7 00:00:08,120 --> 00:00:08,320 oldest and one 8 00:00:08,320 --> 00:00:08,640 oldest and one of 9 00:00:08,640 --> 00:00:08,920 oldest and one of the 10 00:00:08,920 --> 00:00:09,120 oldest and one of the busiest 11 00:00:09,120 --> 00:00:09,600 oldest and one of the busiest in 12 00:00:09,600 --> 00:00:09,760 oldest and one of the busiest in the
Last edited by Gautier on Sat Feb 06, 2016 8:55 pm, edited 1 time in total.

MarcoTC
Subtitles Admin
Posts: 44
Joined: Sat Jun 13, 2015 5:15 am

Re: How to make sense from these rubbish subtitles

Sat Feb 06, 2016 7:07 am

Instead of trying to repair it, maybe you should find a better way to rip it.
What do you use now?

User avatar
vankasteelj
Posts: 175
Joined: Sun Nov 15, 2015 1:09 am

Re: How to make sense from these rubbish subtitles

Sat Feb 06, 2016 4:28 pm

That would only take a few lines of script, if there is no tool to "collapse" those, I can wrap up a node.js script pretty easely to do so (but I rather not if there already is a tool for it^^). PM if needed, not sure i'll see an answer here.

Gautier
Posts: 147
Joined: Fri Jan 01, 2016 6:32 pm
Location: France / België / UK

Re: How to make sense from these rubbish subtitles

Sat Feb 06, 2016 4:47 pm

Hi MarcoTC. The rip is not at fault, it's the source file which seems to be like this. So the rip just extracts the garbage which is already present. Still... I have no clue how they managed to come up with a subtitle file like this one. Some piece of subtitling or encoding software must have screwed up quite badly.

No problems vankasteelj, I already found another file which I downloaded directly from the BBC. Just have to resync it now... But it would be nice to find a software that can handle these kind of situations. Subtitles from French television suffer from similar duplicates as well. Example:

Code: Select all

2 00:00:03,560 --> 00:00:03,720 "celle qui permet de faire le bonheur autour de toi, 3 00:00:03,720 --> 00:00:07,520 "celle qui permet de faire le bonheur autour de toi, 4 00:00:08,400 --> 00:00:08,480 "la source d'or que tu as cherchée aux 4 coins de la terre, 5 00:00:08,480 --> 00:00:12,280 "la source d'or que tu as cherchée aux 4 coins de la terre,

User avatar
vankasteelj
Posts: 175
Joined: Sun Nov 15, 2015 1:09 am

Re: How to make sense from these rubbish subtitles

Sat Feb 06, 2016 11:17 pm

From a technical point, those are really easy-to-solve problems, in any language. I do not know C# so I can't do it myself, but a plugin for https://github.com/SubtitleEdit/subtitleedit would be a great idea.

Gautier
Posts: 147
Joined: Fri Jan 01, 2016 6:32 pm
Location: France / België / UK

Re: How to make sense from these rubbish subtitles

Wed Feb 10, 2016 2:26 am

In the meantime, someone contacted me with the solution. Subtitle Edit has an option Tools>Merge Lines with same text/same time codes. Works a miracle!

User avatar
vankasteelj
Posts: 175
Joined: Sun Nov 15, 2015 1:09 am

Re: How to make sense from these rubbish subtitles

Wed Feb 10, 2016 2:35 am

That's what I thought, it's so easy to make (regarding code) that it seems perfectly normal to me it already exists^^

Return to “General talk”

Who is online

Users browsing this forum: No registered users and 85 guests