Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
tacman1123
Posts: 1
Joined: Sat Sep 04, 2021 2:34 pm

are subtitle dump files still available?

Sat Sep 04, 2021 2:43 pm

The "last modified" dates on https://dl.opensubtitles.org/addons/export/ are from 2019, and I'm getting 404 errors on https://dl.opensubtitles.org/addons/exp ... nth.txt.gz now.

Are these dumps still valid and current? Is there an alternative besides the API? A raw database dump?

Thanks,

Tac

someperson_12345
Posts: 1
Joined: Mon Sep 06, 2021 1:41 pm

Re: are subtitle dump files still available?

Mon Sep 06, 2021 1:50 pm

The last modified dates on the first link now show "2021-09-06 10:16:31", so it looks like an admin has now updated this. I am downloading now.

If the database is kept up to date this is great, and I deeply appreciate this. The "Open" in OpenSubtitles is still very important and I have a lot of project ideas that I can build upon the raw user-contributed data of this site.

someperson_12345
Posts: 1
Joined: Mon Sep 06, 2021 1:41 pm

Re: are subtitle dump files still available?

Mon Sep 06, 2021 2:24 pm

OK I've downloaded the "subtitles_all.txt.gz" file. It was 278MB gzip compressed and 1.2GB uncompressed. It simply contains the metadata:

Code: Select all

IDSubtitle MovieName MovieYear LanguageName ISO639 SubAddDate ImdbID SubFormat SubSumCD MovieReleaseName MovieFPS SeriesSeason SeriesEpisode SeriesIMDBParent MovieKind URL 1 Alien3 1992 English en 2004-10-31 23:54:23 103644 sub 2 Alien.3 11.000 0 0 0 movie http://www.opensubtitles.org/subtitles/1/alien3-en [...] 8797040 The Flintstones 1994 Ukrainian uk 2021-09-06 10:07:36 109813 srt 1 The Flintstones (1994) 0.000 0 0 0 movie http://www.opensubtitles.org/subtitles/8797040/the-flintstones-uk
The last entry (Flinstones shown above), was posted 4 hours ago, so it's definitely seems up-to-date, but there's only 5,372,814 lines (rather than 5,920,874 that OpenSubtitles.Org claims.I wonder why this is.

Also, now that I have the 280MB metadata, do you know if there is a complete database dump of the actual SRT files available? If each file was a 100 kilobytes each file it would be 600GB uncompressed. I have a gigabit internet connection so it would only take me 1.3 days to download. Using a cheap VPS that's only a few dollars to transfer that much data, but on Amazon S3 it's like $15. Maybe torrents are a better way to release it. But I just checked and there isn't a torrent release of the subtitles :/

So the question remains, anybody know how I can acquire the entire OpenSubtitles.Org 600GB SRT file database? I am happy to pay to cover the costs involved :)
Last edited by someperson_12345 on Mon Sep 06, 2021 2:26 pm, edited 1 time in total.

User avatar
oss
Site Admin
Posts: 5230
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: are subtitle dump files still available?

Tue Sep 07, 2021 9:52 am

Hi

there are disabled subtitles counted as well, but not exported (you can not download them). It is faster to do select...count(*) instead select * ... where Enabled = 1...

For export, write me email.

Also, when you go to https://dl.opensubtitles.org/addons/export/ and folder is empty, it means export script is working in background, come back later and you should find the files there...

User avatar
hector
Posts: 346
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: are subtitle dump files still available?

Sat Sep 11, 2021 9:50 am

May I give my opinion on this subject?

I can't figure why anyone would want to download the whole database. First of all there is the problem of synchronisation. Once you've finished downloading, the next minute it will be outdated because you don't get the uploads. Second, you are talking about money. Don't talk about money, talk about resources. Currently OS servers are sometimes overloaded. Now imagine there are 1000 people like you, downloading the whole database. Let alone 10,000 people.

I highly appreciate the "open" in opensubtitles too. But "open" does not mean necessarily "free".

I think downloading 600 GB is a silly thing to do, even if it is cheap in terms of money.

If someone wants to mirror the whole dtabase, it is oss' decision to allow or forbid. But mirrors are usually kept in synch and are used to distribute load.

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 0 guests