Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
tacman1123
Posts: 3
Joined: Sat Sep 04, 2021 2:34 pm

are subtitle dump files still available?

Sat Sep 04, 2021 2:43 pm

The "last modified" dates on https://dl.opensubtitles.org/addons/export/ are from 2019, and I'm getting 404 errors on https://dl.opensubtitles.org/addons/exp ... nth.txt.gz now.

Are these dumps still valid and current? Is there an alternative besides the API? A raw database dump?

Thanks,

Tac

someperson_12345
Posts: 2
Joined: Mon Sep 06, 2021 1:41 pm

Re: are subtitle dump files still available?

Mon Sep 06, 2021 1:50 pm

The last modified dates on the first link now show "2021-09-06 10:16:31", so it looks like an admin has now updated this. I am downloading now.

If the database is kept up to date this is great, and I deeply appreciate this. The "Open" in OpenSubtitles is still very important and I have a lot of project ideas that I can build upon the raw user-contributed data of this site.

someperson_12345
Posts: 2
Joined: Mon Sep 06, 2021 1:41 pm

Re: are subtitle dump files still available?

Mon Sep 06, 2021 2:24 pm

OK I've downloaded the "subtitles_all.txt.gz" file. It was 278MB gzip compressed and 1.2GB uncompressed. It simply contains the metadata:

Code: Select all

IDSubtitle MovieName MovieYear LanguageName ISO639 SubAddDate ImdbID SubFormat SubSumCD MovieReleaseName MovieFPS SeriesSeason SeriesEpisode SeriesIMDBParent MovieKind URL 1 Alien3 1992 English en 2004-10-31 23:54:23 103644 sub 2 Alien.3 11.000 0 0 0 movie http://www.opensubtitles.org/subtitles/1/alien3-en [...] 8797040 The Flintstones 1994 Ukrainian uk 2021-09-06 10:07:36 109813 srt 1 The Flintstones (1994) 0.000 0 0 0 movie http://www.opensubtitles.org/subtitles/8797040/the-flintstones-uk
The last entry (Flinstones shown above), was posted 4 hours ago, so it's definitely seems up-to-date, but there's only 5,372,814 lines (rather than 5,920,874 that OpenSubtitles.Org claims.I wonder why this is.

Also, now that I have the 280MB metadata, do you know if there is a complete database dump of the actual SRT files available? If each file was a 100 kilobytes each file it would be 600GB uncompressed. I have a gigabit internet connection so it would only take me 1.3 days to download. Using a cheap VPS that's only a few dollars to transfer that much data, but on Amazon S3 it's like $15. Maybe torrents are a better way to release it. But I just checked and there isn't a torrent release of the subtitles :/

So the question remains, anybody know how I can acquire the entire OpenSubtitles.Org 600GB SRT file database? I am happy to pay to cover the costs involved :)
Last edited by someperson_12345 on Mon Sep 06, 2021 2:26 pm, edited 1 time in total.

User avatar
oss
Site Admin
Posts: 5878
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: are subtitle dump files still available?

Tue Sep 07, 2021 9:52 am

Hi

there are disabled subtitles counted as well, but not exported (you can not download them). It is faster to do select...count(*) instead select * ... where Enabled = 1...

For export, write me email.

Also, when you go to https://dl.opensubtitles.org/addons/export/ and folder is empty, it means export script is working in background, come back later and you should find the files there...

User avatar
hector
Posts: 370
Joined: Wed Jan 01, 2014 12:27 pm
Location: Spain

Re: are subtitle dump files still available?

Sat Sep 11, 2021 9:50 am

May I give my opinion on this subject?

I can't figure why anyone would want to download the whole database. First of all there is the problem of synchronisation. Once you've finished downloading, the next minute it will be outdated because you don't get the uploads. Second, you are talking about money. Don't talk about money, talk about resources. Currently OS servers are sometimes overloaded. Now imagine there are 1000 people like you, downloading the whole database. Let alone 10,000 people.

I highly appreciate the "open" in opensubtitles too. But "open" does not mean necessarily "free".

I think downloading 600 GB is a silly thing to do, even if it is cheap in terms of money.

If someone wants to mirror the whole dtabase, it is oss' decision to allow or forbid. But mirrors are usually kept in synch and are used to distribute load.

drzraf
Posts: 1
Joined: Sat May 06, 2023 2:27 am

Re: are subtitle dump files still available?

Sat May 06, 2023 2:37 am

Dears,

across the decades, we all have seen dozens (if not hundreds of projects) which, like opensubtitles, carefully collected a whole lot of human history (be it metadata about books, music, disc, covers, games, science, ...). Most of them (some even older than opensubtitles.org) ended up in a big black hole from a day to another, without any way to resurrect the content.
Reason could have been legal, personal, economic, technical. A domain name? An accident? A lost of motivation or an unexpected pressure.

We definitely don't want opensubtitles to end up that way ever and the only thing to avoid this is (like stackoverflow, freedb and many other do nowadays) to host publicly a dump file (and get it downloaded by archive.org).

@admin: Do it once every month, every 6 months, or even every year, but please do it for the sake of data and heritage preservation.

tacman1123
Posts: 3
Joined: Sat Sep 04, 2021 2:34 pm

Re: are subtitle dump files still available?

Fri Oct 13, 2023 1:51 pm


I can't figure why anyone would want to download the whole database.
Like @drzraf said, for archival reasons. But also for data analysis. Even if it's a snapshot (through there are solutions for that, of course). For example, if you're doing research on how language evolves, subtitles are a great way to get dialog. In real life, I'm hearing more active gender-neutral language (using "they" and "their" instead of "he" and "his", or in Spanish, using "todes, amiges" instead of "amigos" or "amigos y amigas".

One might also want to research how race is is discussed -- "negro" used to be an acceptable term, then African-American, then "black" and now "Black" (capitalized when written).

These are just examples, but you asked why anyone would want it, and my answer is that subtitles represent dialog and is a rich source of data.

User avatar
oss
Site Admin
Posts: 5878
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: are subtitle dump files still available?

Sat Oct 14, 2023 3:32 am

just write me PM, exaplin why you want the data, which data you want and I make decision. To post it publicly, well, probably still not ready for that.

tacman1123
Posts: 3
Joined: Sat Sep 04, 2021 2:34 pm

Re: are subtitle dump files still available?

Sat Oct 14, 2023 1:56 pm

just write me PM, exaplin why you want the data, which data you want and I make decision. To post it publicly, well, probably still not ready for that.
Thanks. My application is just a hobby project, and I don't really need the entire dataset. For my testing purposes, I simply scrape the srt files I need. But if I ever get to the point where larger data would be helpful, I'll PM. Thanks for making this site available!

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 33 guests