Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
plison
Posts: 2
Joined: Tue Oct 06, 2015 4:35 pm

Table dump with full list of attributes

Tue Oct 06, 2015 4:54 pm

Hi all,

I'm doing research on machine translation based on the OPUS parallel corpus of subtitles (see http://opus.lingfil.uu.se), which is itself extracted from the subtitle data from OpenSubtitles.org. I am looking at ways to score the quality of the subtitles in my corpus. To this end, I would like to use some of the attributes in the database, especially the attributes SubRating, UserRank and SubDownloadsCnt, as these may be correlated with the subtitle quality.

Unfortunately, these attributes are not provided in the export table dumps that are currently provided on http://dl.opensubtitles.org/addons/export/. I would be great to get a table with the full list of attributes (one single dump is sufficient). Thanks!

User avatar
SmallBrother
Site Admin
Posts: 3726
Joined: Sun Mar 04, 2012 12:59 pm
Location: Somewhere on this globe

Re: Table dump with full list of attributes

Tue Oct 06, 2015 6:58 pm

Without going into the technical stuff, I just want to emphasize that from the things you mentioned only "subtitle rating" and user rank "Sub Translator" (and to some extend also "Trusted") would correlate directly to the quality of subtitles.

Subtitle rating:
this should be a value for the subtitle's quality, but keep in mind that users sometimes just might give a high rating to express their appreciation, instead of saying something about the quality of the subs.

User ranks:
We have a few different user ranks. The bronze/silver/gold/platinum ranks ONLY refer to the number of uploads and have nothing to do with quality. Maybe even on the contrary - think about it, how well can subs be made/checked/corrected if the number of uploads if high? Another user maybe uploads only one per month, but it's a good one...

The only ranks that matter are "trusted" and "sub translator". Lately these are given only to users with more or less constant high quality uploads. But, in the past, "Trusted" was given more easily, so also this should not be taken too heavy. "Sub Translator" is relatively new, and should(!) only be given to users with high quality uploads.

Number of downloads:
It says nothing, or opposite to what you expect. The highest number of downloads are often just for the oldest subtitle. On top of that it's a snowball effect, because many users tend to choose the subs which are downloaded most. The newer subtitles will have much less downloads, but the best chance to be corrected.

See also viewtopic.php?f=1&t=14224 (About Trusted and Sub Translator ranks)
Something similar about the meaning of number of downoads, user ranks etc. in relation to subtitle quality was discussed earlier somewhere else on this forum, but I cannot find it anymore... Try the search function ;-)
Nowadays a VPN is a must for everyone. A VPN allows you safe surfing and protects you against spying governments and companies.
I advise AirVPN - from € 2,75 per month. Click the below banner for more info.


Image

User avatar
oss
Site Admin
Posts: 5890
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: Table dump with full list of attributes

Wed Oct 07, 2015 8:35 pm

write me PM...

plison
Posts: 2
Joined: Tue Oct 06, 2015 4:35 pm

Re: Table dump with full list of attributes

Sat Oct 10, 2015 8:07 pm

Thanks for your suggestions SmallBrother, that was very useful ;-)

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 72 guests