Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
oss
Site Admin
Posts: 5887
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

How to deal with duplicate subs

Fri Aug 22, 2008 4:28 am

according http://blog.opensubtitles.org/opensubti ... k#comments we should continue here.

Guys, I am reading your messages, thanks for informing me. I think in this state don't delete those subtitles, because it is waste of time. Those subtitles, which are uploaded are polish subtitles downloaded from polish sites I think, and as you wrote, there are MANY. When you delete some subtitle and someone will upload the same subtitle again, it will be uploaded again (I think...I can change this). ALLplayer seems as a nice piece of software and I like automatic uploading feature, for sure there will be more players with automatic feature, it is good idea (I had it maybe 2 years ago, and now dreams come true...). We have to think how to deal with those subtitles.

1. upload subs only by logged users. I think this is interesting idea, but it is not final solution. Imagine almost all anyonymous will be registered and we are in same situation like now. Also I dont like registering at all (when it is really not necessary), but you know this.

2. administrators can RELINK subtitles (something like versioning), it should work like this:
- you have new upload (with or without moviehash), where subtitle is almost same (not need to have it in db again), as we already have. On screen you have this subtitle, also all other subtitles for this movie (paired by imdbid) in same language, you select ONE (radiobutton) subtitle from this list and hashes etc will relink to selected subtitles and old subtitle will be deleted, but cannot be uploaded again. Should be quite easy. If you have some more idea what to do, or some questions, please tell me, so if I will code this I should implement everything.

Thanks

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Fri Aug 22, 2008 6:40 pm

OS: How difficult would it be to ignore subtitles that have less than 5% changes to an existing one? I'm assuming it's not a simple job (as it has to compare each uploaded subtitle against the existing ones and compare the diffs between them all) and it may hit the server a lot. I would automatically reject (or put in the "rejected" queue for admins) any subtitle that has less than 15% difference to existing ones and automatically delete any subtitle with less than 5% changes.

I wouldn't allow hash-match uploads by anonymous users and I would seriously re-think anonymous uploads at all.

I would flag movies with more than 3 subtitles per language per moviehash for admins to review, as it's obvious there is crap there (for the same hash and subtitle the most there should be is two subtitles, one for normal and one for hearing-impaired).

There are several things that can be done, but they imply a change in philosophy from massively gathering subtitles without regards to quality to putting quality before quantity.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5887
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Mon Aug 25, 2008 8:34 am

please, dont delete subtitles right now. If you delete them, we are losing moviehash->subtitlehash, and also same subtitles CAN be uploaded, check http://www.opensubtitles.org/pl/search/ ... ovie-13478

Problem with limiting to XXX subtitles to 1 title/language is that, one movie can be released COUPLE of times, that means first come CAM, then SCREENER, than DVDSCR (R5) and DVD as last...if I limit it to 10 uploads per title, it should be quite problem, limiting 3 subtitles per release/title/subtitle should be the way.

for diff - maybe it is not a problem (except server resources), but some people will make correction of subs and send to server, automatic checking will see there is less than 5% of changes, and proper (better) subtitles are lost.

I know there is not possible to filter subtitles, but I want make something, which is really good. First I have to implement this:

subtitles are on server, they cannot be searched in normal search, but only through moviehashes. I am working on that right now (versioning like I wrote before).

ALLPlayer
Posts: 14
Joined: Thu Jul 10, 2008 6:14 pm

Mon Aug 25, 2008 5:00 pm

There are some solutions that we can work together. First I would not recommend using login and password for uploading. This may cause some legal implementations and the anonymous is the right way to populate the servers.

1. We can add some filter that will check how many subs are already there and will not upload if the number is higher than 10. This has limitations and is not our favor one.
2. Its some job made on your side. You will add the counter and check how many times people tried to upload the same movie subs. If there are majority of one sub, means that is popular and proper one. Rest /once a week or month will be taken off from the Servers.

PS. I have checked the site where are more than 700 subs for one movie.. it looks strange, so we shall download some of them and check what differences they have that keep multiplying in that mass.. Maybe there is some bug that makes things harder or sth else.. we need to investigate.

PS2. Do you know if we could get some help and make Portuguese GUI of ALLPlayer with your help?

User avatar
oss
Site Admin
Posts: 5887
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Tue Aug 26, 2008 5:26 am

From ALLPLAYER:

We have checked many subs for Hellboy2 and we discovered that there
are many ads of services where the sub was taken (napisy.info,
napisy24, hatak, napiprojekt) so the sub was different and got to the
server. This is I believe very Polish, due Sub Site Competition is
very strong. In other countries admins of local sites don't do this. I
propose if you could let your servers to compare all the lines without
first and last two. If they are the same - can be deleted.

PS. Soon we will added good support for playback and sending sub that
covers 2 CDs. Once this is done - ALLPlayer won't be seeing part one
and part two of the movie as separate ones. It means that ALLPlayer
will send already less of subs than now. We have made calculation and
if you do as I'm writing with the 2 CDs of of Hellboy2 - we will get
50 subs instead of 700...

User avatar
oss
Site Admin
Posts: 5887
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Tue Aug 26, 2008 5:45 am

Omerta: you are really great admin, I know. Problem is following - I am against for deleting subtitles for TS, CAM or any _different_ release. It is simple - some people get movie later (they dont have good source, or really internet connection), they will search for subtitles, and those subtitles doesnt exists, because we delete them already. I personally dont watch CAM or TS, but I watch DVDSCR, R5 and DVDSCR ofcoz. Sometimes even me I download DVDSCR and watch it, even there is DVDRIP. So basically, I want have subtitle for every release outhere (reconverted to PSP or mobile phone). Connection between movie releases should be done using movietimems which should be sent by programs, but not every program supports this - maybe it should be required information.

For moviehashes - you are basically uploader I think, but there are 90% (if not more) users as downloaders - they have movie, play it in some player, or run subdownloader/whatever and wants subtitles. For that is moviehash, without moviehash opensubtitles cannot work.

AllPlayer: for anonymous uploading - I agree with you now, we will see later, if there will be too much wrong uploads we have to implement this, but not now. For 1st - this is not systematic way - you will implement it (allplayer), but what about other players ? I think this (filtering, etc) should be server side, not client side. 2nd: I was thinking about that, maybe it should be OK

For translations - I have some contact for portuguese translators, I can send to you by mail.

For your 2nd post - comparing subtitles is quite tricky thing, believe me, but I know what do you mean with adding advertisment, but we have to count there is more countries like Poland, and also other do these things. For Hellboy, even 50 subtitles is enough.

Also I look for polish subtitles, polish use really more different formats like other world, so we have another problem here: if someone download tmp file and sonvert it to sub/srt/ssa - there is new subtitle again.

Overall this problem is quite complicated. I agree with limiting 3 anonymous uploads per movie and all other uploads by anonymous should be just ignored. We can hope real translators will upload subtitles logged-in.

Anyway it is quite a lot of work, so admins - dont delete subtitles wirht now (it is pointless, because same subtitles will be uploaded), I will inform here what I do.

User avatar
oss
Site Admin
Posts: 5887
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Tue Aug 26, 2008 10:09 am

ok, today I make a script which adds moviereleasename from CD1 subtitle file name, so in search page you can see more information (before there was empty space), should be good information for people.

also I will implement this:

Code: Select all

if(anonymous_uploader) { if(count(idmovie,sublanguage,moviehash,moviebytesize) >= 3 ) { dont save subtitle, save only subtitle hash: //get most popular ENABLED subtitle for idmovie,sublanguage,moviehash,moviebytesize //save hash to most downloaded SubtitleFileID into subs_subtitle_file_hash } else { upload subtitle } }

Cougar_
Posts: 19
Joined: Fri May 23, 2008 9:18 pm

Tue Aug 26, 2008 2:43 pm

It's very bad idea to limit upload of subtitles by anonymous. Anonymous people create this site not those registered. Before tools like BestPlayer, AllPlayer or my own Zasysacz appear, this site was worth nothing, nothing if someone wanted to download polish subtitles. Now, day by day situation is going to be better. I now, there is many bad subtitles, it's nightmare for moderators to handle this shit, but among this shit whe can find usefull subtitle. By limiting anonymous upload, there will be only alpha and wrong versions of subtitles on your server. Process of creating subtitles is incremetal - sometimes few versions per hour.
This is big problem, especialy for polish napiproject in which for one movie is only one subtitle and only registered users can upload newer version and only with moderators bless. So, by limiting upload this problem will be your too, there will be only crap in many movies like on napiprojekt.

In best case scenario, people can only push upload button, they have more interesting things to do than checking server resources, they even don't know that server like opensubtitle exist, they even don't have skill to upload manualy subtitles.

For most users, there is no problem when on list is many subtitles, they can spent few second to find correct one by clicking one by one and checking. This is only problem for moderators and overloaded server.

The only one thing you can do is changing subtitle hash algorithm, it should count hash without first and last few lines and mayby without times/frames - only text.

Cougar_
Posts: 19
Joined: Fri May 23, 2008 9:18 pm

Tue Aug 26, 2008 3:37 pm

I don't know how you can check every subtitle.People upload few hundred subtitles per day. If you don't have movie it will be imposible to check if subtitle is correct. You can only check if subtitle isn't translated by program - this is problem too. How do you know if subtitle is for that movie??, many subtitles don't have any information on movie - even title, only dialogs.
You can't download every movie, release of movie etc..
If someone upload subtitles from another movie, you won't be able to check this.

User avatar
oss
Site Admin
Posts: 5887
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Tue Aug 26, 2008 5:37 pm

I was thinking about this all. We can divide programs to "subtitle uploaders" and "subtitle downloaders". We can say, uploader will upload (should...?) with subtitle uploader - like SubDownloader or some other program, he will not upload subtitle with AllPlayer for exactly, but ofcourse he can.

Also we can say, there is huge possibility, that translator will register on the site and will upload that subtitle as registered user. In this area is nothing changed, if it is registered, everything will be working like before.

I will code that check, also I will code script which will check all subtitle uploaded by allplayer and will disable it, if needed.

Next, I was thinking, moderators should take care of subtitles - select one _master_ subtitle (it means, we have really good subtitles for this movie, and there is not need to have any other subtitle), so all other subtitles (with some advertising and other...) will be added to this master subtitle. How it sounds?

Also I can code that 2-3 lines removal script, I have to look on those subtitles better, if only difference is they put on the start and on the end some advertisment (which I quite understand), I can remove it and check.

User avatar
oss
Site Admin
Posts: 5887
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Subtitle Duplicates progress

Wed Aug 27, 2008 7:09 pm

Hi,

I am working hard to deal with subtitle duplicates. I look at Levenshtein Distance (google for it), but it is quite expensive operation and we have also some problem with UDF in mysql, so I came with something different (own algo).

"Problem" is, it removes before processing all timestamps, it is not depended on format (srt,sub...), so it is quite powerful - there should be not possible to upload same subtitles in sub or srt or other format, if they already exists in same format. Also, according to this, there will be not possible upload different timings (for different releases).

with this way it sounds, there will be no new upload at all :), but I suggest to make new group ("uploaders") - so people which belongs to it can upload also "duplicate" subtitles. To uploaders I can automatically add every user, which uploads more than 10 subtitles, all others (registered/unregistered) will must pass this duplicate filter.

What do you think about this idea ?

Return to “Developing”

Who is online

Users browsing this forum: Ahrefs [Bot] and 73 guests