| View previous topic :: View next topic |
os Site Admin

Joined: 25 Feb 2006 Posts: 1229
|
Posted: Fri Aug 22, 2008 4:28 am Post subject: How to deal with duplicate subs |
|
|
|
according http://blog.opensubtitles.org/opensubtitlesorg/opensubtitlesorg-is-back#comments we should continue here.
Guys, I am reading your messages, thanks for informing me. I think in this state don't delete those subtitles, because it is waste of time. Those subtitles, which are uploaded are polish subtitles downloaded from polish sites I think, and as you wrote, there are MANY. When you delete some subtitle and someone will upload the same subtitle again, it will be uploaded again (I think...I can change this). ALLplayer seems as a nice piece of software and I like automatic uploading feature, for sure there will be more players with automatic feature, it is good idea (I had it maybe 2 years ago, and now dreams come true...). We have to think how to deal with those subtitles.
1. upload subs only by logged users. I think this is interesting idea, but it is not final solution. Imagine almost all anyonymous will be registered and we are in same situation like now. Also I dont like registering at all (when it is really not necessary), but you know this.
2. administrators can RELINK subtitles (something like versioning), it should work like this:
- you have new upload (with or without moviehash), where subtitle is almost same (not need to have it in db again), as we already have. On screen you have this subtitle, also all other subtitles for this movie (paired by imdbid) in same language, you select ONE (radiobutton) subtitle from this list and hashes etc will relink to selected subtitles and old subtitle will be deleted, but cannot be uploaded again. Should be quite easy. If you have some more idea what to do, or some questions, please tell me, so if I will code this I should implement everything.
Thanks _________________ Support us
 |
|
| Back to top |
|
|
Omerta

Joined: 09 Jul 2007 Posts: 127
|
Posted: Fri Aug 22, 2008 10:34 am Post subject: |
|
|
|
1. There is no way to filter the automated upload process. Uploader has no feedback about site's content, he/she cannot compare the file with the rest that database have.
2. It is nonsense, that any automated upload should overwrite earlier subs, that thing is absolutely manual. None can make decisions about the subs except site admins and ops.
3. At the case, we dont want uploaders to be registered users (btw I dont understand this),
you, os, have a hard job waiting. You must write a code that sends subtitles, that refers to movies with earlier uploads in the very same language, to a parking session, waiting for an admin to allow to display them in search list. After a short period (about 5-6 hours) they would be deleted automatically.
For the subs, which are new at the site, should have the fix status immediately. |
|
| Back to top |
|
|
eduo

Joined: 10 Feb 2007 Posts: 441 Location: Information Technology
|
Posted: Fri Aug 22, 2008 6:40 pm Post subject: |
|
|
|
OS: How difficult would it be to ignore subtitles that have less than 5% changes to an existing one? I'm assuming it's not a simple job (as it has to compare each uploaded subtitle against the existing ones and compare the diffs between them all) and it may hit the server a lot. I would automatically reject (or put in the "rejected" queue for admins) any subtitle that has less than 15% difference to existing ones and automatically delete any subtitle with less than 5% changes.
I wouldn't allow hash-match uploads by anonymous users and I would seriously re-think anonymous uploads at all.
I would flag movies with more than 3 subtitles per language per moviehash for admins to review, as it's obvious there is crap there (for the same hash and subtitle the most there should be is two subtitles, one for normal and one for hearing-impaired).
There are several things that can be done, but they imply a change in philosophy from massively gathering subtitles without regards to quality to putting quality before quantity. _________________ http://eduo.info/
OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux
My current episode processing work flow. |
|
| Back to top |
|
|
Omerta

Joined: 09 Jul 2007 Posts: 127
|
Posted: Fri Aug 22, 2008 6:45 pm Post subject: |
|
|
|
Great idea!
Simply forbid to upload more than 3 anonymous subs/title! |
|
| Back to top |
|
|
tuszyn
Joined: 28 Oct 2007 Posts: 17
|
Posted: Sat Aug 23, 2008 4:14 pm Post subject: |
|
|
|
I find the Idea of limiting Anonymous Ups good. eduo is right, having many subs is just crap. I now People that often download subs , and they have hard time to download something from Opensubs, frankly they don't now how to decide witch subs to download, ther is so much of it. And when you want to get something from the site, without any program, its even worst, practically impossible to get right subs.
I wood like to see the Field "Release Name" as nessesery to fill.I don't want to search throe hashes to see for witch release is the synch. |
|
| Back to top |
|
|
Omerta

Joined: 09 Jul 2007 Posts: 127
|
Posted: Sun Aug 24, 2008 9:25 am Post subject: |
|
|
|
I suggest to filter all the database after multiplied anonymous subs, and delete them where exceeding 3 anonym ups/title.
Hope something happens very soon! |
|
| Back to top |
|
|
tuszyn
Joined: 28 Oct 2007 Posts: 17
|
Posted: Sun Aug 24, 2008 10:15 am Post subject: |
|
|
|
I'm with you Omerta. But, I don't think deleting them will help for very long, although it should be done. Changes need to be made, so that we not be at the same point in a few days.
P.S.: Omerta, i need to rethink my position. Look an this movie http://www.opensubtitles.org/pl/search/sublanguageid-pol/idmovie-13478. There are now 9 Subs, that are good, and 8 of them are Uploads from Anonymous Users. We should for the moment limit Uploads from Anonymous Users to 10/Movie, and after a time go to a lower value (lets say 3), so that we don't lose god subs to, and Users get some time to Register at the site, an make new Uploads. |
|
| Back to top |
|
|
Omerta

Joined: 09 Jul 2007 Posts: 127
|
Posted: Sun Aug 24, 2008 11:09 am Post subject: |
|
|
|
Intelligent filtering can solve this issue.
The subs you linked are my job:DDD
I deleted about 40 hellboy 2 subs for these 8 |
|
| Back to top |
|
|
tuszyn
Joined: 28 Oct 2007 Posts: 17
|
Posted: Sun Aug 24, 2008 11:36 am Post subject: |
|
|
|
Not quite Omerta, yes You have deleted in that time 40 subs from Hellboy 2, but I have deleted the rest to, as there where shit to, and 60 others new, and that is what left  |
|
| Back to top |
|
|
os Site Admin

Joined: 25 Feb 2006 Posts: 1229
|
Posted: Mon Aug 25, 2008 8:34 am Post subject: |
|
|
|
please, dont delete subtitles right now. If you delete them, we are losing moviehash->subtitlehash, and also same subtitles CAN be uploaded, check http://www.opensubtitles.org/pl/search/sublanguageid-pol/idmovie-13478
Problem with limiting to XXX subtitles to 1 title/language is that, one movie can be released COUPLE of times, that means first come CAM, then SCREENER, than DVDSCR (R5) and DVD as last...if I limit it to 10 uploads per title, it should be quite problem, limiting 3 subtitles per release/title/subtitle should be the way.
for diff - maybe it is not a problem (except server resources), but some people will make correction of subs and send to server, automatic checking will see there is less than 5% of changes, and proper (better) subtitles are lost.
I know there is not possible to filter subtitles, but I want make something, which is really good. First I have to implement this:
subtitles are on server, they cannot be searched in normal search, but only through moviehashes. I am working on that right now (versioning like I wrote before). _________________ Support us
 |
|
| Back to top |
|
|
Omerta

Joined: 09 Jul 2007 Posts: 127
|
Posted: Mon Aug 25, 2008 9:57 am Post subject: |
|
|
|
"Problem with limiting to XXX subtitles to 1 title/language is that, one movie can be released COUPLE of times, that means first come CAM, then SCREENER, than DVDSCR (R5) and DVD as last...if I limit it to 10 uploads per title, it should be quite problem, limiting 3 subtitles per release/title/subtitle should be the way. "
os, believe me, there is no fuckin need more than 3-5 anonymous subs/title. If anyone needs a different timing, should give a request in sub's comments, and admins will do it. At least, I'm doin it that way.
And when DVD rips are available, there's no need for cam, TS, screener or R5 subs.
Plz RESTRICT anonymous upload for providing just a base for further work, admins can do it manually.
And hey, movie hash is a good thing for identifying movies, but personally i never use it, coz it was NEVER needed. And I made tons of upload so far. |
|
| Back to top |
|
|
ALLPlayer
Joined: 10 Jul 2008 Posts: 14
|
Posted: Mon Aug 25, 2008 5:00 pm Post subject: |
|
|
|
There are some solutions that we can work together. First I would not recommend using login and password for uploading. This may cause some legal implementations and the anonymous is the right way to populate the servers.
1. We can add some filter that will check how many subs are already there and will not upload if the number is higher than 10. This has limitations and is not our favor one.
2. Its some job made on your side. You will add the counter and check how many times people tried to upload the same movie subs. If there are majority of one sub, means that is popular and proper one. Rest /once a week or month will be taken off from the Servers.
PS. I have checked the site where are more than 700 subs for one movie.. it looks strange, so we shall download some of them and check what differences they have that keep multiplying in that mass.. Maybe there is some bug that makes things harder or sth else.. we need to investigate.
PS2. Do you know if we could get some help and make Portuguese GUI of ALLPlayer with your help? |
|
| Back to top |
|
|
os Site Admin

Joined: 25 Feb 2006 Posts: 1229
|
Posted: Tue Aug 26, 2008 5:26 am Post subject: |
|
|
|
From ALLPLAYER:
We have checked many subs for Hellboy2 and we discovered that there
are many ads of services where the sub was taken (napisy.info,
napisy24, hatak, napiprojekt) so the sub was different and got to the
server. This is I believe very Polish, due Sub Site Competition is
very strong. In other countries admins of local sites don't do this. I
propose if you could let your servers to compare all the lines without
first and last two. If they are the same - can be deleted.
PS. Soon we will added good support for playback and sending sub that
covers 2 CDs. Once this is done - ALLPlayer won't be seeing part one
and part two of the movie as separate ones. It means that ALLPlayer
will send already less of subs than now. We have made calculation and
if you do as I'm writing with the 2 CDs of of Hellboy2 - we will get
50 subs instead of 700... _________________ Support us
 |
|
| Back to top |
|
|
os Site Admin

Joined: 25 Feb 2006 Posts: 1229
|
Posted: Tue Aug 26, 2008 5:45 am Post subject: |
|
|
|
Omerta: you are really great admin, I know. Problem is following - I am against for deleting subtitles for TS, CAM or any _different_ release. It is simple - some people get movie later (they dont have good source, or really internet connection), they will search for subtitles, and those subtitles doesnt exists, because we delete them already. I personally dont watch CAM or TS, but I watch DVDSCR, R5 and DVDSCR ofcoz. Sometimes even me I download DVDSCR and watch it, even there is DVDRIP. So basically, I want have subtitle for every release outhere (reconverted to PSP or mobile phone). Connection between movie releases should be done using movietimems which should be sent by programs, but not every program supports this - maybe it should be required information.
For moviehashes - you are basically uploader I think, but there are 90% (if not more) users as downloaders - they have movie, play it in some player, or run subdownloader/whatever and wants subtitles. For that is moviehash, without moviehash opensubtitles cannot work.
AllPlayer: for anonymous uploading - I agree with you now, we will see later, if there will be too much wrong uploads we have to implement this, but not now. For 1st - this is not systematic way - you will implement it (allplayer), but what about other players ? I think this (filtering, etc) should be server side, not client side. 2nd: I was thinking about that, maybe it should be OK
For translations - I have some contact for portuguese translators, I can send to you by mail.
For your 2nd post - comparing subtitles is quite tricky thing, believe me, but I know what do you mean with adding advertisment, but we have to count there is more countries like Poland, and also other do these things. For Hellboy, even 50 subtitles is enough.
Also I look for polish subtitles, polish use really more different formats like other world, so we have another problem here: if someone download tmp file and sonvert it to sub/srt/ssa - there is new subtitle again.
Overall this problem is quite complicated. I agree with limiting 3 anonymous uploads per movie and all other uploads by anonymous should be just ignored. We can hope real translators will upload subtitles logged-in.
Anyway it is quite a lot of work, so admins - dont delete subtitles wirht now (it is pointless, because same subtitles will be uploaded), I will inform here what I do. _________________ Support us
 |
|
| Back to top |
|
|
Omerta

Joined: 09 Jul 2007 Posts: 127
|
Posted: Tue Aug 26, 2008 9:14 am Post subject: |
|
|
|
| Very well:D |
|
| Back to top |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|