Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Fri Sep 05, 2008 5:20 am

thanks,

I can see site works really well from the logs, there was no glitch at all and the slowest query was about 2 seconds (QPS around 300!), so it seems it is on good way to implement new features.

my top are no-imdb support and tv-series support. But before that I want improve search2 algo.

Cougar_
Posts: 19
Joined: Fri May 23, 2008 9:18 pm

Fri Sep 05, 2008 11:03 am

I found one problem, you should do exception to upload subtitles - even the same exist, if format of exsisting subtitle is TMP. In this case you should delete tmp and let replace it with the same but in better format.

TMP format is the worst so it has been banned on many servers - 1 second resolution, in most case only star time ...

I have bad felings that your filter algorithm fail in this case(it left tmp subs) when I'am searching for subtitles.

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Fri Sep 05, 2008 11:24 am

Omerta: thanks and you are welcome. Answer: he is not getting mail. I dont know why, I have to check it, when I run that script from shell, it is working, when it is executed from cron, it is not working, and I really dont know why (everything works before), I will debug that.

I agree TMP format really sucks, it is not supported in many players, and also as you wrote...

I can change algo, so it will NOT search for similar subs in TMP subtitles, I think that should do a job... or searching in all formats and when it found similar subs in TMP format disable TMP and upload other format. But I like first option first...

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Fri Sep 05, 2008 12:27 pm

Omerta, ok no problem, just post me URL where you need disable button (you mean to disable ALL requests for some movie, or disable 1 request by user?), also URL with example of zero file.

Now I am working on adding movies which are not in IMDB.

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Sun Sep 07, 2008 8:08 am

Omerta, ok, I will code that, should be no problem.
Tuszyn: message should be disabled now.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Sun Sep 07, 2008 1:24 pm

os: I couldn't help noticing you mentioned you'd be storing the subtitles in the database internally and not as files.

This sounds like a good idea (and a step forward versioning, for example) but doesn't it invalidate the hashes for subtitles? Doesn't this mean every subtitle will need to be uploaded for the server to check against and realize if it's a new one or old?

Does this also mean we will stop seeing tens of subtitles per movie and start seeing only 3 or 4? (normal, deaf, etc.)

Does this also mean OpenSubtitles will standardize on a single subtitle standard and/or that subtitle format will be selectable upon download?

Finally: Does this mean formatting will be stripped out? There is no standard for subtitle formatting in .SRT (only in ASS and SSA). I vote yes, I'm sick of them.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Mon Sep 08, 2008 12:17 pm

eduo,

I am not storing subtitles in database. I store only fingerprints of them, so I can compare them later. For versioning - there should be not big deal with that, I think about that, should be quite easy.

For hashes - it will not invalidate hashes. For uploading - yes, those "new" subtitles must be uploaded to site and validating. If they are not valid I store only md5 of subtitles, so next time I look into table and I know this md5 is not valid...there is no other possiblity how to validate and compare them, they must be uploaded to site. But this is done only to anonymous users.

I dont know, if there will be so much subtitles in future, but it should work ok, time will show us.

For formats - nothing is changed, somebody suggest to remove tmp subtitles, because that format really sucks.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Mon Sep 08, 2008 1:00 pm

There is no standard for subtitle formatting in .SRT (only in ASS and SSA).
It has standard, max. 2x40 letters/line, max. 6 sec and min 1 sec/line, and a bunch of minor things.
Altough it is true, that you can write a lot of bullshit into srt and file still remains playable til you dont pick something at timing lines.
I meant "subtitle formatting" as in "styling the subtitles". That kind of format, not that SRT itself has no format. SRT has no actual standard, but there are recommended guidelines (which you've mentioned and, sadly, are mostly ignored).

SRT is a loose standard, to say the least (wikipedia is blunter: "There is no formal specification of the .SRT file format"). Proof of this is that it'll always "play", no matter what's been done to it. SUB being crap and SRT being a bad attempt at standardizing is the reason ASS and SSA exist. The problem is that SRT is a "human-readable" format, where others aren't.

Styling SRT, though, is a jungle. Each player has invented its own format (from the ridiculous use of HTML tags to the interesting use of symbols to the WEIRD use of smileys).

I personally don't like styling but I can see the need for it (especially in closed-captioning/hearing-impaired subtitles).

OS: I thought I had understood you actually store the subtitle lines in a database. I see that's not the case. I can see how there's no easy solution to the mess of multiple uploaders for similar subtitles.


EDIT: This is what I thought was being implemented: SubLib: http://sublib.sourceforge.net/

The advantage of using a subtitle library is that instead of the subtitle itself what can be stored in the database is a "representation" of the subtitles for the movie.

From this representation any format could be output, if needed. And subtitles that are identical but different only in format could be stored as one. Also, format could be stripped as well as language encodings.

This would also mean registered users could download and edit versions of the subtitle and re-upload them fixed.

How many subtitles should, in reality, exist for each local language? Two? Three? there's no point for having more than one with ambiance sounds as an option on the output, I believe. Too bad this may be a dream.
http://eduo.info/
[url=http://eduo.info/soleol/]OpenSubtitles from your desktop: SolEol for Mac/Windows/Linux[/url]
[url=http://forums.plexapp.com/index.php?showtopic=325&st=0&p=2480&#entry2480]My current episode processing work flow[/url].

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Tue Sep 09, 2008 4:52 am

eduo, thanks againf for nice post. I never know nothing about SubLib - it is very interesting project and it should be nice to have it in opensubtitles. I agree, there should be saved only one representation of subtitles, and maybe some variations should be done using methods of that library (that means 1/2cd versions of dvdrip, other releases - changed timing and so on).

Problem is, opensubtitles is not designed for that :(
SubLib is also written in mono, which I dont want install on servers. Next thing - servers will die, if I will call external program for each download (should be cached though...)

Anyway, it is really nice library, maybe I will look on it deeper and start working with it.

Thanks for information , I should know this sooner :)

ALLPlayer
Posts: 14
Joined: Thu Jul 10, 2008 6:14 pm

Tue Sep 09, 2008 9:38 am

Just got 5 cents from my side :). We know that what brings people to any site/product, that is - experience, this is sth that we forget a lot. There are iphones that lack of features of Nokia - but people choose them. This analogy I would like to put on this service, due we here are talking in jargon of admins that do coding to make sure that the servers are running smooth which is no bad, but we can not forget about the users, thus they are our "bosses".

User experience means - easy access and instant access (ALLPlayer) to the right sub even if is the format is less known in Greece or in Spain (tmp or mpl2), some movies uses different sum control, - Director Cut or other special version of the movie not to say the source. That's why is important to leverage the need for new subtitles and the workload of the servers and find solution that will work for both sides. We believe in anonymous upload and instant subs. As ALLPlayer we don't want to have monopoly in that cool feature that's why we think that setting right spec on the client side (any player) could help to create some kind of standardization where many players will help populate with subs this great Service.

PS. Some of you are not happy with SRT format - I believe many people use it as the one that can be either embedded in DivX container by Muxing (we do it with ALL) or just use as external subtitle that 90% all CE devices support in DVD/DivX Players.

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Wed Sep 10, 2008 9:32 am

ALLPlayer - the problem is not server load now, but problem is, when there are too much subtitles for one movie.

I understand you, and I also think about that. To be precise - for now, if there is NEW (new moviehash) movie and OLD or SIMILAR (for non anonymous users) subtitles - everything what is done is block that subtitles, so they cannot be uploaded, but hash is not updated.

I will code soon that moviehash will be stored (so later users will find subtitles for that movie), but subtitles will be not uploaded.

It is quite problematic, because there are many formats, and if there is not same format in database, moviehash cannot be inserted, so maybe I will allow upload those subtitles, but we will end in same situation (too much subs for one movie...)

There should be nice storing timestamps in general format (like sublib), but for that I have to know FPS of movie and also some more processing, for now not possible :(

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Fri Sep 12, 2008 5:51 am

omerta: when you delete all requests for movie, or request for user? Please be specific, because I dont know. You can write here URLs - from where you start, what you click and where you want to return.

Also IMDB sorting was fixed - thanks.

For freezing - I am currently running hashing script, so it is really slow. Also I have to make more optimalizations, there are too much inserts/updates...and QPS is around 400 sometimes.

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Duplicate subs algo changed

Wed Sep 17, 2008 10:06 am

Hi all,

I had to changed algo for duplicate subs. Problem was: somebody (anonymous) uploaded subtitles with moviehash, I did check as I wrote before, subtitles was not uploaded and moviehash was not inserted. That means moviehash (which is really important for XMLRPC) was lost.

So I made improvements, and whole duplicate subs is simple like this:

Code: Select all

if(anonymous_uploader) { if(similar subtitles based on timestamps found for same language) { save moviehash dont upload } else { upload } }
I removed algos before (if more than 3 subs for same movie - there was problem with TV series..., similar subtitles based on contents - problem with moviehashes)

So thats it :)

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 20 guests