Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
luyi84
Posts: 1
Joined: Sun Oct 20, 2013 7:05 pm

NEW Proposed hash method

Sun Oct 20, 2013 7:51 pm

Hello everyone. I registered the account just to share a crazy idea that might help establish video - hash - subtitle link. and increase the chance finding the right subtitle file on the database.
Please disregard this post if you think I'm stupid.
I see there is a problem when get the hash code for the whole video file. The video frame has so many variants depend on the codec and quality, 1080p, 720p, 480p, mp4, mkv, etc. When one made a working subtitle file, he/she upload it with his/her version of the hashcode of the movie (say Bluray 1080p mkv). However, the user with a 720p movie won't find that subtitle file using his version of hashcode (say we disable the function finding subtitle by file name), Even the subtitle for 1080p will work for 720p. So in my opinion, there is a mismatch in the current method of using hashcode to get the subtitle.
subtitle should link to the audio frame. there is less variant in the audio frame. the method of getting hashcode should be proceed like the following: 1) rip the audio frame from the file, 2) get the hashcode for the audio frame, 3) done.
Hope this will help the future development. please comment and tell me I'm wrong.

User avatar
jcdr
Posts: 540
Joined: Sun Apr 08, 2012 9:49 am

Re: NEW Proposed hash method

Tue Oct 29, 2013 12:42 am

Hi luyi84,

Currently, OS uses whole file hashing, which is fast and easy to implement without any technical knowledge of the codecs or encapsulation.
Extracting the raw video or audio bitstream from the container would need to apply some decoding algorithm specific to the container -and there are plenty of them, the most well known being avi, mp4, mkv, ogg... Besides the fact that extracting these bitstreams take much more time than a simple hashing, it would add a lot of complexity to the development of third party player softwares using the OS API (not even speaking about frequent releases to follow each container update, or new container formats), with little added value: it would indeed reduce the number of hashes which are linked to a subtitle, but at present this is not critical, as hash collisions stay minimal with a 128-bit hashing.

User avatar
rednoah
Posts: 84
Joined: Tue Mar 11, 2008 10:02 pm

Re: NEW Proposed hash method

Thu Nov 14, 2013 3:05 pm

AcoustIDs fpcalc works on video files as well now with v1.0, so you can get the audio hash from videos and that also takes less than a second (I guess it doesn't look at all the audio, or maybe it just ignore everything after the typical music file length :D). That'd be interesting to look into.

But the current method is much better most importantly for it's simplicity and also being extremely fast.

More important than the hash method however is how to filter out people uploading subtitles with the wrong movie hashes... does OSDB do any sort of automatic ranking? Like give a +1 confidence each time someone does TryUploadSubtitles for something that has already been uploaded?

User avatar
oss
Site Admin
Posts: 5884
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: NEW Proposed hash method

Mon Nov 25, 2013 7:26 am

yes, we are counting the hashes (seenhash).

For Accoustic hash, and similar - yes there can be good hashes already, I was looking at time for phash. The problem with this approach is hard to implement. You can imagine to implement this in pure php, perl, python and so on ? Sure, we can use external C++ library, but...

Ideal would be to support as many different hashes as needed, but for this we would need to rewrite all application and mainly quit from using MYSQL and using some NOSQL db = means reprogramming all the site :)

User avatar
rednoah
Posts: 84
Joined: Tue Mar 11, 2008 10:02 pm

Re: NEW Proposed hash method

Wed Nov 27, 2013 10:20 am

When I get a subtitle list via hash can I assume that results are sorted by seenhash?

Can I assume the 1. result is the one the "best" subtitle available? Or should I do my own ranking based and not assume any sort of order from what I get back from OpenSubtitles?

User avatar
oss
Site Admin
Posts: 5884
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Re: NEW Proposed hash method

Sat Nov 30, 2013 1:53 pm

nope, no sorting is done for now.

User avatar
rednoah
Posts: 84
Joined: Tue Mar 11, 2008 10:02 pm

Re: NEW Proposed hash method

Tue Dec 03, 2013 7:32 pm

ok, doing my own ranking on the results then :)

Return to “Developing”

Who is online

Users browsing this forum: No registered users and 22 guests