Forum rules
Under no circumstances is spamming or advertising of any kind allowed. Do not post any abusive, obscene, vulgar, slanderous, hateful, threatening, sexually-orientated or any other material that may violate others security. Profanity or any kind of insolent behavior to other members (regardless of rank) will not be tolerated. Remember, what you don’t find offensive can be offensive to other members. Please treat each other with the kind of reverence you’d expect from other members.
Failure to comply with any of the above will result in users being banned without notice. If any further details are needed, contact: “The team” using the link at the bottom of the forum page. Thank you.
fana
Posts: 17
Joined: Thu Mar 15, 2007 9:05 pm

Create Hashes offline?

Thu Mar 15, 2007 9:11 pm

Hi,

I have many movies with subtitles but no internet connection.
Is it somehow possible to hash the movie and subtitle files offline
and upload the subtitles, when I have an internet connection ?

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Thu Mar 15, 2007 10:59 pm

this is interesting idea for sure. I will ask programmer, and I think it will be possible. In video file you should put hash file, and everything remains the same, also that hashfile should be created without any problem.
it should be called as:

moviename.hsh or something like that :)

ok I will tell to Ivan.

fana
Posts: 17
Joined: Thu Mar 15, 2007 9:05 pm

Mon Mar 19, 2007 12:25 pm

Thanks for your reply.

Since Subdownloader uses Python and I am not afraid of using the command line,
you (or Ivan) could tell me how to use the .py files manually in the meantime?

Ok, maybe I can figure it out for myself, but that could take a little bit longer :-))
Last edited by fana on Tue Mar 20, 2007 11:29 pm, edited 1 time in total.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Tue Mar 20, 2007 8:24 pm

Fana:

What platform are you working on? I have the code for an executable you could use, but I'd need to know if you can compile it.

fana
Posts: 17
Joined: Thu Mar 15, 2007 9:05 pm

Tue Mar 20, 2007 11:31 pm

Hi eduo,

thanks for help.

I am using Ubuntu GNU/Linux (Dapper Drake)
but it is no problem to start a Windows 2000 System within a Virtual Machine.

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Tue Mar 20, 2007 11:41 pm

Fana:

Can you try compiling this:

gethash.c

Code: Select all

#include <iostream> #include <fstream> using namespace std; int MAX(int x, int y) { if((x) > (y)) return x; else return y; } #define MAX(x,y) (((x) > (y)) ? (x) : (y)) ; uint64_t compute_hash(ifstream& f) { uint64_t hash, fsize; f.seekg(0, ios::end); fsize = f.tellg(); f.seekg(0, ios::beg); hash = fsize; for(uint64_t tmp = 0, i = 0; i < 65536/sizeof(tmp) && f.read((char*)&tmp, sizeof(tmp)); hash += tmp, i++); f.seekg(MAX(0, (uint64_t)fsize - 65536), ios::beg); for(uint64_t tmp = 0, i = 0; i < 65536/sizeof(tmp) && f.read((char*)&tmp, sizeof(tmp)); hash += tmp, i++); return hash; } int getfilesize(ifstream& f) { int filesize; f.seekg(0, ios::end); filesize = f.tellg(); f.seekg(0, ios::beg); return filesize; } int main(int argc, char *argv[]) { ifstream f; uint64_t myhash; int fileSize; // for(int i=0; i < argc; i++) // cout<<argv[i]<<endl; // f.open("/Volumes/video/TV/Lost/Lost S03E01.avi", ios::in|ios::binary|ios::ate); // Replace as adequate, obviously f.open(argv[1], ios::in|ios::binary|ios::ate); if (!f.is_open()) { cerr << "Error opening file" << endl; return 1; } myhash = compute_hash(f); fileSize = getfilesize(f); /* // Try all possible printf combinations we can think of printf("Hash should be 332c83338820e4f6\n"); printf("I64d: %I64d\n", myhash); // Borland BCC or MS VC++ printf("Ld: %Ld\n", myhash); // Borland BCC printf("lld: %lld\n", myhash); // gcc printf("I64x: %016I64x(hex)\n", myhash); // Borland BCC or Microsoft VC++ hex printf("Lx: %016Lx(hex)\n", myhash);// Borland BCC hex printf("llx: %016llx(hex)\n", myhash); // gcc hex printf("I64x: %I64x(hex)\n", myhash); // Borland BCC or Microsoft VC++ hex printf("Lx: %Lx(hex)\n", myhash);// Borland BCC hex printf("llx: %llx(hex)\n", myhash); // gcc hex */ printf("%i\t%llx\t%s\n", fileSize, myhash, argv[1]); // gcc hex f.close(); return 0; }
It should compile cleanly, although I am not sure. I haven't even gone over it after it finally gave the hash, but you'll have to compare the hash to one you already have to be sure it's correct.

I haven't yet put anything in it to parse wildcard matching or even to work in scripts. But it may get you started.

fana
Posts: 17
Joined: Thu Mar 15, 2007 9:05 pm

Wed Mar 21, 2007 12:16 am

Ok works.
But how do I upload the corresponding subtitles after I hashed all my movie files?

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Wed Mar 21, 2007 2:38 am

I wanted to know if it worked. For uploading you'll still need to use SubDownloader, I'm afraid.

os may be capable of providing a URL for uploading files easily. As far as I know the web page doesn't allow a subtitle to be associate with a hash (I can understand the reasons, manually-input hashes will usually have errors)

User avatar
oss
Site Admin
Posts: 5891
Joined: Sat Feb 25, 2006 11:26 pm
Contact: Website

Wed Mar 21, 2007 1:15 pm

exactly, putting moviehashes by hand is not good solution. Also I will ask ivan, if he can open instead of movie only moviehash in format like this:

Code: Select all

Moviename.avi opensubtitles_hash
so in this way you can do hash files and so on.
Also, I have to look to code, in upload page, maybe it is possible upload subtitles with hashes, but on page it is not visible :)

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Wed Mar 21, 2007 1:40 pm

I'd do this:

moviename.avi language *movie_name *movie_hash *sub_hash *imdb-id *imdb-subID

(* means optional)

I wouldn't make the sub_hash mandatory as that's easy for OS to generate after upload, to make sure no duplicates exist. Also if it's not mandatory people could upload subtitles they find around and they could be later matched to a movie (as currently happens).

I would put the moviename as a lot of people search for subs using that (and meta-trackers usually put a link to OS using the name).

If you leave the form like this it would still be useful by those that upload just the subs but would be useable to those that wanto to upload all the correct data.

I've put the files in order of importance. The more to the right they are the "less mandatory" they become.

I would rate uploaders according to the amount of data they provide, too. I uploaded the other day over 30 subtitles but the credit went to the original uploader, who hadn't matched them to movie hashes and I had. If you implement this kind of "uploader classes" it will foster healthy competition (you can see there is already a sort of competition for "top uploader").

On the topic of uploads (I insist this is the most important part of OS and the one most neglected). Uploading needs to be made as simple as possible. And followup rating should be simple enough. Both the GUI tools and the webpage should show the user their recently downloaded subs with easy rating tools.

The strength of OS to me resides in three places:

1.-A strong database, using hashes in movies to identify unique IDs to which match subtitles.

2.-Rating system, that allows a user to select the best subtitle from several similar ones.

3.-The community. This is the most important part. You can expect to have a ratio of at least 1000 to 1 downloaders versus uploaders (I'm being pessimistic, probably) so it's important both to incentive the uploaders (top tens, easy tools, recognition) and to make it easy for downloaders to contribute something (no need to contribute subtitles, but they can contribute ratings and comments).

James
Posts: 39
Joined: Thu Jan 04, 2007 2:59 am

store hashes on the disk

Sat Apr 07, 2007 9:36 pm

I have many many films without subtitles, and when I search for this collections new subtitles, I must wait and create hashes from all of them every time.

Option: Store hashes will be nice. In first time program calculate hash, and store it for disk (ex. in the same dir like video file, .hash). And in Search procedure program finds if exist hash for video file on the disk, if not, calcute them, if yes, skip generate and use this file.

When I have ~40 700-1.4GB films, and check it periodically all the time, this feature will help a lot for me.
Do good things, and good things happen to you
[img]http://homel.vsb.cz/~spi057/userbar.gif[/img]

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: store hashes on the disk

Sun Apr 08, 2007 4:17 pm

I have many many films without subtitles, and when I search for this collections new subtitles, I must wait and create hashes from all of them every time.

Option: Store hashes will be nice. In first time program calculate hash, and store it for disk (ex. in the same dir like video file, .hash). And in Search procedure program finds if exist hash for video file on the disk, if not, calcute them, if yes, skip generate and use this file.

When I have ~40 700-1.4GB films, and check it periodically all the time, this feature will help a lot for me.
Maybe I'm missing something here, but the way the hash is made, from what I've seen, shouldn't have be impacted by the size of the file.

I have just ran my own hash routine in 100 files ranging from 30MB to a few that have 4GB. It took around the same time in each and every one of them. Less than one second per file (a lot less than 1 second).

James
Posts: 39
Joined: Thu Jan 04, 2007 2:59 am

hash

Sun Apr 08, 2007 5:23 pm

well i dont know how are hashes calculated, but my HDD always scrambling for 5-10 sec when he compute hash, and then OSDB search is fast (about 2-5 sec) and no massive usage my HDD.. so I guess, when I search every day if somebody upload new subtitles for one of my films, stored hashes save my time and HDD life..

This is unnecessary for those who search subtitles for a few films, but fine options for me and those users, who has plenty of films and search frequently for new subs.

And, I think this isn't hard to implement...

P.S.: I mean Subdownloader should include Store hash option...
Do good things, and good things happen to you
[img]http://homel.vsb.cz/~spi057/userbar.gif[/img]

User avatar
eduo
Posts: 716
Joined: Sat Feb 10, 2007 1:40 am
Location: Information Technology
Contact: ICQ Website Yahoo Messenger

Re: hash

Mon Apr 09, 2007 4:43 pm

well i dont know how are hashes calculated, but my HDD always scrambling for 5-10 sec...
This is unnecessary for those who search subtitles for a few films, but fine options for me and those users, who has plenty of films and search frequently for new subs.

And, I think this isn't hard to implement...
James: I did understand you the first time. And my answer was directed to what you asked: Storing hashes does not significantly impact the time the program will finally take to run. Your HDD scrambling probably has more to do with waking from sleep (or standby) mode thant with generating the hash itself.

To make something clear, I usually run the hashes and check for new subtitles for over 300 files every couple of days (I don't download unless there's need for it, of course), both because I have old tv series for which there are no subtitles yet and because I'm developing my own tools and I use this to test how they perform.

I thought that having a list of hashes would be a good thing, instead of getting it every time. Truth is this doesn't impact the process at all (the generation of the hash is, according to my numbers, less than 1% of the time for the whole process).

I've tried this both on my local disk and, these last three weeks, on a network disk (where your alleged slowness to generate the hash should be more noticeable) without any actual impact.

Still, the thing is not whether I think it makes sense to add this functionality but whether it makes sense for the users at large (remember this is an open source tool, you can easily contribute code for functionality you think should be added). Now, the whole point of using a hash is that it doesn't matter what format or name the file has, the correct subtitle will be matched. If you store a list of hashes how will the program know if these hashes effectively correspond to the actual files on disk? Once you store a list of hashes separately from the files themselves you lose that connection. You're relying on a reference. If you replaced the original file by a new one with the same name you'd have a mismatching hash list.

If you don't generate the hash list at the time of the query you lose the whole reason for using hashes in the first place, you'd be relying on a list that may or not refer to the actual video files on disk.

Something that I'd do would be to rename the files on disk with the hash and the bytesize, and reorganize them into folders that include the file and the subtitles. This way just reading the filename would inform the hash and bytesize required for the search. But this wouldn't help your problem because I'm convinced the problem is the spinup of the HDD and not the generation of the hash.

And for what I'd like a lot more stuff would need to exist in the backend, including episodes for TV series (stored in IMDB, but not recognized by OS) for the names, etc.

fana
Posts: 17
Joined: Thu Mar 15, 2007 9:05 pm

Mon Apr 09, 2007 9:28 pm

Check your filesystem activity with ProcessMon or FileMon.
Maybe it is another process which gets in the way of Subdownloader.

http://www.microsoft.com/technet/sysint ... nitor.mspx
http://www.microsoft.com/technet/sysint ... lemon.mspx

Return to “Programs using OS”

Who is online

Users browsing this forum: No registered users and 26 guests