ComputeMovieHash in C#

 
Post new topic   Reply to topic    OpenSubtitles.org Forum Index // Developing
View previous topic :: View next topic  
Author Message
Yougli



Joined: 17 Feb 2008
Posts: 9

PostPosted: Tue Nov 25, 2008 4:47 pm    Post subject: ComputeMovieHash in C# Reply with quote
Hi,

I'm using the code provided in the wiki to compute the hash of a movie. But I'm getting an overflow exception when executing the code. Any idea please?

Code:

        private static byte[] ComputeMovieHash(Stream input)
        {
            long lhash, streamsize;
            streamsize = input.Length;
            lhash = streamsize;
 
            long i = 0;
            byte[] buffer = new byte[sizeof(long)];
            while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0))
            {
                i++;
                lhash += BitConverter.ToInt64(buffer, 0); // The exception occurs here, at the 3rd iteration with the test avi: breakdance.avi
            }
 
            input.Position = Math.Max(0, streamsize - 65536);
            i = 0;
            while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0))
            {
                i++;
                lhash += BitConverter.ToInt64(buffer, 0);
            }
            input.Close();
            byte[] result = BitConverter.GetBytes(lhash);
            Array.Reverse(result);
            return result;
        }


Thanks for your help...
Back to top
View user's profile Send private message
stavros_sk



Joined: 25 Sep 2008
Posts: 9

PostPosted: Wed Nov 26, 2008 8:23 am    Post subject: Reply with quote
Enclose all the function's operations (code) into a try-catch e.g.:


try
{
//function code
//...
//...
}
catch(Exception e)
{
Console.Writeline(e.ToSting() );
}


...to get a clue where exactly, in which line the error occurs.
Back to top
View user's profile Send private message
Yougli



Joined: 17 Feb 2008
Posts: 9

PostPosted: Wed Nov 26, 2008 2:28 pm    Post subject: Reply with quote
I know exactly where the exception occurs, I added a comment in the code in my first post.
Sorry if it wasn't clear...

The error occurs at:
Code:

lhash += BitConverter.ToInt64(buffer, 0);

in the first while loop, at the 3rd iteration for the test avi 'breakdance.avi'
Back to top
View user's profile Send private message
Cougar_



Joined: 23 May 2008
Posts: 19

PostPosted: Wed Nov 26, 2008 6:22 pm    Post subject: Reply with quote
I'm not C# programmer, so I may mistake, but overflow is normal in this pice of code. Yes it contains errors but even if it doesn't have them it is possible(higly) that overflow occure.

First, signed and unsigned arithmetic looks similary but are often reason of very hard to detect errors.

In this code I think, every variable that participate in calculating hash value should be 64bit unsigned integer!!.
As I look to MSDN "long" in C# is 64bit but not unsigned. So I think this is first mistake.

Second, Binaryconvert converts data stream to SIGNED integer and should to unsigned. Big mistake!!!. Why?
look:
4 + 0xff = 3 => 0xff is -1 in signed arithmetic( of course if we use 8 bit variables)

In unsigned arithmetic 4 + 0xff = 0x103 but variable has obly 8 bits and result require 9 bits, so overflow occured and only 8 bits are save so result is 0x03 - most important bit is lost. This bit trow overflow exception by set in procesor CF flag.

So overflow is normal in calculating hash, simply add
4 + 0xff ff ff ff ff ff ff ff and you have overflow but in this case you should ignore it an value 0x3 is correct as hash.


You may ask, why there is error, when variable is declared as long. In both cases result is the same = 0x03. Yes, when you add two variables of the same width result will be the same, but this is a exception. I'm 100 percent sure, that author wrote this pece of code and left it in this form becouse it works, but not becouse he know about exception in signed/unsigned arithmetic. The only difrences between those two sitiations is moment when error occure, but pseuedo error doesn't interest us in this situation, so finally result is the same in both cases so practicaly this code works great.

When width isn't the same problem begins:

int64 = in64 + signed byte => 4 + 0xff = 3;

int64 = in64 + unsigned byte => 4 + 0xff = 0x103;


Autor of this code probably compile this code with overvlow checknig option turned off.
Hmm, I don't know if C# is so restrictive in checking integer overflow, personaly I doubt that, for me it isn't possible so you must set so restrictive checking in project options - I only guess.

As I mentioned I'm not a C# programer, but I think that direct conversion to 64 bit integer may help.

lhash = (long)(lhash +BitConverter.ToInt64(buffer, 0)); // The exception

In this case compiler should know, that you know what you do, becouse you directly convert result to long and truncate its, so there shouldn't be any error.

for me this pice of code works great, as i sad he had errors only in theory.
Back to top
View user's profile Send private message
Yougli



Joined: 17 Feb 2008
Posts: 9

PostPosted: Thu Nov 27, 2008 3:44 pm    Post subject: Reply with quote
Ok, I tried declaring my variables using ulong instead of long, and the application doesn't throw an exception anymore. But the hash I get is incorrect according to the wiki.
I get 55f61777884dc435 instead of 8e245d9679d31e12.

The code:
Code:

      private static byte[] ComputeMovieHash(Stream input)
      {
            ulong lhash, streamsize;
            streamsize = (ulong) input.Length;
            lhash = streamsize;
 
            long i = 0;
            byte[] buffer = new byte[sizeof(long)];
            while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0))
            {
                i++;
                lhash = BitConverter.ToUInt64(buffer, 0);
            }
 
            input.Position = (long) Math.Max(0, streamsize - 65536);
            i = 0;
            while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0))
            {
                i++;
                lhash += BitConverter.ToUInt64(buffer, 0);
            }
            input.Close();
            byte[] result = BitConverter.GetBytes(lhash);
            Array.Reverse(result);
            return result;
      }


Thanks for your help
Back to top
View user's profile Send private message
Cougar_



Joined: 23 May 2008
Posts: 19

PostPosted: Thu Nov 27, 2008 5:35 pm    Post subject: Reply with quote
you forgot add operator in(first loop):

lhash = BitConverter.ToUInt64(buffer, 0);

When I add missing operator, then I compared result with C++ procedure that I use and returned value was the same;

for overflov error (if it happens) try:
lhash = (ulong)(lhash + BitConverter.ToUInt64(buffer, 0));


I suggest to add this line before first loop:
input.Position = 0;
for sure that stream is at the begining.
Back to top
View user's profile Send private message
Yougli



Joined: 17 Feb 2008
Posts: 9

PostPosted: Fri Nov 28, 2008 5:15 am    Post subject: Reply with quote
Thank you for pointing I forgot the add operator.
I tried what you told me, but I'm still getting an overflow exception at the same line, at 13th iteration this time.

Code:
Code:

      private static byte[] ComputeMovieHash(Stream input)
      {
            ulong lhash, streamsize;
            streamsize = (ulong) input.Length;
            lhash = streamsize;
 
            long i = 0;
            byte[] buffer = new byte[sizeof(long)];
            input.Position = 0;
            while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0))
            {
                i++;
                //lhash += BitConverter.ToUInt64(buffer, 0);
                lhash = (ulong)(lhash + BitConverter.ToUInt64(buffer, 0));
            }
 
            input.Position = (long) Math.Max(0, streamsize - 65536);
            i = 0;
            while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0))
            {
                i++;
                //lhash += BitConverter.ToUInt64(buffer, 0);
                lhash = (ulong)(lhash + BitConverter.ToUInt64(buffer, 0));
            }
            input.Close();
            byte[] result = BitConverter.GetBytes(lhash);
            Array.Reverse(result);
            return result;
      }
Back to top
View user's profile Send private message
Yougli



Joined: 17 Feb 2008
Posts: 9

PostPosted: Mon Dec 01, 2008 5:07 pm    Post subject: Reply with quote
Update:
I removed the overflow check in the project compiler settings, and it works fine now with the code submitted in the wiki :p
Back to top
View user's profile Send private message
os
Site Admin


Joined: 25 Feb 2006
Posts: 1229

PostPosted: Tue Dec 02, 2008 7:23 am    Post subject: Reply with quote
ok, I will add this thread to wiki. Also dont forget to test BOTH files, which are in wiki, so they will give you same hash.
_________________
Support us

Back to top
View user's profile Send private message
Yougli



Joined: 17 Feb 2008
Posts: 9

PostPosted: Tue Dec 02, 2008 12:18 pm    Post subject: Reply with quote
The code works fine for both files Smile
Back to top
View user's profile Send private message
Cougar_



Joined: 23 May 2008
Posts: 19

PostPosted: Wed Dec 03, 2008 2:33 am    Post subject: Reply with quote
Yougli wrote:
Update:
I removed the overflow check in the project compiler settings, and it works fine now with the code submitted in the wiki :p


As I said, overflow is normal in this code and everything you should is to disable overflow checking.

I think, you should first read a book about C# and meet every aspect of programing in this language.

I quickly look to: Sams Teach Yourself C# in 21 Days by Bradley Jones on http://books.google.com and guess what I found (I was looking for: How to ignore overflow errors)?? :PPP
In c# is special clause to force compiler to ignore or to check some expression: checked/unchecked without need to globaly turn overflow checking option on/off RazzP

So, everything you need is change both lines from:
lhash += BitConverter.ToInt64(buffer, 0);

to:

unchecked { lhash += BitConverter.ToInt64(buffer, 0); }


So, first read a book about C# syntax ;PP

And Yes, both code works, but that in which you use ulong is better becouse it correctly implement arithmetic operations.

Code:
        private static byte[] ComputeMovieHash(Stream input)
        {
            ulong lhash;
            long streamsize;
            streamsize = input.Length;
            lhash = (ulong)streamsize;
 
            long i = 0;
            byte[] buffer = new byte[sizeof(long)];
            input.Position = 0;
            while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0))
            {
                i++;
               unchecked { lhash += BitConverter.ToUInt64(buffer, 0); }
            }
 
            input.Position = Math.Max(0, streamsize - 65536);
            i = 0;
            while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0))
            {
                i++;
               unchecked { lhash += BitConverter.ToUInt64(buffer, 0); }
            }           
            byte[] result = BitConverter.GetBytes(lhash);
            Array.Reverse(result);
            return result;
        }


I removed from code line that is closing stream. If steram is opened outside procedure it should be closed outside too.
Back to top
View user's profile Send private message
Post new topic   Reply to topic    OpenSubtitles.org Forum Index // Developing All times are GMT + 2 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Powered by phpBB © 2001, 2002 phpBB Group