Page 1 of 1

ComputeMovieHash in C#

Posted: Tue Nov 25, 2008 4:47 pm
by Yougli
Hi,

I'm using the code provided in the wiki to compute the hash of a movie. But I'm getting an overflow exception when executing the code. Any idea please?

Code: Select all

private static byte[] ComputeMovieHash(Stream input) { long lhash, streamsize; streamsize = input.Length; lhash = streamsize; long i = 0; byte[] buffer = new byte[sizeof(long)]; while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0)) { i++; lhash += BitConverter.ToInt64(buffer, 0); // The exception occurs here, at the 3rd iteration with the test avi: breakdance.avi } input.Position = Math.Max(0, streamsize - 65536); i = 0; while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0)) { i++; lhash += BitConverter.ToInt64(buffer, 0); } input.Close(); byte[] result = BitConverter.GetBytes(lhash); Array.Reverse(result); return result; }
Thanks for your help...

Posted: Wed Nov 26, 2008 8:23 am
by stavros_sk
Enclose all the function's operations (code) into a try-catch e.g.:


try
{
//function code
//...
//...
}
catch(Exception e)
{
Console.Writeline(e.ToSting() );
}


...to get a clue where exactly, in which line the error occurs.

Posted: Wed Nov 26, 2008 2:28 pm
by Yougli
I know exactly where the exception occurs, I added a comment in the code in my first post.
Sorry if it wasn't clear...

The error occurs at:

Code: Select all

lhash += BitConverter.ToInt64(buffer, 0);
in the first while loop, at the 3rd iteration for the test avi 'breakdance.avi'

Posted: Wed Nov 26, 2008 6:22 pm
by Cougar_
I'm not C# programmer, so I may mistake, but overflow is normal in this pice of code. Yes it contains errors but even if it doesn't have them it is possible(higly) that overflow occure.

First, signed and unsigned arithmetic looks similary but are often reason of very hard to detect errors.

In this code I think, every variable that participate in calculating hash value should be 64bit unsigned integer!!.
As I look to MSDN "long" in C# is 64bit but not unsigned. So I think this is first mistake.

Second, Binaryconvert converts data stream to SIGNED integer and should to unsigned. Big mistake!!!. Why?
look:
4 + 0xff = 3 => 0xff is -1 in signed arithmetic( of course if we use 8 bit variables)

In unsigned arithmetic 4 + 0xff = 0x103 but variable has obly 8 bits and result require 9 bits, so overflow occured and only 8 bits are save so result is 0x03 - most important bit is lost. This bit trow overflow exception by set in procesor CF flag.

So overflow is normal in calculating hash, simply add
4 + 0xff ff ff ff ff ff ff ff and you have overflow but in this case you should ignore it an value 0x3 is correct as hash.


You may ask, why there is error, when variable is declared as long. In both cases result is the same = 0x03. Yes, when you add two variables of the same width result will be the same, but this is a exception. I'm 100 percent sure, that author wrote this pece of code and left it in this form becouse it works, but not becouse he know about exception in signed/unsigned arithmetic. The only difrences between those two sitiations is moment when error occure, but pseuedo error doesn't interest us in this situation, so finally result is the same in both cases so practicaly this code works great.

When width isn't the same problem begins:

int64 = in64 + signed byte => 4 + 0xff = 3;

int64 = in64 + unsigned byte => 4 + 0xff = 0x103;


Autor of this code probably compile this code with overvlow checknig option turned off.
Hmm, I don't know if C# is so restrictive in checking integer overflow, personaly I doubt that, for me it isn't possible so you must set so restrictive checking in project options - I only guess.

As I mentioned I'm not a C# programer, but I think that direct conversion to 64 bit integer may help.

lhash = (long)(lhash +BitConverter.ToInt64(buffer, 0)); // The exception

In this case compiler should know, that you know what you do, becouse you directly convert result to long and truncate its, so there shouldn't be any error.

for me this pice of code works great, as i sad he had errors only in theory.

Posted: Thu Nov 27, 2008 3:44 pm
by Yougli
Ok, I tried declaring my variables using ulong instead of long, and the application doesn't throw an exception anymore. But the hash I get is incorrect according to the wiki.
I get 55f61777884dc435 instead of 8e245d9679d31e12.

The code:

Code: Select all

private static byte[] ComputeMovieHash(Stream input) { ulong lhash, streamsize; streamsize = (ulong) input.Length; lhash = streamsize; long i = 0; byte[] buffer = new byte[sizeof(long)]; while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0)) { i++; lhash = BitConverter.ToUInt64(buffer, 0); } input.Position = (long) Math.Max(0, streamsize - 65536); i = 0; while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0)) { i++; lhash += BitConverter.ToUInt64(buffer, 0); } input.Close(); byte[] result = BitConverter.GetBytes(lhash); Array.Reverse(result); return result; }
Thanks for your help

Posted: Thu Nov 27, 2008 5:35 pm
by Cougar_
you forgot add operator in(first loop):

lhash = BitConverter.ToUInt64(buffer, 0);

When I add missing operator, then I compared result with C++ procedure that I use and returned value was the same;

for overflov error (if it happens) try:
lhash = (ulong)(lhash + BitConverter.ToUInt64(buffer, 0));


I suggest to add this line before first loop:
input.Position = 0;
for sure that stream is at the begining.

Posted: Fri Nov 28, 2008 5:15 am
by Yougli
Thank you for pointing I forgot the add operator.
I tried what you told me, but I'm still getting an overflow exception at the same line, at 13th iteration this time.

Code:

Code: Select all

private static byte[] ComputeMovieHash(Stream input) { ulong lhash, streamsize; streamsize = (ulong) input.Length; lhash = streamsize; long i = 0; byte[] buffer = new byte[sizeof(long)]; input.Position = 0; while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0)) { i++; //lhash += BitConverter.ToUInt64(buffer, 0); lhash = (ulong)(lhash + BitConverter.ToUInt64(buffer, 0)); } input.Position = (long) Math.Max(0, streamsize - 65536); i = 0; while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0)) { i++; //lhash += BitConverter.ToUInt64(buffer, 0); lhash = (ulong)(lhash + BitConverter.ToUInt64(buffer, 0)); } input.Close(); byte[] result = BitConverter.GetBytes(lhash); Array.Reverse(result); return result; }

Posted: Mon Dec 01, 2008 5:07 pm
by Yougli
Update:
I removed the overflow check in the project compiler settings, and it works fine now with the code submitted in the wiki :p

Posted: Tue Dec 02, 2008 7:23 am
by oss
ok, I will add this thread to wiki. Also dont forget to test BOTH files, which are in wiki, so they will give you same hash.

Posted: Tue Dec 02, 2008 12:18 pm
by Yougli
The code works fine for both files :)

Posted: Wed Dec 03, 2008 2:33 am
by Cougar_
Update:
I removed the overflow check in the project compiler settings, and it works fine now with the code submitted in the wiki :p
As I said, overflow is normal in this code and everything you should is to disable overflow checking.

I think, you should first read a book about C# and meet every aspect of programing in this language.

I quickly look to: Sams Teach Yourself C# in 21 Days by Bradley Jones on http://books.google.com and guess what I found (I was looking for: How to ignore overflow errors)?? :PPP
In c# is special clause to force compiler to ignore or to check some expression: checked/unchecked without need to globaly turn overflow checking option on/off :PP

So, everything you need is change both lines from:
lhash += BitConverter.ToInt64(buffer, 0);

to:

unchecked { lhash += BitConverter.ToInt64(buffer, 0); }


So, first read a book about C# syntax ;PP

And Yes, both code works, but that in which you use ulong is better becouse it correctly implement arithmetic operations.

Code: Select all

private static byte[] ComputeMovieHash(Stream input) { ulong lhash; long streamsize; streamsize = input.Length; lhash = (ulong)streamsize; long i = 0; byte[] buffer = new byte[sizeof(long)]; input.Position = 0; while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0)) { i++; unchecked { lhash += BitConverter.ToUInt64(buffer, 0); } } input.Position = Math.Max(0, streamsize - 65536); i = 0; while (i < 65536 / sizeof(long) && (input.Read(buffer, 0, sizeof(long)) > 0)) { i++; unchecked { lhash += BitConverter.ToUInt64(buffer, 0); } } byte[] result = BitConverter.GetBytes(lhash); Array.Reverse(result); return result; }
I removed from code line that is closing stream. If steram is opened outside procedure it should be closed outside too.

Re: ComputeMovieHash in C#

Posted: Wed Mar 07, 2012 11:07 am
by kokoko3k
Post edited.
(nevermind, i was wrong)

Re: ComputeMovieHash in C#

Posted: Mon Jun 09, 2014 5:54 pm
by Koko
I'd recommend someone with access to the trac wiki change the code to the following, to address the above issue, and to simplify the code:

Code: Select all

using System; using System.Text; using System.IO; namespace MovieHasher { class Program { private static ulong GetHash(string filepath) { using (FileStream input = File.OpenRead(filepath)) { ulong lhash = (ulong)input.Length; byte[] buf = new byte[65536 * 2]; input.Read(buf, 0, 65536); input.Position = Math.Max(0, input.Length - 65536); input.Read(buf, 65536, 65536); for (int i = 0; i < 2 * 65536; i += 8) unchecked { lhash += BitConverter.ToUInt64(buf, i); } return lhash; } } static void Main(string[] args) { ulong moviehash = GetHash(@"C:\test.avi"); Console.WriteLine("The hash of the movie-file is: {0}", moviehash.ToString("x16")); } } }

This will only work on a little endian machine, like the C and C++ code, for two reasons: BitConverter and the hex conversion.
Hardly anyone will ever want to compile this on a big endian machine, but in case:

Change the loop to

Code: Select all

for (int i = 0; i < 2 * 65536; ) unchecked { //source data is always considered little endian, BitConverter won't correctly convert that on big endian platforms -> convert it manually lhash += (ulong)buf[i++] << 0 | (ulong)buf[i++] << 8 | (ulong)buf[i++] << 16 | (ulong)buf[i++] << 24 | (ulong)buf[i++] << 32 | (ulong)buf[i++] << 40 | (ulong)buf[i++] << 48 | (ulong)buf[i++] << 56; }
and use

Code: Select all

private static string ToLittleEndianHexadecimal(ulong l) { StringBuilder hexBuilder = new StringBuilder(); for (int shift = 56; shift >= 0; shift -= 8) { hexBuilder.Append((l >> shift & 0xFF).ToString("x2")); } return hexBuilder.ToString(); }
to convert the hash to a hex string.