Translation by speaker of c? I've forgotten what little I had.

I’m trying to understand how hash tables work, and this “c” function computes a hash for a name. Can someone translate the “Sum up all the characters in the string” line for me?

I assumed it was summing ASCII numbers for the characters in the string, but I’m getting different answers (I’m followiing the article in AppleScript) for one case where a collision is supposed to happen (but doesn’t in my script) in the example I’m following.

[code]int hash(char *str, int table_size)
{
int sum;

/* Make sure a valid string passed in */
if (str==NULL) return -1;

/* Sum up all the characters in the string */
for( ; *str; str++) sum += *str;

/* Return the sum mod the table size */
return sum % table_size;

}[/code]

Hi Adam,

but that’s exactly what it does - I have added some output to this line so you can see what’s going on:


int main (int argc, const char * argv[]) {

char *str = "0123456789 ABCDEF";
int sum = 0;

for( ; *str; str++) {
	printf("sum = %04d ", sum);
	printf("(Stringpointer str ->%-22s) ", str);
	printf("adding value of *str (= %03d) ",*str);
	sum += *str;
	printf("result: sum = %04d\r",sum);
	}
    return 0;
}

Here the output: (sorry - is there any way to choose a monospaced font in this forum?)

sum = 0000 (Stringpointer str ->0123456789 ABCDEF ) adding value of *str (= 048) result: sum = 0048
sum = 0048 (Stringpointer str ->123456789 ABCDEF ) adding value of *str (= 049) result: sum = 0097
sum = 0097 (Stringpointer str ->23456789 ABCDEF ) adding value of *str (= 050) result: sum = 0147
sum = 0147 (Stringpointer str ->3456789 ABCDEF ) adding value of *str (= 051) result: sum = 0198
sum = 0198 (Stringpointer str ->456789 ABCDEF ) adding value of *str (= 052) result: sum = 0250
sum = 0250 (Stringpointer str ->56789 ABCDEF ) adding value of *str (= 053) result: sum = 0303
sum = 0303 (Stringpointer str ->6789 ABCDEF ) adding value of *str (= 054) result: sum = 0357
sum = 0357 (Stringpointer str ->789 ABCDEF ) adding value of *str (= 055) result: sum = 0412
sum = 0412 (Stringpointer str ->89 ABCDEF ) adding value of *str (= 056) result: sum = 0468
sum = 0468 (Stringpointer str ->9 ABCDEF ) adding value of *str (= 057) result: sum = 0525
sum = 0525 (Stringpointer str → ABCDEF ) adding value of *str (= 032) result: sum = 0557
sum = 0557 (Stringpointer str ->ABCDEF ) adding value of *str (= 065) result: sum = 0622
sum = 0622 (Stringpointer str ->BCDEF ) adding value of *str (= 066) result: sum = 0688
sum = 0688 (Stringpointer str ->CDEF ) adding value of *str (= 067) result: sum = 0755
sum = 0755 (Stringpointer str ->DEF ) adding value of *str (= 068) result: sum = 0823
sum = 0823 (Stringpointer str ->EF ) adding value of *str (= 069) result: sum = 0892
sum = 0892 (Stringpointer str ->F ) adding value of *str (= 070) result: sum = 0962

Where exactly is the point you get different answers?

D.

oh i forgot - here the AppleScript translation:

set str to "0123456789 ABCDEF" as string
set sum to 0
set strlist to every character of str
repeat with thisCharacter in strlist
	set sum to sum + (ASCII number of thisCharacter)
end repeat

get sum

During breakfast I thought a little about your problem. Maybe you have tried this with strings containing non standard ascii characters ( > 128?) I am sure this will give different results …

D.

Iterate over each byte in a (null-terminated) string, adding the value of each to the number stored in variable ‘sum’.

BTW, what you want a hash function for? If you’re thinking of writing a hash table in AppleScript, don’t bother: a vanilla hash function is far too slow to be practical. There are more appropriate algorithms, e.g. the Dictionary object in AppleMods’ Types library uses a binary search to locate items in an ordered list; balanced binary trees and skip lists are also worth investigating.

OTOH, if you’re just interested in learning about algorithms, there’s no shortage of online and dead tree material available, e.g. http://www.nist.gov/dads

HTH

Thanks for the reponses. Dominik, that was exactly the script I had running, except that I found sum to be an AppleScript word so used t for total.

HHAS, thank you too - I was, as you guessed, simply working my way through an AppleScript equivalent of a simple hash table I found in SparkNotes so I’d understand the basic idea better than I do (for me to use something intelligently requires that I understand what’s going on and why at least in part).

In the particular example I was following, the author found that “Steve” and “Notes” both hashed (sum of ASCII numbers mod 12) to 3 and therefore had a collision. I didn’t get that result.

set t to 0
repeat with ch in "Steve"
	set t to t + (ASCII number of ch)
end repeat
t mod 12
--> 3

set t to 0
repeat with ch in "Notes"
	set t to t + (ASCII number of ch)
end repeat
t mod 12
--> 5

Probably a mistake by this author. I tried in C and get absolutely the same results as your AppleScripts:

[Session started at 2006-03-24 16:10:23 +0100.]
sum = 0000 (Stringpointer str ->Steve ) adding value of *str (= 083) result: sum = 0083
sum = 0083 (Stringpointer str ->teve ) adding value of *str (= 116) result: sum = 0199
sum = 0199 (Stringpointer str ->eve ) adding value of *str (= 101) result: sum = 0300
sum = 0300 (Stringpointer str ->ve ) adding value of *str (= 118) result: sum = 0418
sum = 0418 (Stringpointer str ->e ) adding value of *str (= 101) result: sum = 0519
result of sum % 12: 3
hash has exited with status 0.
[Session started at 2006-03-24 16:10:52 +0100.]
sum = 0000 (Stringpointer str ->Notes ) adding value of *str (= 078) result: sum = 0078
sum = 0078 (Stringpointer str ->otes ) adding value of *str (= 111) result: sum = 0189
sum = 0189 (Stringpointer str ->tes ) adding value of *str (= 116) result: sum = 0305
sum = 0305 (Stringpointer str ->es ) adding value of *str (= 101) result: sum = 0406
sum = 0406 (Stringpointer str ->s ) adding value of *str (= 115) result: sum = 0521
result of sum % 12: 5
hash has exited with status 0.

D.

Thanks for the validation - I thought I understood what the “c” was doing, and it appears that I did.

Yeah, understanding how basic algorithms and data structures work is useful, and writing your own is a good way to figure them out even if the resulting implementations aren’t always suitable for everyday use.

If you do need to hash strings for some purpose, your best bet would be to wrap your C-based hash function as a scripting addition. Alternatively, TextCommands includes a hash command that’s quick and easy to use.

HTH

I have TextCommands, but haven’t used it much. I want the hash function so I can play with Baysian filtering which requires training by exposure to texts that lack or contain the elements you want to filter. I’ll study TextCommands more thoroughly. I don’t want to write a spam filter - I just want to understand how Baysian filtering works and test it on something I have available, like emails from a certain person compared to others.

Thanks;

Adam

IIRC, Bayesian filtering is based on building a word frequency table, then toting up the spamminess times frequency of each word, so any key-value data structure should do for collecting the words; it doesn’t have to be a hash table. Example (uses List and Types libraries from AppleMods):

property _Loader : run application (get "LoaderServer")

----------------------------------------------------------------------
-- DEPENDENCIES

property _List : missing value
property _Types : missing value

on __load__(loader)
	set _List to loader's loadLib("List")
	set _Types to loader's loadLib("Types")
end __load__

----------------------------------------------------------------------

__load__(_Loader's makeLoader())

set d to _Types's makeDict()

set str to "I have TextCommands, but haven't used it much. I want the hash function so I can play with Baysian filtering which requires training by exposure to texts that lack or contain the elements you want to filter. I'll study TextCommands more thoroughly. I don't want to write a spam filter - I just want to understand how Baysian filtering works and test it on something I have available, like emails from a certain person compared to others."

repeat with wordRef in (get words of str)
	set theWord to contents of wordRef
	if not d's itemExists(theWord) then
		d's setItem(theWord, 1)
	else
		d's setItem(theWord, (d's getItem(theWord)) + 1)
	end if
end repeat
set lst to _List's recomposeList({d's getKeys(), d's getValues()})
reverse of _List's sortListOfLists(lst, {2})
--> {{"I", 6}, {"to", 5}, {"want", 4}, {"the", 2}, {"TextCommands", 2}, {"it", 2}, {"have", 2}, ...}

HTH

Interesting. Your script compiles properly, but baulks at itemExists(theWord) with the message “Can’t continue itemExists”, so something’s askew with my version of _List’s handler for that.

Tsk. Appears to be a bug in version 0.1.0 of the Types library, sorry 'bout that. The one I’ve got was subsequently 11/04, so I’m guessing it got fixed then but the update never got sent to AM for some reason (forgetfulness, laziness, etc…:). I’ve fired off an update to Gary; meantime you can grab a copy at http://freespace.virgin.net/hamish.sanderson/Types.sit

Ta for the heads-up,

has

I grabbed it, dropped it into ~/Library/Scripts/Library Starter Bundle/ in place of the original (that came with the download of the whole), and that wasn’t it. Still hangs at the same instruction: “if not d’s itemExists(theWord) then”.

Needs to go into /Library/Scripts/ASLibraries or ~/Library/Scripts/ASLibraries

HTH

Thank you, hhas - that was the problem. I had forgotten that the libraries were separate from the starter bundle.