Friday, August 17, 2018

#1 2012-09-04 04:28:57 pm

DJ Bazzie Wazzie
Member
From:: the Netherlands
Registered: 2004-10-20
Posts: 2795
Website

Hash table: A good alternative for records

Hello all,

For educational reasons I will show and explain how hash tables works in it's simplest form. There are other, and in some scenarios faster, solutions to skin this cat but that's not the point of this topic. I would like to keep this topic plain AppleScript and has to work with hash function and the code needs to be easy to understand, any input is welcome that will meet this requirements.

For those who aren't familiar with hash tables, a small explanation. A hash table will create an hash from it's key, mostly a string, where the encrypted value meets an index in an array. When you have a list of key values pairs you replace the part of iterating through the list to lookup the key with a small hash function, resulting in a good performing key value pair list.

Applescript:

script hashTable
   on newInstance()
       script hashTableInstance
           property parent : hashTable
           property size : missing value
           property hashList : missing value
           property keyList : missing value
           property valueList : missing value
       end script
   end newInstance
   
   on init()
       set my size to 0
       set my hashList to {}
       repeat 1000 times
           set end of my hashList to {}
       end repeat
       set my keyList to {}
       set my valueList to {}
       return me
   end init
   
   on initWithKeysAndValues(_keys, _values)
       set my keyList to _keys
       set my valueList to _values
       set my size to count my keyList
       set my hashList to {}
       repeat 1000 times
           set end of my hashList to {}
       end repeat
       repeat with x from 1 to my size
           set end of item hashFunction(item x of my keyList) of my hashList to x
       end repeat
       return me
   end initWithKeyAndValues
   
   on setValueforKey(_key, _value)
       set x to hashFunction(_key)
       repeat with node in item x of my hashList
           if id of item node of my keyList = id of _key then
               set item node of my valueList to _value
               return true
           end if
       end repeat
       set my size to (my size) + 1
       set end of my valueList to _value
       set end of my keyList to _key
       set end of item x of my hashList to my size
   end setValueforKey
   
   on valueForKey(_key)
       set x to hashFunction(_key)
       repeat with node in item x of my hashList
           if id of item node of my keyList = id of _key then
               return item node of my valueList
           end if
       end repeat
       return missing value
   end valueForKey
   
   on keyExists(_key)
       considering case
           return my keyList contains _key
       end considering
   end keyExists
   
   on keys()
       return my keyList
   end keys
   
   on setKeys(_keys)
       set _keys to every text of _keys
       if (count _keys) = (count my keyList) then
           set my keyList to _keys
           --need to re-hash
           set my hashList to {}
           repeat 1000 times
               set end of my hashList to {}
           end repeat
           repeat with x from 1 to my size
               set end of item hashFunction(item x of my keyList) of my hashList to x
           end repeat
       end if
   end setKeys
   
   on valueExists(_value)
       return my valueList contains _value
   end valueExists
   
   on values()
       return my valueList
   end values
   
   on setValues(_values)
       if (count _values) = (count my valueList) then set my valueList to _values
   end setValues
   
   on count
       return my size
   end count

   on hashFunction(_key)
       set _hash to count _key
       --Credits to Shane Stanly here, supports single character keys now.
       repeat with char in (id of _key as list)
           set _hash to _hash + char
       end repeat
       return _hash mod 1000 + 1
   end hashFunction
end script

The methods:
- newInstance() - class method
    create an instance of hashtable
- init() --instance method
    Initialize the instance, all values and size will be set to 0
- initWithKeysAndValues(_keys, _values) --instance method
    Initialize the instance, and all given values are set
- setValueforKey(_key, _value) - instance method
    Insert a key and value into the list; if a key already exists then it will be updated with the value
    Keys are case sensitive

- valueForKey(_key) - instance method
    Return the associated value by key; if a given key doesn't exists it returns missing value
    Keys are case sensitive

- keyExists(_key) - instance method
    Return a boolean whether a key already exists or not; note this method is case sensitive
- keys() instance method
    Return a list with all the keys
- setKeys(_keys) - instance method
    Set new keys for all the values. The keys must be a all of type string and length should match the number of current keys
    Problem with big lists is that changing the keys that the whole table needs to be re-hashed
- valueExists(_value) - instance method
    Return a boolean whether a value is already in the list or not.
- values() - instance method
    Return a list with all the values
- setValues(_values) - instance method
    set the values for all keys. Can be usefull when you have a fixed list with a fixed key table
- count - instance method
    Returns the number of items in the list
- hashFunction(_key) - class method
    The function that returns the associated item in the hash table

Note: Difference between instance method and class method is that class method should be called from the hashtable object immediatly and instance method should be called from the object returned by the newInstance class method.

USAGE:
First we can create an empty list by using:

Applescript:


set dict to hashTable's newInstance()'s init()

Now we've created a instance that is initialized and contains no data.  To create a list with values you can use:

Applescript:

set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})

now to lookup an value you can use:

Applescript:


set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's valueForKey("Second")

Which returns "Green" but keep in mind that the key's are case sensitive. So when I lookup "second" it would return missing value instead of Green. To avoid this behavior you should capitalize or uncapitalize the key string matches and hashes function to make this case insensitive.

To add an value we should use:

Applescript:

set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's setValueforKey("Last", {1, 2, 3, 5, 6, 7, 8})
dict's valueForKey("Last")

As you can see we can insert other object that string only. You can add any kind of basic AppleScript objects you want like string, numbers, integer, reals, records, lists and data.

To update an item we use the same code as inserting one. When an key exists the old associated value will be overwritten, this makes it also impossible to have two of the same keys. This is a feature not a bug big_smile.

Applescript:

set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's setValueforKey("Second", "Yellow")
dict's valueForKey("Second")

when you don't want to overwrite you can use keyExists(_key) method to check if a key exists before you want to set/add the key value pair

Applescript:


set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
if not dict's keyExists("Second") then
   dict's setValueforKey("Second", "Yellow")
end if
dict's valueForKey("Second")

You can also set immediatly the values without rehashing which can be usefull for things like profiles (fixed key and value list)

Applescript:


set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's setValues({"Cyan", "Magenta", "Yellow"})
dict's valueForKey("Second")

The same can be applied for keys but remember that the list of keys can only contain strings and it needs to rehash the key table. So as long as dictionary contains less than 3000 items there is no problem/lag but when it's 10,000 it needs some time to hash.

Applescript:

set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
dict's setKeys({"Color 1", "Color 2", "Color 3"})
dict's valueForKey("Color 3")

At last, and maybe weird is that count command. Some commands can be overwritten with script objects so you can use the count command for the instance:

Applescript:


set dict to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
count of dict
count dict --is also correct
number of dict --can also be used

So I hope this is enough information and having fun with it!

Already some FAQ answers

Why an hash table size of 1000?
The main reason for that is AppleScript seems to be comfortable with arrays containing up to 1000 items. While the lookup for the hash wouldn't have any problem with a list containing 100,000 items, creating them is so slow you wouldn't use this script object. So creating a bigger hash table is only overhead and initialization would take longer.

Shouldn't it be smaller then?
No because the hash function will output an index by it's key value, the larger the hash table the less collisions we have (multiple indexes in the node) and the faster the item will be returned.

Aren't there other/better hash tables?
Definitely! There are plenty different solutions but I wanted it to keep simple as possible so the code speaks for itself. When someone knows the trick he can create more complex hash function resulting in a better hash code. However, since 10.10 ASObjC is supported in AppleScript so I would use an NSDictionary instead. Again this is only for educational purpose.

What's the main advantage of hash tables?
Apart from being an associative array the main advantage of hash tables is that no matter how large a list gets, it should only affect the lookup performance for a tiny bit (read: collisions). When you look up data from a list containing 100 items, a repeat loop will be fast. But when you need to loop through an list of 100,000 items for a couple of times it would be a lot more data handling. So when the list contains 100 or 100,000 the performance difference should be very small.

Does the 'performance' advantage also applies to this hash table?
Keep in mind that AppleScript and it's foundation is all written in C, it's compared to AppleScript a very high performance programming languages. That implies that self written implementations in plain AppleScript is slower than AppleScript's own implementations written in C (not using an AppleEvent). So compared to lists and records this is slow, but when you lookup values by iterating in plain AppleScript this performance advantage does apply also in this simple hash table.

So why should I use an hash table when I can't use it in a way as in other programming languages?
Well like in my first line, this post is for educational purposes. Another thing is that even creating a list containing 10,000 items are slow, you can still lookup an item in a blink of an eye. There are many cases records or other alternatives are better but this hash table can lookup keys for existence and their keys doesn't need to be set at compile time which is required for records. 

Why is it case sensitive?
Some programming languages does support associative arrays with case insensitive keys but I prefer case sensitive and using lower case key names. It's something personal and also in this context quite easier smile otherwise I had to convert the keys to lower or uppercase first in the hash function so that the outcome of "KEY" and "key" are the same.

Edit 1: Shane Stanley suggested to change the loop to support single character keys
Edit 2: Nigel suggested to change the local variable name 'hash' because it collide with satimage users. Therefor I changed the local variable 'hash' into '_hash'. I also changed dictionary into dict
Edit 3: Shane pointed me to a translation error I made. The method initWithKeysAndValues is now spelled correctly
Edit 4: Updated some information because some of it doesn't apply or AppleScript provides better alternatives by itself now.

Last edited by DJ Bazzie Wazzie (2015-02-20 04:07:25 am)

Offline

 

#2 2012-09-04 06:00:06 pm

Adam Bell
Administrator
From:: Nova Scotia, Canada
Registered: 2005-10-04
Posts: 4663

Re: Hash table: A good alternative for records

Very slick. cool Thank you for the thorough entry.


iMac running OS X 10.13.1

Offline

 

#3 2012-09-05 02:35:45 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4621

Re: Hash table: A good alternative for records

Thanks, DJBW.

I haven't had time to look at this in detail (we're expecting a visitor today), but immediately obvious problems are:

'hash' is a command in the Satimage OSAX and causes the script to error.
'size' and 'count' are reserved words in AppleScript and shouldn't be used out of context.
'dictionary' is also a reserved word on my system (I haven't tracked it down) and prevents the script from compiling.

Instances of the complete words 'hash', 'size', and 'dictionary' can be changed to '|hash|', '|size|', and '|dictionary|' respectively if anyone has problems. The 'count' intercept handler can and should be replaced with an ordinary handler — say '|count|()' — to clarify the code and distinguish between the handler and the times 'count' is used normally. The handler could even be dispensed with altogether, since it only executes a three-word line.


NG

Online

 

#4 2012-09-05 03:28:02 am

DJ Bazzie Wazzie
Member
From:: the Netherlands
Registered: 2004-10-20
Posts: 2795
Website

Re: Hash table: A good alternative for records

First of all thanks for the replies Adam and Nigel!

Nigel, I've changed the hash variable name into _hash to avoid collisions with satimage users. I've also changed dictionary to dict because on one of my machines it seems to be problem as well (xmltools osax installed).

For count I leave it like that. I've always considered this being part of overwriting default commands; semi-sub classing. It executes number of, count of and count so it's fully supported IMO. Overwriting preserved keywords is no problem as long as these properties are preserved properties like name, id, length, size and contents for example. You can't use property names like file and weekday and you're correct about that. That only applies for script objects properties and not locals or globals, for those you should never use preserved keywords IMO.

Offline

 

#5 2012-09-05 05:45:54 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5428

Re: Hash table: A good alternative for records

DJ,

Can I suggest another change? Rather than starting like this:

Applescript:

script hashTable
on newInstance()
script hashTableInstance
property parent : hashTable
property size : missing value
property hashList : missing value
property keyList : missing value
property valueList : missing value
end script
end newInstance

You could start with this:

Applescript:

script hashTable
   property size : missing value
   property hashList : missing value
   property keyList : missing value
   property valueList : missing value
   
   on newInstance()
       return me
   end newInstance

By doing it that way, it becomes more inheritable.

For example, suppose I wanted the methods you provided, except sometimes I want to remove your ban on duplicate keys. You're not actually enforcing it in most of your handlers, which means I can use them as is. So if you wrote it as above I could effectively subclass your script something like this:


Applescript:

script hashTableWithDupes
   property parent : hashTable
   
   on setValueforKey(_key, _value)
       set x to hashFunction(_key)
       set my size to (my size) + 1
       set end of my valueList to _value
       set end of my keyList to _key
       set end of item x of my hashList to my size
   end setValueforKey
   
   on valueForKey(_key)
       set resultList to {}
       set x to hashFunction(_key)
       repeat with node in item x of my hashList
           if id of item node of my keyList = id of _key then
               set end of resultList to item node of my valueList
           end if
       end repeat
       return resultList
   end valueForKey
   
end script

And thus:

Applescript:

set dict to hashTableWithDupes's newInstance()'s initWithKeysAndValues({"First", "Second", "Second"}, {"Red", "Green", "Blue"})
dict's valueForKey("Second")
--> {"Green", "Blue"}

Last edited by Shane Stanley (2012-09-05 05:47:52 pm)


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#6 2012-09-06 01:39:15 am

alastor933
Member
From:: Utrecht, NL
Registered: 2008-09-12
Posts: 545

Re: Hash table: A good alternative for records

Great stuff, DJ, especially for someone who only recently started tinkering with objects and associative lists. Gives me something to chew on, now that evenings in the garden come to an end.

Don't spoil the fun by explaining, but: the list 'hashList' is (part of) a device to quickly find a key/value, or maybe even get them immediately without having to search. That about right?

DJ Bazzie Wazzie wrote:

When an key exists the old associated value will be overwritten, this makes it also impossible to have two of the same keys. This is a feature not a bug.


Same with associative list. I have since found a specific application (that is, applied the device) where duplicate keys must be allowed (*). Uniqueness of keys is not imposed by the device per sé (right?), but from DJ's remark I'm guessing that there are some who feel keys must be unique, and others who don't. I would be grateful for a summary of pros and cons of either approach.

(*) Another scripting aid, this one to support developing Pashua interfaces.

Offline

 

#7 2012-09-06 03:24:24 am

DJ Bazzie Wazzie
Member
From:: the Netherlands
Registered: 2004-10-20
Posts: 2795
Website

Re: Hash table: A good alternative for records

By doing it that way, it becomes more inheritable.


True, but it becomes a singleton and not a real instance.

Applescript:

set theList to {}
set end of theList to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Red", "Green", "Blue"})
set end of theList to hashTable's newInstance()'s initWithKeysAndValues({"First", "Second", "Third"}, {"Cyan", "Magenta", "Cyan"})
valueForKey("Second") of item 1 of theList

When newInstance() returns me (singleton) the code above returns "Magenta" while my original method returns "Green" because there are two instances in the list. You could of course solve this by using a copy command but then you copy the whole script object which is in my opinion is a bit overhead in memory and takes up a lot of time as well.

What I do, your hashTableWithDupes example is great and uses inheritance a lot that way, is copying the newInstance method to the 'sub' script and change the parent property. The script will follow the parenting chain without problems and you only need 1 extra action.

Applescript:


script hashTableWithDupes
   property parent : hashTable
   
   on newInstance()
       script hashTableInstance
           property parent : hashTableWithDupes
           property size : missing value
           property hashList : missing value
           property keyList : missing value
           property valueList : missing value
       end script
   end newInstance
   
   on setValueforKey(_key, _value)
       set x to hashFunction(_key)
       set my size to (my size) + 1
       set end of my valueList to _value
       set end of my keyList to _key
       set end of item x of my hashList to my size
   end setValueforKey
   
   on valueForKey(_key)
       set resultList to {}
       set x to hashFunction(_key)
       repeat with node in item x of my hashList
           if id of item node of my keyList = id of _key then
               set end of resultList to item node of my valueList
           end if
       end repeat
       return resultList
   end valueForKey
end script

I think it's a programmer's perference what you need and want and yours as mine have both their cons and pros. So I hope you don't mind leaving it as it is.

Thank you very much for your input Shane cool

Offline

 

#8 2012-09-06 03:41:10 am

DJ Bazzie Wazzie
Member
From:: the Netherlands
Registered: 2004-10-20
Posts: 2795
Website

Re: Hash table: A good alternative for records

Don't spoil the fun by explaining, but: the list 'hashList' is (part of) a device to quickly find a key/value, or maybe even get them immediately without having to search. That about right?


Correct (partly)! There are collisions which you may find out yourself, don't want to spoil the fun, which needs to be caught. You can play with the hash function and change it, it very easy and straightforward at the moment.

Offline

 

#9 2012-09-06 05:26:40 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5428

Re: Hash table: A good alternative for records

Good point about it being a singleton. I guess I just have trouble with the idea of not inheriting properties/ivars.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#10 2012-09-06 05:28:14 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5428

Re: Hash table: A good alternative for records

alastor933 wrote:

from DJ's remark I'm guessing that there are some who feel keys must be unique, and others who don't.


I think it's more a case that sometimes what you want to do requires one or other approach.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#11 2012-09-06 06:20:10 am

DJ Bazzie Wazzie
Member
From:: the Netherlands
Registered: 2004-10-20
Posts: 2795
Website

Re: Hash table: A good alternative for records

Shane Stanley wrote:

Good point about it being a singleton. I guess I just have trouble with the idea of not inheriting properties/ivars.


You're absolutely right and, either way, I think it's not a nice solution but the best we can get. This, or your way, is the closest you can get but far away from full OOP. Once used to OOPs like C++, PHP or (AppleScript)Objective-C I'm getting a little mixed up feelings when I'm back in AS with script objects again. Simplicity isn't always better big_smile

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)