Establishing links in Entourage 2008 between messages and contacts

Guys,
This will take the emails in all folders, except those that logically are uninteresting (Junk, Deleted, Sent, …) and establishes links to the sender’s contact information in the Address Book, if an entry exists. I use this weekly as I am constantly gaining new contacts and many times emails have come from them prior to me creating an address book entry.

So, OMM this runs somewhere between 560 and 700 seconds to handle 22000 emails and about 2000 contacts in each an Exchange Contact list and an Address Book (nearly all duplicated entries). It creates a log file to check your timing. Notice that the number of emails is not the determining factor for how long a folder takes it is the number of links per contact and the number of addresses found that belong to contacts in a folder.

I tweaked it to squeeze out max speed without resorting to one final trick… getting this to run in parallel as separate sub-applications of a single application. This may eke out a faster result but will be pushing Entourage to its limits, I believe. One trick used, however, was the ordering of found addresses by occurrence and not in alpha order, as my logic was that I get more than the majority of my emails from a finite number of people and so their related email addresses ought to be checked first. It did speed it up a bit.


-- a derivative script based on one from Jolly Roger <jollyroger@pobox.com> that I found some time ago
--author of this version: John Egan
--version 1.0.0 - revamped how it works... tweaked for speed
--  OMM (early 2009 MacBook with 4 Gig RAM, 2.4 GHz processors) over 22K emails with about 2000 contacts in each of the regular Address Book and Exchange Contacts Address Book, this runs a varying amount of time (averages ~700 seconds, with spikes up to 1000 seconds) due to other processes running and Entourage offline or not looking for emails from 7 accounts every 5 minutes
--  tested with Entourage 2008

property pLogEventsOn : true -- generate a log file, add to an existing one
property pReportContacts : false --  if the contacts should be listed in the log (false as default, set to true to verify operation)

global gFolderList --   the list of folders in Entourage, checked each run for validity
global gLogFolder --  where the log file is stored
global gLogFileName --  the log file name
global gUnixEOL -- use this line feed character in logs, Unix end of line character
global gStartTime --   when the script began
global gSenderMatchList --  sender matches already processed
global gContactIDRecords --  the contacts for a particular email address, note that this spans contacts and address book
global gEmailAddressList --  list of all email addresses in Address Book [check adding/toggling with Contacts in Exchange environment]

set gEmailAddressList to {}
set gContactIDRecords to {}
set gSenderMatchList to {}
set gUnixEOL to ASCII character 10 -- use this line feed character in logs, Unix end of line character
set gLogFileName to "Link Contact to Email Message.log"
set gFolderList to {}
set LogFolderLocation to the path to the desktop --   where log and  archive folders are, default
set LogFolderName to "Entourage Maintenance Log Messages" --  where I store all logs of Entourage maiontenance
set gLogFolder to ((LogFolderLocation as text) & LogFolderName as text) & ":"
set FoldersToSkip to {"Drafts", "Outbox", "Sent Items", "Deleted Items", "Junk E-mail"} --  saves time on wasted efforts, leave off top level folders only with these names in case user uses same names in lower level folders

--  ==========  Initial entry to text file
if pLogEventsOn is true then my LogEntry("Starting up ...", 1) --   the 1 means to start the timer to track how long this took
if pLogEventsOn is true then my LogEntry("contacts followed by ' *' signifies links to multiple records", 0)
--  ==========  Get a list of all of the email addresses in the Address Book
set text item delimiters to ","
tell application "Microsoft Entourage"
	ignoring application responses
		set working offline to true
	end ignoring
	set gEmailAddressList to every email address of every contact as text
	ignoring application responses
		set working offline to false
	end ignoring
end tell
set text item delimiters to ""
if pLogEventsOn is true then my LogEntry("Email addresses placed in a list...", 0)
--  ==========  Get a list of all folders
tell application "Microsoft Entourage" to set TopFolderList to the folders
repeat with nextFolder in TopFolderList
	if (name of nextFolder) as text is not in FoldersToSkip then
		my GetFolderListing(nextFolder)
	end if
end repeat
if pLogEventsOn is true then my LogEntry("Completed folder list creation...", 0)
-- ================ Process the folders 1 by 1 and establish links between emails and contacts
repeat with nextFolder in gFolderList
	if pLogEventsOn is true then my LogEntry(".................  Starting folder: " & the name of nextFolder, 0)
	ProcessFolder(nextFolder)
end repeat
if pLogEventsOn is true then my LogEntry("Finished", 2)

--  Handlers
--  ========== Get the list of all email storage folders in Entourage plus the top level standard ones
on GetFolderListing(someFolder) --a recursive handler, finds nested email folders in Entourage
	set end of gFolderList to someFolder
	tell application "Microsoft Entourage" to set subFolderList to {} & someFolder's folders
	if subFolderList is not {} then -- this folder has subfolders
		repeat with SubFolder in subFolderList -- process each subfolder
			my GetFolderListing(SubFolder)
		end repeat
	end if
end GetFolderListing
--  ==========  Process each item of list of folders
on ProcessFolder(someFolder)
	set senderList to {}
	script AScript
		property alist : senderList
	end script
	tell application "Microsoft Entourage"
		set theMsgCount to count of messages of someFolder
		ignoring application responses
			set working offline to true
		end ignoring
		set AScript's alist to (address of sender of every message of someFolder) as list --  if list is blank returns {}
		ignoring application responses
			set working offline to false
		end ignoring
	end tell
	if pLogEventsOn is true then my LogEntry("Number of msgs: " & theMsgCount, 0)
	if theMsgCount > 0 then --number of messages in the folder
		set AScript's alist to my CutDupesNonContactsAndSort(AScript's alist)
		if pLogEventsOn is true then my LogEntry("Contact email addresses: " & (count of AScript's alist), 0)
		my LinkMessages(someFolder, AScript's alist) --  takes folder and the sender list and scans for matches to Address Book contacts
	end if
	if pLogEventsOn is true then my LogEntry("Completed folder", 0)
end ProcessFolder
--  ===========
on LinkMessages(someFolder, thePassedSenderList) --  thePassedSenderList only contains email addresses associated with contacts
	set ContactsFound to {}
	script AddBook
		property senderMatchList : gSenderMatchList
		property ContactRecs : gContactIDRecords
		property FolderContacts : ContactsFound
		property theSenderList : thePassedSenderList
	end script
	ignoring application responses
		tell application "Microsoft Entourage" to set working offline to true
	end ignoring
	repeat with emailSender in AddBook's theSenderList
		set emailSenderText to emailSender as text
		if emailSenderText is not in AddBook's senderMatchList then -- checks to see if this email address was encountered alreaady
			tell application "Microsoft Entourage"
				set senderContact to find emailSender
				set messageList to (every message of someFolder whose address of sender is emailSender)
				ignoring application responses
					link messageList to senderContact
				end ignoring
			end tell
			set theRecord to {TheSender:"", TheContact:"", TheCount:1, TheDisplayName:""}
			set the end of AddBook's senderMatchList to emailSenderText
			set matchTrack to 0
			repeat with IDItem in senderContact
				if AddBook's FolderContacts does not contain id of IDItem as text then
					set the end of AddBook's FolderContacts to id of IDItem as text
				else if pReportContacts is true then
					set matchTrack to matchTrack + 1
				end if
			end repeat
			if pReportContacts is true then
				set LengthSC to length of senderContact
				if matchTrack ≠ LengthSC then
					if the LengthSC > 1 then
						set DisplayName to ""
						repeat with theContactID in senderContact
							tell application "Microsoft Entourage" to set CurrentDisplayName to display name of theContactID
							if DisplayName ≠ "" and DisplayName ≠ CurrentDisplayName then
								set DisplayName to DisplayName & ", " & CurrentDisplayName
							else if DisplayName = "" then
								set DisplayName to CurrentDisplayName
							end if
						end repeat
						set DisplayName to DisplayName & " *"
					else
						tell application "Microsoft Entourage"
							try
								set DisplayName to display name of senderContact
							on error -- Entourage is offline and contact in Exchange AB only or no display name exists
								set DisplayName to emailSenderText
							end try
						end tell
					end if
					if pLogEventsOn is true and pReportContacts is true then my LogEntry("linked msgs from: " & DisplayName, 0)
				end if
				copy DisplayName to TheDisplayName of theRecord
			end if
			copy emailSenderText to TheSender of theRecord
			copy senderContact to TheContact of theRecord
			set the end of AddBook's ContactRecs to theRecord
		else --  there is a match to the email address meaning its related info is already found and in a record
			set ListItemID to FindSenderAndIncrementRecord(emailSender, emailSenderText)
			set matchTrack to 0
			if class of ListItemID is integer then
				set senderContact to {} & (TheContact of (record ListItemID of AddBook's ContactRecs))
				set DisplayName to (TheDisplayName of (record ListItemID of AddBook's ContactRecs))
			else
				set senderContact to ListItemID
				set DisplayName to emailSenderText
			end if
			tell application "Microsoft Entourage"
				set messageList to (every message of someFolder whose address of sender is emailSender)
				ignoring application responses
					link messageList to senderContact
				end ignoring
			end tell
			repeat with IDItem in senderContact
				if AddBook's FolderContacts does not contain (id of IDItem as text) then
					set the end of AddBook's FolderContacts to id of IDItem as text
				else if pReportContacts is true then
					set matchTrack to matchTrack + 1 --  tracks how many matches were found
				end if
			end repeat
			if pReportContacts is true then
				if matchTrack ≠ length of senderContact then
					if pLogEventsOn is true then my LogEntry("linked msgs from: " & DisplayName, 0)
				end if
			end if
		end if
	end repeat
	ignoring application responses
		tell application "Microsoft Entourage" to set working offline to false
	end ignoring
	set gSenderMatchList to AddBook's senderMatchList
	set gContactIDRecords to AddBook's ContactRecs
end LinkMessages
--  ==========
on CutDupesNonContactsAndSort(thePassedList) --remove duplicates & email addresses not associated with contacts, then sort list
	set theModifiedList to {}
	set theReturnList to {}
	script j
		property EmailList : gEmailAddressList
		property PassedList : thePassedList
		property ModifiedList : theModifiedList
	end script
	ignoring case
		repeat with theItem in j's PassedList
			set theItem to theItem as text
			if j's ModifiedList does not contain theItem then --  cuts out dupes
				if theItem is in j's EmailList then --  cuts out any email address not associated with a contact, reduces sort time
					set the end of j's ModifiedList to theItem
				end if
			end if
		end repeat
	end ignoring
	set theLength to the length of j's ModifiedList
	if theLength > 10 then
		set theReturnList to AdamSort(j's ModifiedList, 1, theLength)
	else
		set theReturnList to bubbleSwapSort(j's ModifiedList)
	end if
	return theReturnList
end CutDupesNonContactsAndSort
--  ==========
on AdamSort(array, leftEnd, rightEnd) -- Hoare's QuickSort Algorithm
	script A
		property L : array
	end script
	set {i, j} to {leftEnd, rightEnd}
	set v to item ((leftEnd + rightEnd) div 2) of A's L -- pivot in the middle
	repeat while (j > i)
		repeat while (item i of A's L < v)
			set i to i + 1
		end repeat
		repeat while (item j of A's L > v)
			set j to j - 1
		end repeat
		if (not i > j) then
			tell (a reference to A's L) to set {item i, item j} to {item j, item i} -- swap
			set {i, j} to {i + 1, j - 1}
		end if
	end repeat
	if (leftEnd < j) then AdamSort(A's L, leftEnd, j)
	if (rightEnd > i) then AdamSort(A's L, i, rightEnd)
	return A's L
end AdamSort
--  ========== bubble sort for when there are 10 or less items in list to sort, seems to be faster than AdamSort at small lists
on bubbleSwapSort(theList)
	script Bscript
		property alist : theList
	end script
	set TheCount to length of Bscript's alist
	if TheCount < 2 then return Bscript's alist
	set swaps to true
	repeat while swaps
		set swaps to false
		repeat with theIndex from 1 to TheCount - 1
			if item theIndex of Bscript's alist > item (theIndex + 1) of Bscript's alist then
				set temp to item theIndex of Bscript's alist
				set item theIndex of Bscript's alist to item (theIndex + 1) of Bscript's alist
				set item (theIndex + 1) of Bscript's alist to temp
				set swaps to true
			end if
		end repeat
		set TheCount to TheCount - 1
	end repeat
	return Bscript's alist
end bubbleSwapSort
--  ========== this will find the sender info in the gContactIDRecords
on FindSenderAndIncrementRecord(emailSender, emailSenderText)
	script G
		property GotRec : gContactIDRecords
	end script
	set theListCount to the length of G's GotRec
	repeat with listItem from 1 to theListCount
		if (TheSender of record listItem of G's GotRec) is equal to emailSenderText then
			set TheCount of record listItem of G's GotRec to ((TheCount of record listItem of G's GotRec) + 1)
			if listItem ≠ 1 then --  shifts the most encountered senders to the start of the list for faster processing, not into alpha order
				repeat with itemPos from listItem to 2 by -2
					set theRecCount to record (itemPos - 1) of G's GotRec
					if (TheCount of record itemPos of G's GotRec) > (TheCount of theRecCount) then
						if itemPos - 2 > 0 and (TheCount of record itemPos of G's GotRec) > (TheCount of record (itemPos - 2) of G's GotRec) then
							set theRecCount to record (itemPos - 2) of G's GotRec
						end if
						-- swaps records if latest record's count is higher than one before it
						tell (a reference to G's GotRec) to set {record itemPos of G's GotRec, theRecCount} to {theRecCount, record itemPos of G's GotRec}
						set listItem to (itemPos - 1)
					else
						exit repeat
					end if
				end repeat
			end if
			set gContactIDRecords to G's GotRec
			return listItem
		else -- no match somehow
			if listItem = theListCount then
				tell application "Microsoft Entourage" to set senderContact to find emailSender
				return senderContact
			end if
		end if
	end repeat
end FindSenderAndIncrementRecord
---==================== Log data to text file ==========================
on LogEntry(someText, timingAct)
	stop log
	--create text entry to log file, append to end of file to maintain long record
	set logFile to (gLogFolder as text) & gLogFileName
	set logRef to 0
	try
		set logRef to open for access file logFile with write permission
		set fileContents to (read logRef)
	end try
	ignoring application responses
		if logRef ≠ 0 then
			if someText is "Starting up ..." then write gUnixEOL starting at eof to logRef
			write FormatDateTime(current date) & ": " & someText & gUnixEOL starting at eof to logRef
			if timingAct is 1 then
				set gStartTime to current date
			else if timingAct is 2 then
				write "Time to complete: " & ((current date) - gStartTime) & gUnixEOL starting at eof to logRef
			end if
			close access logRef
		end if
	end ignoring
	start log
end LogEntry
---================== places date and time in format for time stamp ======
on FormatDateTime(theDate)
	set theDate to theDate as date
	set dd to text -2 thru -1 of ("0" & theDate's day)
	copy theDate to tempDate
	set the month of tempDate to January
	set mm to text -2 thru -1 of ("0" & 1 + (theDate - tempDate + 1314864) div 2629728)
	set yy to text -1 thru -4 of ((year of theDate) as text)
	set hh to time string of theDate
	return (yy & "/" & mm & "/" & dd & " " & hh as text)
end FormatDateTime



I look forward to any feedback or improvements.