Ideas to clean & make faster

I want to create a database of products to check for price drops etc. I have put something together that works but is painfully slow. The included script goes through the first 100 pages, there are 16,635 pages in total so any suggestions to speed things up will be greatly appreciated.

set the fieldNames to {"Date", "Description", "Model", "Internet", "Store Sku", "Store SO Sku", "Price"}

tell application "Database Events"
	set quit delay to 0
	if not (database "Home Depot" exists) then
		if not (exists database (POSIX path of (path to documents folder) & "Databases/Home Depot.dbev")) then
			make new database with properties {name:"Home Depot"}
			save result
		end if
	end if
	
	set AppleScript's text item delimiters to "Email Print "
	repeat with pageNum from 1 to 100
		set pageSource to do shell script "curl [url=http://www.homedepot.com/buy/?page=]http://www.homedepot.com/buy/?page="[/url] & pageNum & ""
		
		tell database "Home Depot"
			repeat with i from 2 to count of text items of pageSource
				set theItem to text item i of pageSource
				set charCount to count of characters of theItem
				set modelStart to (offset of "Model #" in theItem)
				set priceStart to (offset of "$" in theItem)
				set priceEnd to (offset of " / " in (text priceStart thru charCount of theItem))
				
				set this_record to make new record with properties {name:""}
				tell this_record
					make new field with properties {name:"Description", value:(text 1 thru (modelStart - 2) of theItem)}
					make new field with properties {name:"Price", value:(text (priceStart + 1) thru (priceEnd + priceStart - 2) of theItem)}
					make new field with properties {name:"Date", value:current date}
					
					if theItem contains "Internet #" then
						set internetStart to (offset of "Internet #" in theItem)
						make new field with properties {name:"Model", value:(text (modelStart + 8) thru (internetStart - 2) of theItem)}
					else if theItem contains "Store S" then
						set skuStart to (offset of "Store S" in theItem)
						make new field with properties {name:"Model", value:(text (modelStart + 8) thru (skuStart - 2) of theItem)}
					end if
					
					if theItem contains "Internet #" then
						set AppleScript's text item delimiters to "Internet # "
						make new field with properties {name:"Internet", value:((text 1 thru 9 of text item 2) of theItem)}
					end if
					
					if theItem contains "Store SKU #" then
						set AppleScript's text item delimiters to "Store SKU # "
						make new field with properties {name:"Store SKU", value:((text 1 thru 6 of text item 2) of theItem)}
					end if
					
					if theItem contains "Store SO SKU #" then
						set AppleScript's text item delimiters to "Store SO SKU # "
						make new field with properties {name:"Store SO SKU", value:((text 1 thru 6 of text item 2) of theItem)}
					end if
				end tell
				set AppleScript's text item delimiters to "Email Print "
			end repeat
		end tell
	end repeat
	save
	delete every database
end tell
beep 2

set AppleScript's text item delimiters to {""}

Browser: Firefox 9.0.1
Operating System: Mac OS X (10.7)

Thanks for posting this. I’ve been playing with it today as I’m not very familiar with Database Events. I think most of the running time is taken by the “curl” connections, so you’re really pretty much at their mercy.

I did my own version of the script, two runs of which took 211 seconds and 216 seconds respectively on my machine, as opposed to the 256 seconds and 240 seconds taken by yours. (I also know where my script saved the database!) It uses multiple text item delimiters, which are only heeded in Snow Leopard and later.

There’s occasionally an error caused by the remote server sending an HTML page which just says “Your browser didn’t send a complete request in time.” The script then errors because it assumes it’s the requested page and tries to parse it accordingly.

on main()
	set pageFetchTime to (current date)
	set pageSource to (do shell script "curl http://www.homedepot.com/buy/?page=[1-100]")
	
	set dbPath to (path to documents folder as text) & "Databases:Home Depot.dbev"
	
	tell application "Database Events"
		launch
		if not (database "Home Depot" exists) then
			try
				open file dbPath
			on error
				make new database with properties {name:"Home Depot"}
				save result in file dbPath
			end try
		end if
	end tell
	
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "Email Print "
	considering case
		script o
			property textItems : text items 2 thru -1 of pageSource
		end script
		
		repeat with i from 1 to (count o's textItems)
			-- Get only the part of the line containing the data we want.
			set AppleScript's text item delimiters to " / "
			set theItem to text 1 thru text item -3 of paragraph 1 of item i of o's textItems
			
			-- Everything before the (last) " Model # " in the line is the description.
			set AppleScript's text item delimiters to " Model # "
			set |description| to text 1 thru text item -2 of theItem
			set otherStuff to text item -1 of theItem
			
			-- The price is whatever comes after the last " $" in whatever we have left.
			-- (If running on a machine set up to recognise "." decimal points and "," thousands separators, the price could be coerced to number here.)
			set AppleScript's text item delimiters to " $"
			set price to text item -1 of otherStuff
			set otherStuff to text 1 thru text item -2 of otherStuff
			
			-- The model number comes before the first " Internet # " or " Store S" (if either's present) .
			set AppleScript's text item delimiters to {" Internet # ", " Store S"} -- Requires Snow Leopard or later.
			set model to text item 1 of otherStuff
			
			tell application "Database Events"
				tell database "Home Depot"
					set this_record to make new record with properties {name:""}
					tell this_record
						make new field with properties {name:"Description", value:description}
						make new field with properties {name:"Model", value:model}
						make new field with properties {name:"Price", value:price}
						make new field with properties {name:"Date", value:pageFetchTime}
						
						-- Internet, Store SKU, and Store SO SKU numbers are blocks of digits and will be the first 'word' after their respective labels.
						if (otherStuff contains "Internet #") then
							set AppleScript's text item delimiters to "Internet # "
							make new field with properties {name:"Internet", value:word 1 of text item 2 of otherStuff}
						end if
						
						if (otherStuff contains "Store SKU #") then
							set AppleScript's text item delimiters to "Store SKU # "
							make new field with properties {name:"Store SKU", value:word 1 of text item 2 of otherStuff}
						end if
						
						if (otherStuff contains "Store SO SKU #") then
							set AppleScript's text item delimiters to "Store SO SKU # "
							make new field with properties {name:"Store SO SKU", value:word 1 of text item 2 of otherStuff}
						end if
					end tell
				end tell
			end tell
		end repeat
	end considering
	set AppleScript's text item delimiters to astid
	
	tell application "Database Events"
		save database "Home Depot"
		delete every database
		quit
	end tell
	beep 2
end main

main()

Edit 1: A couple of small but important delimiter corrections!
Edit 2: Rerunning with the above corrections revealed an item on one of the Web pages with only five digits in its “Store SO SKU” number. (The script was assuming 6.) I’ve now changed ‘text 1 thru 6 of text item 2’ to ‘word 1 of text item 2’ to allow some flexibility and have done the same for “Internet” and “Store SKU” numbers.
Edit 3: A few of the product descriptions contain " /", which is also used as a delimiter to mark the end of the price. The delimiter must therefore only be used after the description has been extracted.
Edit 4: Incorporated DJ’s suggestion below, using a single shell script to get all the Web data at once.
Edit 5: Cured all known and foreseen issues and added a few comments to clarify the parsing logic.
Edit 6: Cured overlooked foreseeable issue about what happens when " Internet # " or “Store S” don’t appear in the data.

Thanks Nigel !

Now that you are familiar with the script, tell me if you have any ideas for the query. I am trying to catch price drops across different dates. Conceptually, it should find all records with duplicate model #'s and return an alert/value if the most recent price is lower than the last price. Stefan suggested multiple lists with a loop, I am considering using sqlite3.

I’ve made a couple of corrections in my version of the script and further modifications in view of a circumstance brought to light by the corrections!

Just some additional info…

I run simple the command:

In Terminal and i took about 30 seconds to fetch all data. My connection has a ping less than 8ms to the server and running over glass (1gbit up and down Internet connection). So even when you have a proper connection, 16,635 pages will still take hours to complete. Also the site has way to much overhead for your needs, look if the site has an API to use. Most webshops do, sometimes on another URL.

Thanks DJ! That cuts down the running time on my machine to about 192 seconds. I’ve incorporated it into my script above. I’ll leave adayzdone to worry about site APIs. :wink:

Thanks again for all of the time, I learn something new with every post.

Nigel, I didn’t use TID for the price field because the " /" character appears in some of the descriptions.

tell application "Database Events"
	open database "~/Documents/Databases/Home Depot.dbev"
	tell database "Home Depot"
		value of field "Price" of record 124
	end tell
end tell

Yeah. That was the reason for my third edit. It’s not over yet either. A couple of the descriptions have "Model # " in them too. :expressionless: But I’m working on it.

I think I’ve sorted out the remaining issues with my version (post #2 above) now.

See in your other thread: http://macscripter.net/viewtopic.php?pid=147857#p147857

Although I knew that SL & L supported multiple TIDs, I’d never used them. Thanks, Nigel, for illustrating that. I can simplify some of my old scripts.

Wow Nigel, Thanks ! Incorporating your changes, I am working on a SQLite version of the script. I will post when finished. It will probably serve as a good comparison between the two approaches.

Well I’m afraid I’ve had to make another adjustment to my parsing logic. :expressionless: I’ve also been struggling today with spinning beach balls and some fields (“Model” and “Price”) not showing up in some records after two or more runs of the script. I suspect it’s a problem with Database Events rather than with the script.

I’d love to see your SQLite version when you’ve done it. It’ll no doubt be much faster than Database Events ” and possibly less flaky.

Here is a version that works with sqlite3. I think most of the kinks are out of it. The column view in terminal makes them much easier to find.

set loc to space & "~/Documents/Databases/homedepot.db" & space
set head to "sqlite3" & loc & quote
set tail to quote
set newTable to "create table depot" & "(id integer primary key, Date, Description, Model, Internet, 'Store Sku', 'Store SO Sku', Price); "

tell application "Finder"
	if not (exists (path to documents folder as text) & "Databases:homedepot.db") then
		do shell script head & newTable & tail
	end if
end tell

set pageSource to (do shell script "curl http://www.homedepot.com/buy/?page=[235-237]")
set AppleScript's text item delimiters to "Email Print "
considering case
	script o
		property textItems : text items 2 thru -1 of pageSource
	end script
	
	repeat with i from 1 to (count o's textItems)
		set theRecord to {"Date, Description, Model, Price"}
		set recordData to {"datetime()"}
		
		-- Get only the line containing the data we want.
		set theItem to paragraph 1 of item i of o's textItems
		
		-- Everything before the (last) " Model # " in the line is the description.
		set AppleScript's text item delimiters to " Model # "
		set theDescription to (text 1 thru text item -2 of theItem)
		set otherStuff to text item -1 of theItem
		-- ( ' needs to be replaced with '' ) and ( " needs to be replaced with .in ) for shell script
		if theDescription contains "'" then
			set AppleScript's text item delimiters to {"'"}
			set theseItems to the text items of theDescription
			set AppleScript's text item delimiters to "''"
			set theDescription to theseItems as string
		end if
		if theDescription contains "\"" then
			set AppleScript's text item delimiters to {"\""}
			set theseItems to the text items of theDescription
			set AppleScript's text item delimiters to " in."
			set theDescription to theseItems as string
		end if
		set end of recordData to "'" & (theDescription) & "'" -- Description
		
		-- Between that and the first " Internet # " or " Store S" (if either's present) is the model number.
		set AppleScript's text item delimiters to {" Internet # ", " Store S", "\""} -- Requires Snow Leopard or later.
		set theModel to (text item 1 of otherStuff)
		if theModel contains "'" then
			set AppleScript's text item delimiters to {"'"}
			set theseItems to the text items of theModel
			set AppleScript's text item delimiters to "''"
			set theModel to theseItems as string
		end if
		set end of recordData to "'" & (theModel) & "'" -- Model
		
		-- The price is whatever comes after the first "$" in what we have left.
		set AppleScript's text item delimiters to {" $", " /"} -- Requires Snow Leopard or later.
		set end of recordData to "'" & (text item 2 of otherStuff) & "'" -- Price
		
		-- Internet, Store SKU, and Store SO SKU numbers are blocks of digits and will be the first 'word' after their respective labels.
		if (otherStuff contains "Internet #") then
			set AppleScript's text item delimiters to "Internet # "
			set end of theRecord to "Internet"
			set end of recordData to "'" & (word 1 of text item 2 of otherStuff) & "'" -- Internet
		end if
		
		if (otherStuff contains "Store SKU #") then
			set AppleScript's text item delimiters to "Store SKU # "
			set end of theRecord to "'Store Sku'"
			set end of recordData to "'" & (word 1 of text item 2 of otherStuff) & "'"
		end if
		
		if (otherStuff contains "Store SO SKU #") then
			set AppleScript's text item delimiters to "Store SO SKU # "
			set end of theRecord to "'Store SO Sku'"
			set end of recordData to "'" & (word 1 of text item 2 of otherStuff) & "'"
		end if
		
		set AppleScript's text item delimiters to {", "}
		set recordData to recordData as string
		do shell script head & "insert into depot ( " & theRecord & ") values( " & (recordData) & "); " & tail
		
	end repeat
	beep
end considering
set AppleScript's text item delimiters to {""}

Terminal Commands:
sqlite3 ~/Documents/Databases/homedepot.db
.mode column
.headers on
.width 5 20 50 15 10 9 13 10
select * from depot;

Edit 1 : Added a “"” TID to Model record because of an error on HD page 236 (Model # MK- 404CR 4")
Edit 2 : Replaced else if with another if statement in the Description because of an error on HD page 496. (Fabral 2-1/2" UTILITY GA CORRUGATED 2’X 8’)

Still working on parsing but, here is a version that runs much faster as it batches the shell command and loops to next set of pages. CAUTION : This will run until it produces an error !

set batch to 50 -- How many pages will be downloaded at once ?

set loc to space & "~/Documents/Databases/homedepot.db" & space
set head to "sqlite3" & loc & quote
set tail to quote
set newTable to "create table depot" & "(id integer primary key, Date, Description, Model, Internet, SKU, SO_SKU, Price); "

tell application "Finder"
	if not (exists (path to documents folder as text) & "Databases:homedepot.db") then
		do shell script head & newTable & tail
	end if
end tell

-- Find total number of pages
set pageSource to (do shell script "curl [url=http://www.homedepot.com/buy/?page=1)]http://www.homedepot.com/buy/?page=1")[/url]
set AppleScript's text item delimiters to {"<a href=\"?page="}
set totalPages to first word of (text item -1 of pageSource)


repeat with startPage from 1 to totalPages by batch
	if not totalPages - startPage is less than batch then
		set pageSource to (do shell script "curl http://www.homedepot.com/buy/?page=[" & startPage & "-" & startPage + batch - 1 & "]")
	else
		set pageSource to (do shell script "curl http://www.homedepot.com/buy/?page=[" & startPage & "-" & totalPages & "]")
	end if
	
	set AppleScript's text item delimiters to "Email Print "
	considering case
		script o
			property textItems : text items 2 thru -1 of pageSource
			property batchList : {}
		end script
		
		repeat with i from 1 to (count o's textItems)
			set theRecord to {"Date, Description, Model, Price"}
			set recordData to {"datetime()"}
			
			-- Get only the line containing the data we want.
			set theItem to paragraph 1 of item i of o's textItems
			
			-- Everything before the (last) " Model # " in the line is the description.
			set AppleScript's text item delimiters to " Model # "
			set theDescription to (text 1 thru text item -2 of theItem)
			set otherStuff to text item -1 of theItem
			
			-- ( ' needs to be replaced with '' ) and ( " needs to be replaced with .in ) for shell script
			if theDescription contains "'" then
				set AppleScript's text item delimiters to {"'"}
				set theseItems to the text items of theDescription
				set AppleScript's text item delimiters to "''"
				set theDescription to theseItems as string
			end if
			if theDescription contains "\"" then
				set AppleScript's text item delimiters to {"\""}
				set theseItems to the text items of theDescription
				set AppleScript's text item delimiters to " in."
				set theDescription to theseItems as string
			end if
			if theDescription contains "`" then
				set AppleScript's text item delimiters to {"`"}
				set these_items to the text items of theDescription
				set AppleScript's text item delimiters to ""
				set theDescription to these_items as string
			end if
			set end of recordData to "'" & (theDescription) & "'" -- Description
			
			-- Between that and the first " Internet # " or " Store S" (if either's present) is the model number.
			set AppleScript's text item delimiters to {" Internet # ", " Store S", "\"", " $"} -- Requires Snow Leopard or later.
			set theModel to (text item 1 of otherStuff)
			if theModel contains "'" then
				set AppleScript's text item delimiters to {"'"}
				set theseItems to the text items of theModel
				set AppleScript's text item delimiters to "''"
				set theModel to theseItems as string
			end if
			set end of recordData to "'" & (theModel) & "'" -- Model
			
			-- The price is whatever comes after the first "$" in what we have left.
			set AppleScript's text item delimiters to {" $", " /"} -- Requires Snow Leopard or later.
			try
				set end of recordData to "'" & (text item 2 of otherStuff) & "'" -- Price
			on error
				set end of recordData to "'price error'" -- PRICE ERROR!
			end try
			
			-- Internet, Store SKU, and Store SO SKU numbers are blocks of digits and will be the first 'word' after their respective labels.
			if (otherStuff contains "Internet #") then
				set AppleScript's text item delimiters to "Internet # "
				set end of theRecord to "Internet"
				set end of recordData to "'" & (word 1 of text item 2 of otherStuff) & "'" -- Internet
			end if
			
			if (otherStuff contains "Store SKU #") then
				set AppleScript's text item delimiters to "Store SKU # "
				set end of theRecord to "SKU"
				set end of recordData to "'" & (word 1 of text item 2 of otherStuff) & "'" -- Store Sku
			end if
			
			if (otherStuff contains "Store SO SKU #") then
				set AppleScript's text item delimiters to "Store SO SKU # "
				set end of theRecord to "SO_SKU"
				set end of recordData to "'" & (word 1 of text item 2 of otherStuff) & "'" -- Store SO Sku
			end if
			
			-- Build the batch
			set AppleScript's text item delimiters to {", "}
			set recordData to recordData as string
			set end of o's batchList to "insert into depot (" & theRecord & ") values(" & (recordData) & ");"
			
		end repeat
		set AppleScript's text item delimiters to {" "}
		set o's batchList to o's batchList as string
		set AppleScript's text item delimiters to {""}
		do shell script head & o's batchList & space & tail -- Enter the batch into the database
	end considering
end repeat
set AppleScript's text item delimiters to {""}
beep 2

Edit : Removed “theseItems” from the Enter the batch section, thanks Nigel.
Added another if statement to theDescription to remove ` character
Added try statement to catch price errors (and debug later)

Edit 2: Changed the names of the tables to avoid escape problems for double quotes during query
Added " $" to line {" Internet # “, " Store S”, “"”, " $“} – Requires Snow Leopard or later.
This corrected parsing error that occured when neither " Internet # " nor " Store S” were present in otherStuff

You seem to be flying now. :slight_smile:

I couldn’t understand sqlite3’s “man” pages at all, but looking at your scripts, it’s obviously very easy to use.

It’s annoying having to special-case single quotes in the description. It should be possible to use AppleScript’s ‘quoted form’ to quote quotes appropriately for shell scripts, but sqlite3 doesn’t appear to understand the results. I spent ages yesterday trying to solve that! Fortunately, your extra code takes very little time to execute, even hundreds of times.

Using a 100-page text downloaded from the site the other day, and timing only how long it takes the scripts to process it and create a database from scratch, I find that your first sqlite3 script and the one I produced yesterday based on my latest parsing régime both take about 86 seconds and produce files of identical sizes. The Database Events script is slightly faster at 80 seconds, but produces a file that’s nearly four times as large as the other two. Furthermore, as noted in my last post, fields tend to go missing when more records are added to it on subsequent runs. Database Events doesn’t see any records at all in the databases created by the sqlite3 scripts, although the contents of all three files begin with “SQLite format 3”.

The “batch” idea did occur to me, but I ran out of time to pursue it and you were obviously way ahead of me anyway! I’ll play with your batch script today.

Later: Just noticed this:

  1. o’s batchList is a list, so it has ‘items’, not ‘text items’.
  2. set theseItems to (text) items of o’s batchList needlessly creates another list containing the same items. set theseItems to o’s batchList would set theseItems to the same list, or you could set o’s batchList to o’s batchList as string a couple of lines later.

Hey Nigel,
Although I have never used it before, sqlite3 seems like it will be easier to use. Your query in the other thread was brilliant, I just hope there are easier ways to manipulate the data than that.
I started with this post http://www.macscripter.net/viewtopic.php?id=24744 and I just bought “The Definitive Guide to SQLite”.

Good catch on theseItems, late at night I revert to trial and error.

Ah. Of course! Adam’s article. I read it when it came out, but wasn’t ready to think about databases at the time. I can see now where some of the techniques in your scripts came from. :wink:

Here’s my version of the batch script. It’s similar to yours, but uses some text manipulation appended to the page-fetch shell script to pick out the lines containing “Email Print”, keep only the bits after "Email Print ", and substitute “‘’” (two single quotes) for “'” (one). Another difference is that instead of using double quotes in the “sqlite3” commands, it uses the ‘quoted form’ of those bits. This leaves any double quotes in descriptions free to be kept as such within the single-quoted text.

If it weren’t for my page-fetch shell script reducing the page contents as they come in, I’d parse the first page separately for its stock entries instead of fetching it again after getting the number of pages from it.

on main()
	-- How many pages will be downloaded at once? (Not too many. There's a limit to the length of text that can be used in a shell script. )
	set batch to 50
	
	set sqlite3Head to "sqlite3 " & quoted form of POSIX path of ((path to documents folder as text) & "Databases:Home Depot.db") & space
	do shell script (sqlite3Head & "\"create table if not exists depot(id integer primary key, Date, Description, Model, Internet, 'Store Sku', 'Store SO Sku', Price); \"")
	
	set astid to AppleScript's text item delimiters
	set pageSource to (do shell script "curl [url=http://www.homedepot.com/buy/?page=1)]http://www.homedepot.com/buy/?page=1")[/url]
	set AppleScript's text item delimiters to {"<a href=\"?page="}
	set totalPages to (first word of text item -1 of pageSource) as integer
	-- set totalPages to 100 -- Enable this to reduce the number of pages for testing.
	
	script o
		property lineList : missing value
		property batchList : missing value
	end script
	
	repeat with startPage from 1 to totalPages by batch
		set endPage to startPage + batch - 1
		if (endPage comes after totalPages) then set endPage to totalPages
		set pageRange to "[" & startPage & ("-" & endPage & "]")
		set editedText to (do shell script "curl [url=http://www.homedepot.com/buy/?page=]http://www.homedepot.com/buy/?page="[/url] & pageRange & " | grep -o 'Email Print .*' | sed -e s/'Email Print '/''/g -e s/\"'\"/\"''\"/g")
		
		set o's lineList to editedText's paragraphs
		set o's batchList to {}
		
		considering case
			repeat with i from 1 to (count o's lineList)
				set theRecord to {"Date, Description, Model, Price"}
				set recordData to {}
				
				-- Get only the part of the line we want.
				set AppleScript's text item delimiters to " / "
				set theItem to text 1 thru text item -3 of item i of o's lineList
				
				-- Everything before the (last) " Model # " in the line is the description.
				set AppleScript's text item delimiters to " Model # "
				set theDescription to text 1 thru text item -2 of theItem
				set otherStuff to text item -1 of theItem
				
				-- The price is whatever comes after the last " $" in whatever we have left.
				set AppleScript's text item delimiters to " $"
				set thePrice to text item -1 of otherStuff
				set otherStuff to text 1 thru text item -2 of otherStuff
				
				-- The model number comes before the first " Internet # " or " Store S" (if either's present) .
				set AppleScript's text item delimiters to {" Internet # ", " Store S"} -- Requires Snow Leopard or later.
				set theModel to text item 1 of otherStuff
				
				set end of recordData to theDescription
				set end of recordData to theModel
				set end of recordData to thePrice
				
				-- Internet, Store SKU, and Store SO SKU numbers are blocks of digits and will be the first 'word' after their respective labels.
				if (otherStuff contains "Internet #") then
					set AppleScript's text item delimiters to "Internet # "
					set end of theRecord to "Internet"
					set end of recordData to word 1 of text item 2 of otherStuff
				end if
				
				if (otherStuff contains "Store SKU #") then
					set AppleScript's text item delimiters to "Store SKU # "
					set end of theRecord to "'Store SKU'"
					set end of recordData to word 1 of text item 2 of otherStuff
				end if
				
				if (otherStuff contains "Store SO SKU #") then
					set AppleScript's text item delimiters to "Store SO SKU # "
					set end of theRecord to "'Store SO SKU'"
					set end of recordData to word 1 of text item 2 of otherStuff
				end if
				
				set AppleScript's text item delimiters to ", "
				set theRecord to theRecord as text
				set AppleScript's text item delimiters to "', '"
				set end of o's batchList to "insert into depot ( " & theRecord & ") values( datetime(), '" & recordData & "'); "
			end repeat
		end considering
		
		set AppleScript's text item delimiters to ""
		do shell script (sqlite3Head & quoted form of (o's batchList as text))
	end repeat
	
	set AppleScript's text item delimiters to astid
	beep 2
end main

main()

Here’s an attempt at a version of that for sqlite3. I’m afraid my knowledge of text manipulation by shell script only goes so far.

  1. Get sqlite3 to return everything from the database. The default return format is one record per line, the fields in each line delimited with bar characters. The fields are in the same left-to-right order used when they were originally specified in the creation of the table. Field 2 is the date in ISO format; field 3 is the description; field 4 is the model number.
  2. Sort the records on the date fields. However, since the records seem to be returned in creation order (ie. date order), this step is probably unnecessary.
  3. Since at least “Eden Arbors Vienna Arbor” and “Eden Arbors The London Arbor” have the same model number, it’s necessary to take the description into account as well as the model. To this end, perform a stable sort on the description fields. This groups records with equal descriptions without changing their order with respect to each other (ie. sorted on date).
  4. Perform another stable sort on the model fields. Thiss group records with equal model numbers without changing their order with respect to each other (sorted on description, sub sorted on date).
  5. Using vanilla AppleScript, go through a list of the sorted lines. At the end of each run of a particular model number or description, compare the last two prices in the run. If the last one’s less than the one before it, append the relevant details to a results list.
  6. Return the results list.
on main()
	script o
		-- A list of the lines returned by sqlite3 for all the records in the database. Here they're assumed to have come off in date order, so they're just stable-sorted on field 3 (descriptions) and on field 4 (model numbers).
		property recordLines : paragraphs of (do shell script ("sqlite3 " & quoted form of POSIX path of ((path to documents folder as text) & "Databases:Home Depot.db") & " 'select * from depot; ' | sort -st '|' -k 3  | sort -st '|' -k 4"))
		property priceDrops : {}
	end script
	
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "|"
	set i to 1
	set currentDescription to text item 3 of beginning of o's recordLines
	set currentModel to text item 4 of beginning of o's recordLines
	repeat with j from 2 to (count o's recordLines)
		set thisDescription to text item 3 of item j of o's recordLines
		set thisModel to text item 4 of item j of o's recordLines
		if (not ((thisModel is currentModel) and (thisDescription is currentDescription))) then
			-- A different model number or description. Compare the last two prices (if more than one) for the previous model and insert details of any price drop into o's priceDrops.
			if (j - i > 1) then checkPrices(j - 1, o)
			-- Reset for a run of the model number just reached.
			set i to j
			set currentDescription to thisDescription
			set currentModel to thisModel
		end if
	end repeat
	if (j > i) then checkPrices(j, o) -- Check the run in progress at the end of the list.
	set AppleScript's text item delimiters to astid
	
	return o's priceDrops
end main

-- Compare the price in the indexed list in o's recordLines with that in the list before. If it's less, append relevant details to o's priceDrop list. 
on checkPrices(j, o)
	set latestDetails to item j of o's recordLines
	set previousDetails to item (j - 1) of o's recordLines
	set latestPrice to text item -1 of latestDetails
	set previousPrice to text item -1 of previousDetails
	considering numeric strings
		if (latestPrice < previousPrice) then set end of o's priceDrops to {|description|:text item 3 of previousDetails, model:text item 4 of previousDetails, |previous date|:text item 2 of previousDetails, |previous price|:previousPrice, |latest date|:text item 2 of latestDetails, |latest price|:latestPrice}
	end considering
end checkPrices

main()

As always, awesome. Do you have any suggested reading on grep, sed and reg ex?

I gleaned the “grep” and “sed” stuff from their “man” pages. I started working through this Regex tutorial a few months ago before getting sidetracked. It’s brain-mangling, but goes to some length to explain how things work and seems good.

Actually, the only regular expression in the above shell script is the “.*”, which means “any number of instances of any character(s).” Or in context, “'Email Print ’ and everything after it.” The sed command specifies two search-and-replace operations (s///g). The first replaces each instance of "Email Print " with an empty string (ie. deletes it) and the second replaces each instance of a single quote with two single quotes, the single quotes here being presented between escaped double quotes.

Do you want to keep a history of all prices or do you only want the last price and the previous one?

If you want to read about sed, try http://www.grymoire.com/Unix/Sed.html.