Any ideas on how to extract some information from this html source

so I am working on a way to automatically place button locations for DVD authoring. By using the slice tool in photoshop the artist can generate an html file like below. I need to be able to extract the left, top, width, and height info for each button and place them into lists. Any ideas on how I can use apple script to parse this information?

thanks!!

ryan

button positions

Try something like this:

choose file with prompt "Choose HTML file generated by Photoshop:" without invisibles
set sourceFile to POSIX path of result

set leftList to grepCSS(sourceFile, "left")
set topList to grepCSS(sourceFile, "top")
set widthList to grepCSS(sourceFile, "width")
set heightList to grepCSS(sourceFile, "height")
set buttonList to {}

repeat with i from 1 to (count leftList)
	set end of buttonList to {(item i of leftList) as integer, (item i of topList) as integer, (item i of widthList) as integer, (item i of heightList) as integer}
end repeat

-- Check the result in Script Editor
buttonList

on grepCSS(sourceFile, attribute)
	do shell script "/usr/bin/grep --only-matching '^[[:space:]]\\+" & attribute & ":[0-9]\\+' " & quoted form of sourceFile & " | cut -d : -f 2"
	return paragraphs of result
end grepCSS

Bruce, you are a genius and a gentleman.

thank you so much for help

you have inspired me to better understand shell scripting and regular expressions.

thank you again!

-ryan

You’re welcome!

Alternatively:

choose file with prompt "Choose HTML file generated by Photoshop:" without invisibles
set sourceFile to quoted form of POSIX path of result

set attributeList to {"left", "top", "width", "height"}
set attributeCount to count attributeList
set styleList to {}
set finalList to {}

repeat with thisItem in attributeList
	do shell script "/usr/bin/grep --only-matching '^[[:space:]]*" & thisItem & ":[0-9]\\+' " & sourceFile & " | cut -d : -f 2"
	set end of styleList to paragraphs of result
end repeat

repeat with i from 1 to (count first item of styleList)
	set intermediateList to {}
	
	repeat with n from 1 to attributeCount
		set end of intermediateList to (item i of item n of styleList) as integer
	end repeat
	
	set end of finalList to intermediateList
end repeat

-- Check the result in Script Editor
finalList

I have a couple of questions about shell scripting, starting with

where can I learn more?

and currently my DVD authoring application generates a text file and I am using Word via applescript to do so. Is there a way using shell script or applescript to generate a text file without the dependency on a separate application like word?

Also, where can I learn more about regular expressions so that I may write some similar scripts that extract information?

thanks!

-ryan

Hi Ryan,

it’s not far away.
There are many useful tutorials at MacScripter.net/unscripted

Ryan,
Applescript can read and write plain text files independent of other applications. In Script Editor, look under the File menu and select Open Dictionary and open the file called Standard Additions. In the File Commands section you will see how to use the open, read, write, and close commands.

If you are familiar with the Terminal at all, you can type “man grep” and there is a nice discussion in the manual page for grep about regular expressions and how to construct them.

Alternatively, if you are not comfortable with the Terminal, I’ve written a program called “Man Handler” that will let you fetch and read man pages in an OS X application (written in AS Studio, of course!). :slight_smile:

thank you!!