Converting QuarkXPress files to CSS (Part 1)

The emergence of new technologies are opening up great opportunities for AppleScript developers. I personally love coming up with ways to mix these technologies. Any programming language or markup language that does not require the final code to be compiled is a good candidate for using AppleScript to pull information from various sources and dynamically generate code in another language.

The next couple of articles will focus on using AppleScript to convert a QuarkXPress document to HTML using CSS (Cascading Style Sheets). This topic will span several articles because the process will involve several subroutines for converting the QuarkXPress properties of picture boxes, line boxes and text boxes to CSS, then putting all the pieces together. So that you will have a working script as soon as possible, I will work in broad strokes, then refine the process with each successive step. I will try to give more in depth explanations where I think it is necessary or will be helpful. I will assume some things are already known to you so I may glance over them. Mostly this will be tasks involving scripting the finder.

I will concentrate first on converting the page items to CSS and building an HTML file to display the results. We will build the form of the page first, then tackle including the content. While it will not be necessary to know CSS for this exercise, it will be helpful. Not to mention the fact that CSS is one of the greatest developments for the internet since Al Gore invented it. If you are unfamiliar with CSS, you can learn the basics by visiting www.w3schools.com/css/ or doing a search on Eric Meyer, arguably the top CSS expert.

Some of the things we will cover in this series will include converting RGB colors to HEX notation, sorting lists of lists, converting QuarkXPress’s page item properties to more web friendly equivalents, and a great routine (even if I do say so myself) for rendering multi-columned text boxes in CSS.

As I mentioned, this series will include a little something here for just about everyone. In addition to the routines above, I will also throw in a couple of subroutines for converting colors from RGB to CMYK, and CMYK to RGB. Also, this application includes a script for Adobe Photoshop that will trim an image to fit its picture box in QuarkXPress and save the image for use in our final HTML recreation of the QuarkXPress file. Since the details of scripting Photoshop is outside of the scope of this article, I will only include the code for this but not explain it.

I must mention a couple of things up front. The first thing you will notice is that I’m using QuarkXPress 5 for this demo. This is because necessity has not yet dictated that I do so. Even though I’ve written this application using QuarkXPress 5, you should be able to easily modify the appropriate script parts for QuarkXPress 6. Second, this demonstration will only focus on converting a one page document to CSS. With the basics you’ll gain from this series, however, you should be able to easily modify the script to convert an entire document. Third, this article will be about AppleScript and only AppleScript. If you are unfamiliar with CSS you can learn the basics by consulting the resources mentioned above. Fourth, there very well may be bugs in this script at it has not been fully tested as of the this writing. Fifth, this script does not allow for shades (tints) of colors, rotated boxes or diagonal lines. In the case of shades of color and rotated boxes, these are possible but somewhat complicated. Diagonal lines are not possible (as far as I know) with CSS.

What you will need:
1. QuarkXPress of course
2. A QuarkXPress document that uses text boxes, lines (horizontal or vertical only) and picture boxes.
Also, include at least one picture box with no image, filled with a color of your choice. Don’t use
shades of colors. This exercise does not include any way to handle shades of colors.
3. A folder tree that looks like this: (preferably on your desktop)
qxpToCSS
|_ css
|_ images

I’ve included a link to this script just in case I left anything out in putting the article together. You can get it here Download the script

First, we will create a script shell for the application. This shell will handle returning the path of the document dropped on it and make sure it is a QuarkXPress file. If the file is not a QuarkXPress file, the script will alert us of this fact and quit.

Script Shell


on open thisDoc
	set idxList to {}
	if class of thisDoc = list then
		set thisDoc to item 1 of thisDoc
	end if
	set thisDoc to thisDoc as string
	
	tell application "Finder"
		if name extension of file thisDoc is not "qxd" then
			activate
			display dialog "Doh! QuarkXPress files only please." buttons {"ok"} default button 1 with icon 0
			return
		end if
	end tell

	-- our subroutines will go here
end open

Next, it will be helpful to define the following properties. These will be set once and not changed during the execution of the script.

Properties


property newScale : 0.5
property basePX : 72
property cssLineBreak : ((return & tab & tab) as string)
property sitePath : "Macintosh HD:users:yourusername:desktop:qxpToCss:"
property libPath : ((sitePath & "css:") as string)
property imgFolderPath : ((sitePath & "images:") as string)
property templatePath : ((sitePath & "templates:") as string)
property hexList : {1, 2, 3, 4, 5, 6, 7, 8, 9, "A", "B", "C", "D", "E", "F"}
property leftOffset : 50
property topOffset : 50

global idxList

Next, we will need to create a simple subroutine for creating the final HTML. I’m starting with this first because I find it helps me move through the development of an application logically if I know what the desired output should look like.

Our parameters for the subroutine have yet to be defined but we will come to these in due course. For right now we’ll just create a subroutine that asks us for the name of our CSS style sheet and the content of the body of our HTML file. The subroutine will return the full HTML.

Subroutine to create HTML:


on makeHTML(theStyleSheet, theBody)
	set theHTML to (("" & return ¬
		& tab & "" & return ¬
		& tab & tab & ¬
		"" & return ¬
		& tab & tab & tab & ¬
		"< l ink rel=\"stylesheet\" type=\"text/css\" href=\"libs/" & ¬
		theStyleSheet & "\" title=\"Default\" />" & return & tab & ¬
		"" & return & tab & ¬
		"" & return & tab & tab & theBody & return & tab & "" & return & "") as string)
	return theHTML
end makeHTML

And while we’re at it, go ahead and include the standard routine for writing data to a text file.

Subroutine to Write files:


on write_to_file(this_data, target_file, append_data)
	tell application "Finder"
		try
			try
				close access file target_file
			end try
			set the open_target_file to ¬
				open for access file target_file with write permission
			if append_data is false then ¬
				set eof of the open_target_file to 0
			write this_data to the open_target_file starting at eof
			close access the open_target_file
			return true
		on error
			try
				close access file target_file
			end try
			return false
		end try
	end tell
end write_to_file

Next we’ll create some very simple subroutines for opening our QuarkXPress file, closing our file, and retrieving the document size of our file. The first two are simply cosmetic so that the body of our final script isn’t cluttered. The third docSize() is important because everything we do with the CSS will be relative to the size and position of our document.

Some basic routines:


on openDoc(doc)
  tell application "QuarkXPress Passport"
    open file doc
  end tell
end openDoc

on closeDoc()
  tell application "QuarkXPress Passport"
    close document 1 saving no
  end tell
end closeDoc

on docSize()
  tell application "QuarkXPress Passport"
    set pW to page width of document 1
    set pW to coerce pW to real
    set pH to page height of document 1
    set pH to coerce pH to real
    return {pW, pH}
  end tell
end docSize

Now unless you have a very large (and expensive) monitor, most likely you will not be able to view an 8.5" X 11" document at full size, so I’ve added a simple routine that reduces all of the measurements from my QuarkXPress file by some scale (set in the property “newScale” above). In the routine below, theMeasurement is the measurement to be scaled and theScale, is the decimal equivalent of the scale factor (ie., 50% = 0.5).

Routine to scale output:


on scaleIt(theMeasurement, theScale)
	return (round ((theMeasurement * theScale) * 72) rounding to nearest) as integer
end scaleIt

Okay, now we’re ready to write the first part of the main script. This part won’t do a whole lot. It will simply take the name of our QuarkXPress file and create a “.css” equivalent for the name of our CSS file, and a “.html” equivalent for the name of our HTML file. Then, it opens the file via the openDoc() routine.

So, let’s modify our script shell from above to look like this:

Get names for output files:


on open thisDoc
	set idxList to {}
	if class of thisDoc = list then
		set thisDoc to item 1 of thisDoc
	end if
	set thisDoc to thisDoc as string
	
	tell application "Finder"
		if name extension of file thisDoc is not "qxd" then
			activate
			display dialog "Doh! QuarkXPress files only please." buttons {"ok"} default button 1 with icon 0
			return
		end if
	end tell
	
	-- Insert the new part here
	
	tell application "Finder"
		set templateName to name of file thisDoc
		if templateName contains ".qxd" then
			set templateName to ((characters 1 thru ((length of templateName) - 4) of templateName) as string)
		end if
		set templateCSSName to ((templateName & ".css") as string)
		set htmlFileName to ((templateName & ".html") as string)
	end tell
	
	openDoc(thisDoc)
	
	-- End the new part
	
	-- our subroutines will go here
	
end open

Like I said, this doesn’t do anything very exciting, but it’s necessary. The next step, however, is where we’ll get into the meat of the application. We will use our docSize() and scaleIt() routines. These will highlight one of the key features of this application. The docSize() routine, if you will notice, sets the measurement units to inches. This is so that we are certain we are working with the right conversions. I chose inches because I know there are 72 pixels per inch. The scaleIt() routine will take our document size, convert it to pixels (72dpi), then scale it by the factor we indicated in the “newScale” property at the top of the script.

So once more, modify the main body of the script to look like this:

Get the document size:


on open thisDoc
	set idxList to {}
	if class of thisDoc = list then
		set thisDoc to item 1 of thisDoc
	end if
	set thisDoc to thisDoc as string
	
	tell application "Finder"
		if name extension of file thisDoc is not "qxd" then
			activate
			display dialog "Doh! QuarkXPress files only please." buttons {"ok"} default button 1 with icon 0
			return
		end if
	end tell
	
	-- Insert the new part here
	
	tell application "Finder"
		set templateName to name of file thisDoc
		if templateName contains ".qxd" then
			set templateName to ((characters 1 thru ((length of templateName) - 4) of templateName) as string)
		end if
		set templateCSSName to ((templateName & ".css") as string)
		set htmlFileName to ((templateName & ".html") as string)
	end tell
	
	openDoc(thisDoc)
	
	-- End the new part
	
	set {pW, pH} to docSize()
	set pW to scaleIt(pW, newScale)
	set pH to scaleIt(pH, newScale)
	
	set boxProperties to sortPBoxes(1)
	
end open

If you noticed in the modified code above that I included a call to an as yet undefined routine, then good for you for being on the ball. With this step, things are going to get really interesting. The tasks we will perform with this step will be for picture boxes. They will, for the most part, apply to text boxes and line boxes as well, but with some modifications.

The sortPBoxes() routine is infact 4 routines. The subroutine makes calls to 3 other routines listed below. I will explain what each of these does as I list them, starting with sortPBoxes().

docSize() subroutine:


on sortPBoxes(pageNum)
  tell application "QuarkXPress Passport"
    tell document 1
      set theList to {}
      repeat with i from 1 to (count picture boxes of page 1)
        set the end of idxList to i
        set myProps to pBoxProps(1, i) of me
        set myProps to convertProps(myProps) of me
        set the end of theList to myProps
      end repeat
      set firstSort to bubbleSort(2, theList) of me
      set secondSort to bubbleSort(1, firstSort) of me
      return secondSort
    end tell
  end tell
end sortPBoxes

What we’re doing here is listing the properties of all the picture boxes, converting the properties to something HTML/CSS will recognize and sorting them so that the index at which they appear in our web browser start in the upper left hand corner going left to right, then top to bottom. QuarkXPress indexes page items in the reverse order they were created so that a picture box with the index 1 is the frontmost picture box. However, people tend to create page items the way they read; left to right and top to bottom. I prefer to index them in this order so I have a standardized way of referencing each picture box.

The first subroutine called by sortPBoxes() is pBoxProps. Notice that we need to provide, as parameters, pageNum for the page number or the QuarkXPress file, and pBoxNum, which you probably already guessed, indicates the number of the picture box.

This routine fetches the bounds of the picture box, whether or not it contains an image, the frame width, frame color, and the color of the picture box. That’s it. We’re just retrieving the properties here. We will manipulate them in just a moment.

pBoxProps() subroutine:


on pBoxProps(pageNum, pBoxNum)
  tell application "QuarkXPress Passport"
    tell document 1
      tell page pageNum
        set {x1, y1, x2, y2} to (coerce ((bounds of picture box pBoxNum) as list) to list)
        set {x1, y1, x2, y2} to {(coerce x1 to real), (coerce y1 to real), (coerce x2 to real), (coerce y2 to real)}
        if file path of image 1 of picture box pBoxNum is not null then
          set getsImage to true
        else
          set getsImage to false
        end if
        
        set fW to width of frame of picture box pBoxNum
        set fW to (coerce fW to real)
        
        set fc to RGB color value of color of frame of picture box pBoxNum
        
        if name of color of picture box pBoxNum is not "None" then
          set bc to RGB color value of color of picture box pBoxNum
        else
          set bc to {65535, 65535, 65535}
        end if
        return {fW, fc, {x1, y1, x2, y2}, bc, getsImage}
      end tell
    end tell
  end tell
end pBoxProps

The next routine is convertProps(). This is the routine that converts the QuarkXPress properties to CSS properties. There is nothing especially difficult (or possibly exciting) about this. We simply take each of our properties in turn, and treat them pretty much the same as we did our page sizes in docSize(). We’re going to multiply each measurement by the basePX property (72dpi), then multiply that product by the newScale property.

convertProps() subroutine:


on convertProps(myProps)
	set fW to item 1 of myProps
	set fW to round fW rounding to nearest
	set item 1 of myProps to fW
	set fc to (item 2 of myProps)
	if fc is not "noValue" then
		set fc to rgbToHex(fW, fc) of me
		set item 2 of myProps to fc
	end if
	set {x1, y1, x2, y2} to item 3 of myProps
	set {x1, y1, x2, y2} to {((x1 * basePX) * newScale), ((y1 * basePX) * newScale), ((x2 * basePX) * newScale), ((y2 * basePX) * newScale)}
	set item 3 of myProps to {x1, y1, x2, y2}
	set bc to item 4 of myProps
	set bc to rgbToHex(fW, bc) of me
	set item 4 of myProps to bc
	return myProps
end convertProps

There is a call to yet another subroutine in this routine. Don’t worry, I’ll keep track of where we are. This new subroutine is rgbToHex(). rgbToHex() requires 2 arguments; fw (frame width) and the RGB values of the color we wish to convert. The reason for providing the frame width is so that our script encounters a picture box with no frame and a white background, it will automatically be given a background color of light gray so that we can see it in our web browser. So here is the rgbToHex() routine. I’m going to hold off on explaining this routine so that we don’t get lost. At the end of this article, I will explain this conversion routine in depth.

rgbToHex() subroutine:


on rgbToHex(fW, {r, g, b})
	set {r, g, b} to {round ((r / 65535) * 255) rounding to nearest, round ((g / 65535) * 255) rounding to nearest, round ((b / 65535) * 255) rounding to nearest}
	set newValues to {}
	if {r, g, b} is not {0, 0, 0} then
		if {r, g, b} is not {255, 255, 255} then
			repeat with i from 1 to 3
				set c to item i of {r, g, b}
				if c div 16 > 9 then
					set c1 to item (c div 16) of hexList
				else
					set c1 to c div 16
				end if
				if c mod 16 > 9 then
					set c2 to item (c mod 16) of hexList
				else
					set c2 to c mod 16
				end if
				set the end of newValues to ((c1 & c2) as string)
			end repeat
		else
			if fW < 1 then
				set newValues to {"C", "C", "C"}
			else
				set newValues to {"F", "F", "F"}
			end if
		end if
	else
		if fW < 1 then
			set newValues to {"C", "C", "C"}
		else
			set newValues to {r, g, b}
		end if
	end if
	return ("#" & newValues as string)
end rgbToHex

Going back to our sortPBoxes() routine, we are now at the bubbleSort() subroutine. This is the step that rearranges our picture boxes in the CSS file. It is not complicated but is very useful even beyond the scope of this application. This particular version of the bubbleSort() assumes we are passing it a list of lists. This is so we can sort by any item within the sublist. In the first case, we will sort by item 2 of each item of the list, because item 2 (y1) tells us the left edge of the picture box. Once our list is sorted left-to-right, we will pass the newly sorted list back to bubbleSort(), this time sorting by item 1 of each item of the list, thus sorting the picture boxes top-to-bottom. Once we have retrieved our box properties, converted them to pixel based measurements and sorted them, we can return them to the main body of the script.

bubbleSort() subroutine:


on bubbleSort(sortBy, theList)
	set sortList to {}
	repeat (count items of theList) times
		repeat with i from 1 to ((count items of theList) - 1)
			set a to item i of theList
			set b to item (i + 1) of theList
			if item sortBy of item 3 of a > item sortBy of item 3 of b then
				set item (i + 1) of theList to a
				set item i of theList to b
				switchIDS(i)
			end if
		end repeat
	end repeat
	return theList
end bubbleSort

So that you can see what we’ve accomplished so far, we’re going to skip the text boxes and line boxes but will come back to them shortly. We need to add some more code to the main body of our script, which I will also explain as we add each line. Once we’ve made these additions, the rest is essentially the same thing with some modifications.

After the sortPBoxes() call in the main body of the script, add the following lines of code:

Initiate the Style Sheet:


closeDoc()
set divLst to {}
set the end of divLst to (("<div class=\"page\"]" & return & "</ div>") as string)

set myCss to (("body" & tab & "{" & cssLineBreak & ¬
	"background: #BBB;" & cssLineBreak & ¬
	"}" & return & return & ¬
	".page" & cssLineBreak & "{" & cssLineBreak & ¬
	"background: #fff;" & cssLineBreak & ¬
	"position: absolute;" & cssLineBreak & ¬
	"left: " & leftOffset & "px;" & cssLineBreak & ¬
	"top:" & topOffset & "px;" & cssLineBreak & ¬
	"width: " & pW & "px;" & cssLineBreak & ¬
	"height: " & pH & "px;" & cssLineBreak & ¬
	"border: 1px solid #000;" & cssLineBreak & ¬
	"}" & return & return) as string)

We have gotten all of the information we need from the QuarkXPress file, so we can close it with the all-too-obviously named closeDoc() routine.

The next line “set the end of divList to…” adds the HTML code that tells the browser to show a div item named “Page”. You will notice that it has no attributes at all defined in the HTML. This is because CSS will tell the browser what the attributes of the “Page” item will be. And, that is precisely what the next line (really long line) of code does. Actually it defines the attributes for the “BODY” tag as well. If you recall, at the very beginning of our script, I asked you to create some properties without explaining them. We are using two of them here. They are topOffset and leftOffset. These are simply to push our re-created QuarkXPress page out from the top and left of the browser.

Okay, I promise, we’re getting close. We have just a bit more trickery to pull off, then we will be able to run our script and see some results. But before we can, you’ll need to add this code:

Adding the picture boxes


-- Divs for picture boxes
set picSizes to ""
repeat with pBoxNum from 1 to (length of boxProperties)
	set myProps to item pBoxNum of boxProperties
	set {myStyle, myWidth, myHeight} to boxStyle(pBoxNum, "box", myProps, pW, pH, leftOffset, topOffset) of me
	set myCss to myCss & myStyle as string
	if last item of myProps is not false then
		set imgSrc to (("<img src=\"img" & pBoxNum & ".pct\" width=\"" & myWidth & "\" height=\"" & myHeight & "\" >") as string)
	else
		set imgSrc to ""
	end if
	set the end of divLst to (("<div class=\"" & ("box" & pBoxNum) & "\"]" & ¬
		return & imgSrc & return & "</div>") as string)
end repeat

For each picture box, we need to create a CSS pseudo class. We do this using the routine below. The first two parameters of this call will be combined to make up the pseudo class name (ie., “.box1”). We will also send the entire list of properties of each picture box to the routine so it can pull the information it needs. Further, we will also pass the scaled page width and height (pW, pH) because everything is relative to the page. We will also pass the topOffset and leftOffset to be added to the top and left attributes of each pseudo class. So go ahead and add the following subroutine to your script.

The Box Style Sheet


on boxStyle(i, className, myProps, pW, pH, leftOffset, topOffset)
	set myBorder to item 1 of myProps
	set myBorderColor to item 2 of myProps
	set {x1, y1, x2, y2} to item 3 of myProps
	set myWidth to (round (y2 - y1) rounding to nearest)
	set myHeight to (round (x2 - x1) rounding to nearest)
	set myTop to (round x1 rounding to nearest)
	set myLeft to (round y1 rounding to nearest)
	if pW - (myLeft + myWidth) is not 2 then
		set myLeft to myLeft + 2
	end if
	set myBackground to item 4 of myProps
	set myStyle to ¬
		(("." & className & (i) & space & "{" & cssLineBreak & ¬
			"background:" & myBackground & ";" & cssLineBreak & ¬
			"Position:absolute;" & cssLineBreak & ¬
			"left:" & (myLeft + leftOffset) & "px;" & cssLineBreak & ¬
			"top:" & (myTop + topOffset) & "px;" & cssLineBreak & ¬
			"width:" & myWidth & "px;" & cssLineBreak & ¬
			"height:" & myHeight & "px;" & cssLineBreak & ¬
			"border:" & myBorder & "px" & space & "solid" & space & myBorderColor & ";" & cssLineBreak & ¬
			"}" & return & return) as string)
	return {myStyle, myWidth, myHeight}
end boxStyle

I think most of this is pretty self-explanatory but I do want to point out one thing. I am rounding the top, left, width and height of each box. The reason for this is that your browser is not going to render a decimal measurement and it keeps our code cleaner. Some minor mis-alignment may occur but since we are rounding to nearest, the most that the alignment will ever be off is 1 pixel. In most cases it will not even be noticeable. I have added a hack (lines 9 - 11 of the routine) to accomodate any items that should touch the right edge of the page. The routine returns the entire pseudo class style (as a string) and the width and height of the picture box.

Picture boxes and text boxes are a little tricky because they each have special but different needs. Picture boxes will sometimes hold images and sometimes not. When they do, we need to include and “img” tag that specifies the name of the image, its width and its height. The width and height were returned by the boxStyle() routine above for this purpose. So that we don’t get a missing image icon when the page is loaded, I don’t want to include this tag unless it is needed. Back when we asked QuarkXPress for the properties of each picture box, you may have noticed (or not but) that there was a variable named “getsImage” that was set to either true or false based on whether or not the file path for image 1 of each picture box was null. When we passed those properties back from the subroutine, we passed back this variable. It is the last item of the list (myProps). If is is true, we want to include the “img” tag, and if false we don’t want the “img” tag.

The next step in the script creates the div tag string and adds it to the end of our list of divs. Once all the div tags for the picture boxes have been created we need only to concatenate this into a long string of HTML, add this to the entire HTML page code, then write it to a file. We have been concatenating our CSS style in the myStyle variable as we stepped through so we will right that to a different file. Do you remember those really un-exciting file names we created in the very beginning of the script Well now we will use them to write the HTML to one and the CSS to the other. So go ahead and add the remaining code to your script, compile it as an application and drop a QuarkXPress file on it.

The Finish!


set divData to ""
repeat with j from 1 to (length of divLst)
	set divData to ((divData & return & (item j of divLst)) as string)
end repeat
set theHTML to makeHTML(templateCSSName, divData)

write_to_file(myCss, ((libPath & templateCSSName) as string), false)
write_to_file(theHTML, ((sitePath & htmlFileName) as string), false)

If you’ve entered all the code correctly (and I didn’t forget anything) you should be able to open the resulting HTML file in the browser of your choice (I use Safari) and see the basics of your QuarkXPress file converted to CSS. That’s all for today’s article. I will post the remaining code in a day or two.

]How rgbToHex Works

As I promised I will explain how the rgbToHex() routine works. It is basically all math using modulus and div 16. We are going to be converting 16 bit color representation to binary representation.

The very first line of the routine,

Converting 16 bit to 8 bit


set {r, g, b} to {round ((r / 65535) * 255) rounding to nearest, round ((g / 65535) * 255) rounding to nearest, round ((b / 65535) * 255) rounding to nearest}

converts from 16 bit to 8 bit representation. What I’m doing here is figuring out what percentage of 16 bit each color represents, then multiplying 255 (8 bit) by the same percentage.

Once I’ve found out what percentage of 255 each color is, I use the div operator to find out how many times 16 will divide the color’s numerical value. This gives me the first digit of my binary representation. The if clauses are to determine if the number is greater than 9 so that I can convert it to the letter equivalent by changing the number to the character at that offset within the hexList property. For instance, if R div 16 = 11, then the first character of R in hex would be the 11th item in hexList. If R div 16 is less than 9, I can simply use R div 16 as the first character of R’s hex value.

The next step is to repeat the same step as above only using a modulus of 16. This gives us the remainder after R has been divided by 16. Again, if R mod 16 is greater than 9, we use item R mod 16 of hexList, else the second character in R’s hex value is R mod 16.

The other code in this routine is to catch specific instances in converting our QuarkXPress file to CSS. So if a picture box has no frame and no image it won’t be visible. By setting the background color of the picture box to “#CCC” (light gray) if there is no frame, we will be sure to see the picture box whether it has an image or not.

In the next part of this article we will cover converting line boxes and text boxes to CSS. These steps are just modified versions of what we’ve done so far.