Safari "Export as PDF..." with endless -repeat until exists-

So I used this code from here to save a single webpage with Safari’s “Export as PDF…”

Safari pdf export

I am looping through a big array of URL’s, setting up Safari in Responsive Design Mode to save the PDF’s for mobile. I am even scrolling down the pages to load lazy images and dynamically modifying CSS with JavaScript but I left that out for now.

The problem is that I never get through my URL list, mostly the script stucks at the repeat until exists sheet 1 of window 1 right after the click menu item "Export as PDF…"

Here an example of the SE replies:

tell application "System Events"
click menu item "Export as PDF…" of menu "File" of menu bar 1 of process "Safari"
      --> menu item "Export as PDF…" of menu "File" of menu bar item "File" of menu bar 1 of application process "Safari"
   exists sheet 1 of window 1 of process "Safari"
      --> false
   exists sheet 1 of window 1 of process "Safari"
      --> false
   exists sheet 1 of window 1 of process "Safari"
      --> false</code>

I have bumped up and inserted various delays, to no avail.
Sometimes I also get similar problems with other click menu item and click button

Is there another (new?) way to do this, or to check if a button is indeed clicked or a sheet does exist?

Running through the array, the whole UI processing feels slow and laggy, hence all the delays.

I also tried throwing in some try - on error - end try but it doesn’t help…

# declare variables
global G_url_list, G_url_list_counter, G_save_folder_path

# an array of URL's
set G_url_list to {"h**ps://www.morganscloud.com/2019/03/28/4-vital-anchor-selection-criteria-and-a-review-of-spade/", "h**ps://www.morganscloud.com/2019/03/14/15-steps-to-getting-securely-anchored/", "h**ps://www.morganscloud.com/2017/03/22/come-alongside-docking-made-easy/", "h**ps://www.morganscloud.com/2017/03/24/10-tips-to-make-coming-alongside-docking-easy/"}
# start index
set G_url_list_counter to 1
# folder to save the PDF's
set G_save_folder_path to "~/Desktop/"


# INIT
urlLoop(G_url_list)


# wait for page to finish loading in Safari
# works in macOS Catalina and macOS Big Sur
# may need adjusting for other versions of macOS
on waitPageLoaded()
   tell application "System Events" to repeat until ¬
      exists (buttons of groups of toolbar 1 of window 1 of ¬
         process "Safari" whose name = "Reload this page")
      delay 0.5
   end repeat
end waitPageLoaded


# loop through URL list
# @ array - URL list
on urlLoop(url_arr)
   repeat with i from 1 to (count url_arr)
      set url_item to (item i of url_arr)
      my loadURL(url_item)
      # add to global counter
      set G_url_list_counter to G_url_list_counter + 1
   end repeat
end urlLoop


# make a new tab and load URL
# prepare Safari in Responsive Design Mode
# @ string - URL to load
on loadURL(url_item)
   
   tell application "Safari"
      activate
      tell window 1
         
         set current tab to make new tab with properties {URL:url_item}
         
         my waitPageLoaded()
         delay 2
         
         # enter Responsive Design Mode, only after page is loaded
         tell application "System Events" to tell process "Safari"
            -- click menu item "Show Web Inspector" of menu "Develop" of menu bar 1
            click menu item "Enter Responsive Design Mode" of menu "Develop" of menu bar 1
         end tell
         delay 2
         
      end tell -- window 1 of Safari
   end tell -- Safari app
   
   my scrollPage()
   
end loadURL


# scroll the page to the bottom to load all lazy images
on scrollPage()
   my saveToPDF()
end scrollPage


# save as PDF and close Safari tab
on saveToPDF()
   
   tell application "Safari"
      activate
      
      tell application "System Events" to tell process "Safari"
         
         # export PDF
         click menu item "Export as PDF…" of menu "File" of menu bar 1
         delay 4 -- give the dialog a few seconds to popup
         
         repeat until exists sheet 1 of window 1
            delay 0.5
         end repeat
         
         # define folder with folder shortcut
         keystroke "g" using {command down, shift down}
         
         repeat until exists sheet 1 of sheet 1 of window 1
            delay 0.5
         end repeat
         
         # set path and go
         tell sheet 1 of sheet 1 of window 1
            set value of combo box 1 to G_save_folder_path
            click button "Go"
         end tell
         
         # save
         click button "save" of sheet 1 of window 1
         delay 1 -- give the save process a second
         
         # close tab
         click menu item "Close Tab" of menu "File" of menu bar 1
         delay 1 -- give the close tab a second
         
      end tell -- System Events -- process Safari
      
   end tell -- Safari app
   
end saveToPDF

Having a tell statement on the same line as another tell statement is never a good idea. Especially when the second tell is a multi-line tell block.

I don’t even know how this compiles since a tell line with a “to” option should only be a one-liner.

Here is my version I cleaned up for you.

# declare variables
global G_url_list, G_url_list_counter, G_save_folder_path

# an array of URL's
set G_url_list to {"https://www.morganscloud.com/2019/03/28/4-vital-anchor-selection-criteria-and-a-review-of-spade/", "https://www.morganscloud.com/2019/03/14/15-steps-to-getting-securely-anchored/", "https://www.morganscloud.com/2017/03/22/come-alongside-docking-made-easy/", "https://www.morganscloud.com/2017/03/24/10-tips-to-make-coming-alongside-docking-easy/"}
# start index
set G_url_list_counter to 1
# folder to save the PDF's
set G_save_folder_path to "~/Desktop/"


# INIT
urlLoop(G_url_list)


# wait for page to finish loading in Safari
# works in macOS Catalina and macOS Big Sur
# may need adjusting for other versions of macOS
on waitPageLoaded()
	tell application "System Events"
		repeat until exists (buttons of groups of toolbar 1 of window 1 of ¬
			process "Safari" whose name = "Reload this page")
			delay 0.5
		end repeat
	end tell
end waitPageLoaded


# loop through URL list
# @ array - URL list
on urlLoop(url_arr)
	repeat with i from 1 to (count url_arr)
		set url_item to (item i of url_arr)
		my loadURL(url_item)
		# add to global counter
		set G_url_list_counter to G_url_list_counter + 1
	end repeat
end urlLoop


# make a new tab and load URL
# prepare Safari in Responsive Design Mode
# @ string - URL to load
on loadURL(url_item)
	
	tell application "Safari"
		activate
		tell window 1
			set current tab to make new tab with properties {URL:url_item}
		end tell -- window 1 of Safari
	end tell -- Safari app
	my waitPageLoaded()
	delay 1
	# enter Responsive Design Mode, only after page is loaded
	tell application "System Events"
		tell process "Safari"
			-- click menu item "Show Web Inspector" of menu "Develop" of menu bar 1
			click menu item "Enter Responsive Design Mode" of menu "Develop" of menu bar 1
		end tell
	end tell
	delay 2
	my scrollPage()
	my saveToPDF()
end loadURL


# scroll the page to the bottom to load all lazy images
on scrollPage()
	tell application "System Events"
		tell process "Safari"
			set myScroll to scroll bar 1 of scroll area 1 of group 1 of group 1 of scroll area 2 of tab group 1 of splitter group 1 of window 1
			set value of myScroll to 1.0
		end tell
	end tell
end scrollPage


# save as PDF and close Safari tab
on saveToPDF()
	
	tell application "Safari" to activate
	tell application "System Events"
		tell process "Safari"
			
			# export PDF
			click menu item "Export as PDF…" of menu "File" of menu bar 1
			--delay 1 -- give the dialog a few seconds to popup
			
			repeat until exists sheet 1 of window 1
				delay 0.5
			end repeat
			
			# define folder with folder shortcut
			keystroke "g" using {command down, shift down}
			
			repeat until exists sheet 1 of sheet 1 of window 1
				delay 0.5
			end repeat
			
			# set path and go
			tell sheet 1 of window 1
				tell sheet 1
					set value of text field 1 to G_save_folder_path
					set focused of text field 1 to true
					delay 0.5
					keystroke return
				end tell
				
				# save
				click button "save"
			end tell
			delay 1 -- give the save process a second
			
			# close tab
			click menu item "Close Tab" of menu "File" of menu bar 1
			delay 1 -- give the close tab a second
		end tell
	end tell -- System Events -- process Safari
end saveToPDF
1 Like

Hi Robert,

Thank you for taking the time looking at my script and explaining to me the issues with the tell statements.

Your corrections did seem to help a lot, I just tried about 30 pages and was only interrupted once in the waitPageLoaded() function, but probably a little glitch while loading the pages bloated with JavaScripts, dynamic fonts, Analytics etc.

The scrollPage() function is to scroll trough the page with JS, the only solution I could think of to load the nasty lazy images, a Royal PITA… even with JS disabled in Safari the low res images don’t get replaced, -1 for this WP plugin.

I love Safari’s “Export as PDF…” because it can be used on conjunction with the Responsive Design Mode, useful to save a mobile version with CSS @media Rules applied.

I also found a good way to alter the CSS some more with JS, some pages have hundreds of comments which are dynamically appended while scrolling.

I will post my code now, I hope it can help somebody else.

Thanks again!

P.S. if there is something more to correct please let me know… :pray:

# declare variables
global G_site_name, G_url_list, G_url_list_counter, G_save_folder_path, G_scroll_speed, G_scroll_y

# website name to remove from title
set G_site_name to " - Attainable Adventure Cruising"

# an array of URL's
set G_url_list to {"https://www.morganscloud.com/2019/03/28/4-vital-anchor-selection-criteria-and-a-review-of-spade/", "https://www.morganscloud.com/2019/03/14/15-steps-to-getting-securely-anchored/", "https://www.morganscloud.com/2017/03/22/come-alongside-docking-made-easy/", "https://www.morganscloud.com/2017/03/24/10-tips-to-make-coming-alongside-docking-easy/"}

# start index
set G_url_list_counter to 1

# folder to save the PDF's
set G_save_folder_path to "~/Desktop/"

# speed scroll in seconds, to catch up and load lazy images
set G_scroll_speed to 0.75 -- seems to work, depending on how many images there are on the page

# horizontal scroll in pixels
-- a small phone (4 inch) has a height of +/- 600px 
-- depends on the viewports offset of lazy loaded images
set G_scroll_y to 300


# INIT
urlLoop(G_url_list)


# wait for page to finish loading in Safari
# works in macOS Catalina and macOS Big Sur
# may need adjusting for other versions of macOS
on waitPageLoaded()
   tell application "System Events"
      repeat until exists (buttons of groups of toolbar 1 of window 1 of ¬
         process "Safari" whose name = "Reload this page")
         delay 0.5
      end repeat
   end tell
end waitPageLoaded


# loop trough URL list
# @ array - URL list
on urlLoop(url_arr)
   repeat with i from 1 to (count url_arr)
      set url_item to (item i of url_arr)
      my loadURL(url_item)
      # add to global counter
      set G_url_list_counter to G_url_list_counter + 1
   end repeat
end urlLoop


# make a new tab and load URL
# prepare Safari in Responsive Design Mode
# modify HTML / CSS
# @ string - URL to load
on loadURL(url_item)
   
   # load URL
   tell application "Safari"
      activate
      tell window 1
         set current tab to make new tab with properties {URL:url_item}
      end tell -- window 1
   end tell -- Safari app
   
   # on load complete			
   my waitPageLoaded()
   delay 1
   
   # enter Responsive Design Mode, only after page is loaded
   tell application "System Events"
      tell process "Safari"
         -- click menu item "Show Web Inspector" of menu "Develop" of menu bar 1
         click menu item "Enter Responsive Design Mode" of menu "Develop" of menu bar 1
      end tell -- process Safari
   end tell -- System Events
   delay 2
   
   tell application "Safari"
      tell window 1
         # modify HTML / CSS
         do JavaScript "var styles = '.scriptlesssocialsharing,.wpd-form-wrap{display:none}#wpdcom .wpd-comment .wpd-comment-text{font-size:18px!important}ul{margin:0 0 1.5em 1em}ol{margin:0 0 1.5em 1.2em!important}blockquote{padding:0 0 0 10px!important;font-size:1em!important}.note,.box{padding:0!important}#wpdcom.wpd-layout-2 .wpd-reply.wpd_comment_level-2{margin-left:0!important}.site-main .wp-block-group__inner-container{padding:1em!important}.tip .wp-block-group__inner-container{padding:0!important}';
    				// add to the stylesheet
    				var styleSheet = document.createElement('style');
    				styleSheet.innerText = styles;
    				document.head.appendChild(styleSheet);
    				" in current tab
         delay 1 -- give the page a a second to re-render
      end tell -- window 1
   end tell -- Safari app
   
   my scrollPage()
   
end loadURL


# scroll the page to the bottom to load all lazy images
# scroll back to the top of the page
on scrollPage()
   
   tell application "Safari"
      activate
      tell window 1
         
         # loop to scroll
         repeat
            
            # move down and return boolean true when bottom is reached
            # could work also with FN + arrow down
            set isBottom to do JavaScript "
    					var bottomscroll = false;
    					// add 10 more pixels, sometimes we are missing a pixel (probably because of rounding)
    					if ((window.pageYOffset + window.innerHeight + 10) < document.body.clientHeight) {
    						window.scrollBy(0," & G_scroll_y & ");	
    						bottomscroll;
    					} else {
    						bottomscroll = true;
    						bottomscroll;
    					}
    				" in current tab
            
            # scrolling speed
            delay G_scroll_speed
            
            # if we have reached to bottom of the page
            if isBottom then
               # scroll back to top of page (to fix top menu)
               do JavaScript "window.scrollTo(0, 0);" in current tab
               delay 1
               exit repeat -- exit the loop to scroll
            end if
            
         end repeat -- end loop to scroll
         
      end tell -- window 1
   end tell -- Safari app
   
   my saveToPDF()
   
end scrollPage


# get page title and modify
# save as PDF and close Safari tab
on saveToPDF()
   
   tell application "Safari"
      activate
      tell window 1
         
         # get the title
         set page_title_full to name of current tab
         
         # if there is the website's name, get rid of it
         if (page_title_full contains G_site_name) then
            try
               # save their current state
               set oldDelims to AppleScript's text item delimiters
               # declare new delimiters
               set AppleScript's text item delimiters to {G_site_name}
               # do script steps here
               set page_title_short to text item 1 of page_title_full
               set page_title_new to (G_url_list_counter as text) & ". - " & page_title_short
               # restore them
               set AppleScript's text item delimiters to oldDelims
            on error
               # restore them in case something went wrong
               set AppleScript's text item delimiters to oldDelims
            end try
         else
            set page_title_new to (G_url_list_counter as text) & ". - " & page_title_full
         end if
         
         delay 5 -- very conservative here...
         
      end tell -- window 1
   end tell -- Safari app
   
   tell application "Safari"
      activate
      tell application "System Events"
         tell process "Safari"
            
            # export PDF
            click menu item "Export as PDF…" of menu "File" of menu bar 1
            delay 4 -- give the dialog a few seconds to popup
            
            # sheet
            repeat until exists sheet 1 of window 1 -- sheet in export pdf dialog
               delay 0.5
            end repeat
            
            # insert new title text, textfield is allready active
            set value of text field 1 of sheet 1 of window 1 to page_title_new & ".pdf"
            
            # define folder with folder shortcut
            keystroke "g" using {command down, shift down}
            
            # another sheet
            repeat until exists sheet 1 of sheet 1 of window 1
               delay 0.5
            end repeat
            
            # set path and go
            tell sheet 1 of window 1
               tell sheet 1
                  set value of combo box 1 to G_save_folder_path
                  click button "Go"
                  delay 0.5
               end tell
               # save
               click button "save"
            end tell
            delay 1 -- give the save process a second
            
            # close tab
            click menu item "Close Tab" of menu "File" of menu bar 1
            delay 1 -- give the close tab a second
            
         end tell -- process Safari
         
      end tell -- System Events
      
   end tell -- Safari app
   
end saveToPDF
1 Like

Hey Anton,

One minor suggestion. Personally I wouldn’t use UI-Scripting to close the front Tab when it can be done directly.

tell application "Safari"
   tell front window
      close current tab
   end tell
end tell

-Chris

You could use just Applescript for the “ScrollPage” routine instead of Javascript.

Like so

property G_scroll_speed : 0.75

on scrollPage()
	local myScroll, myScrollArea, maxScroll, mySize, CurrScroll, G_scroll_y
	tell application "System Events"
		tell process "Safari"
			set myScrollArea to scroll area 1 of group 1 of group 1 of scroll area 2 of tab group 1 of splitter group 1 of window 1
			--set mySize to item 2 of ((value of attribute "AXSize" of myScrollArea) as list)
			set G_scroll_y to item 2 of ((value of attribute "AXSize" of myScrollArea) as list)
			set myScroll to scroll bar 1 of myScrollArea
			set value of myScroll to 0
			set maxScroll to item 4 of ((value of attribute "AXFRame" of UI element 1 of myScrollArea) as list)
			if maxScroll < 0 then -- for some reason values greater than 2^16 come back as negative
				set maxScroll to 65536 + maxScroll - (item 2 of ((value of attribute "AXFRame" of UI element 1 of myScrollArea) as list))
			end if
			set maxScroll to maxScroll - G_scroll_y
			set maxScroll to G_scroll_y / maxScroll
			set CurrScroll to 0
			repeat until CurrScroll ≥ 1
				set CurrScroll to CurrScroll + maxScroll
				if CurrScroll > 1 then set CurrScroll to 1
				set value of myScroll to CurrScroll
				delay G_scroll_speed
			end repeat
		end tell
	end tell
end scrollPage

also you can use properties instead of globals like so

declare variables

property G_scroll_speed : 0.75
property G_url_list : {"https://www.morganscloud.com/2019/03/28/4-vital-anchor-selection-criteria-and-a-review-of-spade/", "https://www.morganscloud.com/2019/03/14/15-steps-to-getting-securely-anchored/", "https://www.morganscloud.com/2017/03/22/come-alongside-docking-made-easy/", "https://www.morganscloud.com/2017/03/24/10-tips-to-make-coming-alongside-docking-easy/"}
property G_url_list_counter : 1
property G_save_folder_path : "~/Desktop/"

Here is also an much cleaner and faster “saveToPDF()” routine

# save as PDF and close Safari tab
on saveToPDF()
	tell application "Safari" to activate
	tell application "System Events"
		tell process "Safari"
			
			# export PDF
			click menu item "Export as PDF…" of menu "File" of menu bar 1
			
			repeat until exists sheet 1 of window 1
				delay 0.5
			end repeat
			
			# define folder with folder shortcut
			keystroke "d" using {command down} -- gets immediately to Desktop folder
			
			click button "save" of sheet 1 of window 1
			
			repeat while exists sheet 1 of window 1
				tell sheet 1 of window 1 -- check if "Repace Existing" dialog appears
					if exists sheet 1 then tell sheet 1 to click button "Replace"
				end tell
				delay 0.5
			end repeat
		end tell
	end tell -- System Events -- process Safari
	
	# close tab
	tell front window of application "Safari" to close current tab
	delay 1 -- give the close tab a second
end saveToPDF

and here is a better init routine using an explicit ‘run’ command

on run
	tell application "System Events"
		if exists menu "Develop" of menu bar 1 of process "Safari" then
			tell me to urlLoop(G_url_list)
		else
			beep
			display alert "You must have the Develop Menu visible to use this script!" giving up after 5
		end if
	end tell
end run

Wow that looks much better indeed!

The scrolling in JS looks a bit simpler, but I will play around with your scroll anyway to learn a bit more.
Thanks again Robert for the other suggestions I will try them out soon. :partying_face:
Also :pray: to Fredrik71 and ccstone, Macscripter seems a cool place

P.S.
Now if I could only load those lazy images without having to scroll…
I tried to ask here: lazySizes.loader.unveil(imgElem) Method does not show all images in page · Issue #981 · aFarkas/lazysizes · GitHub and found this: Method to trigger a lazysizes initiation? · Issue #205 · aFarkas/lazysizes · GitHub but for now no luck :expressionless:

1 Like

Just as a side note: Safari seems to remember the last chosen save path so actually there is even no need to set that.

I am saving a modified page title with and index and since the textbox is already highlighted all I need to do is set the value and click the save button.

on savePDF()
	tell application "Safari" to activate
	tell application "System Events"
		tell process "Safari"
			
			# export PDF
			click menu item "Export as PDF…" of menu "File" of menu bar 1
			
			# wait for the sheet (export PDF dialog)
			repeat until exists sheet 1 of window 1
				delay 0.5
			end repeat
			
			# textfield is already active filled with the page title
            # optional: insert new title text    			
            set page_title_new to "my new page title"
			set value of text field 1 of sheet 1 of window 1 to page_title_new & ".pdf"
			
			# save to the last location
			click button "save" of sheet 1 of window 1
			delay 1 -- give the save process a second
			beep
			say "PDF" & G_url_list_counter & "saved"
			
		end tell -- process Safari
	end tell -- System Events
	
	# close tab
	tell front window of application "Safari" to close current tab
	delay 1 -- give the close tab a second
	
end saveToPDF
1 Like

The difficulty of simply turning a webpage into a decluttered pdf has been a headache of mine for years. The closest thing I’ve found to a solution is Brett Terpstra’s gather-cli, which produces a markdown file from a webpage, and does a pretty good job of decluttering. Not perfect. It doesn’t work well on certain domains. Also downloading and embedding images can still be a problem when converting to pdf.

Interesting the Gather library, I will check it out later!

Yeah de-cluttering a webpage for print is not easy, when you have an array of mixed websites to automate through I imagine it quickly becomes a nightmare.
What I always try first is Safari’s Reader View, CMD + SHIFT + R I believe. Firefox is even better as you can customize fonts, colors, page width, line height etc. I use it a lot on my rugged 4-inch phone where every pixel counts, although on mobile you don’t get the format controls.
These readers weed out comments sections, which is important in my case.

When you have pages with the same layout / template like a typical Wordpress website it becomes easier, CSS to the rescue to hide and override rules. I did that with JS in my AppleScript earlier, create a style element, fill it with CSS textcontent and append it to the head.
The other day I found a Safari extension that let’s you inject CSS (and also JS but I prefer to do that in my Macscript as I feel it gives more control)
It’s a lot easier to use especially when experimenting to find the rules to override.
https://phili.pe/posts/introducing-page-extender-for-safari/

Before turning to AppleScript, I used the command line tool WKHTMLTOPDF https://wkhtmltopdf.org/
I tried out most of the options, even managed to get the --cookie-jar working as the pages I want to convert to PDF are behind a paywall. WKHTMLTOPDF is a robust solution, you can inject CSS and JS and there are a lot of options but being a headless navigator some things were missing for me, most important a way to set the viewport size and emulate a device.
btw I am rendering PDF’s to read offline on my 4-inch phone and a 15 year old iPad which I use solely for studying.

Safari’s Responsive Design Mode is great because it applies @media CSS queries, responsive layouts and images etc. So does Firefox and I guess Chrome.
Anyway I much prefer the way Safari renders PDF’s with the Save to PDF… it never seems to “cut” images over pages, although you could fix that later removing margins making one long PDF.

My main problem is - WAS because I found a hack - lazy loaded images!! That’s why I needed to scroll, which takes too much time especially to go through hundreds of comments just for an image attachment once in a while.
This Wordpress plugin uses a fork of the lazySizes library, which does have a lazysizes.loader.unveil() Method but for some reason I could not call it in this WP version even though I can console.log(window.lazySizes) the Object and it Methods…

So what I did was take a good look at the HTML nodes and saw that a <noscript> tag was rendered with a full res image node in it’s textContent. I than swapped out the the 2 nodes with a few lines of JS and bam, all the images are loaded!

I tried loading the pages with JS disabled before which worked loading the lazy images from the <noscript> tag but since the comments have some Ajax pagination I can’t disable JS. Now I still need to implement triggering my myUnveil() hack in case the pagination does occur (> 100 comments it seems)
An approach could be scrolling the webpage only when the pagination is present.

Very happy to get this project almost finished, thanks Macscripters for the help!

/*
<a href="#">
	 <img class="lazyload" src="blurred.jpg" />
	 <noscript>
		  <img src="sharp.jpg" />
	 </noscript>
</a>
-- becomes:
<a href="#">
	 <img src="sharp.jpg" />
</a>
*/
function myUnveil() {
	var images = document.querySelectorAll('img.lazyload');
	// console.log(images);
	for (let i = 0; i < images.length; i++) {
		let image = images[i];
		// console.log('unveil:', image);
		let noscriptImg = image.nextElementSibling.firstChild;
		let noscriptImgTxtCont = noscriptImg.textContent;		
		image.parentNode.innerHTML = noscriptImgTxtCont;
	}
}
myUnveil();