getElementsByClassName() and getElementsByTagName without browser.

Perhaps someone has already tried to do what I am trying to do - get the text of HTML elements without the participation of any browser.

If I knew how the getElementsByTagName(‘someTagName’) and getElementsByClassName(‘someClassName’) functions are implemented in Safari, then I could repeat them without the participation of the browser.

My understanding is that these functions 1) find all parts of HTML that start and end with the tag (or class) name, 2) remove the tag name itself from the beginning and end of lines.

The following simple code that I started only does the 2nd step.


use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

set theHTML to do shell script "curl 'https://www.google.gr'"
set elementsInnerStrings to my getElementsByTagName(theHTML, "script")

on getElementsByTagName(theHTML as text, theTag as text)
	set ATID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {"<" & theTag & " ", "</" & theTag}
	set tagStrings to text items of theHTML
	set AppleScript's text item delimiters to ATID
	if (tagStrings as list) is {} then return {}
	set tempList to {}
	repeat with tagString in tagStrings
		if tagString does not start with ">" and tagString does not start with "<!" then set end of tempList to contents of tagString
	end repeat
	return tempList
end getElementsByTagName

on getElementsByClassName(theHTML as text, theClass as text)
	-- similar stuff
end getElementsByClassName

The request for help is: how to do the same effectively? (without repeat loops, for example)

Look into using NSXMLDocument

Thanks for the tip, @technomorph. :slight_smile: This really works:


use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set theHTML to do shell script "curl 'https://www.google.com'"
set elementsInnerStrings to my getElementsByTagName(theHTML, "script")

on getElementsByTagName(theXML, tagName) -- gets tag text contents
	set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithXMLString:theXML options:(current application's NSXMLDocumentTidyHTML) |error|:(reference)
	set {theMatches, theError} to (theXMLDoc's nodesForXPath:("//" & tagName) |error|:(reference))
	return (theMatches's valueForKey:"stringValue") as list
end getElementsByTagName

Note you can also create the NSXMLDocument straight from a NSURL as well.

Here’s some other examples of Paths
Note the stuff in […] are filters

static NSString* PlaylistNameXPath = @“.//title”;
// DJ K-Tel - Session #104 - 20th November 2021
//
//static NSString* PlaylistNameXPath = @“/html/meta[@property=‘og:title’]”;

static NSString* PlaylistTracksXPath = @“.//div[@class=‘info’]”;
//

static NSString* PlaylistTracksTitleXPath = @“.//div[@class=‘sub’]”;
//

Who’ll Stop The Rain?

static NSString* PlaylistTracksArtistXPath = @“.//div[@class=‘head’]”;
//

Creedence Clearwater Revival

Quite interesting examples. Many applications have a distinctly slow speed for retrieving objects when applying filters. One notable example of this is getting a list of Calendars.app events - very slow. I’ll think about using AsObjC to improve the speed of some application-related scripts.

AppleScript events are “expensive”
You may wanna look into eventKit framework (using AsObjC)
https://developer.apple.com/documentation/eventkit?language=objc

I found hensame with iTunes and now use the ITLibrary framework
Much faster. Though it’s read only.
Seems with event kit you can create items and modify them