Can AppleScript get list of BBEdit found sub patterns?

A text file which I’m processing in BBEdit includes the line

“19/12/22 Subscription MON-SAT 02/01/23-31/01/23 (Qty:26) 15/12/22 Credit for 12/12/22-15/12/22 (Qty:4)”

and I want to extract the dates and other details for processing in another app.

The following bit of AppleScript uses a find pattern with 5 sub patterns to successfully identify the required information, but I can’t see how to extract the 5 sub patterns. I’d like to put the found sub patterns into a list. Is this possible with BBEdit scripting?

If BBEdit doesn’t allow this then I guess I’ll have to use its replace verb to surround each sub pattern with unique markers and then extract the edited line for subsequent parsing. But extracting to a list would be much neater I think.

tell application "BBEdit"
	activate
	tell text of front text document
		set results to find "(\\d\\d/\\d\\d/\\d\\d) ([^\\d]+)(\\d\\d/\\d\\d/\\d\\d)-(\\d\\d/\\d\\d/\\d\\d) ([^ ]+)" options {search mode:grep, showing results:false, returning results:true}
	end tell
end tell

Any advice would be much appreciated.

Hey @jbh

I’m not sure if BBEdit is a requirement for you, but if this was me I wouldn’t try and script BBEdit, but instead just access the file directly to get the data you’re looking for. This can be done with a single line of code.

set theDates to do shell script "grep -Eo '[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,2}' " & quoted form of POSIX path of "Macintosh HD:Users:YourUser:Desktop:testing.txt"

I know I didn’t match your pattern exactly, so you’ll need to modify it to your needs. Hopefully you can see the basic idea and be able to tweak the pattern for your exact needs.

Once you’ve got the results of the grep, it will be a string so you’ll want to convert that to a list with the following code.

set AppleScript's text item delimiters to return
set dateList to text items of (theDates as string)
set AppleScript's text item delimiters to {""}
# dateList is your list of dates from the supplied file

One last note on the grep line of code. If the shell script doesn’t find any matches, it will exit with status 0 which will cause AppleScript to produce an error. So you may want to surround it in a try block like in the example below.

try
     set theDates to do shell script "grep -Eo '[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,2}' " & quoted form of POSIX path of "Macintosh HD:Users:YourUser:Desktop:testing.txt"
on error
     set theDates to ""
end try

I hope this helps

Thank you for that suggestion. The particular line of text which interests me is part of a document which also contains other dates, so the posted BBEdit search pattern targets only the relevant information. I’m fairly comfortable using BBEdit’s pattern playground to do this. If needs be I’ll extract that line of text and parse it using grep as per your suggestion.

But I would dearly like to get the sub patterns directly. They’re already known to Bbedit and it would save me tweaking the grep code.

hey @jbh,
Below is the code that does what I believe you’re looking for in BBEdit.

tell application "BBEdit"
	set documentRef to front text document
	set results to find "(\\d\\d/\\d\\d/\\d\\d) ([^\\d]+)(\\d\\d/\\d\\d/\\d\\d)-(\\d\\d/\\d\\d/\\d\\d) ([^ ]+)" searching in documentRef options {search mode:grep, showing results:false, returning results:true} as record
	
	set dateList to {}
	if found of results then
		set foundMatches to found matches of results
		repeat with aMatch in foundMatches
			set end of dateList to (match_string of aMatch)
		end repeat
	end if	
end tell
dateList

Best of luck!

1 Like

You can also get the offsets of the found text using the returning results option. This may make it easier to work with the text in between the dates.

tell application "BBEdit"
	find "(\\d\\d/\\d\\d/\\d\\d)" searching in text 1 of text document 1 options {search mode:grep, wrap around:true, returning results:true}
end tell

-- partial result
--> {found:true, found matches:{{result_document:text document "untitled text 8" of application "BBEdit", result_kind:note_kind, start_offset:64, end_offset:72, result_line:3, message:"“19/12/22 Subscription MON-SAT 02/01/23-31/01/23 (Qty:26) 15/12/22 Credit for 12/12/22-15/12/22 (Qty:4)”", …}

Update: If you find the nature of the regex annoying (as I do) there is an alternative way to express it. The number inside the braces is how many of the previous character it should look for.

"(\\d{2}/\\d{2}/\\d{2})"
1 Like

You can use the “grep substitution” command, which returns the contents of capture groups.

tell application "BBEdit"
	activate
	tell text of front text document
		set results to find "(\\d\\d/\\d\\d/\\d\\d) ([^\\d]+)(\\d\\d/\\d\\d/\\d\\d)-(\\d\\d/\\d\\d/\\d\\d) ([^ ]+)" options {search mode:grep, starting at top:true}
	end tell
	set captureList to {grep substitution of "\\1", grep substitution of "\\2", grep substitution of "\\3", grep substitution of "\\4", grep substitution of "\\5"}
end tell
return captureList --> {"19/12/22", "Subscription MON-SAT ", "02/01/23", "31/01/23", "(Qty:26)"}

Reference: this thread

1 Like

A quick question… do you only need the first part of the line? i.e. the ‘subscription’ but not the ‘credit for’

Find and parse: 
19/12/22 Subscription MON-SAT 02/01/23-31/01/23 (Qty:26)

Ignore: 
15/12/22 Credit for 12/12/22-15/12/22 (Qty:4)
1 Like

Kia ora @Mockman, @peavine, @Shai1 and thank you all.

Yes I do need to get all the sub patterns, i.e. from the whole line. Sometimes there will be only one set, sometimes two.

When I’ve used the the search pattern manually (i.e. not using AppleScript) to edit files with replace , both sets of five sub patterns have been changed, and so I was hoping there might be a way of accessing both of them with AppleScript.

I can get the number of sets (a count of found matches), and thought that it would then be easy to apply peavine’s grep suggestion to each of them. But I am now conceding defeat and throwing it back to those who are wiser.

FWIW My clumsy effort below throws up a couple of 100002 errors in BBEdit’s Search Results window (I haven’t discovered how to decipher the errors) and the resulting list has a repeat from the first matched set instead of getting the sub patterns from the second set.

set searchPattern to "(\\d\\d/\\d\\d/\\d\\d) ([^\\d]+)(\\d\\d/\\d\\d/\\d\\d)-(\\d\\d/\\d\\d/\\d\\d) ([^\\)]+\\))"

tell application "BBEdit"
	tell front document
		set x to find searchPattern searching in text 1 options {search mode:grep, returning results:true, starting at top:true}
	end tell
	set foundMatches to found matches of x
	set n_count to count of foundMatches
	
	set theList to {}
	repeat with j from 1 to n_count
		log "=====" & j & "======"
		set thisText to (match_string of item j of foundMatches)
		log thisText
		set result to find searchPattern searching in thisText options {search mode:grep}
		
		set captureList to {grep substitution of "\\1", grep substitution of "\\2", grep substitution of "\\3", grep substitution of "\\4", grep substitution of "\\5"}
		copy captureList to end of theList
	end repeat
end tell
theList --> {{"19/12/22", " Subscription MON-SAT ", "02/01/23", "31/01/23", "(Qty:26)"}, {"19/12/22", " Subscription MON-SAT ", "02/01/23", "31/01/23", "(Qty:26)"}}

I am getting the impression that bbedit’s find command only works on document text and not on random strings and that this may bring about the error messages.

So here is a script that runs the search, creates a temporary document and then cycles through the returned results by setting the temp doc to only the matching text of each result. It then runs the search on that text alone and then captures the subpatterns to a list of lists.

Note that to avoid some of the frustration of re-running each iteration of the script, I begin by setting the front document to the source text. I’d suggest running it on a blank document until it has been edited. FWIW, I tried adding more matching strings to the source document and the corresponding subpatterns were added to the capture list.

set searchPattern to "(\\d{2}/\\d{2}/\\d{2}) ([^\\d]+)(\\d{2}/\\d{2}/\\d{2})-(\\d{2}/\\d{2}/\\d{2}) ([^\\)]+\\))"
set sourceText to "19/12/22 Subscription MON-SAT 02/01/23-31/01/23 (Qty:26) 15/12/22 Credit for 12/12/22-15/12/22 (Qty:4)"

tell application "BBEdit"
	activate
	set text 1 of project window 1 to sourceText
	set returnedResults to find searchPattern searching in text 1 of project window 1 options {search mode:grep, returning results:true, starting at top:true}
	
	set capList to {} -- where the subpatterns will be stored
	set tempWindow to make new text document with properties {contents:sourceText as text}
	
	-- cycle through found matches
	set fmAll to found matches of returnedResults
	
	repeat with fm in fmAll -- each found match
		set text of tempWindow to match_string of fm -- set contents of temporary window to match string
		set ms to find searchPattern searching in text 1 of tempWindow options {search mode:grep, starting at top:true}
		set end of capList to {grep substitution of "\\1", grep substitution of "\\2", grep substitution of "\\3", grep substitution of "\\4", grep substitution of "\\5"}
		
	end repeat
end tell

ms
--> {found:true, found object:characters 1 thru 45 of text document 1 of application "BBEdit", found text:"15/12/22 Credit for 12/12/22-15/12/22 (Qty:4)"}
capList
--> {{"19/12/22", "Subscription MON-SAT ", "02/01/23", "31/01/23", "(Qty:26)"}, {"15/12/22", "Credit for ", "12/12/22", "15/12/22", "(Qty:4)"}}

NB As ms is set within the repeat loop, the result above is for the last time through. Perhaps its found text can be used, obviating the need for the temp document.

Also, the loop find should not have the returning results option turned on.

Thank you! That script does exactly what I want. I’m happy to use the temp document. This has been a great learning experience for me.

1 Like

Hi.

Another way, without the additional windows would be:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

tell application "BBEdit" to set fullText to text of document 1
set searchPattern to "(\\d{2}/\\d{2}/\\d{2}) ([^\\d]+)(\\d{2}/\\d{2}/\\d{2})-(\\d{2}/\\d{2}/\\d{2}) (\\([^\\)]+\\))"

set fullText to current application's class "NSString"'s stringWithString:(fullText)
set regex to current application's class "NSRegularExpression"'s regularExpressionWithPattern:(searchPattern) options:(0) |error|:(missing value)
set matches to regex's matchesInString:(fullText) options:(0) range:({0, fullText's |length|()})

set output to {}
repeat with thisMatch in matches
	set fullMatch to (fullText's substringWithRange:(thisMatch's range())) as text
	set subMatches to {}
	repeat with i from 1 to (thisMatch's numberOfRanges()) - 1
		set end of subMatches to (fullText's substringWithRange:(thisMatch's rangeAtIndex:(i))) as text
	end repeat
	set end of output to {fullMatch:fullMatch, subMatches:subMatches}
end repeat
return output
-->{{fullMatch:"19/12/22 Subscription MON-SAT 02/01/23-31/01/23 (Qty:26)", subMatches:{"19/12/22", "Subscription MON-SAT ", "02/01/23", "31/01/23", "(Qty:26)"}}, {fullMatch:"15/12/22 Credit for 12/12/22-15/12/22 (Qty:4)", subMatches:{"15/12/22", "Credit for ", "12/12/22", "15/12/22", "(Qty:4)"}}}
2 Likes

You could also use the ‘bbfind’ command line tool (if installed) that comes with BBEdit

set searchPattern to "(\\d{2}/\\d{2}/\\d{2}) \\D+ (\\d{2}/\\d{2}/\\d{2})-(\\d{2}/\\d{2}/\\d{2}) ([^\\)]+\\))"
set sourceText to "19/12/22 Subscription MON-SAT 02/01/23-31/01/23 (Qty:26) 15/12/22 Credit for 12/12/22-15/12/22 (Qty:4)"

set filePath to ((path to desktop folder) as text) & "testing.txt"
try
	set testFile to open for access file filePath with write permission
on error
	return
end try
set eof testFile to 0
write sourceText to testFile
close access testFile

set theDates to do shell script "/usr/local/bin/bbfind -g -x '" & searchPattern & "' " & quoted form of POSIX path of filePath

1 Like

Hi, that’s magic Nigel. Thank you…

@robertfern
Thanks for pointing out the bbfind tool.