extract all numbers between parentheses in a string

I have this dataset:

set theString to “Test 10 (1773)
Test 8 (1600)
Gò Vấp (10062)
Phú Nhuận (898)
Tân Bình (925)
Bình Chánh (78)
Bình Tân (500)
Bình Thạnh (400)
Test 9 (471)
Test 1 (615)
Test 12 (463)
Test 3 (39)
Tân Phú (522)
Thủ Đức (423)
Test 7 (351)
Cần Giờ (9)
Hóc Môn (98)
Test 2 (127)
Test 4 (8)
Test 5 (12)
Test 6 (228)
Test 11 (111)
Củ Chi (112)”

I’m trying to extract only the numbers that are between the parenthesis. Anybody know how to do this?

You may try:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

set theString to "Test 10 (1773)
Test 8 (1600)
Gò Vấp (10062)
Phú Nhuận (898)
Tân Bình (925)
Bình Chánh (78)
Bình Tân (500)
Bình Thạnh (400)
Test 9 (471)
Test 1 (615)
Test 12 (463)
Test 3 (39)
Tân Phú (522)
Thủ Đức (423)
Test 7 (351)
Cần Giờ (9)
Hóc Môn (98)
Test 2 (127)
Test 4 (8)
Test 5 (12)
Test 6 (228)
Test 11 (111)
Củ Chi (112)"

my getNumbersFrom:theString

#===== Now we aren't in the main code, we may put the handlers

on getNumbersFrom:someString
	-- define the pattern used by the regular expression
	set pattern to "\\(([0-9]+)\\)"
	-- build the regular expression
	set regex to (current application's NSRegularExpression's regularExpressionWithPattern:pattern options:0 |error|:(missing value))
	-- call the regex to search the substring mathching the given pattern
	set matches to (regex's matchesInString:someString options:0 range:{location:0, |length|:(count someString)})
	-- matches is an object which can't be used directly by AppleScript
	-- convert the AppleScript string object into an NSString which is the official class used by ASObjC
	set cocoaString to (current application's NSString's stringWithString:someString)
	set newArray to current application's NSMutableArray's new()
	repeat with aMatch in matches
		(newArray's addObject:((cocoaString's substringWithRange:(aMatch's rangeAtIndex:1)) as integer))
	end repeat
	return newArray as list
end getNumbersFrom:

#=====

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mercredi 8 juillet 2020 10:35:25

Of course there is also the old fashioned road:

set theString to "Test 10 (1773)
Test 8 (1600)
Gò Vấp (10062)
Phú Nhuận (898)
Tân Bình (925)
Bình Chánh (78)
Bình Tân (500)
Bình Thạnh (400)
Test 9 (471)
Test 1 (615)
Test 12 (463)
Test 3 (39)
Tân Phú (522)
Thủ Đức (423)
Test 7 (351)
Cần Giờ (9)
Hóc Môn (98)
Test 2 (127)
Test 4 (8)
Test 5 (12)
Test 6 (228)
Test 11 (111)
Củ Chi (112)"

--
set splitted to rest of my decoupe(theString, {"(", ")"})
set theNumbers to {}
repeat with i from 1 to (count splitted) by 2
	set end of theNumbers to (item i of splitted) as integer
end repeat
theNumbers

#===== Now we aren't in the main code, we may put the handlers

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

which works flawlessly as long as there is no anomaly in the original datas.
For instance a single extraneous parenthesis in the line:
Test (1 (615) would break the code.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mercredi 8 juillet 2020 15:11:00

Another suggestion, which returns a list of integers and includes simple error correction.

set theString to "Test 10 (1773)
Test 8 (1600)
Gò Vấp (10062)
Phú Nhuận (898)
Tân Bình (925)
Bình Chánh (78)
Bình Tân (500)
Bình Thạnh (400)
Test 9 (471)
Test 1 (615)
Test 12 (463)
Test 3 (39)
Tân Phú (522)
Thủ Đức (423)
Test 7 (351)
Cần Giờ (9)
Hóc Môn (98)
Test 2 (127)
Test 4 (8)
Test 5 (12)
Test 6 (228)
Test 11 (111)
Củ Chi (112)"

set theString to paragraphs of theString

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"(", ")"}

set numberList to {}

repeat with anItem in theString
	try
		set end of numberList to (text item 2 of anItem) as integer
	on error
		set end of numberList to missing value
	end try
end repeat

set AppleScript's text item delimiters to astid

numberList

With a line modified as : “Test (1 (615)”, peavine’s proposal would return :
{1773, 1600, 10062, 898, 925, 78, 500, 400, 471, 1, 463, 39, 522, 423, 351, 9, 98, 127, 8, 12, 228, 111, 112}
Here is a modified version:

set theString to "Test 10 (1773)
Test 8 (1600)
Gò Vấp (10062)
Phú Nhuận (898)
Tân Bình (925)
Bình Chánh (78)
Bình Tân (500)
Bình Thạnh (400)
Test 9 (471)
Test (1 (615)
Test 12 (463)
Test 3 (39)
Tân Phú (522)
Thủ Đức (423)
Test 7 (351)
Cần Giờ (9)
Hóc Môn (98)
Test 2 (127)
Test 4 (8)
Test 5 (12)
Test 6 (228)
Test 11 (111)
Củ Chi (112)"

set theString to paragraphs of theString

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"(", ")"}

set numberList to {}

repeat with anItem in theString
	if (count (get text items of anItem)) = 3 then
		set end of numberList to (text item 2 of anItem) as integer
	else
		set end of numberList to anItem as text
	end if
end repeat

set AppleScript's text item delimiters to astid

numberList
--> {1773, 1600, 10062, 898, 925, 78, 500, 400, 471, "Test (1 (615)", 463, 39, 522, 423, 351, 9, 98, 127, 8, 12, 228, 111, 112}

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mercredi 8 juillet 2020 18:01:44

You could make that more efficient by doing a regex search-and-replace:

on getNumbersFrom:someString
	set cocoaString to (current application's NSString's stringWithString:someString)
	set cocoaString to cocoaString's stringByReplacingOccurrencesOfString:"(?m)^.+\\((\\d+).+" withString:"$1" options:(current application's NSRegularExpressionSearch) range:{0, cocoaString's |length|()}
	return paragraphs of (cocoaString as text)
end getNumbersFrom:

Thank you Shane.

For sure it works but I really don’t understand the way it does.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) jeudi 9 juillet 2020 12:34:14

The i[/i] means that ^ and $ should be treated as the beginning and end of each paragraph, not the whole text. The ^.+\( means match all the characters in a paragraph up to, and including, (. The \d+ then matches any following digits, and is in its own parentheses to make it a capture group. The .+ then matches anything after the digits, in this case ).

On the replace side, $1 means capture group 1. Capture group 0 is the whole match, and capture group 1 is the first pattern in parentheses, so the \d+.

Hi. The multi-line mode and ‘at beginning’ symbols i^[/i] appear inert in that pattern; their removal doesn’t affect the outcome.

So it doesn’t. I was over-thinking it.

Thank you both.
I missed that the instruction was treating the string one paragraph at a time.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) jeudi 9 juillet 2020 15:41:31

FWIW, I tested the suggested scripts with Script Geek. The reported results are average times with 10 iterations.

Yvan - post 2 - 38 milliseconds

Yvan - post 3 - 1 millisecond

Peavine - post 4 - 1 millisecond

Shane - post 6 - 2 milliseconds

I revised my script to fix the issue identified by Yvan. If the script cannot determine the number in parentheses, I though it best simply to return missing value, although this is just a matter of personal preference and the OP can handle this as he/she wants. BTW, I wasn’t certain if it was correct to use the missing value constant, which is defined in the ASLG as:

set theString to "Test (10 (1773)) -- returns missing value
Test (aa) -- returned missing value
Test 8 (1600)
Gò Vấp (10062)
Phú Nhuận (898)
Tân Bình (925)
Bình Chánh (78)
Bình Tân (500)
Bình Thạnh (400)
Test 9 (471)
Test 1 (615)
Test 12 (463)
Test 3 (39)
Tân Phú (522)
Thủ Đức (423)
Test 7 (351)
Cần Giờ (9)
Hóc Môn (98)
Test 2 (127)
Test 4 (8)
Test 5 (12)
Test 6 (228)
Test 11 (111)
Củ Chi (112)"

set theString to paragraphs of theString

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"(", ")"}

set numberList to {}

repeat with anItem in theString
	if (count (get text items of anItem)) = 3 then
		try
			set end of numberList to (text item 2 of anItem) as integer
		on error
			set end of numberList to missing value
		end try
	else
		set end of numberList to missing value
	end if
end repeat

set AppleScript's text item delimiters to astid

numberList

The problem is that every script based upon text item delimiters fail, more or less, if there is an oddity in the source data.
Mine in post 3 crash if there is an extraneous parenthesis in a paragraph.
Yours in post 4 returns a wrong result if the paragraph :
Test 1 (615)
is edited as
Test (1 (615). It returns the value 1 which is not the wanted one.
My edited version of yours “understand” that a paragraph is misformed and return the odd paragraph itself allowing us to grab the correct value.

Given the typographical errors that companies like electricity vendors are able to insert in their bills I wouldn’t bet a cent that an extraneous parenthesis would never surface.
Only scripts using regex are able to return correct values with facetious datas.
Shane’s proposal, amended by Marc Anthony, is clearly the correct one.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) jeudi 9 juillet 2020 18:46:43

I disagree that my script fails, more or less, when there is an oddity in the source data. Instead it returns missing value, so that the OP can make whatever adjustments are required. Where does the OP state that the data has anything to do with electricity vendors–I looked but couldn’t find that.

Secondly, my script is easily understood and modified. I suspect that many and perhaps most casual AppleScript users are not skilled with ASObjC and would have no ability to modify Shane’s script to meet their needs. I think this is an important consideration which should not be overlooked.

I tested Shane’s script and it absolutely works well but that doesn’t make it the correct one.

This is a revised version of my script from post 13. If a paragraph of theString cannot be processed, the user is prompted for the correct value.

set theString to "Test (10 (1773))
Test (aa)
Test 8 (1600)
Gò Vấp (10062)
Phú Nhuận (898)
Tân Bình (925)
Bình Chánh (78)
Bình Tân (500)
Bình Thạnh (400)
Test 9 (471)
Test 1 (615)
Test 12 (463)
Test 3 (39)
Tân Phú (522)
Thủ Đức (423)
Test 7 (351)
Cần Giờ (9)
Hóc Môn (98)
Test 2 (127)
Test 4 (8)
Test 5 (12)
Test 6 (228)
Test 11 (111)
Củ Chi (112)"

set theString to paragraphs of theString

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"(", ")"}

set numberList to {}

repeat with anItem in theString
	if (count (get text items of anItem)) = 3 then
		try
			set end of numberList to (text item 2 of anItem) as integer
		on error
			set end of numberList to errorDialog(anItem)
		end try
	else
		set end of numberList to errorDialog(anItem)
	end if
end repeat

set AppleScript's text item delimiters to astid

numberList

on errorDialog(aParagraph)
	display dialog "The following item could not be processed. Please enter the correct value." default answer aParagraph
	set userAnswer to text returned of result
	try
		return userAnswer as integer
	on error
		return userAnswer
	end try
end errorDialog

Have you ever tested before writing that ? I commented your script of message #4.
You assumed that your script return missing value when there is an oddity in the source data but it may fail to do that.
I just discover the version in message #13 which take care of the described oddity.
Between a script which returns always every numbers enclosed between parenthesis and one which may drop some of them my choice is done.
Two months ago I was unable to write the code posted in message #2.
I learnt a bit and got it to do the job.
Shane proposed a more efficient scheme.
At first I didn’t understood the way it behave.
After reading added explanations I think that I understand the beast and will try to use the same basis upon different cases like searching patterns of letters and digits which I already did with the regex used in message #2.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) vendredi 10 juillet 2020 01:38:51

Yvan. After reading your above comment, I tested my script in post 4, my script in post 16, and Shane’s script in post 6. I did not have Shane’s script as modified by Marc Anthony to test, but I will do that if you’ll post the script.

To create an error, I inserted in theString variable the line you suggest in post 5, which is:

Test (1 (615)

My script in post 4 returned the number 1. Shanes script returned “615”, and my script in post 16 prompted the user for the correct value.

The number 1 is almost certainly not correct. It seems likely that the correct answer is 615, but it’s also possible the correct answer is 1615. Given this uncertainty, IMO, either prompting the user or returning missing value seems a reasonable course of action.

I also tested the scripts by inserting the following in theString variable:

Test (abc)

My script in post 4 returned missing value. Shane’s script returned “Test (abc)”, and my script in post 16 prompted the user. Once again, IMO, prompting the user or returning missing value seems a reasonable course of action.

BTW, the above should under no circumstance be taken as a criticism of Shane’s script, because he obviously could edit his script to do whatever is desired. Also, his suggestion was a simple one to help the discussion, and he could not anticipate that it would be subjected to all of these oddball tests.

Shane’s handler modified as described by Marc Anthony would be :

on getNumbersFrom:someString
	set cocoaString to (current application's NSString's stringWithString:someString)
	set cocoaString to cocoaString's stringByReplacingOccurrencesOfString:".+\\((\\d+).+" withString:"$1" options:(current application's NSRegularExpressionSearch) range:{0, cocoaString's |length|()}
	return paragraphs of (cocoaString as text)
end getNumbersFrom:

the original question asked to extract numerical values enclosed between parenthesis.
615 match the requirements, 1 like 1615 doesn’t.

We were not asked to return something if there is no item matching the request in a given paragraph.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) vendredi 10 juillet 2020 15:12:40

Thanks Yvan. I tested Shane’s script as modified by Marc Anthony. The time reported by Script Geek was 2 milliseconds.

With the following paragraph in theString variable:

Test (1 (615)

Shane’s script as amended by Marc Anthony returned (or more accurately included in the returned list) “615”.

And, with the following paragraph in theString variable:

Test (abc)

Shane’s script as amended by Marc Anthony returned (or more accurately included in the returned list) “Test (abc)”.

Anyways, the OP has a lot of good options and can decide on the script that best fits his/her needs. :slight_smile: