Extract certain number from text

I have an AppleScript variable that contains the following:

I need to parse through the text and return the following list: {“6963”, “6973”, “95348”}

Lots of ways to do this. You can use text item delimiters, shell scripts, scripting additions, etc., etc., etc. Here’s one way:

set x to " Scale Ordering Quantity Weight List Price code per carton  (psi) (lbs.) (each) 
 Series DPG1-2 
 Dial Size: 2\"?Connection: 1/4\" NPT 
 0 ? 15 0615600 40 8.2 $27.50 
 0 ? 30 0615601 40 8.2 93.20 
 0 ? 60 0615602 40 8.2 ?6963? 
 0 ? 100 0615603 40 8.2 ?6973? 
 0 ? 160 0615604 40 8.2 ?953489? 
 0 ? 200 0615605 40 8.2 27.50 
 0 ? 300 0615606 40 8.2 27.50 "

set the_data to {}
repeat with this_row in (paragraphs of x)
	set this_row to (contents of this_row)
	if this_row contains "?" then set end of the_data to text ((offset of "?" in this_row) + 1) thru ((offset of "?" in this_row) - 1) of this_row
end repeat
return the_data
-->{"6963", "6973", "953489"}

Jon

Here is another:

on BreakAtTokens(s, t1, t2)
	--
	--	Assumes t1 and t2 are ONLY used as starting and
	--	ending tokens in the string.
	--
	set oldtids to AppleScript's text item delimiters
	try
		set AppleScript's text item delimiters to t1
		set s to s's text items
		set AppleScript's text item delimiters to t2
		set s to (s as string)'s text items
	on error m number n from f to t partial result p
		set AppleScript's text item delimiters to oldtids
		error m number n from f to t partial result p
	end try
	set AppleScript's text item delimiters to oldtids
	set a to {}
	repeat with i from 2 to s's length by 2 -- even items
		set a's end to s's item i
	end repeat
	return a
end BreakAtTokens

BreakAtTokens(str, "�", "�") --> {"6963", "6973", "953489"}

It just occurred to me that I didn’t name this handler very well. How about:

on ExtractFromBetweenNonNestedDataDelimitingTokens(s, t1, t2)

:wink:

set quarktext to "0 � 15	0615600	40	�06156526�	�0615600� 
	0 � 30	0615601	40	8.2	�0615601�  
	0 � 60	0615602	40	8.2	�0615602�  
	0 � 100	0615603	40	8.2	�0615603�  
"

set the_data to {}
repeat with this_row in (paragraphs of quarktext)
	set this_row to (contents of this_row)
	if this_row contains "�" then set end of the_data to text ((offset of "�" in this_row) + 1) thru ((offset of "�" in this_row) - 1) of this_row
end repeat

This will only return the following list:
{“06156526”,“0615601”,“0615602”,“0615603”}

I really want the following list:
{“06156526”,“0615600”,“0615601”,“0615602”,“0615603”}

Umm, you know my solution will handle that, right?

set quarktext to "0 � 15 0615600 40 �06156526� �0615600�
0 � 30 0615601 40 8.2 �0615601�
0 � 60 0615602 40 8.2 �0615602�
0 � 100 0615603 40 8.2 �0615603�"

BreakAtTokens(quarktext, "�", "�")

--> {"06156526", "0615600", "0615601", "0615602", "0615603"}

Here is one way that Jon’s technique. It can be modified to suit your purposes:

set a to {}

repeat with i from 1 to (count paragraphs in quarktext)
	set p to paragraph i of quarktext
	
	repeat while (p contains "�")
		set x to offset of "�" in p
		set y to offset of "�" in p
		set a's end to p's text (x + 1) thru (y - 1)
		
		try
			set p to p's text (y + 1) thru -1
		on error
			exit repeat
		end try
	end repeat	
end repeat

a
--> {"06156526", "0615600", "0615601", "0615602", "0615603"}

Did not know the BreakAtTokens did what I want. I guess I should have tried it before asking for more help.

If you’re interested, this one catches some errors in the text by just looking for items with trailing tags instead of splitting the text up all at once:

set the_text to "0 � 15   0615600        40        �06156526�        �0615600� 
0 � 30      0615601        40        8.2        �0615601�
0 � 60      0615602        40        8.2        �0615602� 
0 � 100      0615603        40        8.2        �0615603�"

set def_tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"�"}
try
	set delim_text to text items of the_text
	set AppleScript's text item delimiters to {"�"}
	set new_list to {}
	repeat with this_ti in (rest of delim_text) -- skip the first item, empty or
		if this_ti contains "�" then
			set new_ti to first item of (text items of this_ti)
			set end of new_list to new_ti
		else -- error, no trailing tag
			display dialog this_ti
		end if
	end repeat
	set AppleScript's text item delimiters to def_tid
on error err_mess
	set AppleScript's text item delimiters to def_tid
	display dialog err_mess
end try
{new_list, contents of this_ti}