Assign found text in text file to a variable

I have been messing with this for hours and cannot figure it out. I am trying to find text in a string, that is different every time, and assign that text to a variable.

The text will always begin after some variation of “Message-ID: <“ and end with “>” and the text I want is always between that start and that end. There is normally a significant amount of text before and after the string that I am looking for as the start.

Examples:

“more text Message-ID:<1>more text”
I want to return 1 in this case

“more text
Message-ID: <1234567> more text”
I want to return 1234567 in this case

“more text Message-ID:
<1asioduasdoipasu877> more text”
I want to return 1asioduasdoipasu877 in this case

“more text Message-ID:

asdkjdfkjdfkj@sdfksjfdmore text”
I want to return asdkjdfkjdfkj@sdfksjfd in this case

There are some instances where “Message-ID: <“ or some variation of it is found more than once in the text file but so far it has always been the first instance.

I have been messing with AppleScript’s text item delimiters but am just guessing and have not guessed right, any help would be greatly appreciated.

Browser: Safari 605.1.15
Operating System: macOS 10.14

You may try with :

set theText to "
“more text   Message-ID:
    
<asdkjdfkjdfkj@sdfksjfd>more text”
I want to return asdkjdfkjdfkj@sdfksjfd in this case
"

set searched to item 2 of my decoupe(theText, "Message-ID")
set searched to item 2 of my decoupe(searched, {"<", ">"})

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) vendredi 3 avril 2020 10:17:45


set aText to "“more text Message-ID:<1>more text”
I want to return 1 in this case

“more text 
Message-ID:      <1234567> more text”
I want to return 1234567 in this case

“more text   Message-ID:    
    <1asioduasdoipasu877> more text”
I want to return 1asioduasdoipasu877 in this case

“more text   Message-ID:
    
<asdkjdfkjdfkj@sdfksjfd>more text”
I want to return asdkjdfkjdfkj@sdfksjfd in this case"

set theSubTexts to {}
set countText to count aText
set mOffset to 1

repeat
	set a to text mOffset thru (mOffset + 10) of aText
	if a is "Message-ID:" then
		repeat with i from mOffset + 11 to countText
			if character i of aText is "<" then exit repeat
		end repeat
		repeat with j from i + 1 to countText
			if character j of aText is ">" then exit repeat
		end repeat
		set end of theSubTexts to text (i + 1) thru (j - 1) of aText
		set mOffset to j + 1
	else
		set mOffset to mOffset + 1
	end if
	if mOffset > countText - 13 then exit repeat
end repeat

return theSubTexts
--> RESULT:  {"1", "1234567", "1asioduasdoipasu877", "asdkjdfkjdfkj@sdfksjfd"}

If you need to extract every different occurences, you may use :


set aText to "“more text Message-ID:<1>more text”
I want to return 1 in this case

“more text 
Message-ID:      <1234567> more text”
I want to return 1234567 in this case

“more text   Message-ID:    
    <1asioduasdoipasu877> more text”
I want to return 1asioduasdoipasu877 in this case

“more text   Message-ID:
    
<asdkjdfkjdfkj@sdfksjfd>more text”
I want to return asdkjdfkjdfkj@sdfksjfd in this case

“more text   Message-ID:    
    <1asioduasdoipasu877> more text”
I want to return 1asioduasdoipasu877 in this case

“more text   Message-ID:
    
<asdkjdfkjdfkj@sdfksjfd>more text”
I want to return asdkjdfkjdfkj@sdfksjfd in this case"

set inList to rest of my decoupe(aText, "Message-ID:")
set allStrings to {}
repeat with aString in inList
	set aString to (item 2 of my decoupe(aString, {"<", ">"})) as text
	if aString is not in allStrings then set end of allStrings to aString
end repeat
allStrings

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) vendredi 3 avril 2020 17:30:46

Yes, your code is better than mine. I’ll take it to the library as a good example of parsing.

Your code is very well if you want get only different occurences. My script returnes all the occurences, so I will also leave it in case duplicates are also needed. Users should also understand the difference between the 2 approaches, so that they can choose what they need.

I would call my script “Parse the text by keywords”, and yours would say “Parse the text by keywords & remove the duplicates”. In most cases, users will need the option to remove duplicates.

Thank you both! I am trying to read the contents of a file and got it to work by adding Line 4 (set aText to "" " & aText & "" ") but it fails partially through the script. I am a hack and do not know why some things work and others don’t so I am sure it is something simple but I do not know what it is.

Here is a https://smithgrpnet-my.sharepoint.com/:t:/g/personal/mike_smithgrp_net/Ed1VKPx8Y4tPs6EIcNfM7SgBGFH3OtaHBX5gN3bL0iM5RQ?e=3fNXXw to a sample file if you are willing and it helps

Thank you very much

set theFile to choose file with prompt "Please select a text file to read:"
set theFile to theFile as string
set aText to read file theFile
set aText to "\" " & aText & "\" "

--set aText to "“more text Message-ID:<1>more text”
--I want to return 1 in this case
--
--“more text 
--Message-ID:      <1234567> more text”
--I want to return 1234567 in this case
--
--“more text   Message-ID:    
--    <1asioduasdoipasu877> more text”
--I want to return 1asioduasdoipasu877 in this case
--
--“more text   Message-ID:
--    
--<asdkjdfkjdfkj@sdfksjfd>more text”
--I want to return asdkjdfkjdfkj@sdfksjfd in this case
--
--“more text   Message-ID:    
--    <1asioduasdoipasu877> more text”
--I want to return 1asioduasdoipasu877 in this case
--
--“more text   Message-ID:
--    
--<asdkjdfkjdfkj@sdfksjfd>more text”
--I want to return asdkjdfkjdfkj@sdfksjfd in this case"

set inList to rest of my decoupe(aText, "Message-ID:")
set allStrings to {}
repeat with aString in inList
	set aString to (item 2 of my decoupe(aString, {"<", ">"})) as text
	if aString is not in allStrings then set end of allStrings to aString
end repeat
allStrings

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

Browser: Safari 605.1.15
Operating System: macOS 10.14

Your document doesn’t match the described structure!

The first occurence is correct :
Message-ID:
BY5PR20MB2996B08A007F7F64018E24FFAEC70@BY5PR20MB2996.namprd20.prod.outlook.com

The second one:
X-MS-Exchange-Organization-Network-Message-Id:
0e0751b6-dc27-4a9a-75c7-08d7d7748d7c
X-MS-Exchange-Organization-SCL: -1

isn’t. The characters “<” and “>” are missing.

If this structure is the common one, you must replace my original script by :

set theFile to choose file with prompt "Please select a text file to read:"
-- set theFile to theFile as string
set aText to read theFile
set aText to "\" " & aText & "\" "
if aText does not contain "Message-ID:" then error "This file doesn't contain the strin “Message-ID:” !"

--set aText to "“more text Message-ID:<1>more text”
--I want to return 1 in this case
--
--“more text 
--Message-ID:      <1234567> more text”
--I want to return 1234567 in this case
--
--“more text   Message-ID:    
--    <1asioduasdoipasu877> more text”
--I want to return 1asioduasdoipasu877 in this case
--
--“more text   Message-ID:
--    
--<asdkjdfkjdfkj@sdfksjfd>more text”
--I want to return asdkjdfkjdfkj@sdfksjfd in this case
--
--“more text   Message-ID:    
--    <1asioduasdoipasu877> more text”
--I want to return 1asioduasdoipasu877 in this case
--
--“more text   Message-ID:
--    
--<asdkjdfkjdfkj@sdfksjfd>more text”
--I want to return asdkjdfkjdfkj@sdfksjfd in this case"

set inList to rest of my decoupe(aText, {"Message-ID:<", "Message-ID:" & linefeed & tab & "<", "Message-ID:" & return & tab & "<"})

set allStrings to {}
repeat with aString in inList
	set aString to (item 1 of my decoupe(aString, {">"})) as text
	if aString is not in allStrings then set end of allStrings to aString
end repeat
allStrings

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

But I have no guarantee that I treat every config of the delimiter “Message-ID:”, something,“<”.
It’s why I used the original code but I didn’t guess that your documents may fail to match your description.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) vendredi 3 avril 2020 20:00:46

Thank you!

I only want it to return a result if this criteria is met “Message-ID:” ‘maybe some text, maybe not’ followed by “<” ‘needed text’ followed by “>”

This is the only time I need it to return a result independent of any other occurrences that may come close to matching the above criteria. I apologize if I was not clear initially. I was just trying to show that there could be multiple variances of the text / characters before after the needed text. Either way thank you very much, greatly appreciated.

Here is a version supposed to treat all cases.

set theText to "
“more text Message-ID:<1>more text”
I want to return 1 in this case

“more text 
Message-ID:      <1234567> more text”
I want to return 1234567 in this case

“more text   Message-ID:    
    <1asioduasdoipasu877> more text”
I want to return 1asioduasdoipasu877 in this case

“more text   Message-ID:
    
<asdkjdfkjdfkj@sdfksjfd>more text”
I want to return asdkjdfkjdfkj@sdfksjfd in this case

There are some instances where “Message-ID: <“ or some variation of it is found more than once in the text file but so far it has always been the first instance.

“more text   Message-ID:    
    1asioduasdoipasu877@> more text”  <asdkjdfkjπdfkj@sdfksjfd>more text”
I want to return 1asioduasdoipasu877 in this case

“more text   Message-ID:
    
<asdkjdfkjdfkj@sdfksjfd>more text”
I want to return asdkjdfkjdfkj~sdfksjfd in this case

There are some instances where “Message-ID: <“ or some variation of it is found more than once in the text file but so far it has always been the first instance."

set inList to rest of my decoupe(theText, "Message-ID:")
set allStrings to {}
repeat with aString in inList
	if aString contains "<" then
		set subList to rest of my decoupe(aString, "<")
		repeat with bString in subList
			if bString contains ">" then
				set cString to item 1 of my decoupe(bString, ">") as string
				if cString is not in allStrings then set end of allStrings to cString
			end if
		end repeat
		--if aString is not in allStrings then set end of allStrings to aString
	end if
end repeat
allStrings

#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) samedi 4 avril 2020 11:04:38