Filter text and extract only specific paragraphs

Hello hackers,

I have a tricky/challenging question for you.

Suppose a text that has the following structure:

But this can be up to hundreds of paragraphs/lines.

What I would like to do is to filter this whole structure and extract only the texts that belong under two specific titles, “Title_B” and ”Title_D”.

So from the above, the result to be a unified text of:

All the titles in the structure are the same and always in bold, but the sizes of the texts underneath vary.

So is there any way to filter this structure?

For the moment I am trying to do that with the following:

on offsets(needle, haystack)
	script go
		on |λ|(hay, iFrom)
			set i to offset of needle in hay
			if 0 ≠ i then
				set iMatch to iFrom + i
				if i < length of hay then
					{iMatch} & |λ|(text (1 + i) thru -1 of hay, iMatch)
				else
					{iMatch}
				end if
			else
				{}
			end if
		end |λ|
	end script
	
	go's |λ|(haystack, 0)
end offsets

epaminos. I don’t understand your request to “filter this whole structure.” That issue aside, the following script should do what you want.


set theText to paragraphs of (read (choose file))

main(theText)
on main(txt)
	
	script o
		property txtRef : txt
		property extractedText : {}
	end script
	
	repeat with i from 1 to ((count o's txtRef) - 1)
		set aLine to item i of o's txtRef
		if aLine begins with "Title_B" or aLine begins with "Title_D" then
			set end of o's extractedText to (removeWhitespace(item (i + 1) of o's txtRef))
		end if
	end repeat
	
	set ATID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to linefeed
	set extractedText to o's extractedText as text
	set AppleScript's text item delimiters to ATID
	return extractedText
	
end main

on removeWhitespace(aLine)
	repeat ((count aLine) - 1) times
		if text 1 of aLine is in {tab, space} then
			set aLine to text 2 thru -1 of aLine
		else
			return aLine
		end if
	end repeat
end removeWhitespace

Amazing! It works perfectly!

Thank you so much peavine!

epaminos. You’re very welcome. I’m glad that worked.