Some regex or other text manipulation help requested

Hi,

I want to move from Slipbox to Yojimbo. Both only use .rtf and rtfd formats. (My needs are fairly simple, if those two won’t do, I’ll try Freeform.) I’ll be moving about 3000 notes. The categories (Keywords) in Slipbox are all exported in the last line of the text content. I want to turn these Keywords into Tags for Yojimbo.

I have worked everything out to get the line from each item containing all the Keywords, I just need to extract them and create a variable for each one. Here’s the part of my script (derived from here) that gets me the line containing the Keywords.

set theSourceFile to choose file of class {"public.rtf"}

tell application "TextEdit"
open theSourceFile # REQUIRED
tell (get front document)
	set searchString to "Keywords: "
	set tc to count paragraphs
	set start_Line to 1 -- the starting line of the first portion
	considering case
		repeat with i from 1 to tc
			if ((get paragraph i) = searchString) or i = tc then
				set keyLine to paragraph i as text
			end if
		end repeat
	end considering
end tell
end tell

Here are a couple of examples of what that returns.

(.rtf 1)
Keywords: Media, Sound and Music, Software Help

(.rtf 2)
Keywords: Access, Personal, Private

(.rtf 3)
Keywords: Hardware, Wishlist

Can you help me manipulate the text so that I can assign those keywords to variables in order to set the tags in Yojimbo? (I have everything worked out except extracting the keywords, of which there are normally no more than 3-8 for each exported Slipbox file.) Please note that some of the Keywords consist of more than one word, the words of which are always separated by one space. (Yojimbo accepts tags of multi-word phrases also.)

For the three examples above the variables will be something like this:

(item 1)
set key1 to “Media”
set key2 to “Sound and Music”
set key3 to “Software Help”

(item 2)
set key1 to “Access”
set key2 to “Personal”
set key3 to “Private”

(item 3)
set key1 to “Hardware”
set key2 to “Wishlist”

I need (to summarize) to remove "Keywords: " from the paragraph, get the next string before the comma, keep doing that until I get the string after the last occurrence of ", ", etc. I figured on feeding it to grep or perl or something in one or more “do shell script” constructs. I’m not that great with regex, though.

Thanks for your help.

L. Lee

Here ya go:

set rawKeys to "Keywords: Media, Sound and Music, Software Help"

set {TIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, ": "}

set listKeys to text item 2 of rawKeys -- drops "Keywords"
set AppleScript's text item delimiters to ", "
set listKeys to text items of listKeys

set AppleScript's text item delimiters to TIDs
listKeys --> {"Media", "Sound and Music", "Software Help"}

Note this is absolutely dependent on the exact layout of the text you gave us.

If the ‘keywords’ line is always the last paragraph your script can be much simplified:

set theSourceFile to choose file of class {"public.rtf"}

tell application "TextEdit"
	open theSourceFile # REQUIRED
	tell document 1-- this is always the file it just opened
		set keyLine to paragraph -1
	end tell
end tell
1 Like

Thanks so much! :yum: :+1:

L. Lee

Here, I think I have it (in case anyone else could use it). In this form it’s a Droplet application. I haven’t handled instances of dropping anything other than .rtf or .rtfd, (folders, for instance) or a .rtf or .rtfd that doesn’t end with the list of Keywords described in the OP (Example: “Keywords: Hardware, Personal, Purchase”). Would you change anything? Thanks again.

on open the_items
my Yojimbo_import(the_items)
tell application "TextEdit" to quit
end open

on Yojimbo_import(the_items)
repeat with the_item in the_items
	set the_item to the_item as alias
	set tag_Extracts to my extract_Tags(the_item)
	tell application "Yojimbo"
		set newItem to import the_item
		add tags tag_Extracts to newItem
	end tell
end repeat
end Yojimbo_import

on run
set the_items to ((choose file with multiple selections allowed) as list)
Yojimbo_import(the_items)
end run

on extract_Tags(theSourceFile)
tell application "TextEdit"
	open theSourceFile # REQUIRED
	tell document 1 -- this is always the file it just opened
		set rawKeys to paragraph -1
		close
	end tell
end tell
set {TIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, ": "}
set listKeys to text item 2 of rawKeys -- drops "Keywords"
set AppleScript's text item delimiters to ", "
set listKeys to text items of listKeys
set AppleScript's text item delimiters to TIDs
return listKeys
end extract_Tags

Looking good. Tip: on open and choose file will produce a list of aliases. You can loose the as alias and as list coercions.

1 Like