Request - Find hashtags in a PDF/text file, write them to Finder tags?

I have a use case for Finder tags that I want to exploit.

  1. Read a text/PDF file
  2. Find hashtags in the text of the file
    1. A hashtag is a continuous (no space) string that begins with a “#” and ends with a space or line break/new line
    2. There may be multiple hashtags in a file, or even in a line/row of text.
  3. Write each hashtag as a separate finder tag to the text/PDF file
    1. The home brew terminal command, “tag” can be used.

I haven’t had any luck finding a ready-made solution for numbers 1 and 2, above, but they seem so obvious that I have to believe this has already been solved multiple times.

Does anyone have a solution?

Thank you,

John

Here is one method for accomplishing what you specify. Note that this routine specifically uses only a space, line feed, or return to demark the ending of a hash tag. Just as you specified.

This method relies heavily on tag routines of Shane Stanley. Thank you Shane.

use scripting additions
use framework "Foundation"

property endChars : space & "
" & "
"

set readFile to POSIX path of (choose file of type {"txt", false})
set theText to read readFile as text
set hashTagList to my getHashTags(theText)
my setTags:hashTagList forPath:readFile

on getHashTags(fileText)
	set hashTagsList to {}
	repeat until fileText does not contain "#"
		set poundLoc to offset of "#" in fileText
		set thisHashTag to ""
		repeat with i from poundLoc to (count of fileText)
			set thisChar to item i of fileText
			if endChars does not contain thisChar then
				set thisHashTag to thisHashTag & thisChar
			else
				copy thisHashTag to the end of the hashTagsList
				set fileText to characters (poundLoc + (count of thisHashTag)) thru -1 of fileText as text
				exit repeat
			end if
		end repeat
	end repeat
	return hashTagsList
end getHashTags

-- The following routines are by Shane Stanley

on returnTagsFor:posixPath -- get the tags
	set aURL to current application's |NSURL|'s fileURLWithPath:posixPath -- make URL
	set {theResult, theTags} to aURL's getResourceValue:(reference) forKey:(current application's NSURLTagNamesKey) |error|:(missing value)
	if theTags = missing value then return {} -- because when there are none, it returns missing value
	return theTags as list
end returnTagsFor:

on setTags:tagList forPath:posixPath -- set the tags, replacing any existing
	set aURL to current application's |NSURL|'s fileURLWithPath:posixPath -- make URL
	aURL's setResourceValue:tagList forKey:(current application's NSURLTagNamesKey) |error|:(missing value)
end setTags:forPath:

on addTags:tagList forPath:posixPath -- add to existing tags
	set aURL to current application's |NSURL|'s fileURLWithPath:posixPath -- make URL
	-- get existing tags
	set {theResult, theTags} to aURL's getResourceValue:(reference) forKey:(current application's NSURLTagNamesKey) |error|:(missing value)
	if theTags ≠ missing value then -- add new tags
		set tagList to (theTags as list) & tagList
		set tagList to (current application's NSOrderedSet's orderedSetWithArray:tagList)'s allObjects() -- delete any duplicates
	end if
	aURL's setResourceValue:tagList forKey:(current application's NSURLTagNamesKey) |error|:(missing value)
end addTags:forPath:

Model: Mac Pro (Mid 2010)
AppleScript: 2.7
Browser: Firefox 79.0
Operating System: macOS 10.14

Wow! :smiley: This is really amazing. I was having a bit of a rough week, but this complete solution, simply handed to me, makes me feel happy and grateful.

Thank you!

I did find that property endChars will only respect the first value and will loop indefinitely if there are actually other end characters like linefeed or return in the file. So, if you reorder the endChars values and put “return” as the first check, it will respect only returns and will loop if there is only a space but no return after the hashtag.

I’m going to try to figure it out in the morning… time for bed, now.

Thanks again.

Here are the text file contents that was used to test the routine.

Hash Tag Test Document

#HashTag1 this is the first hash tag.
#HashTag2 this is the second hash tag.

The following hash tag is inside and at the end of a paragraph: #HashTag3

The next hash tag #HashTag4 is in the middle of a paragraph.

Thank you for the test data. Based on that, I was able to discover that there was one additional endChars that I should be checking for - the end of the text, with no space, return, or linefeed.

If the last line in the text is a hashtag, and there is no space, return, or linefeed, that will cause the script to never end.

Since I wasn’t sure how to add a check for end of text/file to the endChars variable, I simply added a return to the theText variable. Not elegant, but it works. :slight_smile:

Thanks again… here is your script with my hack:

use scripting additions
use framework "Foundation"

property endChars : space & "
" & "
"

set readFile to POSIX path of (choose file of type {"txt", false})
set theText to read readFile as text
-- The line below hacked in by johncatalano
set theText to theText & return
-- The line above hacked in by johncatalano
set hashTagList to my getHashTags(theText)
my setTags:hashTagList forPath:readFile

on getHashTags(fileText)
	set hashTagsList to {}
	repeat until fileText does not contain "#"
		set poundLoc to offset of "#" in fileText
		set thisHashTag to ""
		repeat with i from poundLoc to (count of fileText)
			set thisChar to item i of fileText
			if endChars does not contain thisChar then
				set thisHashTag to thisHashTag & thisChar
			else
				copy thisHashTag to the end of the hashTagsList
				set fileText to characters (poundLoc + (count of thisHashTag)) thru -1 of fileText as text
				exit repeat
			end if
		end repeat
	end repeat
	return hashTagsList
end getHashTags

-- The following routines are by Shane Stanley

on returnTagsFor:posixPath -- get the tags
	set aURL to current application's |NSURL|'s fileURLWithPath:posixPath -- make URL
	set {theResult, theTags} to aURL's getResourceValue:(reference) forKey:(current application's NSURLTagNamesKey) |error|:(missing value)
	if theTags = missing value then return {} -- because when there are none, it returns missing value
	return theTags as list
end returnTagsFor:

on setTags:tagList forPath:posixPath -- set the tags, replacing any existing
	set aURL to current application's |NSURL|'s fileURLWithPath:posixPath -- make URL
	aURL's setResourceValue:tagList forKey:(current application's NSURLTagNamesKey) |error|:(missing value)
end setTags:forPath:

on addTags:tagList forPath:posixPath -- add to existing tags
	set aURL to current application's |NSURL|'s fileURLWithPath:posixPath -- make URL
	-- get existing tags
	set {theResult, theTags} to aURL's getResourceValue:(reference) forKey:(current application's NSURLTagNamesKey) |error|:(missing value)
	if theTags ≠ missing value then -- add new tags
		set tagList to (theTags as list) & tagList
		set tagList to (current application's NSOrderedSet's orderedSetWithArray:tagList)'s allObjects() -- delete any duplicates
	end if
	aURL's setResourceValue:tagList forKey:(current application's NSURLTagNamesKey) |error|:(missing value)
end addTags:forPath:

To accomodate the situation where a hash tag is at the end of the entire text, you could replace

if endChars does not contain thisChar then

with

if (endChars does not contain thisChar) or (i = count of fileText)  then

This does not account for the cases where the hash tag ends the character before any of these characters “.)]}?!” or similar terminators. You know your data so if you need any of these, add them to the endChars.