Using Sed for RegEx Find/Replace Problems

Is trying to use sed to execute Regex the best way of going about doing text cleanup?
What is the best way to get the script to use the selected text in the frontmost window? I’m just not able to get a clear picture on how to achieve this.

tell application "System Events"
    set frontApp to name of first application process whose frontmost is true
    tell application frontApp
        set selectedText to value of text area 1 of scroll area 1 of front window as string
        
        -- Replace ".  " with ". "
        set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/\\.  /\\. /g'"
        
        -- Replace "  " with " "
        set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/  / /g'"
        
        -- Replace "   " with " "
        set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/   / /g'"
        
        -- Replace ",  " with ", "
        set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/,  /, /g'"
        
        -- Replace ":" without a space after with ": "
        set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/:\([^[:space:]]\)/: \\1/g'"
        
        -- Replace ";" without a space after with "; "
        set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/;\([^[:space:]]\)/; \\1/g'"
        
        -- Replace curly quotes with straight quotes
        set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/[“”‘’]/\"/g'"
        
        -- Replace multiple carriage returns with one carriage return
        set selectedText to do shell script "echo " & quoted form of selectedText & " | awk 'BEGIN{RS=\"\";FS=\"\\n\"}{for(i=1;i<=NF;i++)if($i)printf(\"%s%s\",$i,i==NF?\"\":\"\\n\")}'"
        
        -- Replace repeated words with one occurrence of the word
        set selectedText to do shell script "echo " & quoted form of selectedText & " | sed -E 's/(\\b\\w+\\b)(\\s+\\1)+/\\1/g'"
        
        -- Capitalize the first letter after a period and lowercase the rest
        set selectedText to do shell script "echo " & quoted form of selectedText & " | sed -E 's/\\b([a-z])|\\.\\s+(.)/\\U\\1\\L\\2/g'"
        
        -- Update the text in the frontmost window with the modified text
        set value of text area 1 of scroll area 1 of front window to selectedText
    end tell
end tell

Regular text - above the three opening backticks

tell application "System Events"
	-- more code here	
	-- must be on a separate line from the backticks
	
	-- backtick (or grave accent) is the key above the tab and beside the 1 key -- `
	
	-- don't have curly quotes in the text (single or double)
	-- use only straight quotes and apostrophe, e.g. " ", ' '
	
	-- comments begin with two dashes, not an en-dash or em-dash
	-- depending on settings, some apps convert two dashes to something different
end tell

Return to regular text - below the three closing backticks

Within text, put a single backtick before and after words to make text look like code.

Also, start a line with four spaces and it will be treated as code

I’m including an image of this post’s edit window so you can see what went into each line. Note that some of what’s in the post is more general in case anyone else takes a look at it.

1 Like

To address the other part of your question, I would recommend against sed and awk in this case. Not that they can’t be effective, but it’s sort of a horror show.

I would recommend text item delimiters for this task. This script is simple and the replacements are comprehensible. If you wish to add a character to the replacement list, it is straightforward to do so — along with adding the replacement text. Basically, it cycles through the various unwanted characters and replaces them throughout the text with the corresponding suitable characters. You can use multiple character strings in either list.

The one quirk to the script is determining whether it’s the front document. If you are testing or editing the script, it’s easier when it’s open but then you need to make sure that it’s not modifying its own text, so if it is the front document then it will bring document 2 to the fore. I put the script in the application script folders so I can access it from the script menu and then, when I get a broken script from this site, I can easily call on the fix script to clean it up and then it will process the front document. If you choose another name for the file, adjust the script accordingly.

tell application "Script Editor"
	set badQuotes to {"“", "”", "‘", "’", "`", "–"}
	set goodQuotes to {quote, quote, "'", "'", "'", "--"}
	
	-- if this script is frontmost then work with window 2	
	if name of front document is "quotesfix.scpt" then -- use whatever this script's name is
		set index of window 2 to 1
	end if
	set wt to text of document 1
	--> “. ”. ‘. ’. `. –
	
	repeat with bg from 1 to count of badQuotes
		set AppleScript's text item delimiters to item bg of badQuotes
		set ti to text items of wt
		set AppleScript's text item delimiters to item bg of goodQuotes
		set wt to ti as text
	end repeat
	
	set text of document 1 to wt
	
end tell

For reference, the badQuotes characters are as follows:

“. ”. ‘. ’. `. –
  • left double
  • right double
  • left single
  • right single
  • grave accent (aka backtick)
  • en dash

The following will do what you want and will work with many but not every app. I tested it without issue with TextEdit, Script Editor, Script Debugger, IA Writer Classic, macOS Mail, and FSNotes. In my testing, I ran the script by way of the macOS Script Menu (enabled in Script Editor settings).

set the clipboard to ""

tell application "System Events"
	set activeApp to name of first process whose frontmost is true
	tell application process activeApp
		click menu item "Copy" of menu "Edit" of menu bar 1 -- edit as needed
	end tell
end tell
delay 0.2 -- test different values
set selectedText to the clipboard

if selectedText = "" then -- this is just for testing
	display dialog "Text selection not found"
else
	display dialog selectedText
end if

BTW, if you are going to use your script with a particular app, it would be best to write the script to work with that app.

I took your code advice and all worksd well until I start adding the code to replace the anomalies in a text documtne. Could you point me in the direction of how to correct this? Thanks
set the clipboard to “”

tell application "System Events"
	set activeApp to name of first process whose frontmost is true
	tell application process activeApp
		click menu item "Copy" of menu "Edit" of menu bar 1 -- edit as needed
	end tell
end tell
delay 0.2 -- test different values
set selectedText to the clipboard

if selectedText = "" then -- this is just for testing
	display dialog "Text selection not found"
else
	display dialog selectedText
end if

-- Replace ".  " with ". "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/\\.  /\\. /g'"

-- Replace "  " with " "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/  / /g'"

-- Replace "   " with " "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/   / /g'"

-- Replace ",  " with ", "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/,  /, /g'"

-- Replace ":" without a space after with ": "
--set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/:\([^[:space:]]\)/: \\1/g'"

-- Replace ";" without a space after with "; "
--set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/;\([^[:space:]]\)/; \\1/g'"

-- Replace curly quotes with straight quotes
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/[“”‘’]/\"/g'"

-- Replace multiple carriage returns with one carriage return
set selectedText to do shell script "echo " & quoted form of selectedText & " | awk 'BEGIN{RS=\"\";FS=\"\\n\"}{for(i=1;i<=NF;i++)if($i)printf(\"%s%s\",$i,i==NF?\"\":\"\\n\")}'"

-- Replace repeated words with one occurrence of the word
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed -E 's/(\\b\\w+\\b)(\\s+\\1)+/\\1/g'"
display dialog selectedText
-- Capitalize the first letter after a period and lowercase the rest
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed -E 's/\\b([a-z])|\\.\\s+(.)/\\U\\1\\L\\2/g'"

display dialog selectedText
tell application activeApp
	-- Replace the selected text with the modified text
	tell application "System Events"
		keystroke "a" using {command down}
	end tell
	delay 0.1
	keystroke modifiedtext
end tell

Assuming you selected some text in TextEdit document window:

tell application "TextEdit" to activate

set the clipboard to ""
tell application "System Events" to keystroke "c" using command down -- copy
delay 1 -- test different values

set selectedText to the clipboard as Unicode text
if selectedText is "" then
	display dialog "Text selection not found"
	return
end if

-- Replace ".  " with ". "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/\\.  /\\. /g'"

-- Replace "  " with " "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/  / /g'"

-- Replace "   " with " "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/   / /g'"

-- Replace ",  " with ", "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/,  /, /g'"

-- Replace ":" without a space after with ": "
--set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/:\([^[:space:]]\)/: \\1/g'"

-- Replace ";" without a space after with "; "
--set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/;\([^[:space:]]\)/; \\1/g'"

-- Replace curly quotes with straight quotes
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/[“”‘’]/\"/g'"

-- Replace multiple carriage returns with one carriage return
set selectedText to do shell script "echo " & quoted form of selectedText & " | awk 'BEGIN{RS=\"\";FS=\"\\n\"}{for(i=1;i<=NF;i++)if($i)printf(\"%s%s\",$i,i==NF?\"\":\"\\n\")}'"

-- Replace repeated words with one occurrence of the word
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed -E 's/(\\b\\w+\\b)(\\s+\\1)+/\\1/g'"

-- Capitalize the first letter after a period and lowercase the rest
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed -E 's/\\b([a-z])|\\.\\s+(.)/\\U\\1\\L\\2/g'"

set the clipboard to selectedText
delay 1
tell application "System Events" to keystroke "v" using command down -- paste

NOTE: if you run script from script editor or Finder (as compiled .scpt file or as .app file) it always will be the last frontmost application process. So, you need determine the name of prosess which was FRONTMOST BEFORE RUNNING THE SCRIPT.

In this case instead of

tell application “TextEdit” to activate

you should place following:

on getSecondFrontProcessName()
	set {ATID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, quote}
	set secondFrontProcess to text item 4 of (do shell script "lsappinfo visibleProcessList")
	set AppleScript's text item delimiters to {" ", "_"}
	set secondFrontProcess to (text items of secondFrontProcess) as text
	set AppleScript's text item delimiters to ATID
	return secondFrontProcess
end getSecondFrontProcessName

set secondFrontAppProcessName to getSecondFrontProcessName()

tell application "System Events"
	set frontmost of process secondFrontAppProcessName to true
	delay 1
end tell

-- the rest code

Frankly, I’d go a completely different way using JavaScript and perhaps a function with a map to do the replacement. Something like

let txt = getSelectedTextSomeHow();
txt = txt.replaceAll(/(\s){2,}/g,"$1"); /* Replace all occurences of two or more spaces with a single one */
txt = txt.replaceAll(/([:;])(\S)/$1 $2/; /* Add space to occurences of ":" and "." not followed by a space */
txt = txt.replaceAll(/[“”‘’]/g,'"');
txt = txt.replaceAll(/(\b\w+\b)\s+\1/g,"$1");
...

A lot less to type, no processes to start, no worries about escaping, and (imho) easier to read, too. Or, if it absolutely has to be AS, you could use the RE engine of NSFoundation.

I welcome any JXA solution, but does your example only replace text in a variable? It is not clear if it replaces the selected text in the window. And what happens if the text is selected in other applications as well as in the one you want? I don’t think the problem is so simple.

The parts that get the selected text and replace it after all the regex stuff should (!) be similar to the AS code. That’s why I didn’t bother to write it down – I was more piqued by the repeated calls to sed and all the shenanigans necessaryfor that.

Well, I copied your script into the Script Editor, set the JavaScript mode, ran it, but it doesn’t work and throws errors. I know you are an advanced user, so why don’t you post a working solution?

I can see the futility of runing regex through sed under AppleScript, but as someone new to AppleScript I’m not sure if this is beyond my skillset. JavaScript sounds like an option but I don;t have the bndwidth to start learning something new.

Embedding JXA code in the AppleScript solution:

set secondFrontAppProcessName to getSecondFrontProcessName()

tell application "System Events"
	set frontmost of process secondFrontAppProcessName to true
	delay 1
end tell

set the clipboard to ""
tell application "System Events" to keystroke "c" using command down -- copy
delay 1 -- test different values

set selectedText to the clipboard as Unicode text
if selectedText is "" then
	display dialog "Text selection not found"
	return
end if
set the clipboard to runJXApart(selectedText)
delay 1
tell application "System Events" to keystroke "v" using command down -- paste


on runJXApart(selectedText)
	do shell script "osascript -l JavaScript -e \"function run () {
	let txt = '" & selectedText & "'; 
txt = txt.replace(/  +/g, ' '); /* Replace all occurences of two or more spaces with a single one */
/* Other Replace operations */
return txt}\""
end runJXApart


on getSecondFrontProcessName()
	set {ATID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, quote}
	set secondFrontProcess to text item 4 of (do shell script "lsappinfo visibleProcessList")
	set AppleScript's text item delimiters to {" ", "_"}
	set secondFrontProcess to (text items of secondFrontProcess) as text
	set AppleScript's text item delimiters to ATID
	return secondFrontProcess
end getSecondFrontProcessName

Thanks for that. Though: it wasn’t JXA code but pure JavaScript that could run anywhere. As is your code which creates an Application object that’s not needed. Nor are the standard additions. No need to use a run function either, probably.
The RE is replacing only ASCII spaces. I’d always use \s to match also tabs etc.

As to “why do you post untested code”: I have neither the time nor the interest to solve problems that I don’t have. So, I posted only symbolic code showing how one would solve the interesting part of the problem in another language (!). Actually completely agnostic of the environnent – this would work in a browser as well as in Drafts, for example.

Yes, creating ‘Application’ object and standard additions no need here. I updated the JXA part. As for the ‘run’ function creating, it is my usual approach to return results. I left it as is to be able to use return command which works inside functions only.

Hi, I put together the code and it all checks out in Script Editor, how ever when I run the script from the AppleScript Menu when in TextEditor, the clipboard always comes back with the message that it is empty. Where should I be looking to figure this out?
Thanks Jeff

When you run it from script menu, text editor might not be the second open app.

I specifically drew attention to the fact that the search for the second frontmost application only makes sense when running the script “normally” from the Script Editor or as an application from the Finder.

In the case of a “special” script run from the Scripts menu, the desired frontmost application will be the first frontmost application. In this case you have already the desired application on front, so all lines of code up to set the clipboard to “” and the getSecondFrontProcessName() handler can be omitted from my scripts.

You can store on your Mac 2 versions of script/app for “normal” running and “from scripts menu” running.

1 Like

@jpottsx1. I revised your script and it works in my testing with TextEdit, Script Editor, and FSNotes. I ran the script by way of the macOS Script Menu and FastScripts 3. The script may not work if you run it in a script editor, and, depending on your computer, you may need to insert more or longer delays.

set the clipboard to ""

tell application "System Events"
	set activeApp to name of first process whose frontmost is true
	tell application process activeApp
		keystroke "c" using {command down}
	end tell
end tell
delay 0.2
set selectedText to the clipboard

if selectedText = "" then
	display alert "The active app is " & activeApp & ". A text selection not found"
	error number -128
end if

-- Replace ".  " with ". "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/\\.  /\\. /g'"
-- Replace "  " with " "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/  / /g'"
-- Replace "   " with " "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/   / /g'"
-- Replace ",  " with ", "
set selectedText to do shell script "echo " & quoted form of selectedText & " | sed 's/,  /, /g'"
-- other replacements deleted for brevity

set the clipboard to selectedText
delay 0.2 -- probably not necessary

tell application "System Events"
	tell application process activeApp
		keystroke "v" using {command down}
	end tell
end tell

‘’’

--Running under AppleScript 2.8, MacOS 13.0.1
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

tell application "TextEdit"
    activate
    set selectedText to document 1's text
    --My TextEdit document contains the following text..."Lorem                   ipsum ,dolor “dolor dolor sit” amet,  consectetur ;adipiscing :elit. lowercase CRAS FEUGIAT ,euismod   iaculis." & return & return & "Donec vel:bibendum risus,  in consequat erat. Nam eu molestie dolor. Duis in dignissim neque. Vivamus non est in turpis sagittis efficitur. Praesent molestie" & linefeed & linefeed & linefeed & " erat ut ipsum elementum, nec venenatis mi venenatis. Fusce volutpat quis enim nec sollicitudin. Donec ex odio, volutpat ut laoreet ut, tincidunt id justo. Curabitur blandit enim nisi, a rutrum urna ultricies quis. Donec nec iaculis nisi. Donec dictum mi ac varius blandit. Integer dictum tempor neque, eu eleifend nisi semper sit amet. Phasellus ante nunc, porttitor eu diam ac, lacinia dapibus nibh. Nulla quis auctor arcu, luctus pharetra urna. Ut mauris quam, bibendum sit amet ipsum vitae, iaculis dapibus libero. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Vivamus semper, neque sed aliquam vehicula, sem ipsum rhoncus eros, in iaculis lacus nisi a ligula. Vivamus pulvinar neque sit amet lacinia iaculis. Curabitur imperdiet blandit scelerisque. Ut ut urna vel ante interdum semper. Nullam nec ante tellus. Etiam suscipit eleifend erat, non iaculis nulla efficitur nec. Sed congue ornare consectetur."
end tell

--Add appropriate spaces after these punctuation marks, remove leading spaces.
repeat with thisDelimiter in {{" :", ":"}, {":", ": "}, {" ;", ";"}, {";", "; "}, {" ,", ","}, {",", ", "}, {"“", "\""}, {"”", "\""}}
    set selectedText to (my replaceText(item 1 of thisDelimiter, item 2 of thisDelimiter, selectedText))
end repeat
--Iterate over these duplicates until all repetitions are handled.
repeat with thisDelimiter in {{return & return, return}, {linefeed & linefeed, linefeed}, {space & space, space}}
    repeat while selectedText contains (item 1 of thisDelimiter)
        set selectedText to (my replaceText(item 1 of thisDelimiter, item 2 of thisDelimiter, selectedText))
    end repeat
end repeat
-- Remove all word duplications.
repeat with thisWord in words of selectedText
    set duplicatedWord to thisWord & space & thisWord
    set selectedText to (my replaceText(duplicatedWord, thisWord, selectedText))
end repeat
--Capitalize first word of sentences, lowercase the remainder. 
set AppleScript's text item delimiters to ("." & space)
set textChunks to text items of selectedText
repeat with i from 2 to length of textChunks
    set mungedText to (my uppercase(character 1 of (item i of textChunks))) & (my lowercase(text 2 thru -1 of (item i of textChunks)))
    set item i of textChunks to mungedText
end repeat
set selectedText to textChunks as text
set AppleScript's text item delimiters to ""
set the clipboard to selectedText
return selectedText


on replaceText(searchString, replacementString, sourceText)
    set the sourceString to current application's NSString's stringWithString:sourceText
    set the adjustedString to the sourceString's stringByReplacingOccurrencesOfString:searchString withString:replacementString
    return (adjustedString as text)
end replaceText

on uppercase(sourceText)
    set the sourceString to current application's NSString's stringWithString:sourceText
    set the adjustedString to sourceString's uppercaseString()
    return (adjustedString as text)
end uppercase

on lowercase(sourceText)
    set the sourceString to current application's NSString's stringWithString:sourceText
    set the adjustedString to sourceString's lowercaseString()
    return (adjustedString as text)
end lowercase

‘’’

Thank you for taking the time to worker my beginner’s script.

I tested it and it works great except for the Smart/Straight quote replacement.

If I have any words within quotation marks I receive the following results "big house" or "little house"".

I swapped out the escape character positions and it works perfectly.

Thank you for you time and effort on this. I truly have so much to learn in the AppleScript world.