I’m using this regex to clean up some body text grabbed from some emails.
set clean_text to do shell script "echo " & quoted form of alertcon & "|sed \"s/[^[:alnum:][:space:]]//g\""
It does a pretty good job within my script, but there’s unwanted space left where an image is removed. How can I alter that regex part so it also removes all carriage returns?
I didn’t know the proper way to describe the newline or empty lines. It’s not carriage returns and the empty space is still there after applying the new regex. I should have said I want to remove empty lines or newlines from the text.
How would I do that instead? That’s is, remove both carriage returns AND empty lines/newlines?
What’s the difference between a gap resulting from the removal of a picture (which you want to remove) and a gap between paragraphs (which presumably you want to keep)?
I changed the script to remove empty lines. Maybe that works for you, or you’ll redecide having read Nigel’s consideration. You cain’t have both!
Here is the changed version, with some demo text, before and after, I have newlines as line-endings in Script Debugger, you may experience a different result, if you use AppleScript Editor, with line endings set to carriage return, so please try it with some text from disk.
set alertcon to "This
is some text
that should be cleansed " & return & " and fine."
set clean_text to do shell script "echo " & quoted form of alertcon & "|sed -e 's/[^[:alnum:][:space:]]//g' -e '/^$/ d' |tr -d '\\r'"
# "This
# is some text
# that should be cleansed and fine"
I’m not sure what was causing the gap where the picture was in the first place. What I did was write an Applescript that runs when I get email (via Apple Mail rule) and it takes the sender, subject and part of the body to send via iChat to a an old cell phone to get around an issue with the cell phone service provider. I need the text body to be less than 180 characters so I fixed that as well. Finally, I made an Applescript that turns on the rule when away from computer that runs the Applescript and the rule also uses “Stop evaluating rules” so everything temporarily goes to the inbox. Then another script to turn off the rule when I get back to the computer and no longer need the script to run. Those both are Actions in LaunchBar, so it’s trivial to do it.
Anyway, everything was working perfect until I got emails with inline images within them and it would break the script and nothing would happen. So I experimented with the script looking for attachments and that regex I found elsewhere and it solved the problem, but there was a lot of space left over in the body text where the image was removed and would waste space on my old cell phone with a tiny screen.
In case you’re curious, this is the final Applescript I patched together that works perfect now:
tell application "iChat"
activate
end tell
tell application "Mail"
activate
set theSelection to (get every message of inbox whose read status is false)
set theMessage to item 1 of theSelection
set alertsend to sender of theMessage
set alertsub to subject of theMessage
set alertcon to content of theMessage
if length of alertcon > 180 then set alertcon to (text 1 thru (180 - 1) of alertcon) as string
if length of alertcon < 180 then set alertcon to text 1 thru length of alertcon as string
set alert to alertsend & alertsub & alertcon
if (every mail attachment of theMessage) ≠{} then
set clean_text to do shell script "echo " & quoted form of alertcon & "|sed -e 's/[^[:alnum:][:space:]]//g' -e '/^$/ d' |tr -d '\\r'"
set alert to alertsend & alertsub & clean_text
end if
set read status of theMessage to true
tell application "iChat"
activate
send alert to buddy "+18001234567" of service 1
end tell
end tell
Granted, this is an absolute Frankenstein I patched together so I’m positive there’s many things wrong with the way I’ve done this, but it works perfect now.
Ah. So you weren’t interested in keeping gaps between paragraphs anyway. And it links in with why you were happy for your regex to zap punctuation.
The code I originally posted in post #3 did away with the “tr” command by simply changing [:space:] to [:blank:], which would look like this applied to McUsrII’s second offering:
set clean_text to do shell script "echo " & quoted form of alertcon & "|sed -e 's/[^[:alnum:][:blank:]]//g' -e '/^$/ d'"
Or:
set clean_text to do shell script "echo " & quoted form of alertcon & "|sed 's/[^[:alnum:][:blank:]]//g ; /^$/ d'"