Sorting out a string

Noovicio · November 2, 2016, 7:19am

Hello.
I have created a script which takes an email from Outlook, creates a txt file, removes all the irrelevant data and adds the formatted text to a .csv file to be used later with InDesign data merge. The script works fine, however I’m getting returns (new line) in unexpected places and is messing the creation of the database. these returns don’t display when opening the email in Outlook but they show opening the file in TW in it’s original file type .eml, also show opening them with Thunderbird or mail app.
I have tried encoding the eml file differently but didn’t see any results.

I can’t remove the returns programmatically (using my beginner skills) because they appear in the client input so it’s difficult to pin point exactly were the return will be…let me show you what I mean with an example…

Name:Dr Sara J
Hukkman
Title:Visiting Senior Fellow
Phone:+44 (0)20 7655
7424
email:[always displays correctly]

Some people write the number with no spaces so there is no return, others write as above so one return might happen.

I solved the problem by pausing the script to allow a human to tidy up the file but It would be great if it can be somehow automated and hopefully, this would create some challenge to you experts

Thanks

Yvan_Koenig · November 2, 2016, 10:06am

If I understand well you wish to get :
[format]Name:Dr Sara J Hukkman
Title:Visiting Senior Fellow
Phone:+44 (0)20 7655 7424
email:[always displays correctly][/format]

If I am right you may use :

set originalText to "Name:Dr Sara J
Hukkman
Title:Visiting Senior Fellow
Phone:+44 (0)20 7655
7424
email:[always displays correctly]"

set splitted to my recolle(paragraphs of originalText, linefeed) # Now we know which is the paragraph delimiter
set splitted to my decoupe(splitted, linefeed & "Title:")
set nameLine to my recolle(paragraphs of item 1 of splitted, space)
set splitted to "Title:" & item 2 of splitted
set splitted to my decoupe(splitted, linefeed & "Phone:")
set titleLine to my recolle(paragraphs of item 1 of splitted, space)
set splitted to "Phone:" & item 2 of splitted
set splitted to my decoupe(splitted, linefeed & "email:")
set phoneLine to my recolle(paragraphs of item 1 of splitted, space)
set splitted to "email:" & item 2 of splitted
set emailLine to my recolle(paragraphs of splitted, space)
set newtext to my recolle({nameLine, titleLine, phoneLine, emailLine}, linefeed)


#=====

on decoupe(t, d)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

on recolle(l, d)
	local oTIDs, t
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d}
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end recolle

#=====

You may also use ASObjC features

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

set originalText to "Name:Dr Sara J
Hukkman
Title:Visiting Senior Fellow
Phone:+44 (0)20 7655
7424
email:[always displays correctly]"

set splitted to my concatList:(paragraphs of originalText) usingString:linefeed # Now we know which is the paragraph delimiter
set splitted to my split:originalText usingString:(linefeed & "Title:")
set nameLine to my concatList:(paragraphs of item 1 of splitted) usingString:space
set splitted to "Title:" & item 2 of splitted
set splitted to my split:splitted usingString:(linefeed & "Phone:")
set titleLine to my concatList:(paragraphs of item 1 of splitted) usingString:space
set splitted to "Phone:" & item 2 of splitted
set splitted to my split:splitted usingString:(linefeed & "email:")
set phoneLine to my concatList:(paragraphs of item 1 of splitted) usingString:space
set splitted to "email:" & item 2 of splitted
set emailLine to my concatList:(paragraphs of splitted) usingString:space
set newtext to my concatList:{nameLine, titleLine, phoneLine, emailLine} usingString:linefeed

#=====

on split:sourceString usingString:d1
	set sourceString to current application's NSString's stringWithString:sourceString
	return (sourceString's componentsSeparatedByString:d1) as list
end split:usingString:

#=====

on concatList:theList usingString:d1
	set anArray to current application's NSArray's arrayWithArray:theList
	return (anArray's componentsJoinedByString:d1) as text
end concatList:usingString:

#=====

As you may see, I choose safety and assumed that the four entries may be wrongly formatted in the original text.

Yvan KOENIG running Sierra 10.12.1 in French (VALLAURIS, France) mercredi 2 novembre 2016 11:05:48

DJ_Bazzie_Wazzie · November 2, 2016, 10:48am

I can imagine that the order of such an text based card can vary (since it’s name based). Also this example doesn’t matter how many linefeeds are used.

set cardString to "Title:Visiting Senior Fellow
Phone:+44 (0)20 7655
7424
Name:Dr Sara J
Hukkman
email:[always displays correctly]"

set cardRecord to {theName:"", theTitle:"", phoneNumber:"", mailAddress:""}

repeat with theLine in paragraphs of cardString
	if theLine begins with "Name:" then
		set buf to (a reference to theName of cardRecord)
		set contents of buf to text of theLine
	else if theLine begins with "Title:" then
		set buf to (a reference to theTitle of cardRecord)
		set contents of buf to text of theLine
	else if theLine begins with "Phone:" then
		set buf to (a reference to phoneNumber of cardRecord)
		set contents of buf to text of theLine
	else if theLine begins with "email:" then
		set buf to (a reference to mailAddress of cardRecord)
		set contents of buf to text of theLine
	else
		set contents of buf to contents of buf & space & theLine
	end if
end repeat

tell AppleScript
	set oldTIDs to text item delimiters
	set text item delimiters to linefeed
	set newCard to (cardRecord as list) as string
	set text item delimiters to oldTIDs
end tell
return newCard

or with AppleScript Toolbox to simplify some tasks:

set cardString to "Title:Visiting Senior Fellow
Phone:+44 (0)20 7655
7424
Name:Dr Sara J
Hukkman
email:[always displays correctly]"

set theLines to paragraphs of cardString

set r to AST find regex "^(Title|Phone|Name|email):" in string theLines
repeat with i from (count theLines) to 2 by -1
	if item i of r = {} then
		set item (i - 1) of theLines to item (i - 1) of theLines & space & item i of theLines
		set item i of theLines to missing value
	end if
end repeat
set theLines to text of theLines

tell AppleScript
	set oldTIDs to text item delimiters
	set text item delimiters to linefeed
	set newCard to theLines as string
	set text item delimiters to oldTIDs
end tell
return newCard

Nigel_Garvey · November 2, 2016, 1:29pm

Here’s another ASObjC approach. It assumes that the “Name:” header’s always first and that the “email:” one’s always present, last, and correct. But I’d feel more comfortable knowing why the spurious line endings were being inserted in the first place.

use AppleScript version "2.4"
use framework "Foundation"

set originalText to "Name:Dr Sara J 
Hukkman
Title:Visiting Senior Fellow
Phone:+44 (0)20 7655
7424
email:[always displays correctly]"

set |âŒ˜| to current application
-- Get the text as NSString.
set originalText to |âŒ˜|'s class "NSString"'s stringWithString:(originalText)
-- Replace all its line endings with linefeeds to ensure that's what they are.
set newlineSet to |âŒ˜|'s class "NSCharacterSet"'s newlineCharacterSet()
set theText to (originalText's componentsSeparatedByCharactersInSet:(newlineSet))'s componentsJoinedByString:(linefeed)
-- Substitute a space for any linefeed (and any preceding spaces) where followed by at least one colon-less line and a header line.
(theText's stringByReplacingOccurrencesOfString:(" *\\n(?=([^:\\n]++\\n)++(Title|Phone|email):)") withString:(" ") options:(|âŒ˜|'s NSRegularExpressionSearch) range:({location:0, |length|:theText's |length|()})) as text

ccstone · November 5, 2016, 4:53am

Hey Noovicio,

Here’s how I usually approach such a problem:

Requires the Satimage.osax (AppleScript Extension) to be installed.

I’ve added some spacing glitches into the info-record.


-------------------------------------------------------------------------------------------

set cardString to "
    
Title:Visiting Senior Fellow
  Phone:+44 (0)20 7655   
7424
Name:Dr Sara J

Hukkman    
email:[always displays correctly]

"
# Change any non-breaking-spaces or tabs into spaces.
set cardString to cng("[[:blank:]]", " ", cardString) of me

# Trim off any leading or trailing vertical whitespace from the entire record.
set cardString to cng("\\A\\s+|\\s+\\Z", "", cardString) of me

# Trim off any leading or trailing whitespace from each line.
set cardString to cng("^\\s+|\\s+$", "", cardString) of me

# Change 2 or more spaces to a single space.
set cardString to cng(" {2,}", " ", cardString) of me

# Change one or more carriage-return or linefeed into a bullet.
set cardString to cng("[\\n\\r]+", "¢", cardString) of me

# Restore field-lines in the record.
set cardString to cng("¢((?:Phone|Name|email):)", "\\n\\1", cardString) of me

# Remove any remaining bullets.
set cardString to cng("¢", " ", cardString) of me

-------------------------------------------------------------------------------------------
--» HANDLERS
-------------------------------------------------------------------------------------------
on cng(_find, _replace, _data)
	change _find into _replace in _data with regexp without case sensitive
end cng
-------------------------------------------------------------------------------------------

The regular expressions are portable, so an ASObjC approach like Nigel’s could be modified to do what I am.

Users and the Internet always add in little wrinkles that need to be accounted for, so I take this modular approach for easy maintenance and modification.

–
Chris

{ MacBookPro6,1 · 2.66 GHz Intel Core i7 · 8GB RAM · OSX 10.11.6 }
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Noovicio · November 6, 2016, 9:19am

wow! I went away for a few days and I come back to this, lots to experiment
Thanks for your answers