Find, copy paste name and email

Hi,

Is there a way to have Applescript go through a folder of CVs/resumes, and for each CV/resume Word doc file, add the following to the top of the document:

Name: [name of the person extracted from CV/resume file name (the file name is the person’s name)]

and underneath add

Email: [email address extracted from the CV/resume (in the CV/resume, the person usually writes Email: or E-mail: followed by their email address]

So at the top of the CV/resume, it would say for example:

Name: Tom Smith
Email: ts@myemail.com

If so, I’ll start looking into how to get Applescript to do it. Otherwise, I’ll have to try to figure out another way.

Thanks.

Model: iMac
AppleScript: 2.4
Operating System: Mac OS X (10.10)

Hey Rob,

Is it possible?

Sure.

I’d probably use the textutil command-line program to extract the text from the file.

[format]textutil -noload “/Users/yourUserName/test/Christopher Stone.docx” -stdout -format docx -convert txt -encoding UTF-8[/format]

Then I’d use a regular expression (added to AppleScript via the Satimage.osax AppleScript Extension) to extract the email address.

Of course you could use sed, awk, Perl, Ruby, or Python to extract the address, but I find the Satimage.osax easier to use most of the time.

It’s easy to get the file name.

Emplacing text INSIDE the Word document will require scripting Word, but that should be possible.

What version of Word are you using?

What is the structure of the files on-disk that you want to process?


Chris


{ MacBookPro6,1 · 2.66 GHz Intel Core i7 · 8GB RAM · OSX 10.11.2 }
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Hey Rob,

Here’s an example.

I’m using the textutil to convert the word document to text, and then I’m using the Satimage.osax’s regex to find the email address in that text.

I imagine you can find the email address with Word itself, but I’m not going to bother.

All-in-all that was easier than I expected.

Tested with Word 14.5.7 from Office 2011.

-Chris


-------------------------------------------------------------------------------------------
# Auth: Christopher Stone <scriptmeister@thestoneforge.com>
# dCre: 2016/01/03 21:15
# dMod: 2016/01/03 21:39
# Appl: Finder, Microsoft Word
# Task: Emplace Name and Email Address Header in Word Files.
# Osax: Satimage.osax { http://tinyurl.com/dc3soh } » REQUIRED!
# Tags: @Applescript, @Script, @Finder, @Microsoft_Word
-------------------------------------------------------------------------------------------

try
	
	set mySourceFolder to path to downloads folder
	
	tell application "Finder"
		set myFileList to (items of mySourceFolder whose name ends with ".docx") as alias list
		set myNameList to name of items of mySourceFolder whose name ends with ".docx"
	end tell
	
	if myFileList ≠ {} then
		set AppleScript's text item delimiters to ".docx"
		
		set _counter to 0
		
		repeat with theFile in myFileList
			
			set _counter to _counter + 1
			set filePathPosix to quoted form of (POSIX path of theFile)
			set shCMD to "textutil -noload " & filePathPosix & " -stdout -format docx -convert txt -encoding UTF-8"
			set docText to do shell script shCMD
			set theEmailAddress to fnd("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}\\b", docText, false, true) of me
			set headerText to "Name: " & text item 1 of (item _counter of myNameList) & return & "Email: " & theEmailAddress & return & return
			
			tell application "Microsoft Word"
				open theFile
				
				tell active document
					tell its text object
						insert text headerText at beginning
					end tell
					save
					close
				end tell
				
			end tell
			
		end repeat
		
	end if
	
on error e number n
	stdErr(e, n, true, true) of me
end try

-------------------------------------------------------------------------------------------
--» HANDLERS
-------------------------------------------------------------------------------------------
on stdErr(e, n, beepFlag, ddFlag)
	set e to e & return & return & "Num: " & n
	if beepFlag = true then
		beep
	end if
	if ddFlag = true then
		tell me
			set dDlg to display dialog e with title "ERROR!" buttons {"Cancel", "Copy", "OK"} default button "OK"
		end tell
		if button returned of dDlg = "Copy" then set the clipboard to e
	else
		return e
	end if
end stdErr
-------------------------------------------------------------------------------------------
on fnd(_find, _data, _all, strRslt)
	try
		find text _find in _data all occurrences _all string result strRslt with regexp without case sensitive
	on error
		return false
	end try
end fnd
-------------------------------------------------------------------------------------------

Thanks Christopher,

Am getting a Syntax error when compiling though. Error message: Syntax Error Expected function name, command name or function name but found “error”.

It highlights “error” in this line: on error e number n

Am using Word 2011

What is the structure of the files on-disk that you want to process? All the files to process are in a single folder. Is that the structure?

It also seems we’re going to be using a different system, so now I need to find out how to put the CV file name, which is the person’s name (need to exclude the file extension), and the person’s email address, extracted from the CV, on the same row in Excel.

Is this possible?

It’s quite easy to use an email extractor to extract email addresses from multiple files, and it’s quite easy to just copy a file list in Finder and paste it into Excel, but the difficulty is getting the CV file name to match the email address found in the CV on the same row.

After getting the people’s names (by using the CV file names) and email addresses into Excel, I can use Text to Columns to split the name into first, middle and family name, using the spaces between the parts of the name as the delimiter.

Are you sure that the required OSAX named Satimage is installed on your machine ?

Yvan KOENIG running El Capitan 10.11.2 in French (VALLAURIS, France) lundi 4 janvier 2016 09:40:24

Hi Yvan,

I believe so. I installed http://www.satimage.fr/software/downloads/Satimage398.pkg

Will double check when I get back. Am out at the moment.

Am still getting the error.

I put DOCX files into a folder called mySourceFolder, which is in my Downloads folder, and compiled the script.

What am I doing wrong?

What have you done after downloading http://www.satimage.fr/software/downloa . age398.pkg ?
You were supposed to double click the icon of the package to install it in the dedicated folder so that it’s pathname is :
/Library/ScriptingAdditions/Satimage.osax/

My guess is that you just downloaded the package. Am’I wrong ?

Yvan KOENIG running El Capitan 10.11.2 in French (VALLAURIS, France) lundi 4 janvier 2016 16:16:38

Just checked. Satimage.osax is in the /Library/ScriptingAdditions/ folder

So I guessed wrongly.

I have no other idea about the fact that you can’t compile the script posted by ccstone in its message dated Yesterday 09:47:28 pm.
I just clicked on [Open this Scriplet in your Editor:] and got the script ready to compile flawlessly.

Yvan KOENIG running El Capitan 10.11.2 in French (VALLAURIS, France) lundi 4 janvier 2016 16:56:21

Hi Yvan,

Could you go through step by step what you do before compiling and running the script, please?

Thanks.

Enter this thread in macScripter.
Navigate to ccstone’s message dated Yesterday 09:47:28 pm. It’s numbered #3.
In the message we see a rectangle surrounding the code of the script.
We may see a block of blue text spelled : [Open this Scriplet in your Editor:]
Click on it as I already wrote.
The script will appear in a window of Script Editor.
Last task, click the button named [Compile] (the one with a hammer).

For years it’s the official protocol, and it’s also the safer, cleaner one.
With it we don’t drop any useful character or insert any extraneous one.

Yvan KOENIG running El Capitan 10.11.2 in French (VALLAURIS, France) lundi 4 janvier 2016 17:12:35

Thanks, it complies ok now, but nothing happens to the DOCX files I put in the folder called mySourceFolder, which is in my Downloads folder.

The script appears to do something. It seems to open three Word files, but then stops, and the files in the mySourceFolder haven’t been modified.

What am I missing?

The script is supposed to insert three lines of text at the very beginning of the file.

I don’t use Mer.Soft products so I can’t test.

Insert two extraneous instructions flagged # ADDED below :
repeat with theFile in myFileList

		set _counter to _counter + 1
		set filePathPosix to quoted form of (POSIX path of theFile)
		set shCMD to "textutil -noload " & filePathPosix & " -stdout -format docx -convert txt -encoding UTF-8"
		set docText to do shell script shCMD

log docText # ADDED
set theEmailAddress to fnd(“\b[A-Z0-9._%±]+@[A-Z0-9.-]+\.[A-Z]{2,}\b”, docText, false, true) of me
set headerText to "Name: " & text item 1 of (item _counter of myNameList) & return & "Email: " & theEmailAddress & return & return
log ">>>>>>> " & headerText # ADDED
tell application “Microsoft Word”
open theFile

			tell active document
				tell its text object
					insert text headerText at beginning
				end tell
				save
				close
			end tell
			
		end tell
		
	end repeat

When you will run the script they will display some datas in the Events log area at the bottom of the Editor’s window.
If the Events log pane is not displayed, click the small blue square with four horizontal lines available at the very bottom of the window.
Looking at this events log, you will be able to see if the script correctly extract the wanted datas.
Given my experience, ccstone never post a script before testing it so I’m quite sure that if your documents contain the searched datas, it will insert them at the beginning of the document.

As you wrote that the script opened three Word documents, it’s clear that it built the wanted headers.

Just a question : are the documents large ones ?
If they are, maybe it would be useful to insert a delay between the instruction : open theFile
and the instruction : tell active document:
You may also temporarily force the script to stop after inserting datas in the first docx treated.

The code will then looks like that :
tell application “Microsoft Word”
open theFile
delay 1 # ADDED
tell active document
tell its text object
insert text headerText at beginning
end tell
save
error number -128 # ADDED
close
end tell

		end tell

With these tips you would be able to see what is really done.

Yvan KOENIG running El Capitan 10.11.2 in French (VALLAURIS, France) lundi 4 janvier 2016 18:12:42

I seem to recall that a Microsoft Word .docx “file” is actually a .zip package composed of several files inside it – perhaps that is part of the problem?

If I’m reading you correctly, your objective is entirely modifying the MS Word file, right?

If so, then I’d highly recommend that you use the Word VBA for this process. It is very powerful, and, IMO, much easier to use on MS docs than AppleScript. Plus, there are a ton of VBA examples out there that you can make use of.

BTW, VBA also has a RegEx engine, if you need it.

If you decide to go this route, and get stuck, let me know and I’ll try to help. Over the years I’ve done lots of VBA work.

Good luck.

Hey JM,

It would be nice if you provided a sample of VBA to find text in the active document with regex.

And how to get the value back out to AppleScript.

-Chris

Hey Rob,

Well…

You did change the source folder in the script. Yes or no?

This line in my script sets the source folder.

set mySourceFolder to path to downloads folder

You have to change that to point to the correct folder on your system. Something like this:

alias ((path to downloads folder as text) & "MyWordFiles Folder:")

Do you have a sample file or three you can send me for testing? { macscripter@thestoneforge.com }

-Chris