problem isolating numerical values in textedit

hey,

ill admit i don’t really know what im doing, but here’s what im trying to do:

we’re a screenprint company who works in illustrator. the illustrator file is named by the design number, but when we send out pdfs for proofing to the client, we save the pdf as the ORDER number, which is in a text box within the file.

i worked out an automator action that compresses the .ai file into a .pdf, and parses out all the text into a separate file of the same name, and puts both files (pdf and txt) on the desktop. for some reason automator names these things crazy characters like “0skeh42s.pdf”, but the accompanying text file is the same name so they’re easy to match up.

what i’d like to do now is create an applescript that i could either drop these files on, or put inside the automator action (but i’ve just about given up on that part) that will:
-reference the pdf to the txt file
-open the txt file and look for the first 6 digit number between say 150000 and 300000 that doesn’t have any letters attached
-rename the pdf with this number
-delete the text file

so far i’ve gotten this one to work:


on open dropThese
	repeat with oneFile in dropThese
		set thePath to (oneFile) as text
		set tName to name of (info for oneFile)
		if tName does not contain ".txt" then
			set txtPath to ((text 1 thru -4 of thePath) & "txt")
			tell application "TextEdit"
				open file (txtPath)
				set orderText to (every paragraph in text of document 1 where it begins with "20") as text
				if orderText is "" then
					set orderText to (every paragraph in text of document 1 where it begins with "19") as text
				end if
				close document 1
			end tell
			set orderName to (text 1 thru 6 of orderText)
			tell application "Finder"
				set the name of oneFile to (orderName & ".pdf")
				move txtPath to the trash
			end tell
		else
			tell application "Preview"
				open oneFile
			end tell
		end if
	end repeat
end open

but as you can see it’s too restrictive. also, it picks up things like “20%”

this one makes more logical sense, but just hangs the system:


on open dropThese
	repeat with oneFile in dropThese
		set thePath to (oneFile) as text
		set tName to name of (info for oneFile)
		if tName does not contain ".txt" then
			set txtPath to ((text 1 thru -4 of thePath) & "txt")
			tell application "TextEdit"
				open file (txtPath)
				set theLine to 1
				set orderText1 to 1
				repeat
					if orderText1 < 150000 then
						try
							set orderText1 to (text 1 thru 6 of paragraph theLine of document 1)
						on error
							set theLine to (theLine + 1)
						end try
					end if
				end repeat
				set orderText to orderText1 as text
				close document 1
			end tell
			set orderName to (text 1 thru 6 of orderText)
			tell application "Finder"
				set the name of oneFile to (orderName & ".pdf")
				move txtPath to the trash
			end tell
		else
			tell application "Preview"
				open oneFile
			end tell
		end if
	end repeat
end open

finally, if this wasnt long enough, here’s a few examples of the text docs i’d be sifting through:


Thumbnail Size
200944
AW 7.28.09
Screen Print
WHITE
WILFLEX
WHITE
WILFLEX
Brite Org
WILFLEX
Black
WILFLEX
110 BK
110 OR
100%
#SP12971
#SP12971-1 (CU)Hit Somebody Rosman NC
(CU)Hit Somebody Rosman NC FB FB2
LC2 4 in x 3.75 in
12 in x 11 in 50% 35%
1.
1.
2.


Embroidery
FONT(S)
Thumbnail Size
50%
200486
TS 7.23.09
CUS PO: 714
#4624
File Name 4624 grand superior
pupcat
RES09R2_[AP]_F.Type FF+Apq.
C
315 x 129mm 12.4" x 5"
836 CC
836 CC
1090
emerald
1057
dk spice 1128
taupe
1002
white
1002
white
1109
fuschia
______%
Grk Pink
P. Twill
MarlinTeal
P. Twill
BRONZE
P TWILL TAN
P TWILL
White
P. Twill
White
P. Twill
split order evenly b/w colorways
1.
tackle
twill
2.
tackle
twill
3.
tackle
twill
A.
B.
C.


sorry for the long post, but i pasted the text so you can see that sometimes the order number appears in different places, and there may be all sorts of other numbers in the doc, but it will always be

6 characters,
between 150000 and 300000,
on its own line,
and be the the first of its kind in the txt (a reorder .ai file may have a text box with older order numbers, but those always appear further down in these text files)…

is the best way i can think to narrow it down.

any help would be much appreciated. thanks guys!

Hi,

the best solution is to find the reason for the random numbering while creating the pdf

try this


on open dropThese
	repeat with oneFile in dropThese
		set textPath to oneFile as text
		if textPath ends with ".pdf" then
			set textFile to text 1 thru -4 of textPath & "txt"
			set theText to read file textFile
			set flag to false
			repeat with oneParagraph in (get paragraphs of theText)
				try
					set i to oneParagraph as integer
					if i > 150000 and i < 300000 then
						set flag to true
						exit repeat
					end if
				end try
			end repeat
			if flag then
				tell application "Finder"
					set the name of contents of oneFile to ((i as text) & ".pdf")
					move file textFile to trash
				end tell
			else
				display dialog "text doesn't contain a matching number"
			end if
		end if
	end repeat
end open

hey, thanks for the quick reply but i dont think thats possible…

the weird name of the pdf and txt files come from automator’s temporary name for the file after pdf compression… its a process that’s built into automator and works fine to reduce file size… much quicker than doing this through scripting illustrator.

but that doesnt matter anyway really, because the original file name is not what we need the pdf to be called.

example:
uc berkeley wants a tshirt with the school logo
we would make that file as “SP3378 uc berkeley.ai” on our server
if they want to see it before its mass-printed we make a pdf
that file would be “198432.pdf”, the “198432” being that specific order number.

as for the actual text in the doc, i think that its arranged differently based on a bunch of factors, like when it was typed in the file in illustrator, its location relative to the (0,0) point on the artboard, etc. i’m not sure, but regardless, we can’t go back and somehow tag each order number within each file because there’s literally thousands of design files by now.

does that make sense?

thanks again. im keeping my fingers crossed.

oh, and thinking about it, if i could make a script in illustrator that would

open the file
set the artboard to (0,0)
select a text box within a certain area
copy it as plain text
and save a pdf with that text as the filename

that could also work, because the order number is always in the same general location on the page. i just dont have a clue how to get illustrator to select the box and copy the text.

I cannot imagine that Illustrator is slower then the built-in function and can’t create small PDF files.
Adobe’s PDF creator is the best you can get

ok, but creating the pdf isnt my problem. im trying to batch rename files based on text extracted from within the pdf. im really just looking for some applescript that could isolate what i need from the text doc. we send out about 100 orders a day, and so if the person who does that didn’t have to manually rename each file, it would save us a lot of time.

This is probably possible, but it depends on the structure of the .ai document

No offense, but for a commercial use what’s about to spend some money for the time (and money) you save
e.g. at http://www.macfreelancer.com/

because at this point all i really need is a way to tell textedit through applescript to find the first 6 digit number between 150000 and 300000 in a doc where everything is separated by a character return. thats it. i just dont know how to script it to go line by line and weed out everything until it gets to it.

have you tried the solution in post #2?
TextEdit is actually not needed, if the files are plain text files

i could look into the funny naming that automator does when it compresses them, but i dont care what the temp name is since im renaming it right away anyway

as for the text that it extracts from the pdf, thats arranged that way based on layers i believe, and then chronologically. since there are so many files and our templates have changed a few times over the years, its impossible to ensure a consistently clean text file where the order number shows up at line 2 all the time, really.

i realize textedit isnt needed, but frankly, i dont know how to do it without it. i just need a way of picking out the first 6-character combination in the text that is all only numbers, and whose value is 150000 to 300000, and i think i could handle the rest.

That’s what the solution in post #2 does exactly

PS:

here the extracted code for the “algortithm”


set theText to "Embroidery 
FONT(S) 
Thumbnail Size 
50% 
200486 
TS 7.23.09 
CUS PO: 714 
#4624 
File Name 4624 grand superior 
pupcat 
RES09R2_[AP]_F.Type FF+Apq. 
C 
315 x 129mm 12.4" x 5" 
836 CC "

set flag to false
repeat with oneParagraph in (get paragraphs of theText)
	try
		set i to oneParagraph as integer
		if i > 150000 and i < 300000 then
			set flag to true
			exit repeat
		end if
	end try
end repeat
if flag then
	display dialog i as text
else
	display dialog "text doesn't contain a matching number"
end if

ok honestly, youre an amazing man. i have no idea what i was doing before, but youre right, it works absolutely perfectly. really, thank you very much. that was too cool just testing it a second ago. it also runs a lot faster not using textedit. any thoughts on what to change to embed it in an automator workflow?

You can embed AppleScripts in Automator with the “Run AppleScript” action

well, sort of…

on run {input, parameters}

(* Your script goes here *)

return input

end run

when i put the script in nothing happens. no error, just nothing happens. im sure im supposed to be passing through a variable or something…

really though if this is a pain, i really didnt think id be able to do this, and im grateful for the help and would be totally fine with having people just run the automator action and then the applescript separately. at this point its just sport.

thanks

of course the whole thing depends on the Automator actions and which and how the parameters are passed.
Just wrapping the script with the event handler is not sufficient