Loop saga pt. 2

Many thanks, first off, to Dan T for his help.

{Edit by Adam Bell: see this thread for the previous result}

I got the loop working. The idea behind the loop is to have each odd member of an array replace the even member that comes right after them. The final product is supposed to swap characters that XML doesn’t take for their entities, i.e. & for & and so forth.

The loop works fine when the items in the array are things like {“item1”, “item2”, “item3”, “item4”…} The replacements go as planned.

** But when I start adding the array members for the entity swap, the replacement fails to take place. **

To do the replacement, I’m using an old UNIX command called sed. Kinda old, but wicked fast. At some point, we may have to work on hundreds of files at once, so we need fast.

The code follows. It takes the contents of file_a, copies them to hold_file1, then hold_file1 and hold_file2 swap about (sed can’t change it’s input file, only output to another file), then hold_file2 copies itself into end_file.

Best wishes,
Laurel

set test_list to {"&#38", "&", "&#162", "¢", "item1", "item2", "item3", "item4", "item5", "item6", "item7", "item8", "item9", "item10"}

set x to 1
set y to 2


do shell script "cp /Users/kschalk/desktop/file_a  /Users/kschalk/desktop/hold_file1"

--loop to replace members of an array with other members of an array. 
--This loop replaces each even members of the array with the odd member before it.
--This counter number needs to be the same as the number of entity pairs.
repeat while y ≤ (count of items in test_list)
	
	set var1 to item x of test_list as text
	set var2 to item y of test_list as text
	
	--setting the item it will be replaced with (entity)
	--works on hold_file1 and outputs data to hold_file2
	do shell script "sed 's/" & var2 & "/" & var1 & "/g' /Users/kschalk/desktop/hold_file1 > /Users/kschalk/desktop/hold_file2"
	
	--copies hold_file2 to hold_file1
	do shell script "cp /Users/kschalk/desktop/hold_file2 /Users/kschalk/desktop/hold_file1"
	
	--increase the counters
	set x to x + 2
	set y to y + 2
end repeat

I find it hard to believe that you’ll endure a large penalty in time just doing this in plain vanilla AppleScript.

I saved a file called ‘theFile’ on my desktop and put this in it:

I ran this script (with an important proviso that the ‘&’ symbol MUST be first or it will be replaced in the codes):

set exchList to {"&#38", "&", "&#162", "¢", "£", "£", "&#183", "¢", "ø", "ø"} -- this did have codes in it that rendered.
set F to open for access (alias ((path to desktop as text) & "theFile.txt")) with write permission
set tFile to read F
try
	repeat with k from 2 to (count exchList) by 2
		if tFile contains item k of exchList then set tFile to swap(item k of exchList, item (k - 1) of exchList, tFile)
		log tFile
	end repeat
	set eof of F to 0
	write tFile to F
	close access F
on error
	close access F
end try

to swap(toFind, toReplace, theText)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to toFind
	set textItems to theText's text items
	set AppleScript's text item delimiters to toReplace
	tell textItems to set editedText to beginning & toReplace & rest
	set AppleScript's text item delimiters to astid
	return editedText
end swap

Which correctly changed all the symbols (but I can’t show it here because they get rendered) - which is why I didn’t put applescript tags on the script either.

Hi,

Yes, it can be done in plain-vanilla applescript. But we may be working in file-batches of over a thousand files each. In the past, this has required someone around to babysit the computer for 20-40 min–in case server connections went down, and our access to the files was cut off.

At least that’s my understanding. I was brought in because I knew the UNIX commands. I think the problem has something to do with Applescript not understanding unicode.

Any thoughts on a workaround?–I’m doing it this way because I was asked to by my boss.

Best,
Laurel:D

I used to teach Engineering Design in an Engineering School. I used to point out that there were three kinds of decision-making:

  1. Mother Nature Never Sleeps: the laws of nature are never suspended, physics and math rule.
  2. Economics, the dismal master: the value of every part of a design, a process, or a product must exceed its cost.
  3. Politics, with a small ‘p’ …: the law, labor agreements, corporate policy, patent positions, environmental concerns, and office politics (“the boss said so”) are concerns not covered by the first two, but just as important.

Looks like the third one got you. :wink:

Good luck.

Just an idea: Maybe the problems have something to do with misinterpretations of sed (‘&’ and maybe other characters):

try (in the terminal):

echo “----&#38----” | sed ‘s/&#38/&/g’

  • the result is not ‘----&----’ as expected

maybe you can try using perl instead:

echo “----&#38----” | perl -pe ‘s/&#38/&/g’

here the ‘&’ replacement works: ‘----&----’

D.

According to sed’s man page:

Try this instead:

echo "----&#38----" | sed 's/&#38/\&/g'

Gives: ----&----

I’m not exactly sure what you’re saying the problem is, but there’s one change I’d make that would make the script run far, far faster.

In the case where you have 10 replacements, you copy the file, run one replacement, copy it back, make the next replacement, copy it back, make the next replacement, etc., etc. For 10 replacements you’re running 10 separate sed commands.

You’d be far better off running a single sed command. Change your loop to build a more complex sed command and run that once. For example:

set test_list to {"&", "amp;", "¢", "¢", "item1", "item2", "item3", "item4", "item5", "item6", "item7", "item8", "item9", "item10"}

set sed_commands to ""
repeat with i from 1 to (count test_list) by 2
	set sed_commands to sed_commands & "-e 's/" & item i of test_list & "/" & item (i + 1) of test_list & "/g' "
end repeat

do shell script "sed " & sed_commands & " /Users/kschalk/desktop/hold_file1 > /Users/kschalk/desktop/hold_file2"