How can I modify text in a column (TextEdit)?

Hello everyone,

I have a text file of approximately 3500 rows. The first row is the column names. The following 3000 or so rows of the first column start with 2/26/2002 X:00 (where X ranges from 1 to 24) and the last 500 have only 2/27/2002 (and no X:00s).

I want to erase 2/26/2002s from the first column but retain the X:00s. From the rest, I want to erase 2/27/2002s. How can I do this? I just started learning AppleScript but I think my knowledge is still not enough to do this. I will appreciate any help.

Thank you in advance.

Ozgur

Just wanted to detail my problem and show you my temporary solution. Then I will have some follow-up questions:

I have a text file of 3324 rows and 5 columns. The first row is the column headings. Rows from 2 to 3193 look like
2/26/2002 1:00:00 AM,WEST2002,304,-4.98,29
2/26/2002 1:00:00 AM,WEST2002,304,-4.99,84
2/26/2002 3:00:00 AM,WEST2002,304,-4.99,419
2/26/2002 4:00:00 AM,HOUSTON2002,292,9,0

etc, and from 3194 to the end:

2/27/02 HOUSTON2002,292,22,0
2/27/02 HOUSTON2002,292,19,200

What I want to do is to get rid of 2/26/2002 s.

I’m a newbie, and this is what I could do with my very limited knowledge:

set theFile to (choose file)
set i to 2
tell application "TextEdit"
	open theFile
	repeat 3192 times
		repeat 10 times --2/26/2002 and a space is a total of 10 characters
			delete the first character of paragraph i of the front document
		end repeat
		set i to i + 1
	end repeat
	save the front document
	close the front document
end tell

The immediate problems, however, are:

  1. Before the task is finished, the script stops working; i.e, stops deleting at row, say 2500…
  2. It is painfully slow (on a 2.4 GHz, 2GB MBP): It takes approximately 0.8 seconds per line (so the whole thing would take 42.5 minutes if the task was completed)
  3. To get the number 3192, I first have to import the file to excel and find the last row which begins with 2/26/2002.

So, now my questions are:

  1. Why does the script stop? There are no jumps whatsoever in the text file.
  2. Is there a way to do this faster?
  3. How can I get the total number of rows that I want automatically. I need to do it since I will repeat this task for multiple files of this sort, where the total number of rows with the property I want (or don’t want) is different in each file.

I tried things like “get the count of paragraphs” or tried to assign to a variable “count of paragraphs”/“total number of paragraphs” etc. but they didn’t work.

I will appreciate if you have any idea and if you share it.

Thank you…

Ozgur

Model: MBP, 2.4 GHz
AppleScript: 2.2.1
Browser: Firefox 3.0.10
Operating System: Mac OS X (10.5)

Hi, Ozgur. Welcome to these fora.

TextEdit’s scripting implementation isn’t brilliant for large-scale editing as it makes heavy going updating the document in which the text appears.

If your text file’s just that ” a file of ASCII text ” it would be easier and faster to edit it directly with vanilla AppleScript rather than involving TextEdit. This should also work for rich text provided there are no style changes within the text you want to cut. If there are, we need to think again.:

set textFile to (choose file)

-- Read the text file, assuming the data are ASCII text.
set theText to (read textFile as string)

-- Store the current text item delimiter value.
set astid to AppleScript's text item delimiters
-- Replace all "2/26/2002 "s with empty texts (ie. nothing).
set AppleScript's text item delimiters to "2/26/2002 "
set textItems to theText's text items
set AppleScript's text item delimiters to ""
set newText to textItems as text
-- Replace all "2/27/2002s" ditto.
set AppleScript's text item delimiters to "2/27/2002"
set textItems to newText's text items
set AppleScript's text item delimiters to ""
set newText to textItems as text

-- Work out the path for another file to contain the edited text.
set oldPath to textFile as Unicode text
set AppleScript's text item delimiters to ":"
if (text item -1 of oldPath contains ".") then
	set AppleScript's text item delimiters to "."
	set newPath to text 1 thru text item -2 of oldPath & " (edited)." & text item -1 of oldPath
else
	set newPath to oldPath & " (edited)"
end if
-- Restore the old delimiter value.
set AppleScript's text item delimiters to astid

-- Create the new file and write the edited text to it.
set fRef to (open for access file newPath with write permission)
try
	set eof fRef to 0
	write newText as string to fRef
end try
close access fRef

A slightly abbreviated form of the editing process would be:

-- Store the current text item delimiter value.
set astid to AppleScript's text item delimiters
-- Replace all "2/26/2002 "s with "2/27/2002s", then all "2/27/2002s" with empty texts.
set AppleScript's text item delimiters to "2/26/2002 "
set textItems to theText's text items
set AppleScript's text item delimiters to "2/27/2002"
set newText to text items of (textItems as text)
set AppleScript's text item delimiters to ""
set newText to textItems as text

Thanks a lot Nigel, this worked perfectly! (40 minutes vs 4 seconds!) I just added a few modifications to make the column numbers even. (Namely, after deleting "2/27/2002"s, I added "unknown"s).

I have a couple of questions though; if you don’t mind:

I went through each line one by one, testing what it returns with “return” and understood everything but this block:

try
	set eof fRef to 0
	write newText as string to fRef
end try
close access fRef

Here, what does a try - end try block do?

What does set eof fRef to 0 do?

Also, in addition to what had been done, would it be possible to add the number of columns? That is, if this were my file:

HOURENDING,ZONE,PARTYID,BIDPRICE,BALMW
2/26/2002 1:00:00 AM,HOUSTON2002,292,9,0
2/26/2002 18:00:00 PM,HOUSTON2002,292,8.5,50
2/26/2002 18:00:00 PM,HOUSTON2002,292,8,100
2/27/2002,WEST2002,304,8.1,88
2/27/2002,WEST2002,304,1.11,89
2/27/2002,WEST2002,304,1.1,176

Would I be able to get:

0,HOURENDING,ZONE,PARTYID,BIDPRICE,BALMW
1,1:00,HOUSTON2002,292,9,0
2,1:00,HOUSTON2002,292,8.5,50
3,18:00,HOUSTON2002,292,8,100
4,unknown,WEST2002,304,8.1,88
5,unknown,WEST2002,304,1.11,89
6,unknown,WEST2002,304,1.1,176

(Don’t tell how to delete AMs and PMs and the redundant :00s - I think now I have enough knowledge to do it myself!)

Thank you again…

Ozgur

Oh, and, the jargon is new to me as well: What does “vanilla AppleScript” mean?

Not at all. :slight_smile:

It catches any errors that might occur inside it so that the script doesn’t immediately stop. It’s possible to include an ‘on error’ section with code to be executed if an error occurs:

try
	set fire to trousers
on error errMsg number errNum
	display dialog "Error number " & errNum & " occurred:" & return & errMsg buttons {"Blunder on", "Quit gracefully"} default button 2 with icon caution
	if (button returned of result is "Quit gracefully") then error number -128 -- Stop the script.
end try
beep 2

But in the example you quoted, the idea is simply to make sure that, in the unlikely event of anything going wrong while the access to the file is open, the script will keep going long enough to close it again.

It’s another precautionary formality. It sets the file’s length to zero to ensure it’s empty before the new data are written. If the file existed already and contained more data than was being written to it now, the old data wouldn’t be completely overwritten; so if you want the file to contain only the new data, it’s best to ditch the old first.

Yes. In this case, it would probably be easiest to do it after you’ve done the other edits:

-- newText complete except for row numbers; original text item delimiter value not yet restored.

set rows to newText's paragraphs
repeat with i from 0 to (count rows)
	set item i of rows to (i as text) & "," & item i of rows
end repeat
set AppleScript's text item delimiters to return
set newText to rows as text

-- Write newText to the file.

“Using only the core AppleScript language and (optionally) any scripting additions (OSAXen) that are part of the standard Mac OS installation.” ie. Not involving scriptable applications, third-party scripting additions, or shell scripts. I don’t think it’s an official term.

The File Read/Write commands used in the scipt are from the StandardAdditions OSAX; the rest are from the core language.

Thanks again Nigel… The only thing is, I got an error (*) which, I think, is because AppleScript didn’t like “item 0”. So the latest version is

... 
set newText to textItems as text

--Count the number of rows, add a new column with those numbers
set rows to newText's paragraphs
repeat with i from 1 to ((count rows) - 1)
	set item i of rows to ((i - 1) as text) & "," & item i of rows
end repeat
set AppleScript's text item delimiters to return
set newText to rows as text

-- Work out the path for another file to contain the edited text.
set oldPath to textFile as Unicode text
set AppleScript's text item delimiters to ":"
...

which works.

By the way, going back to your first reply, how does the script know that the file it needs to open and write newText is a .txt file? In the if statement, I understand that you assign the new file <old_file’s_name (edited).txt> in the first block and <old_file’s_name (edited)> in the second. Is that how?

Cheers.

(*) When I ran your code first with the original 3500-row file, I got a huge error message. It was something like “couldn’t get item 0 of …” and all those 3500 lines were written. I couldn’t scroll down, couldn’t press (OK-Cancel) because they were not visible, minimizing the window etc. didn’t work, hence I had to force quit AppleScript. I guess this is a bug Apple needs to work on.

I should have seen that! Many apologies. Your fix is good, except that you may not want the ‘- 1’ in the ‘repeat’ line:

repeat with i from 1 to (count rows) 
	set item i of rows to ((i - 1) as text) & "," & item i of rows 
end repeat

It depends on whether or not you have a blank row at the end that you don’t want to number.


Yes. If the file name at the end of ‘oldPath’ contains a dot, a new path is constructed consisting of everything in the path before the last dot, " (edited)" and a dot, and whatever comes after the dot in the original. (The original name extension.) If the original file name doesn’t contain a dot, the new path is formed by simply appending " (edited)" to the end of the old path. The new path refers to another file in the same folder as the original. The ‘open for access’ command creates this file if it doesn’t already exist, or reuses it if it does.

Great, thanks Nigel, this is resolved as well.

I have two more questions at the very end, and I didn’t hesitate posting them here, because at least the first one, I think, is instructive for the new scripters.

I was trying to get rid of the last piece of redundant information in the file; namely the additional 00s, AMs and PMs of the time information. I used your idea of playing with the text item delimiters. There is a bug though, but I cannot find its cause.

Let me first paste here a subset of the text file in case anyone wants to play with it:

HOURENDING,ZONE,PARTYID,BIDPRICE,BALMW
2/26/2002 1:00:00 AM,HOUSTON2002,292,9,0
2/26/2002 3:00:00 AM,HOUSTON2002,292,21,0
2/26/2002 4:00:00 AM,NORTH2002,284,11.99,168
2/26/2002 5:00:00 AM,NORTH2002,306,21,20
2/26/2002 10:00:00 AM,WEST2002,304,-23.99,364
2/26/2002 11:00:00 AM,HOUSTON2002,292,19,0
2/26/2002 12:00:00 PM,HOUSTON2002,292,22.5,0
2/26/2002 1:00:00 PM,NORTH2002,292,12,0
2/26/2002 2:00:00 PM,WEST2002,304,-23.99,395
2/26/2002 3:00:00 PM,HOUSTON2002,292,22,0
2/26/2002 4:00:00 PM,WEST2002,304,-23.99,510
2/26/2002 5:00:00 PM,HOUSTON2002,292,20,0
2/26/2002 6:00:00 PM,WEST2002,304,-23.99,230
2/26/2002 7:00:00 PM,HOUSTON2002,292,22,0
2/26/2002 8:00:00 PM,WEST2002,304,-23.99,361
2/26/2002 9:00:00 PM,HOUSTON2002,292,20,0
2/26/2002 10:00:00 PM,WEST2002,304,-23.99,328
2/26/2002 11:00:00 PM,HOUSTON2002,292,22,0
2/27/2002,WEST2002,304,8.1,88
2/27/2002,WEST2002,304,1.11,89
2/27/2002,WEST2002,304,1.1,176

What we had done so far was to get rid of the dates, for the rows starting with 2/27/2002 add a column, rows of which are filled with the string “unknown”, and then count and write the number of each row. What I am now trying to do is to convert 1:00:00 AM to 1:00, 11:00:00 PM to 23:00 and so on.

I first tried to use two separate loops for AMs and PMs but they didn’t work (I did not get an error message but I the file wasn’t modified at all either):

...
repeat with i from 1 to 11 --Up to 11, because there is no 12:00:00 AM in the data
	set AppleScript's text item delimiters to "i:00:00 AM"
	set textItems to newText's text items
	set AppleScript's text item delimiters to "i:00"
	set newText to textItems as text
end repeat

repeat with i from 1 to 11
	set AppleScript's text item delimiters to "i:00:00 PM"
	set textItems to newText's text items
	set AppleScript's text item delimiters to "(i+12):00"
	set newText to textItems as text
end repeat
...

(12:00:00 PM to be dealt with separately). Obviously, AppleScript doesn’t update the value of i when i is in quotation marks. Then I used brute force; that is, wrote a 4-line code for each hour separately:

...
set AppleScript's text item delimiters to "1:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "1:00"
set newText to textItems as text

--the other hours, in the natural order, go here

set AppleScript's text item delimiters to "11:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "23:00"
set newText to textItems as text

And the bug is this: It replaces every hour up to 11:00:00 PM correctly, but replaces 11:00:00 PM with 113:00 !!! However, if I place the 4 lines for 11:00:00 PM anywhere before the 4 lines for 1:00:00 AM, the code works just fine.

So my question are:

  1. Is there a way to avoid brute force and handle this task in a loop?

  2. When the lines for 11:00:00 PM come after those for 1:00:00 AM, why are 11:00:00 PMs replaced with 113:00 and not with 23:00?

(Here is the not-so-efficient code, which works:)

set textFile to (choose file)

-- Read the text file, assuming the data are ASCII text.
set theText to (read textFile as string)

-- Store the current text item delimiter value.
set astid to AppleScript's text item delimiters
-- Replace all "2/26/2002 "s with empty texts (ie. nothing).
set AppleScript's text item delimiters to "2/26/2002 "
set textItems to theText's text items
set AppleScript's text item delimiters to ""
set newText to textItems as text
-- Replace all "2/27/2002s" ditto.
set AppleScript's text item delimiters to "2/27/2002"
set textItems to newText's text items
set AppleScript's text item delimiters to "unknown" -- we need this to make the column numbers even, since there is no "hour" information for the rows starting with 2/27/2002
set newText to textItems as text

--Update the time information:

set AppleScript's text item delimiters to "11:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "23:00"
set newText to textItems as text

set AppleScript's text item delimiters to "1:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "1:00"
set newText to textItems as text

set AppleScript's text item delimiters to "2:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "2:00"
set newText to textItems as text

set AppleScript's text item delimiters to "3:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "3:00"
set newText to textItems as text

set AppleScript's text item delimiters to "4:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "4:00"
set newText to textItems as text

set AppleScript's text item delimiters to "5:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "5:00"
set newText to textItems as text

set AppleScript's text item delimiters to "6:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "6:00"
set newText to textItems as text

set AppleScript's text item delimiters to "7:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "7:00"
set newText to textItems as text

set AppleScript's text item delimiters to "8:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "8:00"
set newText to textItems as text

set AppleScript's text item delimiters to "9:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "9:00"
set newText to textItems as text

set AppleScript's text item delimiters to "10:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "10:00"
set newText to textItems as text

set AppleScript's text item delimiters to "11:00:00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to "11:00"
set newText to textItems as text

set AppleScript's text item delimiters to "12:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "12:00"
set newText to textItems as text

set AppleScript's text item delimiters to "1:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "13:00"
set newText to textItems as text

set AppleScript's text item delimiters to "2:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "14:00"
set newText to textItems as text

set AppleScript's text item delimiters to "3:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "15:00"
set newText to textItems as text

set AppleScript's text item delimiters to "4:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "16:00"
set newText to textItems as text

set AppleScript's text item delimiters to "5:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "17:00"
set newText to textItems as text

set AppleScript's text item delimiters to "6:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "18:00"
set newText to textItems as text

set AppleScript's text item delimiters to "7:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "19:00"
set newText to textItems as text

set AppleScript's text item delimiters to "8:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "20:00"
set newText to textItems as text

set AppleScript's text item delimiters to "9:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "21:00"
set newText to textItems as text

set AppleScript's text item delimiters to "10:00:00 PM"
set textItems to newText's text items
set AppleScript's text item delimiters to "22:00"
set newText to textItems as text


--Count the number of rows, add a new column with those numbers
set rows to newText's paragraphs
repeat with i from 1 to ((count rows) - 1)
	set item i of rows to ((i - 1) as text) & "," & item i of rows
end repeat
set AppleScript's text item delimiters to return
set newText to rows as text

-- Work out the path for another file to contain the edited text.
set oldPath to textFile as Unicode text
set AppleScript's text item delimiters to ":"
if (text item -1 of oldPath contains ".") then
	set AppleScript's text item delimiters to "."
	set newPath to text 1 thru text item -2 of oldPath & " (edited)." & text item -1 of oldPath
else
	set newPath to oldPath & " (edited)"
end if
-- Restore the old delimiter value.
set AppleScript's text item delimiters to astid

-- Create the new file and write the edited text to it.
set fRef to (open for access file newPath with write permission)
try
	set eof fRef to 0
	write newText as string to fRef
end try
close access fRef

It’s late here and that’s a new problem. I’ll think about it tomorrow.

The problem actually occurs because “11:00:00 PM” is handled after “1:00:00 PM”, which is a substring of it. All instances of “1:00:00 PM” in the text are changed to “13:00”, so “11:00:00 PM” becomes “113:00”. When the script gets round to handling “11:00:00 PM”, it no longer exists in the text because all instances of it have already been changed to something else.

Either of these, both of which assume there’s no “12:00:00 AM” in the text:

repeat with h in {{":00 AM", ""}, {"12:00:00 PM", "12:00"}, {"11:00:00 PM", "23:00"}, {"10:00:00 PM", "22:00"}, {"9:00:00 PM", "21:00"}, {"8:00:00 PM", "20:00"}, {"7:00:00 PM", "19:00"}, {"6:00:00 PM", "18:00"}, {"5:00:00 PM", "17:00"}, {"4:00:00 PM", "16:00"}, {"3:00:00 PM", "15:00"}, {"2:00:00 PM", "14:00"}, {"1:00:00 PM", "13:00"}}
	set AppleScript's text item delimiters to beginning of h
	set textItems to newText's text items
	set AppleScript's text item delimiters to end of h
	set newText to textItems as text
end repeat
set AppleScript's text item delimiters to ":00 AM"
set textItems to newText's text items
set AppleScript's text item delimiters to ""
set newText to textItems as text
repeat with i from 12 to 1 by -1
	set AppleScript's text item delimiters to (i as text) & ":00:00 PM"
	set textItems to newText's text items
	set AppleScript's text item delimiters to ((i mod 12 + 12) as text) & ":00"
	set newText to textItems as text
end repeat

Nigel,

Many thanks…