Newbie: Cleaning up some text

I have this horrid chunk of text

“30701034”,“DUMAS,ROBERT A”,“10610 S 48TH ST”,“PHOENIX AZ”,85044-1743,0730, ,20030825,89900,1177114,“WELLS FARGO HOME”,84405,“HARDT,RICHARD B” ,

and I need to extract just the address (10610 S 48th St) and the price (89900) and I need to do it for about 50+similarly formatted lines of text. Each line of text is separated by a carriage return and each line is filled with different ‘junk.’ Fortunately, the address is always preceded by the “,” string. The price, however, is a little stickier as the characters surround it are always the same, though are not unique to that particular location in the line of text. I assumed there was a way to tell applescript to look for the “,” string and delete everything before it AND preserve everything after it until the next " appeared, but have failed in my attempts. As for isolating the price, I simply have no idea how to do that.

If someone would be willing to point me in the right direction (since I love problem solving and would hate to make someone else do all the work for me) it would be greatly appreciated.

fs

p.s. the space in about the middle of the line (in that long string of commas) is in exactly the same place for every single line of text.

This is a relatively easy problem to overcome using text item delimiters. The only problem is when some of the items (such as the name) have commas within the value. still, this should work:

Jon


[This script was automatically tagged for color coded syntax by Script to Markup Code]

Do I need all of those 's in there to make the script work? If so, that could be a serious pain (since we’re talking about a lot of text here) …though now that I look more closely, I see they’re before a set of double quotes, which can be done in the script.

And now I’m off to see if I can get it to save the file when it’s done.

Thanks again.

fs

No, that line was just to show what the variable all_text would be if you had read the text from a file as in this line:

Jon


[This script was automatically tagged for color coded syntax by Script to Markup Code]

First off, thank you, Jon, so very much for all your help. I went to your profile and your homepage and you’re obviously very skilled at this. That you would take your time to help out so many people with such seemingly simple mistakes is a testament to your good nature.

And now down to business. I have tweaked your script a bit as well as combined scripts from Apple’s website to automate my task more, yet I’m now getting the error (as per my ‘Log Error’ subroutine) “Error: -1700. Can’t make file (alias “Adv3:Users:operator:Desktop:test.txt”) into a file.”

And here’s the Frankenstein Script.


property type_list : {}
property extension_list : {"txt"}

on open these_items
	try
		repeat with i from 1 to the count of these_items
			set this_item to item i of these_items
			set the item_info to info for this_item
			if (folder of the item_info is false) and ¬
				(alias of the item_info is false) and ¬
				((the file type of the item_info is in the type_list) or ¬
					the name extension of the item_info is in the extension_list) then
				process_item(this_item)
				set {all_addresses, all_prices} to this_story
				set this_file to (((path to desktop folder) as text) & "Text Processed")
				set this_data to ((current date) as string) & space & "STATUS OK" & return
				set this_file to (((path to desktop folder) as text) & "MY LOG FILE")
				my write_to_file(this_story, this_file, false)
			end if
		end repeat
	on error error_message number error_number
		set this_error to "Error: " & error_number & ". " & ¬
			error_message & return
		set the log_file to ((path to desktop) as text) & "Script Error Log"
		my write_to_file(this_error, log_file, true)
	end try
end open

on process_item(this_item)
	set all_text to read file this_item
	set all_addresses to {}
	set all_prices to {}
	repeat with this_paragraph in (paragraphs of all_text)
		set old_atid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to {","}
		set the_values to text items of (text of this_paragraph)
		set AppleScript's text item delimiters to old_atid
		set end of all_addresses to (text 2 thru -2 of (item 6 of the_values)) -- strip the quotes 
		set end of all_prices to item 24 of the_values as number
	end repeat
end process_item

on write_to_file(this_data, target_file, append_data)
	try
		set the target_file to the target_file as text
		set the open_target_file to ¬
			open for access file target_file with write permission
		if append_data is false then ¬
			set eof of the open_target_file to 0
		write this_data to the open_target_file starting at eof
		close access the open_target_file
		return true
	on error
		try
			close access file target_file
		end try
		return false
	end try
end write_to_file

Thank you, again, for your help.

fs

I already discovered that the lines

            set this_data to ((current date) as string) & space & "STATUS OK" & return 
            set this_file to (((path to desktop folder) as text) & "MY LOG FILE") 

are supurfluous. …and now I’ll try and hide my embarassment at having included them.

fs

I’m guessing that ‘set all_text to read file this_item’ (process_item handler) is causing the problem. What happens if you change that line to:

set all_text to read this_item

– Rob

I’ve got it mostly working, but now it’s giving me some trumped up “Variable all_addresses is not defined.”

Thoughts?

fs

There is probably an issue with the scope of the ‘all_addresses’ variable. You might need to either make it global or pass it from one handler to the other.

– Rob

I suspect from gleaning a little more about what you want to do that you don’t actually want a list of values but a string of text including those values. This code will probably be more along the lines of what you are looking for. You can save it as an application and drag the source file on the application icon to kick it off:

Jon


[This script was automatically tagged for color coded syntax by Script to Markup Code]

…well, damn your wonderful brilliance! I was in the midst of posting a reply of my current progress and successes (and failures) when I noticed Jon had beat me to the punch.

I tried your script and it did exactly what I wanted it to do, but I suspect it still has one unsolved problem (that is, until I try to move this information into Excel. Then I expect a whole slew of new problems. Hopefully, though, I can beat that leg on my own.)

The remaining unsolved problem comes from the fact that I forgot to include Zip codes in the script. Actually, the problem really comes from the fact that that space at about the 80th character interrupts the Zip from time to time, and when it does, the script stops. I’ve tried to eliminate the space and I’m sure there’s a way to just say “Hey! Delete Character 80, damn you!” but, alak, I’ve met with much failure in my attempt to accomplish that.

And now I’ve begun to feel guilty at having other people do the hard part for me, so if you could just shove me in the right direction to solving that last problem, I’d greatly appreciate it (even more!)

Again, thank you for all of your wonderful help.

fs

…nevermind.

I tweaked your code to get the zip and all works beautifully.

Thank you again!

fs

Hell. That space at about the 80th character is going to be a problem as the ‘try’ command (as I thought it might) simply passes by anything where that space interferres, and while it’s few and far between, about four records were left out because the space occurred right in the middle of a zip code.

If you wouldn’t mind nudging me in the right direction to solving this, I’d be very appreciative (moreso).

fs

Since this script is looking at the text divided by commas, I don’t see why the space is causing you problems. A more likely explanation is that some other text in the record has a comma or two (such as the name field being “Smith, John, Jr.” or something like that). That’s why the text strings are quoted in the first place, to allow the commas to be separators and also to be included in the content. It will be more involved (but doable) to test to see if the first character of the value is a quote mark, if so, see if the last character is, if not, combine this value with the next, etc.

Jon