Delimiter Issue

I am try to extract data from a file that is tab delimited.

The first entry in each line is in the format:

123-1234567-1234567

When using the code below the data extracted is “123” and NOT “123-1234567-1234567” as I would have expected. Is anyone able to tell me why this is? I have searched online extensively and can’t figure it out on my own.

Thanks

tell application “Finder”
set fileRefr to (open for access filePath)
set theText to (read fileRefr)
set textList to paragraphs of theText
close access fileRefr
set theCNT to (count of textList) - 1

set uniqueList to {}
set AppleScript's text item delimiters to " "
repeat with i from 2 to theCNT
	set thisParagraph to first word of item i of textList
	set end of uniqueList to thisParagraph
end repeat
set AppleScript's text item delimiters to ""

end tell

Hi,

an hyphen is a word delimiter. The code

set thisParagraph to first word of item i of textList

returns 123 of your example text.

The most reliable way is to set text item delimiters to hyphen


set theText to (read fileRefr)
set textList to paragraphs of theText
set theCNT to (count of textList) - 1

set uniqueList to {}
set TID to text item delimiters
set text item delimiters to "-"
repeat with i from 2 to theCNT
	set end of uniqueList to text item 1 of item i of textList
end repeat
set text item delimiters to TID


the Finder is not needed at all

Hi Stefan.

Thanks for the reply.

I did wonder if special characters may be the issue.

I had circumvented the problem by asking it to grab to first 3 words. The real issue I am struggling with is that I desperately need to extract (what I would consider to be) the 10th word on the list (assuming that everything between tabs is considered a word). But the preceding words are littered with symbols such as “+” “-” “:” etc and each line is different. Is there anyway to ask Applescript to ONLY read tabs as delimiters? I am totally stumped here.

This is an example of a line of my data and it is the number “1” I am trying to locate. There is more data after the “1” which I have left out.

123-1234567-1234567 12345677654321 2012-08-20T10:18:13+00:00 2012-08-20T10:18:13+00:00 r2r9zc3t7f9s3kv@email.co.uk John Jenkins 0123 456 7898 6E-ME9H-V7T8 Random Product (Info) 1

then set the text item delimiter to tab and count the 10th text item

In addition to Stefan’s (good) post:

Word delimiters in AppleScript depends on the language settings in system preferences. To be sure that your script works in every language and machines with custom word delimiters it’s wise to replace them for spaces or use a regular expression. If it’s only for your own machine you don’t have to worry about this of course.

In your case the shell command AWK works also perfectly.


every paragraph of (do shell script "cat " & quoted form of posix path of filePath & " | awk -F'"&tab &"' '{print $1}'")

Or is you want to skip the first line:


set uniqueList to every paragraph of (do shell script "cat " & quoted form of POSIX path of filePath & " | awk -F'" & tab & "' 'BEGIN{getline}{print $1}'")

EDIT: to print the 10th field: change $1 to $10

I tried your code Stefan but it was still just pulling the first 3 numbers? Perhaps I have screwed something up in my previous code… I have a lot to learn and not much time!

Bazzie, your shellscript worked like a charm!

Thank you both for helping me out!!

Bazzie.

I’ve run in to a little problem at the end of my script. I modified your code (below) as I need to compile uniqueList from more than one document.

When it has run uniqueList has “{ }” around each batch of data. Is there an easy way to remove the "{ }"s??

set dataList to every paragraph of (do shell script "cat " & quoted form of POSIX path of filePath & " | awk -F'" & tab & "' 'BEGIN{getline}{print $1}'")
set end of uniqueList to dataList

This is what uniqueList ends up looking like…

{{“1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “2”, “1”, “1”, “1”, “1”, “1”}, {“1”, “1”, “1”, “1”, “1”, “1”, “1”}, {“1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”}, {“1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”}, {“1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”}, {“1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”, “1”}}

It seems that you set a new list at the end of the list as one entry instead of adding all items. When you want every item added as a single one you should use

 set uniqueList to uniqueList &  dataList --instead of:set end of uniqueList to dataList

Legend! Thanks again