I’ve had this problem forever with tab-delimited text files saved from Excel. When I bring the text into AS, some of the cell data has quotation marks around it. Other cell data doesn’t. I can’t find the pattern. Does anyone know what caused some fields to come with quotation marks and others without? (All text is without quotes in Excel. All cells are formatted as General.)
Normally, I just clean the text up but in this situation, I have lots of data with " marks to indicate inches, so I can’t just delete all the "s. Thanks.
Note: I do not have Excel on my system right now. This post is based on my previous experience with Excel generated CSV files.
If Excel generates tab-delimited data files similarly to how it generates CSV data files, it will put extra quotes in whenever one or more of the following occur in a value: a tab, a “newline”, leading or trailing white space (usually a space, tab, or “newline”), a double-quote mark, and possibly a comma. See the Specification section of the Wikipedia entry on CSV for some information about how values in CSV are handled.
If you need absolute generality, you should find or write a tab-delimited data parser to properly parse the file (when you “bring the text into AS”). If you are certain that none of the values have embedded tabs or “newlines” in them, you might be able to get away with a few string substitutions:
[code](* pseudocode *)
leading quote → “” (empty string, i.e. delete the quote)
delimiter & quote → delimiter
newline & quote → newline
quote & quote → quote[/code]
Here is some code that demonstrates the idea:
-- EXAMPLE CODE --
set delimiter to tab -- or "," if wanting to try to handle CSV
set lineBreak to ASCII character 10 -- or return, or return & ASCII charater 10, depending on how the data was written
(* Setup some test data *)
set testData to "Name" & delimiter & "Color" & delimiter & "Size" & lineBreak & "Foo" & delimiter & "Red" & delimiter & "\"1\"\"\"" & lineBreak & "\"2\"\" Baz\"" & delimiter & "Grey" & delimiter & "\"2\"\"\"
Bar" & delimiter & "Blue" & delimiter & "\"1.5\"\"\""
set originalData to testData -- save a copy for later comparison
(* Remove the quotes *)
set testData to dequoteDelimitedText of testData at lineBreak between delimiter
(* View results e.g. in Script Editor's "Result" tab *)
set originalValues to breakupValues of originalData at lineBreak between delimiter
set testValues to breakupValues of testData at lineBreak between delimiter
{originalData:originalData, dequotedData:testData, valuesFromOriginalData:originalValues, valuesFromDequotedData:testValues}
-- THE HANDLERS --
to dequoteDelimitedText of someText at lineDelim between valueDelim
(*
Remove extra double-quotes from delimited data (e.g. Excel CSV or tab delimited exports).
If the data's values have embedded delimiters, this probably will not do what one wants. In such a case, one really should be properly parsing the data.
*)
-- Get rid of opening double-quotes.
if someText starts with quote then set someText to text 2 through -1 of someText
repeat with valueStarter in {valueDelim, lineDelim}
set valueStarter to contents of valueStarter
set someText to replaceText of someText from valueStarter & quote to valueStarter
end repeat
-- Get rid of closing double-quotes.
if someText ends with quote then set someText to text 1 through -2 of someText
repeat with valueEnder in {valueDelim, lineDelim}
set valueEnder to contents of valueEnder
set someText to replaceText of someText from quote & valueEnder to valueEnder
end repeat
-- Get rid of doubled up double-quotes.
set someText to replaceText of someText from quote & quote to quote
return someText
end dequoteDelimitedText
to replaceText of t from s to r
(*
From: http://bbs.applescript.net/viewtopic.php?pid=41257#p41257
Credits: kai, Nigel Garvey
*)
local d
set d to text item delimiters
try
set text item delimiters to s
set t to t's text items
-- The text items will be of the same class (string/unicode text) as the original string.
set text item delimiters to r
-- Using the first text item (beginning) as the first part of the concatentation means we preserve the class of the original string in the edited string.
tell t to set t to beginning & ({""} & rest)
set text item delimiters to d
on error m number n from o partial result r to t
set text item delimiters to d
error m number n from o partial result r to t
end try
t
end replaceText
to breakupValues of someText at lineDelim between valueDelim
set theValues to {}
breakupText of someText between lineDelim
repeat with s in result
set the end of theValues to breakupText of (contents of s) between valueDelim
end repeat
theValues
end breakupValues
to breakupText of someText between theDelimiter
local otid
set otid to text item delimiters
try
set text item delimiters to {theDelimiter}
set someText to text items of someText
set text item delimiters to otid
on error m number n from o partial result r to t
set text item delimiters to otid
error m number n from o partial result r to t
end try
return someText
end breakupText
Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari 419.3
Operating System: Mac OS X (10.4)