Commas in Tab-Delimited Excel Files

I’ve had this problem forever with tab-delimited text files saved from Excel. When I bring the text into AS, some of the cell data has quotation marks around it. Other cell data doesn’t. I can’t find the pattern. Does anyone know what caused some fields to come with quotation marks and others without? (All text is without quotes in Excel. All cells are formatted as General.)

Normally, I just clean the text up but in this situation, I have lots of data with " marks to indicate inches, so I can’t just delete all the "s. Thanks.

Note: I do not have Excel on my system right now. This post is based on my previous experience with Excel generated CSV files.

If Excel generates tab-delimited data files similarly to how it generates CSV data files, it will put extra quotes in whenever one or more of the following occur in a value: a tab, a “newline”, leading or trailing white space (usually a space, tab, or “newline”), a double-quote mark, and possibly a comma. See the Specification section of the Wikipedia entry on CSV for some information about how values in CSV are handled.

If you need absolute generality, you should find or write a tab-delimited data parser to properly parse the file (when you “bring the text into AS”). If you are certain that none of the values have embedded tabs or “newlines” in them, you might be able to get away with a few string substitutions:

[code](* pseudocode *)
leading quote → “” (empty string, i.e. delete the quote)
delimiter & quote → delimiter
newline & quote → newline

trailing quote → “”
quote & delimiter → delimiter
quote & newline → newline

quote & quote → quote[/code]
Here is some code that demonstrates the idea:

-- EXAMPLE CODE --

set delimiter to tab -- or "," if wanting to try to handle CSV
set lineBreak to ASCII character 10 -- or return, or return & ASCII charater 10, depending on how the data was written

(* Setup some test data *)
set testData to "Name" & delimiter & "Color" & delimiter & "Size" & lineBreak & "Foo" & delimiter & "Red" & delimiter & "\"1\"\"\"" & lineBreak & "\"2\"\" Baz\"" & delimiter & "Grey" & delimiter & "\"2\"\"\"
Bar" & delimiter & "Blue" & delimiter & "\"1.5\"\"\""
set originalData to testData -- save a copy for later comparison

(* Remove the quotes *)
set testData to dequoteDelimitedText of testData at lineBreak between delimiter

(* View results e.g. in Script Editor's "Result" tab *)
set originalValues to breakupValues of originalData at lineBreak between delimiter
set testValues to breakupValues of testData at lineBreak between delimiter
{originalData:originalData, dequotedData:testData, valuesFromOriginalData:originalValues, valuesFromDequotedData:testValues}

-- THE HANDLERS --

to dequoteDelimitedText of someText at lineDelim between valueDelim
    (*
    Remove extra double-quotes from delimited data (e.g. Excel CSV or tab delimited exports).
    If the data's values have embedded delimiters, this probably will not do what one wants. In such a case, one really should be properly parsing the data.
    *)
    
    -- Get rid of opening double-quotes.
    if someText starts with quote then set someText to text 2 through -1 of someText
    repeat with valueStarter in {valueDelim, lineDelim}
        set valueStarter to contents of valueStarter
        set someText to replaceText of someText from valueStarter & quote to valueStarter
    end repeat
    
    -- Get rid of closing double-quotes.
    if someText ends with quote then set someText to text 1 through -2 of someText
    repeat with valueEnder in {valueDelim, lineDelim}
        set valueEnder to contents of valueEnder
        set someText to replaceText of someText from quote & valueEnder to valueEnder
    end repeat
    
    -- Get rid of doubled up double-quotes.
    set someText to replaceText of someText from quote & quote to quote
    
    return someText
end dequoteDelimitedText

to replaceText of t from s to r
    (*
    From: http://bbs.applescript.net/viewtopic.php?pid=41257#p41257
    Credits: kai, Nigel Garvey
    *)
    local d
    set d to text item delimiters
    try
        set text item delimiters to s
        set t to t's text items
        -- The text items will be of the same class (string/unicode text) as the original string.
        set text item delimiters to r
        -- Using the first text item (beginning) as the first part of the concatentation means we preserve the class of the original string in the edited string.
        tell t to set t to beginning & ({""} & rest)
        set text item delimiters to d
    on error m number n from o partial result r to t
        set text item delimiters to d
        error m number n from o partial result r to t
    end try
    t
end replaceText

to breakupValues of someText at lineDelim between valueDelim
    set theValues to {}
    breakupText of someText between lineDelim
    repeat with s in result
        set the end of theValues to breakupText of (contents of s) between valueDelim
    end repeat
    theValues
end breakupValues

to breakupText of someText between theDelimiter
    local otid
    set otid to text item delimiters
    try
        set text item delimiters to {theDelimiter}
        set someText to text items of someText
        set text item delimiters to otid
    on error m number n from o partial result r to t
        set text item delimiters to otid
        error m number n from o partial result r to t
    end try
    return someText
end breakupText

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari 419.3
Operating System: Mac OS X (10.4)