Regular Expression Pattern

I need some help with a regex pattern. It’s being used in a shortcut, but it could be used in an ASObjC.

The goal is to convert a CSV to a JSON. Fields in the CSV may contain commas, and any such fields are contained within double quotes (see test shortcut below). The following does what I want, but I’d like to get rid of the action that eliminates consecutive double quotes. More importantly, as an enhancement, I’d like the regex pattern to skip double quotes in a CSV field. Thanks for any help.

My testing shortcut:

Regex Test Shortcut.shortcut (22.2 KB)

I’d like the regex pattern to work with the following CSV record:

Honda Civic,“ac, abs, “sharp””,“$30,000”

FWIW, the pattern in the testing shortcut will be used in the following shortcut:

Test CSV to JSON.shortcut (24.1 KB)

Honda Civic,“ac, abs, “sharp””,“$30,000”

You can use a number condition:

(\”){2}

and replace that with single quote.
Replace

Result
Honda Civic,“ac, abs, “sharp”,“$30,000”

You can also potentially add the condition of a following comma

(\”){2}\,

Replace
”,

Result
Honda Civic,“ac, abs, “sharp”,“$30,000”

Please consider that according to the CSV specification

If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote.

The record

Honda Civic,“ac, abs, “sharp””,“$30,000”

is not valid because "ac, abs, " is treated as a field and a comma is expected right before sharp. It’s supposed to be

Honda Civic,"ac, abs, ""sharp""","$30,000"

1 Like

Thanks Stefan. I looked at the CSV standard, and I’ll use that with two exceptions. I’ll use LF instead of CRLF and I’ll not allow a LF in CSV fields.

If I understand correctly, double quotes in a JSON string are escaped with a backslash. So, the following appears to do what I want, but in the back of my mind I get the feeling that I’m missing something. Anyways, in the linked shortcut, I’ve included a proof of sorts.

Regex Test.shortcut (22.5 KB)

Technomorph. Thanks for the response. I don’t completely understand your proposal–could you give a working example.

Also what I tend to do is do a TWO pass on these types of problems:

1st find your “Split Points” split the string into your sections.
then clean any extra stuff per each string (the extra quotes etc)

trying to solve them all in one go can get tricky.
This also helps to refocus your RegEx

Split on:
," | ", | “,”
Split on:
[a-zA-Z], only if both
presiding condition (can trace all the way back to a comma with out finding a quote)
following condition (can trace all the way forward to a common with out finding a quote)

Internal double quote escaping:

Honda Civic,“ac, abs, “sharp””,“$30,000”

\“(\w+)\”

Replace

\\\\”$1\\\\”

Result

 Honda Civic,“ac, abs, \”sharp\””,“$30,000”

technomorph. Thanks for the help.

I found a solution that appears to work.

Regular Expression Test.shortcut (22.1 KB)