Parsing JSON files

Touché :slight_smile:

P.S. My run script…do shell script… solution takes only 30-40, not >100, times as long as your AppleScriptObjC solution here!! :lol:

I use jq to parse weather information in json format from Weather Underground.

https://stedolan.github.io/jq/

For example:

set MoonPhase to do shell script "cat " & DirectoryPad & "/AstroDatabase.json | /usr/local/bin/jq -r .moon_phase.phaseofMoon"

When using JavaScript syntax you can write an entire program in a single line. I think the biggest thrill is using less as possible resources and the least instructions.

[format]function run(argv){
// get current application
var app = Application.currentApplication();
// load specific osax
var StandardAddition = (app.includeStandardAdditions = true, app);
// define string to display
var str = “Hello World!”;
// show the dialog
StandardAddition.displayDialog(str);
};[/format]

Is the same as:

[format]function run(argv){var app = Application.currentApplication();var StandardAddition = (app.includeStandardAdditions = true, app);var str = “Hello World!”;StandardAddition.displayDialog(str);};[/format]

Of course quite irrelevant in most situations. JSON data is downloaded with the required time and only once. It is changed, and evaluated and the the processing starts.

When you have an output from a shell command for example.

In my opinion it’s using less as possible resources and the most efficient code (regardless of the number of instructions or the number of lines).

What really got my juices flowing in this particular problem is the remarkable similarity between JSON and Applescript value specifications. It seemed almost irresistible simply to do a sed…tr… transform on a well-formed JSON string and run script it into an Applescript object.

Besides the hit in execution speed (which may or may not be relevant in a given usage scenario), the key term is well-formed. If not, the run script…do shell script… approach might well fail.

That is the least instructions :slight_smile: Of course with AS you don’t know what your instructions are because they’re bundled in something we know as commands.

They used to be (and I think still are) technically different in AppleScript, but it probably won’t affect the usability of the resulting lists.

It could, of course, all be done in the sed code: :slight_smile:

set applescriptValue to run script (do shell script ("echo " & jsonString's quoted form & " | sed -En 's/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; H; $ {g; s/[[:cntrl:]]+//g; p;}'"))

Shane Stanley wrote:

Then I better have a look at my weather program again, I use the “do shell script” with the JQ parser quite a lot.

I almost forgot that before the JQ parser I used something else to parse in AppleScript JSON information. This is the program JSON Helper, with this program you can parse JSON into regular AppleScript lists and records.
http://www.mousedown.net/mouseware/JSONHelper.html

I switched to the jq parser, because for me as a AppleScript beginner this was much easier. I changed some of my jq code to get the same information with JSON Helper.


tell application "JSON Helper"
	set AstroDataBase to fetch JSON from "https://api.wunderground.com/api/My_Key/astronomy/q/NL/Amsterdam.json"
end tell

set AgeOfMoon to ((AgeOfMoon of moon_phase) in AstroDataBase)


The [ url ] and [ /url ] are not part of my code.

This of course is also not what treed needs, but maybe somebody else finds it useful.

I think you’re missing my point. I don’t see how the source of the output makes any difference to the argument that pre-existing libraries are to be preferred over rolling your own code for things like JSON parsing.

It may affect your choice of library – you might prefer to pipe it to python, for example – but that’s a different issue.

Nice sed construct!

A little off topic, but my sense is that sed has one of the higher ratios of [hidden power]/[real-life usage] among tools in a coder’s toolbox. Fully tapping into that power, though, takes a great deal of study and experience.

I must be jaded. I get the biggest thrill from using code I know someone better than me has already debugged.

Hello, two questions:

– is JSON able to encode Unicode characters ?
– is SED able to apply to Unicode characters ?

Yvan KOENIG running Sierra 10.12.6 in French (VALLAURIS, France) lundi 11 septembre 2017 16:58:13

Hi Yvan.

It seems so. (See the script below.)

sed on Mac OS can handle Unicode as sequences of bytes, but doesn’t recognise Unicode characters per se. For instance, you can’t use the command y/Д/x/ because sed sees “Д” and “x” as different numbers of characters. But the script below works:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set |⌘| to current application

set unicodeText to "⌘řůД⦿"

set originalRecord to {aString:unicodeText, anArray:{unicodeText}}
set jsonData to |⌘|'s class "NSJSONSerialization"'s dataWithJSONObject:(originalRecord) options:(|⌘|'s NSJSONWritingPrettyPrinted) |error|:(missing value)
set jsonString to (|⌘|'s class "NSString"'s alloc()'s initWithData:(jsonData) encoding:(|⌘|'s NSUTF8StringEncoding)) as text

set reconstitutedRecord to run script (do shell script ("echo " & jsonString's quoted form & " | sed -En 's/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; H; $ {g; s/[[:cntrl:]]+//g; p;}'"))

{jsonString, reconstitutedRecord}
(* -->
{"{
  \"aString\" : \"⌘řůД⦿\",
  \"anArray\" : [
    \"⌘řůД⦿\"
  ]
}", {aString:"⌘řůД⦿", anArray:{"⌘řůД⦿"}}}
*)

Thanks a lot Nigel.

Yvan KOENIG running Sierra 10.12.6 in French (VALLAURIS, France) lundi 11 septembre 2017 19:42:43

Here’s a sed construct that accomplishes the same task and differs only in that it waits until all lines have been collected in the hold space before performing substitutions:


set applescriptValue to run script (do shell script ("echo " & jsonString's quoted form & " | sed -E 'H; $!d; g; s/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; s/[[:cntrl:]]+//g'"))

Any advantages or disadvantages to this approach, or just another way to skin the cat?

I’m not sure. Before posting my version, I ummed and ahhed over a similar approach:

This suppresses the normal line output globally and does an explicit output of the final result instead. Yours …

… suppresses the normal line output individually in all but the last line and lets the final result from there go through in the normal way. The same effect by different means.

I eventually went for performing the first edit before storing each line merely on the whim of the moment. Since the first edit potentially shortens each line, that’s potentially fewer characters to copy from the pattern space and appended to the hold space. On the other hand, that’s more times the first edit command has to be invoked! For one’s own sanity, it’s probably better not to worry too much about such things. :wink:

The only characters sed recognises as being line endings are linefeeds, so if the line endings happen to be returns, sed will treat the entire text as one line. In this case, the first ‘s’ command will perform the same global edit whether it’s done pre-hold-space or post-hold-space, but the number of characters transfered to and from the hold space will still be fewer if the edit’s done before then.

I need another cup of coffee.

I replicated the sample json string posted previously ( “{ "MenuID":5, "MenuVersion":1, … }”) to 10 times its original size, then ran the following time tests in a shell:


TIMEFORMAT=%R; (time for i in {1..1000}; do
	echo "$largeJsonString" | sed -En 's/"([^"]+)"[[:space:]]*:/|\1|:/g; H; $ {g; s/[[:cntrl:]]+//g; p;}'
done) 2>&1 >/dev/null

# result = 1.582 seconds

TIMEFORMAT=%R; (time for i in {1..1000}; do
	echo "$largeJsonString" | sed -E 'H; $!d; g; s/"([^"]+)"[[:space:]]*:/|\1|:/g; s/[[:cntrl:]]+//g'
done) 2>&1 >/dev/null

# result = 1.544 seconds

The execution times are nearly identical, mine perhaps being ever so slightly faster.

Edit note:

  • I simplified the time commands by removing from my original post the processing of the results through the bc calculator, which was unnecessary.

In the context of do shell script, I’m finding that mine’s faster. :wink: It’s probably of limited relevance from the point of view of determining whether one way’s generally better than the other as the test script’s designed to massage a particular kind of data in a particular way and has only been tested with a particular set of data. Also, of course, it was established back in post #7 that sed isn’t the right tool for parsing JSON data. :slight_smile: But anyway, here’s my test script:


set jsonString to "{ 
    \"MenuID\":5, 
    \"MenuVersion\":1, 
    \"MenuName\":\"Lunch Menu\", 
    \"MenuItems\":[ 
       { 
            \"Name\":\"TUSCANI MEDITERRANEAN CON POLLO\", 
            \"Description\":\"Pasta\", 
            \"PKID\":2, 
            \"ParentID\":1, 
            \"Ingredients\":[ 
               { 
                    \"PKID\":123, 
                    \"IngName\":\"Cheese\", 
                    \"Included\":true, 
                    \"ExtraPrice\":0
               }, 
               { 
                    \"PKID\":124, 
                    \"IngName\":\"Sausage\", 
                    \"Included\":false, 
                    \"ExtraPrice\":0.99
               } 
           ], 
            \"ItemPricing\":[ 
               { 
                    \"PKID\":456, 
                    \"SizeName\":\"Large\", 
                    \"SizePrice\":12.99
               }, 
               { 
                    \"PKID\":678, 
                    \"SizeName\":\"Small\", 
                    \"SizePrice\":14.99
               } 
           ]
       } 
   ]
}"

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to return
-- set jsonString to jsonString's paragraphs as text -- Uncomment to test with return line endings instead of linefeeds.
set AppleScript's text item delimiters to astid

set jsonString2 to jsonString & jsonString
set jsonString4 to jsonString2 & jsonString2
set jsonString10 to jsonString4 & jsonString4 & jsonString2

-- Four versions of the sed code.
set nDequoteFirst to "sed -En 's/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; H; $ {g; s/[[:cntrl:]]+//g; p;}'"
set nHoldFirst to "sed -En 'H; $ {g; s/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; s/[[:cntrl:]]+//g; p;}'"
set dDequoteFirst to "sed -E 's/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; H; $!d; g; s/[[:cntrl:]]+//g'"
set dHoldFirst to "sed -E 'H; $!d; g; s/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; s/[[:cntrl:]]+//g'"

-- Compare the times for 1000 iterations with any two.
compareTimes(jsonString10, nDequoteFirst, dHoldFirst)

on compareTimes(testString, sed1, sed2)
	do shell script "TIMEFORMAT=%R; (time for i in {1..1000}; do
	echo " & testString's quoted form & " | " & sed1 & "
done) 2>&1 >/dev/null

TIMEFORMAT=%R; (time for i in {1..1000}; do
	echo " & testString's quoted form & " | " & sed2 & "
done) 2>&1 >/dev/null"
	return result
end compareTimes

And here’s Shane’s ASObjC one-liner from post #11, unfolded and commented:

use AppleScript version "2.4" -- Mac OS 10.10 (Yosemite) or later.
use framework "Foundation"

-- set jsonString as in the previous scripts.

-- Get an ObjC version of the JSON text.
set jsonNSString to current application's NSString's stringWithString:jsonString
-- Get a data version of that.
set jsonData to jsonNSString's dataUsingEncoding:(current application's NSUTF8StringEncoding)
-- Derive the equivalent ObjC object from the JSON data.
set ASObjCValue to current application's NSJSONSerialization's JSONObjectWithData:jsonData options:0 |error|:(missing value)
-- Assuming the object's an NSDictionary, coerce it to an AppleScript record.
set applescriptValue to ASObjCValue as record

Virtually identical times on my end.

Agreed. I hope we haven’t exceeded Shane’s patience while we were having a little sed fun.

Oh, I’m entertained :slight_smile:

So this:

do shell script ("echo " & jsonString's quoted form & " | sed -E 'H; $!d; g; s/\"([^\"]+)\"[[:space:]]*:/|\\1|:/g; s/[[:cntrl:]]+//g'")

gives the same result with your test code as this:

set jsonString to current application's NSString's stringWithString:jsonString
set jsonString to jsonString's stringByReplacingOccurrencesOfString:"\\\"([^\"]+)\\\"[[:space:]]*:" withString:"|$1|:" options:(current application's NSRegularExpressionSearch) range:{0, jsonString's |length|()}
set jsonString to jsonString's stringByReplacingOccurrencesOfString:"[[:cntrl:]]+" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, jsonString's |length|()}
jsonString as text

When I compare them in Script Geek, running 1000 times with Nigel’s jsonString10, the latter takes about 1.3 seconds (including the time to create jsonString10). The former, with the overhead of of do shell script, takes 38+ seconds.

I think I need another cup of coffee.