Facilitated setting of "do shell script" environment variables

Environment variables are user-modifiable shell variables that control certain behaviors of Unix commands and the shells in which they are executed. The current discussion will focus on two especially useful environment variables, PATH and LANG. The environment variables will first be described briefly, then a facilitated way to set those variables in shell scripts executed with AppleScript’s do shell script command will be demonstrated.

The PATH environment variable takes as its value a colon-delimited text string of the full POSIX paths of directories (i.e., folders) to be searched by the shell when it encounters executable script files (hereafter referred to as “commands”) specified by name only rather than by full POSIX path. The default value for PATH in the shell invoked by AppleScript’s do shell script command is “/usr/bin:/bin:/usr/sbin:/sbin”, which specifies the following very limited set of four directories in the search path:

/usr/bin
/bin
/usr/sbin
/sbin

In this default case, commands not present in one of those four directories must be specified by full POSIX path in order for the shell to find and execute them. User-defined directories may be added to the search path with bash’s export command. The user-defined directories are inserted in the colon-delimited list in the order they are to be searched. The export command must specify all directories to be included in the search path (including the default directories, which should be specified in virtually all cases). The export command must appear in the script before any commands specified by name only, typically at the beginning of the script. For example, the following export command will add the user’s Desktop and Documents folders to the default search path, with the search order consisting of the user’s Desktop folder, followed by the default folders listed above, and ending with the user’s Documents folder:

export PATH='/Users/[place your home folder name here]/Desktop:/usr/bin:/bin:/usr/sbin:/sbin:/Users/[place your home folder name here]/Documents'

The LANG environment variable is one of a group of locale environment variables that specify the language and character set used to interpret text for various shell and command text-processing tasks. As Nigel Garvey showed in a recent post, the default locale for the bash shell invoked by AppleScript’s do shell script command is the Unicode-unaware C locale. By setting the value of LANG to a Unicode-aware locale with bash’s export command, certain text-processing commands, including sed and grep (but unfortunately not awk), will handle multibyte Unicode characters properly. As with the PATH export command, the LANG export command must be placed before any text-processing commands, typically at the beginning of the shell script. For example, the following export command will set LANG to the Unicode-aware American English locale:

export LANG='en_US.UTF-8'

A complete list of valid Unicode-aware locales may be obtained by running the following do shell script command (with thanks to Nigel Garvey):

do shell script "locale -a | egrep '^[^_]+_[^.]+(.UTF-8)?$'"

(Note: Nigel also described two alternative variables to LANG for setting the locale, namely LC_CTYPE and LC_ALL, as well as a special command-line syntax that allows environment variable values to be set for individual commands rather than the script as a whole. Neither of these topics will be discussed further in the current post.)

The above two export commands may be placed at the beginning of a do shell script script to configure the search path and locale for any subsequent commands in the script:

do shell script "" & ¬
	"export PATH='/Users/[place your home folder name here]/Desktop:/usr/bin:/bin:/usr/sbin:/sbin:/Users/[place your home folder name here]/Documents' ; " & ¬
	"export LANG='en_US.UTF-8' ; " & ¬
	"#...remainder of script commands and statements go here...#"

The obvious drawback to this approach is the onerous amount of typing required to hard-code the export commands. An alternative approach will now be described that achieves the same result with a minimum of coding. It consists of the following two steps:

  1. Save the above two environment-setting export commands in a plain text file (not RTF, DOCX, or any other formatted file type!) Be sure to separate the export commands with a semicolon or a linefeed (but not a carriage return) character. Save the file to the user’s home folder using a short file name without a file extension. It is not necessary to make the file executable. Saving the file to the home folder allows the use of the concise bash home folder shortcut, namely the tilde character “~”, when referencing the file. Using a short file name and omitting the (unnecessary) file extension further contribute to concise referencing of the file. For example, let’s set the file name simply to the letter “e”. This is my preference for file name, signifying an "e"nvironment-setting file in a maximally terse fashion. While any of several applications may be used to create and save the plain text file, including BBEdit, TextWrangler, bash’s nano text editor, TextEdit’s plain-text format feature, and Microsoft Word’s plain-text saving feature, among others, the following AppleScript do shell script command will achieve the same result. Remember to replace “[place your home folder name here]” with your home folder name:
do shell script "echo \"export PATH='/Users/[place your home folder name here]/Desktop:/usr/bin:/bin:/usr/sbin:/sbin:/Users/[place your home folder name here]/Documents' ; export LANG='en_US.UTF-8'\" > ~/e"
  1. source the environment-setting file you just saved at the beginning of any do shell script script in which the custom PATH and LANG values are to be used:
do shell script "source ~/e ; #...remainder of script commands and statements go here...#"

Even better, replace source with its concise bash alias, the period character . , to generate an identical source command that is only 5 characters long, namely period-space-tilde-slash-e:

do shell script ". ~/e ; #...remainder of script commands and statements go here...#"

The custom PATH and LANG values will now be applied to any commands following the source command.

The key to the success of this approach is bash’s source (or . ) command. The source command is in some ways analogous to AppleScript’s run script command in that it executes the text content of a file as a script without requiring the script to be executable. However, its behavior differs importantly from AppleScript’s run script command in that it executes the file’s content in the context of the current bash shell. The result in the our case is that the environment variable values are set for the current shell and subsequent commands in the current script. In contrast, run script executes the file’s content in the context of a separate AppleScript instance. In an analogous fashion, bash executable files are executed in a subshell of the current shell, not the current shell itself. Thus, executing rather than source-ing an environment-setting file would set the environment variables for the executed file’s subshell only, not for the current shell, and thus would not achieve the desired result.

The above techniques will now be applied to a couple of simple examples. First, save the environment-setting plain text file named “e” (without a file extension) to the your home folder exactly as described in Step 1 above. This may be done by simply running the previously displayed do shell script command. Remember to replace “[place your home folder name here]” with your home folder name:

do shell script "echo \"export PATH='/Users/[place your home folder name here]/Desktop:/usr/bin:/bin:/usr/sbin:/sbin:/Users/[place your home folder name here]/Documents' ; export LANG='en_US.UTF-8'\" > ~/e"

To demonstrate the setting of the PATH environment variable, save a test executable shell script to the Desktop or the Documents folder. For instance, running the following do shell script command will save a shell script named “SampleScript” to the desktop and make the script executable. When executed, the script will cause the words “It works!” to be spoken:

do shell script "echo \"say 'It works!' ; sleep 3\" > ~/Desktop/SampleScript ; chmod '+x' ~/Desktop/SampleScript"

To demonstrate that the test script works, execute it by referencing its full POSIX path:

do shell script "/Users/[place your home folder name here]/Desktop/SampleScript"
--> Succeeds with the voice message: "It works!"

Now try to execute the test script by referencing it by name alone without first source-ing the environment-setting file. The shell is unable to find the test script:

do shell script "SampleScript"
--> Fails with the error message "sh: SampleScript: command not found"

Now execute the test script by referencing it by name alone after first source-ing the environment-setting file. Since the test script was saved to the desktop, and the desktop was added to the search path by the source command, the shell is now able to find the test script:

do shell script ". ~/e ; SampleScript"
--> Succeeds with the voice message: "It works!"

To demonstrate the setting of the LANG environment variable, run the following do shell script command, which tries to replace the uppercase characters S, H, D, and C in the input string with the multibyte Unicode characters :spades:, :heart:, :diamonds:, and :clubs: by means of a sed command. The sed command is unable to interpret the multibyte Unicode characters properly because the default locale for do shell script is Unicode-unaware:

do shell script "echo 'The suits of a deck of cards are S, H, D, and C.' | sed -E 'y/SHDC/♠♥♦♣/'"
--> Fails with the error message "sed: 1: "y/SHDC/♠♥♦♣/": transform strings are not the same length"

Now run the same sed command after first source-ing the environment-setting file. Since the source command sets the locale to a Unicode-aware locale, sed is now able to interpret the multibyte Unicode characters properly:

do shell script ". ~/e ; echo 'The suits of a deck of cards are S, H, D, and C.' | sed -E 'y/SHDC/♠♥♦♣/'"
--> Succeeds with the return value: "The suits of a deck of cards are ♠, ♥, ♦, and ♣."

Final note: The current discussion demonstrates a facilitated way of setting the PATH and LANG environment variables in shell scripts executed with AppleScript’s do shell script command. The technique may be extended to the setting of any environment variables, or for that matter, the execution of any script configuration commands, simply by including those export and/or configuration commands in the plain text file that is saved to the user’s home folder and source-ed at the beginning of the do shell script scripts.

It’s interesting that you can export presets, although it seems more onerous to me to have to remember their designation and purpose; I would just choose an option to avoid having to type it. The regex was simplified to return only UTF-8 options.

set chosenLang to "LANG=" & ((choose from list (do shell script "locale -a | grep 'UTF-8' ")'s paragraphs) as text)'s quoted form & space
do shell script chosenLang & <main shell call>

Your technique of finding a LANG locale value through an interactive dialog window and applying the result to a single command using command-line syntax eliminates the need to find a locale value manually and allows that value to be applied to a specific command. It does add the costs of a substantial amount of typing and the time required to interact with a dialog window.

Your approach made me more aware of certain assumptions I made:

Explicit assumptions:
1) That it would be satisfactory to apply custom environment variable values to the script as a whole, i.e., to every command in a multiline script, rather than selectively to individual commands
2) That the coding cost should be minimal

Implicit assumptions:
1) That the time cost should be minimal
2) That the same custom PATH and LANG values would suffice for most shell scripts (or perhaps a few variants of those custom values if the user is willing to save and remember a small number of environment-setting files in the home folder)

With regard to explicit assumption #2, I suggested saving the environment-setting file in the user’s home folder with a very short file name, in my example simply the letter “e”. Then the source command may be invoked with minimal coding cost with a command as few as 5 characters long, . ~/e in my example, or a bit longer if the user prefers a somewhat more expressive file name than the letter “e”.

With regard to implicit assumption #1, timing tests with ~5000 repetitions of very simple shell commands revealed that the source command requires less than 0.001 seconds (range 0.00046 to 0.00065) to execute on my aging MacBook Pro computer. So the time cost is trivial, less than 10% of the time required to invoke the do shell script command itself.

With regard to implicit assumption #2, users who require many combinations of PATH and LANG custom values for their scripts would have to save different environment-setting files and then remember their names to benefit from the approach I describe. For users like myself who need only one set of PATH and LANG values for virtually all of their shell scripts, it is necessary to remember only a single file name and to source that file over and over again at the start of all shell scripts with a simple and easily remembered source command:

. ~/e

I’d be more interested in modifying the PATH permanently.

What file would i modify to change it?

That depends…

Here is the timing script, which I run as a script via LaunchBar:

use framework "Foundation"
use scripting additions

set nn to |...number of reps goes here...|

set t1 to current application's CFAbsoluteTimeGetCurrent()
repeat nn times
	set x1 to do shell script ":"
	-- or to test the execution time of the source command: set x1 to do shell script ". ~/e ; :"
end repeat
set t1 to ((current application's CFAbsoluteTimeGetCurrent()) - t1) / nn

activate
display dialog "Time per \"do shell script\" call = " & t1 & " seconds."

Here are the results:

For nn = 1000:
command = “:”
→ 0.01492 seconds per call
command = “. ~/e ; :”
→ 0.01564 seconds per call
therefore, for the source command “. ~/e” alone
→ 0.01564 - 0.01492 = 0.00072 seconds per invocation

However, here is where the trickiness of script timing rears its ugly head…

For nn = 1 (average of 10 separate runs with script recompilation between runs to minimize any optimizations by LaunchBar):
command = “:”
→ 0.03604 seconds per call
command = “. ~/e ; :”
→ 0.03927 seconds per call
therefore, for the source command “. ~/e” alone
→ 0.03927 - 0.03604 = 0.00323 seconds per invocation

The execution times I reported previously were with nn = 1000. For nn = 1, where script launcher optimizations are minimized, the execution times do go up about 2.5-fold. But in either case, the execution time of the source command itself remains relatively small, less than 10% of the time required to invoke the “do shell script” command.

I’m not sure LaunchBar is a great host for timing – although if that’s where you ultimately run your scripts, I guess it makes some sense. But if you do something as crude as this and run it as an applet, the time is much lower:

set x to current date
repeat 10000 times
	do shell script ":"
end repeat
set y to current date
display dialog ((y - x) / 10000) as text

The 10000 repeats should make the rounding to the nearest second near irrelevant, and here that actually gives times of around 0.0025. If I change the script to use “. ~/e ; :”, it’s around 0.003. So either a tiny amount of time (0.0005) or 20% longer, depending on your outlook :wink: But…

I suspect the real gain is that the file is being cached. Timing any code that involves reading the same file repeatedly is always going to give misleading results. My tests above reflects exactly the same problem.

Hi.

Shouldn’t the CoreFoundation framework be ‘used’ here too, or isn’t it necessary? (I mean formally. I know the script works without it.)

use framework "Foundation"
use framework "CoreFoundation"
use scripting additions

set nn to |...number of reps goes here...|

set t1 to current application's CFAbsoluteTimeGetCurrent() -- CoreFoundation function.
-- etc.

It’s a good question, but the answer is no. CoreFoundation is a special case – Foundation.framework is built on top of it. If you open Foundation.h, you will see “#include <CoreFoundation/CoreFoundation.h>” right at the top.

The practical question, though, is whether AppleScript is reading its .bridgesupport file, too – otherwise you would risk problems with enums and constants. But if you run something like this:

use framework "Foundation"
current application's COREFOUNDATION_CFPLUGINCOM_SEPARATE

You get the correct answer. Which suggests AppleScript is doing the right thing.

Thanks, Shane. I could see in the Xcode documentation that Core Foundation’s various offerings are “bridged seamlessly with the Foundation framework”, but wasn’t sure what that implied for ASObjC scripts. Now I am. :slight_smile: