Avoiding do shell script command buffer overflow.


As of lately I have several times gotten into situations, where I have tried to pass more command text to do shell script than it can take. According to Apple’s tn 2061 it can take about 262.000 bytes if the user hasn’t a too big environment.

There are different ways of over coming this limitation, sometimes a rewrite of parts of you script to a Unix script might do the trick, that is however not feasible in every situation.

Here are some “back of the envelope” calculations to check if we are within the bounds, they are for western languages-

We deal with Unicode text where each character is represented by minimum two and maximum 4 bytes if I remember correctly. I estimate the usage of more than two bytes per character characters to always be less than 2% of the contents of a given text file.

The length of a text is returned in characters.

If the length of a text multiplied by two and divided by 0,98 is less than 261.000, then one should be able to send the text to the do shell script command.

If not, one would have to split up the text in chuncks, and pass the chuncks to the do shell script command, with the following assembly of the output.

on okCommandSize( txtCommandBuffer )
	if (length of txtCommandBuffer) * 2 / 0,98 is greater than 261000 then 
		return false
		return true
	end if 
end okCommandSize

The text will be passed as UTF8, so if it’s mostly English, most characters will be represented by a single byte.

I wouldn’t answer before I was finished with banging my head to the wall. :slight_smile:

Thanks for your accurate information Shane.

Well I guess that leaves us with bigger chunks of text, UTF-8 is also a variable-length character encoding - like UTF-16.

I have come to that the number of diacriticals are highly specific to the language and can’t be deduced generally.
Neither can the number of octets (bytes) which represent each code point.
So there can’t be no general calculation of a buffer size, as on has to both take into account the average percentage of diacriticals, and what number of octets/bytes would be needed to represent that percentage.