string extruding method substr a la PHP

First off: AppleScript is terribly slow when it comes to processing strings. To process longer text blobs, use a shell script.

I found myself writing a method to extract portions of a string.
substrid(text_string, first_character_position, last_character_position)
My goals for it where:
- must always return a string:
- even if paramaters point out of bounds → “”
- if empty string is supplied → “”
- if a parameter is zero → depending on the other but always a string
- must reverse the resulting string when last_character_position comes before first_character_position
- should an AppleScript error occur (never say never), the resulting string contains the message

Then something drove me to write a method that would behave almost exactly as the PHP function. The first difference is that a length value must always be passed.
(AppleScript’s or my limitation)
The other difference: PHP’s substr() function returns boolean false on occassion.
My AppleScript ‘clone’ doesn’t, it returns an empty string
In my oppinion that is the better return value in an AppleScript context where
you can’t use:
if returned_string then …

this is the base script
substrid(text_string, first_character_position, last_character_position)


to substrid(s_text, i_start, i_stop)
	(* core: string from pos to pos of string *)
	--log "substrid 1: " & s_text & ", " & i_start & ", " & i_stop
	try
		if 0 = i_stop then
			if i_start ≤ 0 then
				return ""
			else
				set i_stop to 1
			end if
		end if
		if 0 = i_start then set i_start to 1
		set i_text_length to length of s_text
		if 0 = i_text_length then return ""
		
		--log " " & (i_text_length) & ", " & (i_text_length * -1)
		-- both out of bounds
		if (i_start < i_text_length * -1 and i_stop < i_text_length * -1) or (i_start > i_text_length and i_stop > i_text_length) then return ""
		if i_start > i_text_length then set i_start to i_text_length --return ""
		if i_start < i_text_length * -1 then set i_start to 1 --i_text_length * -1 --return ""-- 
		if i_stop > i_text_length then set i_stop to i_text_length
		if i_stop < i_text_length * -1 then set i_stop to 1 --i_text_length * -1
		set sReturn to characters i_start thru i_stop of s_text as text
		--log "substrid 2: " & s_text & ", " & i_start & ", " & i_stop & " => " & sReturn
		if (i_stop < 0) and (i_start > 0) then
			set i_start_true_pos to i_start
			set i_stop_true_pos to i_text_length + i_stop
		else if (i_stop > 0) and (i_start < 0) then
			set i_start_true_pos to i_text_length + i_start
			set i_stop_true_pos to i_stop
			(*else if (i_stop > 0) and (i_start > 0) then*)
		else
			set i_start_true_pos to 0
			set i_stop_true_pos to 0
			set i_start_true_pos to i_start
			set i_stop_true_pos to i_stop
		end if
		
		-- reverse?
		if i_stop_true_pos < i_start_true_pos then
			
			-- some alternative code lines that all result in the same
			--return reverse of sReturn's characters as text
			-- or 'super abrev'
			--return sReturn's characters's reverse as text
			-- or more 'old school'
			return the reverse of the characters of the sReturn as text
		else
			return sReturn
		end if
	on error msg number n
		return msg & " (" & n & ") original string:[" & s_text & "]:"
	end try
end substrid

this is the php substr() ‘clone’, it uses the substrid() method above


to PHP_substr(s_text, i_start, i_length)
	(* variation: like php substr()  index -> start with 0 not 1 and instead of returning false, it returns an empty string; also i_length is not optional -> use -1 to get remainder of s_text *)
	try
		--log "PHP_substr a: " & s_text & ", " & i_start & ", " & i_length
		-- first those that clearly leave an empty string
		-- zero length
		if 0 = i_length then return ""
		-- both negative and start is bigger or same as  length
		if 0 > i_start and 0 > i_length and i_start ≥ i_length then return ""
		set i_text_length to length of s_text
		-- zero text length
		if 0 = i_text_length then return ""
		-- zero start, negative length and length points beyond start
		if 0 = i_start and 0 > i_length and i_length ≤ i_text_length * -1 then return ""
		-- both positive and start points beyond end
		if 0 < i_start and 0 < i_length and i_start ≥ i_text_length then return ""
		-- zero start and negative length
		--if 0 = i_start and 0 < i_length then set i_start to 1
		-- determine real positions and use substrid to return string	
		if 0 < i_start then
			-- positive start
			if 0 < i_length then
				-- positive length
				set i_stop_tmp to i_start + i_length
				set i_start_tmp to i_start + 1
				if 1 > i_start_tmp then set i_start_tmp to 1
			else
				-- negative length
				set i_stop_tmp to i_text_length + i_length
				set i_start_tmp to i_start + 1
				if i_start_tmp > i_stop_tmp then return "" --set i_stop_tmp to 1
			end if
		else if 0 = i_start then
			if 0 < i_length then
				-- positive length
				set i_start_tmp to 1
				set i_stop_tmp to i_start_tmp + i_length - 1
			else
				-- negative length
				set i_stop_tmp to i_text_length + i_length
				if 1 > i_stop_tmp then set i_stop_tmp to 1
				set i_start_tmp to i_start --+ 1
			end if
		else
			-- negative start -> start from back
			if i_start < i_text_length * -1 then return ""
			if 0 < i_length then
				set i_start_tmp to i_text_length + i_start + 1
				set i_stop_tmp to i_start_tmp + i_length - 1
				--log i_start_tmp
				if 1 > i_start_tmp then set i_start_tmp to 1
			else
				set i_stop_tmp to i_text_length + i_length
				if 1 > i_stop_tmp then set i_stop_tmp to 1
				set i_start_tmp to i_start --+ 1
			end if
		end if
		
		-- make sure we don't get a reversed string back
		if i_start_tmp < i_stop_tmp then
			set i_start to i_start_tmp
			set i_stop to i_stop_tmp
		else
			set i_start to i_stop_tmp
			set i_stop to i_start_tmp
		end if
		return substrid(s_text, i_start, i_stop)
	on error msg number n
		return msg & " (" & n & ") original string:[" & s_text & "]:"
	end try
end PHP_substr

and here’s some stuff I used to test both methods with (will require TextWrangler.app to show results)


(* testing methods *)

(*
	I tested PHP_substr() method with following parameters
	Then I checked that against the output of the attached
	PHP script (last in this file) with TextWrangler's Search -> Compare Two Front Documents
	menu
	I didn't check with special characters
 *)

set s to "abcdefghijklmnopqrstuvwxyz"

--performTest("0123456", 10)
--performTest(s, 30)
--performTest("", 5)

to performTest(s, i_spectrum)
	set sLog to ""
	repeat with i from -i_spectrum to i_spectrum
		repeat with j from -i_spectrum to i_spectrum
			log "Ai_start: " & i & " Ai_length: " & j
			set sLog to sLog & "Ai_start: " & i & " Ai_length: " & j & return & PHP_substr(s, i, j) & return
		end repeat
		repeat with j from i_spectrum to -i_spectrum by -1
			log "Bi_start: " & i & " Bi_length: " & j
			set sLog to sLog & "Bi_start: " & i & " Bi_length: " & j & return & PHP_substr(s, i, j) & return
		end repeat
	end repeat
	repeat with i from i_spectrum to -i_spectrum by -1
		repeat with j from -i_spectrum to i_spectrum
			log "Ci_start: " & i & " Ci_length: " & j
			set sLog to sLog & "Ci_start: " & i & " Ci_length: " & j & return & PHP_substr(s, i, j) & return
		end repeat
		repeat with j from i_spectrum to -i_spectrum by -1
			log "Di_start: " & i & " Di_length: " & j
			set sLog to sLog & "Di_start: " & i & " Di_length: " & j & return & PHP_substr(s, i, j) & return
		end repeat
	end repeat
	
	tell application "TextWrangler"
		make new window
		tell window 1
			set selection to sLog
		end tell
	end tell
end performTest

--performTestSubstrID("0123456", 10)
--performTestSubstrID(s, 30)
--performTestSubstrID("", 5)

to performTestSubstrID(s, i_spectrum)
	set sLog to ""
	repeat with i from -i_spectrum to i_spectrum
		repeat with j from -i_spectrum to i_spectrum
			log "Ai_start: " & i & " Ai_length: " & j
			set sLog to sLog & "Ai_start: " & i & " Ai_length: " & j & return & substrid(s, i, j) & return
		end repeat
		repeat with j from i_spectrum to -i_spectrum by -1
			log "Bi_start: " & i & " Bi_length: " & j
			set sLog to sLog & "Bi_start: " & i & " Bi_length: " & j & return & substrid(s, i, j) & return
		end repeat
	end repeat
	repeat with i from i_spectrum to -i_spectrum by -1
		repeat with j from -i_spectrum to i_spectrum
			log "Ci_start: " & i & " Ci_length: " & j
			set sLog to sLog & "Ci_start: " & i & " Ci_length: " & j & return & substrid(s, i, j) & return
		end repeat
		repeat with j from i_spectrum to -i_spectrum by -1
			log "Di_start: " & i & " Di_length: " & j
			set sLog to sLog & "Di_start: " & i & " Di_length: " & j & return & substrid(s, i, j) & return
		end repeat
	end repeat
	
	tell application "TextWrangler"
		make new window
		tell window 1
			set selection to sLog
		end tell
	end tell
end performTestSubstrID

(*
#!/usr/bin/php
<?php
/* * * * 
 * * substr output testing
 * *
 * * I wanted a PHP substr() equivelant in AppleScript
 * * to make sure my AS-script was behaving as exactly
 * * as possible, I wrote this script to output the same
 * * data structure.
 * *
 * * the output of the AppleScript performTest() method
 * * was compared against the output of this PHP performTest() function
 * *
 * * there is one difference: php substr() returns false in some cases
 * * the AS PHP_substr() method simply returns an empty string in those cases
 * * I have no intention of changing that, as I feel it is better this
 * * way and trivial in usage
 * *
 * * version 20090521 (CC) Luke JZ aka SwissalpS
 * * * */
$s = 'abcdefghijklmnopqrstuvwxyz';

//performTest('0123456', 10);
//performTest($s, 30);
//performTest('', 5);

function performTest($s, $iSpectrum) {
	for ($i = -$iSpectrum; $i < $iSpectrum + 1; $i++) {
		for ($j = -$iSpectrum; $j < ($iSpectrum + 1); $j++) {
			echo 'Ai_start: ' . $i
					. ' Ai_length: ' . $j . '
' . substr($s, $i, $j) . '
';
		}
		for ($j = $iSpectrum; $j > -($iSpectrum + 1); $j--){
			echo 'Bi_start: ' . $i
					. ' Bi_length: ' . $j . '
' . substr($s, $i, $j) . '
';
		}
	}
	for ($i = $iSpectrum; $i > -($iSpectrum + 1); $i--){
		for ($j = -($iSpectrum); $j < ($iSpectrum + 1); $j++){
			echo 'Ci_start: ' . $i
					. ' Ci_length: ' . $j . '
' . substr($s, $i, $j) . '
';
		}
		for ($j = $iSpectrum; $j > -($iSpectrum + 1); $j--){
			echo 'Di_start: ' . $i
					. ' Di_length: ' . $j . '
' . substr($s, $i, $j) . '
';
		}
	}
}
/*
$res = substr($s, 22,22);
echo $res;
echo '
';
echo gettype($res);
/* * * *\ substrTest.php 20090521 (CC) Luke JZ aka SwissalpS /* * * */
?>
*)

EDIT: minor edit in substrid() to fix backwards returns
NOTE: this code has not been tested more than described and only on my current setup: OS X 10.5.x; AS 2.0.1
needles to say: use at own risk…etc.