Checking a Table for Valid Characters

Suppose you have a table of characters like this:

set tCode to paragraphs of "+++++++
[>+++++++++<-]
>.<+++++
[>++++++<-]
>-.+++++++..+++.
<++++++++
[>>++++<<-]
>>.<<++++
[>------<-]
>.<++++
[>++++++<-]
>.+++.------.
--------.>+."

and want to check that it contains only characters from an approved list:

set codeChars to {">", "<", "+", "-", ".", ",", "[", "]"}

The easy way to do that is to use the allowable code characters list as an AppleScript text item delimiter:

-- Put the program in list form:
set tCode to paragraphs of "+++++++
[>+++++++++<-]
>.<+++++
[>++++++<-]
>-.+++++++..+++.
<++++++++
[>>++++<<-]
>>.<<++++
[>------<-]
>.<++++
[>++++++<-]
>.+++.------.
--------.>+."

set codeChars to {">", "<", "+", "-", ".", ",", "[", "]"}

-- Check for valid characters in code, i.e. all in codeChar.
-- Use list of code characters as delimiter. Rather than
-- concatinate them all and check that (which works), better to
-- use a loop so the offending line can be identified.
repeat with k from 1 to count tCode
	set codeItem to item k of tCode
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to codeChars
	set tParts to text items of codeItem -- should all be ""
	set AppleScript's text item delimiters to tid
	-- "tParts as text" reduces to "" if all code text is in codeChar
	if (count characters of (tParts as text)) ≠ 0 then
		display alert "Line " & k & " contains an illegal code character!" as warning
		return
	end if
end repeat

Thanks Adam.

Maybe not really an issue anymore today but older versions of AppleScript don’t support multiple text item delimiters. Only the first text item delimiter will be used then. In your example you’re using a bash like table (delimited by line feeds) your could go for an bash solution as well (faster with huge lists).

set tCode to "+++++++
[>+++++++++<-]
>.<+++++
[>++++++<-]
>-.+++a++++..+++.
<++++++++
[>>++++<<-]
>>.<<++++
[>------<-]
>.<++++
[>++++++<-]
>.+++.------.
--------.>+."

set codeChars to {">", "<", "+", "-", ".", ",", "[", "]"}
set x to do shell script "tr -d " & quoted form of (codeChars as string) & " <<< " & quoted form of tCode & " | awk 'NF {print NR;exit}'"

if x is not "" then
	display alert "Line " & x & " contains an illegal code character!" as warning
	return
end if

But when tCode becomes large, I mean really large, adam’s script would become slow. Considering that it is important to return on which line the error occurs and that the list is still an bash-style table. The following code doesn’t need an AS repeat and doesn’t need bash either.

-- Put the program in list form:
set tCode to "+++++++
[>+++++++++<-]
>.<+++++
[>++++++<-]
>-.+++++++..+++.
<++++++++
[>>++++<<-]
>>.<<++++
[>------<-]
>.<++++
[>++++++<-]
>.+++.------.
--------.>+."

set codeChars to {">", "<", "+", "-", ".", ",", "[", "]"}

set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to codeChars
set tParts to text items of tCode
set AppleScript's text item delimiters to tid

set remains to paragraphs of (tParts as string) as string
if length of remains is not 0 then
	set x to offset of (character 1 of remains) in (tParts as string)
	display alert "Line " & x & " contains an illegal code character!" as warning
	return
end if

Of course the code above does only work with line feed delimited data like in bash tables.

Thanks, DJBW. But – your last solution will only find the first bad character, won’t it, so after fixing that, you’d have to run it again (like fsck)?

Hello.

I amused myself with this problem, realizing that I have amused myself with the wrong problem when I re-read this post. :slight_smile:

If you want to know every position of illegal characters, then I suggest you use your text item delimiters. Paragraph by paragraph on your input, so you can keep track of the lines, then count the items containing nothing (valid), and store the positions that does. This is elaborate, so I wont program it here, but then you are at least able to after one run through your file to say which lines contains which errors.

I am not sure if this is such a bad solution.

The wrong problem I solved, was to exclude search strings that contained other search strings by the way:

I just do that by using the delimiters on the delimiters as text, the index of the items that then contains text, can be removed from the list in order to save time, if there are many text item delimiters.

The thing with text item delimiters, if some text item delimiters contain substrings of other text item delimiters, is to sort them by length, descending. I am pretty sure that Applescript uses a one pass string tokenizer internally.
So the largest item, can’t be cut in two by the substring, when the largest was removed first.

set AppleScript's text item delimiters to ""
-- Put the program in list form:
set tCode to paragraphs of "+++++++
[>+++++++++<-]
>.<++@+++
[>++++++<-]
>-.+++++++..+++.
<++++++++
[>>++++<<-]
>>.<<++++
[>------<-]
>.<++++
[>++++++<-]
>.+++.------.
----.>+.----"

set errL to {}
set codeChars to {">", "<", "+", "-", ".", ",", "[", "]"}
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to codeChars
set tCodeText to every text item of (tCode as text)
set AppleScript's text item delimiters to tid
try
	set probe to first character of (tCodeText as text)
on error
	return {} # no further testing necessary!
end try
set itC to count items of tCode
repeat with k from 1 to itC
	set codeItem to item k of tCode
	set AppleScript's text item delimiters to codeChars
	set material to text items of item k of tCode
	set AppleScript's text item delimiters to tid
	set pEnd to count text items of material
	set m to 1
	repeat while true
		try
			set probe to first character of (material as text)
			# short circuits if no error, next line is due!
			repeat with i from m to pEnd
				if probe is item i of material then
					set end of errL to {k, i}
					log probe
					set m to i + 1
					set text item i of material to ""
					exit repeat
					-- done with this character, looks for next...
				end if
			end repeat
			-- thru with this line
		on error
			-- lets look at next line
			exit repeat
		end try
	end repeat
	if (k < itC) then
		try
			set probe to first character of (text items k thru itC of material)
		on error
			” stops looking when the last illegal character on a line is found.
			exit repeat
		end try
	end if
end repeat
return errL

Depends Adam. If you want to check every (unique) illegal character you can iterate thru remains and prompt the user what to do, like ‘replace it’, ‘remove it’ and ‘quit’. To avoid unnecessary computation I won’t repeat the process. On the other hand, the code is fast enough to run between prompts so you could run in between prompts and it’s easier to write :).

Hello.

I haven’t bothered to find out if I could get character positons back with diff, but then I could use the illegal characters of a line as text item delimiters for a copy of same line (paragraph), finally, I could write two files, and process them with diff.

But I like the one I posted above bettter. :slight_smile: thinking I might speed it up further by using references.