NSScanner -- what can this class do for me?


Let’s imagine I define correct “fields”, into a text string, as non-null integer values consisting with contiguous digits surrounded with [ and ] delimiters.

In that case correct fields are: [1], [334], [32000]
And these are not: [], [00], [one], [ 3 ], [[4]
And this is normal text: 454, 23]

For the time being, my handler has a pointer to the current character, incremented by one. The handler can be in three states: “out field”, “in field” and “accumulating digits”, and accept four types of “tokens”: “0-9”, “[”, “]”, “ANY”. It can issue errors and warnings.

These are examples of errors:
state=“in field”, accumulator= 23, token=“[” : non-digit in field
state=“in field”, accumulator=0, token=“]” : null value in field

And these are warnings:
state=“out field”, accumulator= 0, token=“]” : unbalanced ]
state=“out field”, accumulator= 0, token=“2” : digit out of field

My first Finite State Automaton was written in FORTRAN, 25 years ago. No doubt there are better ways to do this today with Cocoa, maybe this NSScanner (the name is promising :))


I’m sure that NSScanner can handle this problem. I have a question about what is the outcome you want – do you really want a table showing all the errors, or is the goal to keep the user from making incorrect entries? Is this checker something that you will invoke after a user has created a whole document with these [#] entries in it, or do you want to give warnings as the user types to fix things in real time?



No, the checking has to be done for the entire document, because the user can create one text after the other, and when he creates one, the references may not exist yet – in fact, this would be the normal situation. But when he has finished the last text, it’s time to verify the links as a whole (maybe a text is empty, another never “called”, another may contain wrong IDs).

There is a table view containing the errors/warnings. When the user clicks on an error, the editor brings up the text and highlights the offending character – this part works already.

A constant checking (even to make warnings) would be not only useless, but annoying.

In addition, the application is intended to be used by multiple people simultaneously (as each text is saved in its own file). Later, I will look at those famous “wrappers” to replace existing folders.

The control of cohesion must be done at the very end, one time. That is why I can endure the slowness of ASOC. But I regret my Pascal CASE that was ten times faster on a IIsi …

There is also a CASE instruction in C.? :rolleyes:

I’m not sure what you mean by “the references may not exist yet” – it seems like the checking that you want to do is just checking for the correct form, [##], with nothing else inside, so do the references have to exist for you to check the form?

I’m not sure I agree with that – this is the default behavior of the NSTextViews. If you type something wrong you get immediate feed back with the red underlining. This is pretty standard behavior in word processors these days.

That being said, the NSScanner would do fine at the end of the process, since when you create a scanner, you have to pass it a string (which could be the whole contents of a text view). So, when do you do the checking now? Right before saving?



When a user types a mispelled word, the reference already exists, so it can be useful. But when the user types “from here you can go to the castle [96]”, maybe he knows that the text/card ID 96 is not yet created…

The verification time is defined in the preferences. It can be manual (menu command or button) or automatic (before closing the file).

Here is one way to check your links for errors using an NSScanner and NSString’s rangeOfCharacterFromSet. I broke it up into 3 methods for clarity – the first uses a scanner to find the links and check for 2 kinds of errors, the second looks for any non-digit characters and the third checks for zeros. I created an array called theErrors that holds the error info and is used to populate a table. I tested this with a file of 60,000 characters with 710 bad links and it took ~2 seconds to load the table.

on findLinks_(sender)
		set parser to current application's NSScanner's scannerWithString_(textView's textStorage()'s mutableString())
		parser's setCharactersToBeSkipped_(missing value)
		repeat while parser's isAtEnd() as integer is 0
			parser's scanUpToString_intoString_("[", missing value) --Scans until it finds a "["
			set linkStart to parser's scanLocation()
			parser's scanString_intoString_("[", missing value) -- Scans past the "["
			set foundLink to item 2 of parser's scanUpToString_intoString_("]", reference) --gets the characters between the "[" and the "]"
			parser's scanString_intoString_("]", missing value) --Scans past the "]"
			if parser's scanString_intoString_("]", missing value) as integer is 1 then --looks for a second "]"
				theErrors's addObject_({link:foundLink, errorText:"Ends with two end brackets", loc:linkStart})
				if foundLink is missing value then
					theErrors's addObject_({link:foundLink, errorText:"Empty link", loc:linkStart})
					checkForNonDigits(foundLink, linkStart)
				end if
			end if
		end repeat
		theErrors's removeLastObject()
		log theErrors's |count|()
	end findLinks_
	on checkForNonDigits(foundLink, loc)
		if foundLink's characterAtIndex_(0) as integer is 32 then -- Checks for a "space" at the beginning of the link
			theErrors's addObject_({link:foundLink, errorText:"Starts with a space", loc:loc})
		else if foundLink's rangeOfCharacterFromSet_(nonDigits)'s |length| is not 0 then
			theErrors's addObject_({link:foundLink, errorText:"Contains non-digit characters", loc:loc})
			checkForZeros(foundLink, loc)
		end if
	end checkForNonDigits
	on checkForZeros(foundLink, loc)
		if foundLink as integer is 0 then
			theErrors's addObject_({link:foundLink, errorText:"Is zero or multiple zeros", loc:loc})
		end if
	end checkForZeros

In addition to this code, I defined 3 properties (theErrors,nonDigits and linkStart) and added two lines to the applicationWillFinishLaunching method to define the nonDigit set and create theErrors :

set theErrors to current application's NSMutableArray's alloc()'s init()
		set nonDigits to current application's NSCharacterSet's decimalDigitCharacterSet()'s invertedSet()

Of course there are multiple ways to do this, so this isn’t necessarily optimized.



I’m amazed – this is a very complete solution, even more accurate than mine!

I was looking on my side for a less accurate solution and tried this:

(all names beginning with “g” are properties and/or IBOutlets)

    on scan_(sender)
        set arr to {}
        set theString to current application's NSString's stringWithString_(gText as string)
        set sep to current application's NSCharacterSet's characterSetWithCharactersInString_("[]")
        set arr to theString's componentsSeparatedByCharactersInSet_(sep)
        repeat with substring in arr
               set anInt to substring as integer
               log "integer substring is "&anInt   -- only "IDs" are possible here. Check if they exist, etc.
            end try -- errors ignored
        end repeat

I formerly used the trick to begin at first “[”, of course it accelerates the checking but skips any digit before the first potential field.

I’m going to try your solution. As mine works too, there could be a Verification option in the Preferences: “Complete” (your solution) or “Only Fields Contents”(mine)


Hello Ric,

I have now two half-way solutions, one using the “trick” slicing the text in “words” separated with brackets, the other using the NSScanner class developed by you (once again thanks, it’s perfectly working).

Both are extremely fast (instantaneous in fact), compared to the primitive AS equivalent of my previous pascal CASE.

In order to satisfy my curiosity, could you help me to develop a pure Objective-C solution or is it out of the scope of this site?


Sure, I could do that if you want to see the objC equivalent.


The ApplescriptObjC code was like this:

                set inField to false
                set theID to 0
                set numberOfFields to 0
                set thePos to 1
                repeat while thePos ≤ length of theString
                    set theChar to character thePos of theString
                    if theChar is "[" then
                        if inField then
                            storeError(1,theCard's cID,thePos,"Unbalanced «[»")
                            set inField to true
                            set theID to 0
                        end if
                        else if theChar is "]" then
                        if not inField then
                            storeError(1,theCard's cID,thePos,"Unbalanced «]»")
                            else if theID >0 then
                            set idCalled to idCalled & theID
                            set numberOfFields to numberOfFields +1
                            if (theID as integer is theCard's cID) then
                                storeError(0,theCard's cID,thePos,"This card refers to itself")
                            end if
                            if not (existingIDs contains theID) then
                                storeError(3,theCard's cID,thePos,"The card ["&theID&"] does not exist")
                            end if
                            else if theID is 0 then
                            storeError(2,theCard's cID,thePos,"Empty field or null ID")
                        end if
                        set inField to false
                        set theID to 0
                        else if theChar is in digits then
                        if inField then
                            set theID to theID*10 + theChar as integer
                            storeError(0,theCard's cID,thePos,"Digit character out of field")
                        end if
                        if inField then storeError(2,theCard's cID,thePos,"Non-digit character in field")
                        set thePos to thePos + 1
                end repeat

For info, the former pascal instructions were:

I’m not sure such a code fragment can help, but I give it anyway.


Why are you posting this code? Is this what you want me to convert to objective-C?


If you don’t mind, it would be very didactic for me to start from something I can understand— but of course if the Objective-C syntax allows more efficient coding, I’ll try to adapt.

I don’t want to try your patience anyway, the problem of calling such a code from ASOC is another challenge for me.

I posted these codes because they are the complete (and working) solutions for a char-by-char syntax verification.


Objective-C coding is still somewhat difficult for me, but not too hard if I’m working from my own ASOC code that I understand well. I don’t really want to try it with code that I don’t really understand, and that I don’t think is the right way to do it anyway (I’m assuming that this is the code that was so slow in ASOC). NSScanners seem like the logical way to go to me.



this is a simple approach in ObjC with NSScanner.
The workflow is
¢ scan (up to) a left bracket
¢ if no left bracket return error
¢ scan integer
¢ if no integer value found or integer value is 0 return error
¢ scan right bracket
¢ if no right bracket return error

return values: if errorString is not nil, the returned NSInteger is valid

NSString *searchString = @"sdkfjghsdkjfh[3456]weorituy";
NSString *error = nil;
NSInteger number = [self scanIntegerInBrackets:searchString errorString:&error];
if (error)
	NSLog(@"error: %@", error);
	NSLog(@"success %ld", (long)number);

- (NSInteger)scanIntegerInBrackets:(NSString *)string errorString:(NSString **)error
	*error = nil;
	NSInteger result = NSNotFound;
	NSScanner *scanner = [NSScanner scannerWithString:string];
	[scanner setCharactersToBeSkipped:nil];
	BOOL leftBracket = [scanner scanUpToString:@"[" intoString:NULL];
	if (leftBracket) {
		[scanner scanString:@"[" intoString:NULL];
		BOOL integer = [scanner scanInteger:&result];
		if (integer && result > 0) {
			BOOL rightBracket = [scanner scanString:@"]" intoString:NULL];
			if (!rightBracket)
				*error = @"no right bracket";	
			*error = @"no integer found";
		*error = @"no left bracket"; 
	return result;

Thank you Stefan – this is the Objective-C use of NSScanner. As both ASOC and ObjC rely on this class, this is the obvious solution.

I’ll use this class then, after all there is no reason to reinvent the wheel!

But I have to adapt your solutions to make them more “char-by-char”, with no skipping at all. The problem is to issue warnings in case of a mistyped field. As brackets are not evident characters for my users, but also have to be specific to form a correct field, I have to detect every digit typed outside brackets and issue a warning.

So : [78] is a correct field, but I have to check if 78 is a valid ID, and issue an error if not.

And : (78) or {78} are possibly mistyped fields. I must issue a warning here, but only a warning, because something like Thomas Edison (1847 “ 1931) is normal text ”but will raise a warning too. There will be no further ID checking.

On the opposite, Thomas Edison [1847 “ 1931] will give an error, because [ ] are reserved characters and this gives an incorrect field.

Thank you for your help, it’s really kind of you!



I think you need to clearly define the scope of what you want if you want help. The overall approach really depends on what exactly you need to do. In your first post on this thread, you gave examples of what kind of errors you wanted to detect, and the code I posted did just that. But now, you are adding new things you need to do, and maybe that requires a whole different approach. So, what does this mean:

What do you mean that the “brackets are not evident characters for my users” – are the brackets not visible?

Also what does “[78] is a correct field, but I have to check if 78 is a valid ID” mean? What do you mean by a valid ID? Are certain numbers valid Id’s and some not? Do you have constraints on what the numbers can be? Can they be any number of digits or is there a max number?

I still think that going character by character is going to be slow and inefficient, and that using scanners, at least to find all the integers, will be the way to go.



The warnings were mentionned from the very beginning of my post:

Consider a compiler:
the variable myTitle may be a correct name for a variable, but what if it is never defined?
I can write a correct handler, but it can be “dead code” if it’s never called.
I can wait for my variable myTitle to be updated, but what if I have a second variable called MyTitle?

Such errors are not evident to detect.

I think I have a reasonably clear idea of what I want – maybe it’s my english which is confused :confused:

If my user types (56) instead of [56], I have to warn him, but is’s not necessarily an error;
If my user types [56] and the ID 56 does not exists, it is an error;

So in addition of the strict syntax, I have to make four more tests:

  • does the ID exist in the list of the defined IDs? if no, ERROR.
  • does the card at last one ID? if no, it’s a DEAD-END, because there is no way to exit from it.
  • does the card have an ID which is the same as its own ID? if yes it’s a RECURSION.
  • does the card have an ID which is never referred by another card? if yes, it’s a DEAD CARD.

I am sorry not to be clearer :expressionless:


These were in the post, but I had no Idea what they meant. I think the main problem is not your english (but I think that is a little problem), but that you have a very clear idea of how the whole app works and what it does, and I don’t – I have been working on little pieces, but still don’t have a good idea of the whole. I don’t think it’s possible for you to explain it to me, I think I would have to see the whole app and try it out.

In the code I posted above, the last method, checkForZeros, could have an “else” in it which would be the place where you could check for the first 3 of the tests you mentioned:

on checkForZeros(foundLink, loc)
		if foundLink as integer is 0 then
			theErrors's addObject_({link:foundLink, errorText:"Is zero or multiple zeros", loc:loc})
			--we have a number inside brackets with good syntax
			-- check to see if that number is a defined ID
			-- Make sure at least one link makes it this far
			-- Make sure this ID is not the ID of this card
		end if
	end checkForZeros

Anything that makes it through my 3 methods and gets to the “else” has the right syntax, so you could check here for those 3 tests. I don’t know how you would do the last test.

The other checking, that is looking for numbers that may be inside the wrong kind of brackets or not inside brackets at all would require another set of methods. If the numbers are not inside any brackets, then it seems like you have to read the user’s mind to figure out whether it should have been, or whether it’s just a number in the text. I guess you could flag them as possible errors, and have the user check them to see if he meant them to be links or not.


Ok Ric, thank you, I’ll make one definitive handler for all the possible tests, using NSScanner.

I admit it it difficult to present ” and worse, understand” an application with little code fragments at a time, but even so you have helped me to make real progress with ASOC!

The more simple will of course see the application running as a whole. I don’t know how to send it, but it’s ok for me.

It’s getting late here, I’ll continue tomorrow.