Awk case sensitive search

I have been using
set foundList to do shell script “awk -F, '(index($” & theCol & “,” & searchChars & ") != 0)’ " & filePath
to search which has always worked but I now need to the ability to ignore case. Have tried many awk options but cannot find the correct combination so any help would be gratefully received.

Hi. The awk implementataion on my Mojave machine doesn’t have an ignore case option nor does it allow for regex within the index command. It’s unclear what the input/result looks like in your question, but it may be easier to use another awk command—such as match—or a piped grep to filter the result. It’s also possible to provide specific regex case alternates, which may be tedious, depending on their frequency.

do shell script "echo \"Test, TEST, test\" | awk -F '(T|t)est' '{print $2}' " --value returned; example essentially ignores either 'Test' or 'test' but not 'TEST'

do shell script "awk 'BEGIN { print match(\"This is a Test\", \"(T|t)est\") }' " --index returned, ignoring specified case

Apparently gawk, aka GNU awk, has an IGNORECASE option, which doesn’t seem to exist in any of the other variants. It could be installed using one of the package managers like macports. The gnu has a brief discussion of its workings on their site.

I do apologise, it’s been a while since I asked a question in a forum and I forgot to enter ALL the relevant information. I have written an ApplescriptObjC script where the user enters a name and then multiple files are searched for a matching name and address to display. Most of the files have only one name to search on so have used grep command which ignores case but one file has multiple names so I needed to specify which field/column. I could not achieve this with grep so tried awk to specify a field but awk is case sensitive and as all the names are a mixture of upper/lower case it misses certain names. Is there another way I can achieve the desired result? Big Sur 11.7.3 Xcode 12.5 Brain Addled!

The OP already has a script and is looking for a shell solution, and I don’t have one. However, I thought I would use his request to practice some ASObjC skills, and, FWIW, I’ve included my script below. It searches on the first (and depending on user input) the second word of each paragraph of each text file. I have tested this script on Ventura only.

My script is not going to win any awards for brevity, but it searched 3 text files, each of which contained 1704 paragraphs, in 4 milliseconds.

use framework "Foundation"
use scripting additions

set theFolder to "/Volumes/Store/Test/" -- change to desired folder
set theString to text returned of (display dialog "Enter the person's last name or last name and first name:" default answer "Lincoln Abe")
set theFiles to getFiles(theFolder)
set matchingData to getMatchingData(theFiles, theString)

on getFiles(theFolder)
	set fileManager to current application's NSFileManager's defaultManager()
	set theFolder to current application's |NSURL|'s fileURLWithPath:theFolder
	set folderContents to fileManager's contentsOfDirectoryAtURL:(theFolder) includingPropertiesForKeys:{} options:4 |error|:(missing value)
	set thePredicate to current application's NSPredicate's predicateWithFormat:"pathExtension ==[c] 'txt'"
	set theFiles to (folderContents's filteredArrayUsingPredicate:thePredicate)'s valueForKey:"path"
	return (theFiles's sortedArrayUsingSelector:"localizedStandardCompare:")
end getFiles

on getMatchingData(theFiles, theString)
	set theString to current application's NSString's stringWithString:theString
	set theArray to (theString's componentsSeparatedByString:" ")
	if (theArray's |count|()) = 1 then
		set searchPattern to "(?im)^" & ((theArray's objectAtIndex:0) as text) & ".*$"
	else if (theArray's |count|()) = 2 then
		set searchPattern to "(?im)^" & ((theArray's objectAtIndex:0) as text) & " " & ((theArray's objectAtIndex:1) as text) & ".*$"
	else
		display alert "The entered data could not be processed"
		error number -128
	end if
	set matchingData to current application's NSMutableArray's new()
	repeat with aFile in theFiles
		(matchingData's addObject:aFile)
		set matchingLines to getMatchingLines(aFile, searchPattern)
		if (matchingLines's isEqualToString:"") then set matchingLines to (matchingLines's stringByAppendingString:"** No matching lines were found **")
		set matchingLines to (matchingLines's stringByAppendingString:linefeed)
		(matchingData's addObject:matchingLines)
	end repeat
	return ((matchingData's componentsJoinedByString:linefeed) as text)
end getMatchingData

on getMatchingLines(theFile, searchPattern)
	set theString to current application's NSString's stringWithContentsOfFile:theFile encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	set theDelimiters to (current application's NSCharacterSet's newlineCharacterSet())
	set theArray to (theString's componentsSeparatedByCharactersInSet:theDelimiters)
	set thePredicate to current application's NSPredicate's predicateWithFormat_("(self MATCHES %@)", searchPattern)
	set theData to (theArray's filteredArrayUsingPredicate:thePredicate)
	return (theData's componentsJoinedByString:linefeed)
end getMatchingLines

The following is a test text FWIW:

Doe John 111 Anywhere Street Prescott AZ 83222
Lincoln Abraham The White House Washington DC
Lincoln Abe The White House Washington DC
Peabody Peavine, 1 Forest Drive Prescott AZ

1 Like

If you can install a utility, consider ack, which can be installed using a package manager such as macports (% port search --exact ack). It is described as a grep replacement and offers an option for case sensitive search.

From its man page:

       -i, --ignore-case
           Ignore case distinctions in PATTERN.  Overrides —smart-case and
           -I.

       -I, --no-ignore-case
           Turns on case distinctions in PATTERN.  Overrides —smart-case and
           -i.

       -S, --[no]smart-case, --no-smart-case
           Ignore case in the search strings if PATTERN contains no uppercase
           characters. This is similar to "smartcase" in the vim text editor.
           The options overrides -i and -I.

NB I had to substitute for the double-dash in front of ‘smart-case’ as it was affecting formatting.

You can also set case sensitivity in ack’s configuration but setting the option on the command line will override it. You can read its man page online.

1 Like

Thank you to everyone for their input, especially peavine for an amazing script (which I’m still trying to assimilate), I will now try each solution although gawk is looking favourite!

I’m not sure if this could be an option for you or not, but I thought I would throw the idea out there… if for not other reason than a new perspective or thought generator.

What if you…

  • Read your file into a variable within your script
  • Converted the text to all upper or lower case
  • Performed your existing awk command (case insensitive) with a string that was also converted to all upper or lower case to match the text file

This approach could quickly fall flat if you need the exact case of the content in the document for your final result. Just thought I would share the concept to see if it could assist in your final need.

Wait. I like this — convert to all upper or all lower- search, get ranges, apply ranges back to untouched text?

Yes, exactly. This may or may not work for your situation, so I thought I would share the idea to spark thoughts