Searching for a partial string inside a list

Hey folks,
I’ve been wrecking my brain trying to figure this out; searching high and low here on Macscripter and through the Language Guide, etc.: Is it possible to remove items from a list (of about 200 items) using only a partial string match?

So for example I’ve got a large list of different packages from an Android device; a list that looks like this:

{com.htc.blah.blah, com.google.googledevice, com.android.bliggyblah, com.random.companyapp}

I want to remove certain items from that list but keep others; anything containing “com.htc.”, or “com.google.” needs to go, and anything without it should stay. I can’t do exact string matches since there’s an infinitesimal amount of possibilities that any of the “com.whatever.” could be attached to. Here’s what I’ve tried so far as a proof of concept:

	set itemsToDelete to {"com.htc."}
	set cleanList to {}
	repeat with i from 1 to count packageList_
		if {packageList_'s item i} does not contain itemsToDelete then set cleanList's end to packageList_'s item i
	end repeat

packageList_ contains the list of 200 items
cleanList is where I’m trying to send all the good stuff so I can use it later
itemsToDelete - fairly self explanatory. ideally this will have multiple items in it as well to search for and remove.

As you can see above I have “com.htc.” in there hoping it will remove any of those items in the list that contain that string. But this does not work; the statement always returns false. If I was to replace “com.htc.” with something specific that is within packageList_ (like “com.google.android.talk”) it seems to work - but only for that single item. If I add a second item to the list it will seem to skip it.

Currently I’m stumped, since by all my testing this works:

set myTestList to {1, 2, 3, 4, 5, 6, 7, 8}

set itemsToDelete to {4, 5, 6}
set goodList to {}

repeat with i from 1 to count myTestList
	if {myTestList's item i} is not in itemsToDelete then set goodList's end to myTestList's item i
end repeat

goodList

The Language Guide seems to indicate that you can use / but that doesn’t seem to work either, and the only other reliable source I was able to find (http://www.acm.uiuc.edu/iCal/workshops/applescript/1999/introduction/conditions.html - indicates that its not possible to get a partial match inside a list. But that’s from ages ago, and it doesn’t give any alternatives.

Is it possible to do this? Am I missing something stupid?

Browser: Safari 535.11
Operating System: Mac OS X (10.7)

Hi,

in vanilla AppleScript you can filter list items by partial strings only with a repeat loop.
This is a solution with help of the shell.
It coerces the list to paragraphs of text and filters the lines with grep.
At the end the paragraphs are coerced back to list


set itemsToKeep to {"com.htc", "com.google"}
set myTestList to {"com.htc.blah.blah", "com.google.googledevice", "com.android.bliggyblah", "com.random.companyapp"}

set TID to text item delimiters
set text item delimiters to "\\|"
set grepItems to itemsToKeep as text
set text item delimiters to linefeed
set myTestText to myTestList as text
set goodText to do shell script "echo  " & quoted form of myTestText & " | grep " & quoted form of grepItems
set goodList to goodText as text
set text item delimiters to TID
goodList


Stefan,

This keeps any item containing com.htc or com.google. Diffusion9 wanted everything else but…

Changing the grep to grep -v works!

set itemsToDelete to {"com.htc", "com.google"}
set myTestList to {"com.htc.blah.blah", "com.google.googledevice", "com.android.bliggyblah", "com.random.companyapp", "com.dot.company", "or_something.completely-different!!"}

set TID to text item delimiters
set text item delimiters to "\\|"
set grepItems to itemsToDelete as text
set text item delimiters to linefeed
set myTestText to myTestList as text
set goodText to do shell script "echo  " & quoted form of myTestText & " | grep -v " & quoted form of grepItems
set goodList to goodText as text
set text item delimiters to TID
goodList

Hi there!
What would your code look like? Hard to help without seeing it.

Here’s a trick I learned from one of Nigel Garvey’s posts:

set the_list to {1, 2, 3, 4, 5}
tell the_list
	set item 2 to missing value
	set item 4 to missing value
end tell
integers of the_list

I think it’s mentioned in the AppleScriptLanguageGuide.pdf also.

gl,
kel

Hello kel1

It seems that you missed a detail.
The code posted by the Original Poster was a sample.
In real life, the list isn’t a list of integers but something like :
{com.htc.blah.blah, com.google.googledevice, com.android.bliggyblah, com.random.companyapp}

So I’m not sure that your fine tip is relevant in this case.

Yvan KOENIG (VALLAURIS, France) jeudi 18 septembre 2014 18:30:05

Hi Yvan,

The list of integers was only an example. But, the op uses a list of strings. I was lazy and did not want to type a list of strings or even copy. :slight_smile: Thanks for pointing that out.

gl,
kel

Proof of concept:

set itemsToDelete to {"com.htc", "com.google"}
set myTestList to {"com.htc.blah.blah", "com.google.googledevice", "com.android.bliggyblah", "com.random.companyapp", "com.dot.company", "or_something.completely-different!!"}

repeat with thisItem in myTestList
	if thisItem contains "com.htc" then set contents of thisItem to missing value
end repeat

myTestList
--> {missing value, "com.google.googledevice", "com.android.bliggyblah", "com.random.companyapp", "com.dot.company", "or_something.completely-different!!"}

myTestList's text
-->{"com.google.googledevice", "com.android.bliggyblah", "com.random.companyapp", "com.dot.company", "or_something.completely-different!!"}

(twas this, or picking my nose. AppleScript won :D)

Thanks kel1 and alastor933

I wrote :
So I’m not sure that your fine tip is relevant in this case.

It appears that I was wrong.
Thanks, I learnt something.

Yvan KOENIG (VALLAURIS, France) jeudi 18 septembre 2014 19:32:34

Hi,

When Nigel pointed this out in his post, I was surprised. I must have read trough the AppleScriptLanguageGuide.pdf at least a thousand times and never got what that brief list manipulation info could do.

Thanks Nigel,
kel

Another implementation. Deletes items from large lists pretty fast as long as the list of items to remove is smaller.

set itemsToDelete to {"com.htc", "com.google"}
set myTestList to {"com.htc.blah.blah", "com.google.googledevice", "com.android.bliggyblah", "com.random.companyapp", "com.dot.company", "com.google.googledevice", "or_something.completely-different!!", "com.google.googledevice"}

repeat with itemToDelete in itemsToDelete
	set myTestList to removeItemsContainingString(myTestList, itemToDelete)
end repeat
return myTestList

on removeItemsContainingString(theList, theText)
	set delimiter to linefeed & "--s0m3Th1n9Not1NL15t--" & linefeed --use a unique delimiter
	set oldTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to delimiter
	set delimitedText to delimiter & (theList as string) & delimiter
	if delimitedText contains theText then
		set AppleScript's text item delimiters to theText
		set theItems to every text item of delimitedText
		set AppleScript's text item delimiters to delimiter
		repeat with iItem in theItems
			if (count of (text items of contents of iItem)) < 3 then
				set contents of iItem to missing value
			else
				set contents of iItem to text items 2 thru -2 of iItem as string
			end if
		end repeat
		set theItems to text items of (strings of theItems as string)
	else
		return theList
	end if
	set AppleScript's text item delimiters to oldTIDs
	return theItems
end removeItemsContainingString

Would this be faster (against DJ’s prop)?


set itemsToDelete to {"com.htc", "com.google"}
set myTestList to {"com.htc.blah.blah", "com.google.googledevice", "com.android.bliggyblah", "com.random.companyapp", "com.dot.company", "com.google.googledevice", "or_something.completely-different!!", "com.google.googledevice"}

copy myTestList to myListToProcess
repeat with i from 1 to length of itemsToDelete
	set myNewList to {}
	repeat with j from 1 to length of myListToProcess
		if (item 1 of myListToProcess) does not contain (item i of itemsToDelete) then
			copy item 1 of myListToProcess to end of myNewList
		end if
		set myListToProcess to (rest of myListToProcess)
	end repeat
	set myListToProcess to myNewList
end repeat
return myNewList

Your script can be made faster by putting the list length into a variable, and not getting it with every pass of the loop. Wouldn’t matter with lists of 10 items, but imagine fetching that count 1000 times…

OK, tried with a repeat loop of 100000 times on the very code above.
My version runs on 10.99 sec
Yours on 10.75 sec

So you won :slight_smile:

This said I should try on much longer lists and see wether it’s worth setting the list length in a variable.

Very interesting!

I haven’t thought of the filter form of a list for some while. That’s a very slick way to filter stuff out of a list in-place.


set itemsToDelete to {"com.htc", "com.google"}
set myTestList to {"com.htc.blah.blah", "com.google.googledevice", "com.android.bliggyblah", "com.random.companyapp", "com.dot.company", "or_something.completely-different!!"}

repeat with listItem in myTestList
	repeat with deleteItem in itemsToDelete
		if listItem contains deleteItem then
			set contents of listItem to 0
		end if
	end repeat
end repeat

set myTestList to text of myTestList

Running that with 500 items in the primary list takes about 0.038 seconds in Smile (timing with chrono) and is maybe a little faster in Shane’s Script Geek.

I would normally do something like this with the Satimage.osax.


set myTestList to {"com.htc.blah.blah", "com.google.googledevice", "com.android.bliggyblah", "com.random.companyapp", "com.dot.company", "or_something.completely-different!!"}
set regexStr to ".*com\\.(htc|google).*\\s?"

set myTestList to join myTestList using linefeed
set myTestList to change regexStr into "" in myTestList with regexp without case sensitive
set myTestList to change "\\s+\\Z" into "" in myTestList with regexp without case sensitive
set myTestList to splittext myTestList using linefeed
set myTestList to sortlist myTestList comparison 1 with remove duplicates

On a big list it’s faster than the other method.