Removing Pairs from a List

Adam_Bell · July 3, 2005, 9:40pm

Suppose I have a long list of unique pairs, OL, made of of parts that are not unique:
set OL to {“A”, 1, “B”, 1, “J”, 7, “Q”, 7, “J”, 16}
and I want to remove a pair, RL;
set RL to {“J”, 7},
say, to leave FL;
set FL to {“A”, 1, “B”, 1, “Q”, 23, “J”, 16}
without removing the second “J” or the second 7.
Is there a neat way to do this? So far all I’ve thought of seems needlessly convoluted.

The test “RL is in OL” is true, but doesn’t give location.

John_M · July 3, 2005, 10:02pm

Hi NovaScotian,

Is this way any less convoluted than yours?

set OL to {"A", 1, "B", 1, "J", 7, "Q", 7, "J", 16}
set RL to {"J", 7}
repeat with x from 1 to ((count of OL) - 1)
	if (item x of OL is equal to item 1 of RL) and (item (x + 1) of OL is equal to item 2 of RL) then
		if x is equal to 1 then
			set FL to items 3 thru -1 of OL
		else if x is equal to ((count of OL) - 1) then
			set FL to items 1 thru -3 of OL
		else
			set FL to (items 1 thru (x - 1) of OL) & (items (x + 2) thru -1 of OL)
		end if
		exit repeat
	end if
end repeat
return FL

Best wishes

John M

Adam_Bell · July 3, 2005, 10:13pm

Thanks John M. Yours is simpler, but still complex - I’m driven nuts by the vague recollection that I have seen a really unintuitive but crisp way of doing this.

I’ll go with it nonetheless until I find my example (lost among my script snippets with some crazy title) of a really neat way of removing an element from a list and generalize it for two adjacent elements. I’m beginning to think that the whole problem (from which this is an abstraction) would be better off if I made two lists in which item 1 corresponded to item 1, etc.

hhas · July 3, 2005, 10:19pm

Not really. Here’s a simple but naive solution creates a new list from the original, minus all occurrences of the specified pair:

set OL to {"A", 1, "B", 1, "J", 7, "Q", 7, "J", 16}
set RL to {"J", 7}

set NL to {}
repeat with i from 1 to OL's length by 2
	set pair to OL's items i thru (i + 1)
	if pair is not RL then set {end of NL, end of NL} to pair
end repeat
NL --> {"A", 1, "B", 1, "Q", 7, "J", 16}

Though if you’re dealing with large lists you’ll really want to add the usual speed kludges and/or use a more sophisticated algorithm to get acceptable (or at least non-atrocious) performance.

BTW, if all you need is general-purpose key-value data storage, you might want to check out AppleMods’ Types library, which provides reasonably efficient AssociativeList and Dictionary objects.

HTH

Adam_Bell · July 3, 2005, 11:24pm

This speeds it up for longer lists (mine is an event/date list from the future events on a calendar)

on Remove(OrigList, Removal)
	-- Set up Properties
	script ALL
		property OL : OrigList
		property NL : {}
		property Out : Removal
	end script
	-- Build new list without the removal item
	repeat with i from 1 to (ALL's OL)'s length by 2
		set pair to (ALL's OL)'s items i thru (i + 1)
		if pair is not ALL's Out then set {end of ALL's NL, end of ALL's NL} to pair
	end repeat
	return ALL's NL
end Remove

kel · July 4, 2005, 1:02am

Hi,

Using AppleScript’s text item delimiters is a lot faster, but there is a limit on text items. I forgot what the limit was, butyou can take the list items in sets and when the pair is found you’re done. The following will error if more letters are added:

set a to “ABCDEFGHIJKLMNOPQRST”
set OL to {}
repeat with this_a in a
repeat with i from 0 to 99
set end of OL to (contents of this_a)
set end of OL to i
end repeat
end repeat
set RL to {“J”, 7}
set t1 to the ticks
set the_delims to {space}
set d to AppleScript’s text item delimiters
set AppleScript’s text item delimiters to the_delims
set OLs to (OL as string) & space
set RLs to (RL as string) & space
set AppleScript’s text item delimiters to d
set NewOLs to ReplaceText(OLs, RLs, {“”})
set d to AppleScript’s text item delimiters
set AppleScript’s text item delimiters to the_delims
try
set NewOL to text items of NewOLs
set AppleScript’s text item delimiters to d
on error err_mess
set AppleScript’s text item delimiters to d
error err_mess
end try
set NewOL to items 1 thru -2 of NewOL

on ReplaceText(t, s, r)
set d to AppleScript’s text item delimiters
set AppleScript’s text item delimiters to s
set l to text items of t
set AppleScript’s text item delimiters to r
set n to l as string
set AppleScript’s text item delimiters to d
return n
end ReplaceText

I haven’t figured out why it doesn’t error in the subroutine. Note that there isn’t any error there. If it does, then wierd things might happen and you should set the TIDs back to {“”}.

gl,

Nigel_Garvey · July 4, 2005, 1:07am

I haven’t tested this exhaustively for speed, but in theory it should be pretty hot. The assumptions are:

That the pairs in the list are unique.
That they’re character/integer pairs. (I see this has now changed to events and dates, so I don’t know what classes are actually involved.)
That the cases match throughout.

on Remove(OrigList, Removal)
	-- Script object list property for speed of access through a reference.
	script ALL
		property OL : OrigList
	end script
	
	-- Get each item of of the removal pair in its own variable.
	set {theCharacter, theInteger} to Removal
	-- Get the length of the original list.
	set len to (count ALL's OL)
	
	-- Considering case will speed up string comparisons (but only do it if the cases are known to be the same). 
	considering case
		-- Only iterate through the list if you know you'll find something.
		if Removal is in OrigList then
			-- Integer comparison is faster than character comparison, so base the search on the integers.
			repeat with i from 2 to len by 2
				-- If an integer match is found, THEN do a character comparison.
				-- If the characters match too, make a new list and exit the repeat.
				if (item i of ALL's OL is theInteger) and (item (i - 1) of ALL's OL is theCharacter) then
					set NL to {}
					if (i > 2) then set NL to NL & items 1 thru (i - 2) of ALL's OL
					if (i < len) then set NL to NL & items (i + 1) thru len of ALL's OL
					exit repeat
				end if
			end repeat
		else
			-- The removal pair isn't in the list. Return the original.
			set NL to OrigList
			-- Alternatively, return a copy of the original.
			-- copy OrigList to NL
		end if
	end considering
	return NL
end Remove

Nigel_Garvey · July 4, 2005, 1:54am

Hi, kel. The limit on the number of text items you could actually ‘get’ in one go used to be about 4000. I’ve no idea what the current situation is.

When using this method, the items in the list have to be coercible to (some sort of) text and items so coerced mustn’t be confused with similar items that were text anyway. That’s no problem with letters and integers.

But your script returns everything in the new list as a string. Ideally, the integers should be preserved:

set a to "ABCDEFGHIJKLMNOPQRST"
set OL to {}
repeat with this_a in a
	repeat with i from 0 to 99
		set end of OL to (contents of this_a)
		set end of OL to i
	end repeat
end repeat
set RL to {"J", 7}

set d to AppleScript's text item delimiters
set AppleScript's text item delimiters to space
set OLs to (OL as string) & space
set RLs to (RL as string) & space
set AppleScript's text item delimiters to RLs
set i to (count words of OLs's first text item)
set len to (count OL)
set NL to {}
if (i > 0) then set NL to NL & items 1 thru i of OL
if (i < len - 2) then set NL to NL & items (i + 3) thru len of OL
set AppleScript's text item delimiters to d

return NL

kel · July 4, 2005, 2:21am

Oops. I just realized that the integers in the original list of alphabets and integers has changed to text. If speed is important, then I’d begin with a alphnumeric.

gl,

Adam_Bell · July 4, 2005, 3:37am

As always, gentlemen, I ask what I think is a simple question and am blown away completely by the range and cleverness of the answers. Thanks, all; this is a great forum.

NovaScotian (whose name is Adam Bell)

kel · July 4, 2005, 4:45am

Hi Nigel,

Thanks for the great solution. Simply go back to the original list and remove the found items!

gl,

anaxamander · July 4, 2005, 5:06am

set OL to {"A", 1, "B", 1, "J", 7, "Q", 7, "J", 16}
set RL to {"J", 7}

set pairoffset to the offset of RL as string in OL as string
set final_out to (items 1 thru (pairoffset - 1) of OL) & (items (pairoffset + 2) thru -1 of OL)

kel · July 4, 2005, 10:31am

Hi Adam,

Here’s another method. What I do is put the data in blocks of certain length and get an index through the offset. You don’t get overlapping this way. Here’s an example:

set l to {}
set lref to a reference to l
repeat with i from 1 to 40000
set s to i as string
set s to text 1 thru 10 of (s & " ")
set end of lref to s
end repeat
beep 2
set t to l as string
set o to (offset of “40000” in t)
set the_index to (o div 10) + 1

You can also place your data in the string. Anyway, it’s just an idea you might use for your big lists.

gl,

Adam_Bell · July 4, 2005, 12:43pm

anaxamander has compressed John M’s by using offset in a string to avoid the ifs required in a list. Clever.

I must confess, Kel, that I don’t “see” what’s happening in your large string version and I know that’s because I don’t understand references (Iref) properly and don’t understand what adding something to the end of a reference does (since it can’t be seen in the Results pane).

Nigel_Garvey · July 4, 2005, 1:45pm

The ifs are still required, otherwise the script errors when the pair to be removed occurs at the beginning or end of the list. It also deletes the wrong items if any of the preceding numbers in the list have more than one digit.

And ideally, when coercing a list to string, you should explicitly set the text item delimiters to “”, just in case they’re not what you think they are.

hhas · July 4, 2005, 3:04pm

Word of advice here: use the simplest thing that works. If iterating over a list is fast enough for your needs, then that’s what you should use because it’s easy to design, write, test, and - most importantly - be confident that it it will work correctly.

Once you start getting into clever tricks like, say, mashing about with text tables and TIDs, there’s a much higher risk of introducing unexpected problems; e.g. overlooking a corner case (leading to unexpected errors), producing something that’s too specialised and inflexible to be of practical use (especially easy when given vague initial requirements and trivial test data as may be the case here), etc. (Something I’ve learnt myself - all too often the hard way.:p) To quote Brian Kernighan: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”

So before you decide to go down that route, you might want to check with the OP whether or not the simple list-iterating solution was fast enough, and if it’s not then get a more detailed “big picture” description of exactly what it is they’re trying to achieve, along with some realistic and representative test data. For example: just what is this ‘event/date’ list that the OP mentions, because if it’s different to the sample data provided then you need to know exactly what does it look like, what do valid keys and values look like, how big might it be, if all its items are unique, whether to remove all matches or only the first one found, etc.

/sermon
HTH

kel · July 4, 2005, 3:32pm

Hi Adam,

The reference to is just used to speed up the creation of the list. It’s the same principle as the script property, but may be used only at the root level of the script. Here’s an expanded version of the idea. When creating the list, the program should add fillers to the list items, so when the list is coerced to string the data are placed in blocks of a certain length. There should be at least one filler per item. Something like this:

– fillers are ascii character 208
set the_list to {“e1”, ““““”, “d1”, ““““”, “e2”, ““““”, “d2”, ““““”, “e3”, ““““”, “d3”, ““““”}
set the_pair to {“e3”, ““““”, “d3”, ““““”} – 10 characters
set the_offset to (offset of (the_pair as string) in (the_list as string))
set the_index to ((the_offset div 10) * 4) + 1

The result should be 9 which is the index for “e3”. I better go to sleep.

Have a good day,

kai · July 5, 2005, 12:34am

I believe it depends on the space available in the stack at any time. On more recent systems, the size of the stack may have increased but, for some considerable time I have a suspicion that it was probably around 4096 items. Internal usage of the stack generally reduced this capacity to somewhere around 4060 (depending on what else was going on at the time) - so a figure of 4000 would allow a reasonable safety margin. (Incidentally, for an example of a workaround to this limit, see the ‘textItems’ handler at: http://bbs.applescript.net/viewtopic.php?pid=42284#p42284)

Anyway, that wasn’t the reason I jumped in here. Seems you guys have been having quite a party while I was looking the other way!

I confess that I haven’t really been following the thread (more’s the pity), but just thought I’d throw in yet another approach anyway - purely for the mental exercise. (Script object for long lists added in edit.) Regardless of the reality, it assumes that pairs are as indicated in the original example, That is, they consist of a single text character followed by a number, first pair found is the one deleted, yadda yadda yadda…

on everyPair from o apart from r
	script l
		property m : o
	end script
	considering case
		tell l's m to if r is not in it then return it
		set i to r's item 2
		set d to text item delimiters
		set text item delimiters to ""
		tell l's m's strings to set s to beginning & ({""} & rest)
		set text item delimiters to r's item 1
		tell s to repeat with n from 1 to (count text items) - 1
			tell (2 * (1 + (count (text 1 thru text item n)))) to if l's m's item it is i then
				set p to it
				exit repeat
			end if
		end repeat
		set text item delimiters to d
	end considering
	tell l's m to if p is 2 then
		items 3 thru -1
	else if p is (count) then
		items 1 thru -3
	else
		items 1 thru (p - 2) & items (p + 1) thru -1
	end if
end everyPair

set OL to {"A", 1, "B", 1, "J", 7, "Q", 7, "J", 16}
set RL to {"J", 7}
everyPair from OL apart from RL
--> {"A", 1, "B", 1, "Q", 7, "J", 16}