How to remove a column from List using Shell Script Applescript

Hi,

I need your kind support to learn Scripting through this forum. I am not from software background, so it might not be a clear description of the problem.
Thanks in advance.
Question:
How to remove a column(s) from List using Shell Script calling from Applescript?

Sample of My 4 Column List (app 4000 Records) Sample as below

set Test_List to {“217.0 241.0 217224033 37.404”, “225.0 241.0 057181074 25.407”, “249.0 241.0 193039045 11.751”, “241.0 241.0 247147030 35.026”}

I Want Result_List with only 3rd Column i.e.

Result_List Value = {“217224033”, “0571810747”, “193039045”, “247147030”}

I can’t convert commands like

awk ‘{print $1,$2,$3,$4,$5,$7}’ file
cut -f1,2,3,4,5,7 file

to work with the List. How can I create ‘Do Shell Script’ Command for above List to get the desired Result?

(I also want to Save Test_List as newline delimited Test_List.txt and Result as Result_List for Learning)

Hi. Welcome to MacScripter.

At it’s simplest, your third-column-only problem could be solved like this:

set Test_List to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}

set Result_List to {}
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to space
repeat with this_Item in Test_List
	set end of my Result_List to text item 3 of this_Item
end repeat
set AppleScript's text item delimiters to astid

return Result_List

With 4000 records, the script would end up containing two 4000-item lists, which could lead to a bloated script file. But it it’s OK for experimenting in Script Editor.

As I’m not fond of shell, here is a version using ASObjC.

----------------------------------------------------------------
use AppleScript version "2.5"
use framework "Foundation"
use scripting additions
----------------------------------------------------------------

property saveIt : true
-- true --> save the result in a text file
-- false --> don't save the result in a text file

my germaine()

on germaine()
	set Test_List to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
	
	set theArray to current application's NSArray's arrayWithArray:Test_List
	
	set tempArray to current application's NSMutableArray's new()
	repeat with aRecord in theArray
		
		set wanted to item 3 of (my splitString:aRecord usingString:space)
		(tempArray's addObject:wanted)
	end repeat
	
	set theData to (tempArray's componentsJoinedByString:linefeed)
	if saveIt then -- save data into a file
		set newURL to ((path to desktop as text) & "third column.txt") as «class furl»
		(theData's writeToURL:newURL atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
	else
		set theData to theData as string
	end if
end germaine

#=====

on splitString:someText usingString:d1
	set theString to current application's NSString's stringWithString:someText
	set theList to theString's componentsSeparatedByString:d1
	return theList as list
end splitString:usingString:

#=====

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) lundi 15 juin 2020 22:01:01

Hi Nigel Garvey,

Thanks for your quick response. Great your solution works nicely.

I am looking for a “Do Shell Script” Solution for Learning so that I can run various commands through AppleScript for fast processing.

Can you post some example to use awk, cut etc to call from AppleScript. I found various examples for terminal command, but how can I convert them to “Do Shell Script” format?

Hi Yvan Koenig,

Excellent script! Thanks a lot.

I have to work very hard to learn this type of highly efficient stuff.

ASObjC which is really verbose may be intimidating but it’s really powerful.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) lundi 15 juin 2020 22:52:49

I would do this as described by Nigel but the following appears to work:

set theList to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}

set newList to {}

repeat with anItem in theList
	do shell script "echo " & anItem & "| cut -d' ' -f3"
	set end of newList to result
end repeat

newList --> {"217224033", "057181074", "193039045", "247147030"}

Here is an alternate code which uses a single array. I’m not sure that it’s more efficient than my first one and I don’t have a 4000 items list for testing.

----------------------------------------------------------------
use AppleScript version "2.5"
use framework "Foundation"
use scripting additions
----------------------------------------------------------------

property saveIt : true
-- true --> save the result in a text file
-- false --> don't save the result in a text file

my germaine()

on germaine()
	set Test_List to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
	
	set theArray to current application's NSMutableArray's arrayWithArray:Test_List
	
	set i to 0
	repeat with aRecord in theArray
		set wanted to item 3 of (my splitString:aRecord usingString:space)
		(theArray's removeObjectAtIndex:i)
		(theArray's insertObject:wanted atIndex:i)
		set i to i + 1
	end repeat
	set theData to (theArray's componentsJoinedByString:linefeed)
	if saveIt then -- save data into a file
		set newURL to ((path to desktop as text) & "third column.txt") as «class furl»
		(theData's writeToURL:newURL atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
	else
		set theData to theData as string
	end if
end germaine

#=====

on splitString:someText usingString:d1
	set theString to current application's NSString's stringWithString:someText
	set theList to theString's componentsSeparatedByString:d1
	return theList as list
end splitString:usingString:

#=====

You may also test :

----------------------------------------------------------------
use AppleScript version "2.5"
use framework "Foundation"
use scripting additions
----------------------------------------------------------------

property saveIt : true
-- true --> save the result in a text file
-- false --> don't save the result in a text file

my germaine()

on germaine()
	set Test_List to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
	
	set theArray to current application's NSMutableArray's arrayWithArray:Test_List
	set nbRec to (count theArray) - 1
	repeat with i from 0 to nbRec
		set arecord to (theArray's objectAtIndex:0) as string
		set wanted to (item 3 of (my splitString:arecord usingString:space))
		(theArray's removeObjectAtIndex:0)
		(theArray's insertObject:wanted atIndex:nbRec)
	end repeat
	set theData to (theArray's componentsJoinedByString:linefeed)
	if saveIt then -- save data into a file
		set newURL to ((path to desktop as text) & "third column.txt") as «class furl»
		(theData's writeToURL:newURL atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
	else
		set theData to theData as string
	end if
end germaine

#=====

on splitString:someText usingString:d1
	set theString to current application's NSString's stringWithString:someText
	set theList to theString's componentsSeparatedByString:d1
	return theList as list
end splitString:usingString:

#=====

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mardi 16 juin 2020 00:02:25

These alternate versions prove to be less efficient than the first one which treated a list of 400,000 strings in less than 78 seconds.

Hi Peavine,

Thank you so much. This was exactly the solution, I was struggling to find. My problem is solved thank once again.

Hi Yvan Koenig,

Thanks again for your alternative script!

Will test both the scripts with my full dataset.

Hi, peavine. While that works, I wouldn’t instantiate a shell script call more than a time or two; 4000 times—as is the OP’s intended purpose—is going to be excessively expensive, and one call is all that’s needed.

set theList to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
set text item delimiters to linefeed
set theList to theList as text
set text item delimiters to ""

do shell script "echo " & theList's quoted form & "| cut -d '" & space & "' -f3 "

Thanks Marc Anthony. I’m always eager to learn new stuff and appreciate your post. I ran our scripts through Script Geek and your script was three times faster, which is certainly a worthwhile improvement.

The requested shell script solution’s been provided, but here’s another entry in the ASObjC stakes. :slight_smile: With four thousand records, it’s about four times as fast as the shell script — although that itself takes less than half a second on my machine.

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

set Test_List to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
set sample to Test_List
repeat 999 times
	set Test_List to Test_List & sample
end repeat

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
set Test_String to Test_List as text
set AppleScript's text item delimiters to astid

set Test_String to current application's class "NSString"'s stringWithString:(Test_String)
set Test_String to Test_String's stringByReplacingOccurrencesOfString:("(?m)^(?:\\S++\\s){2}+(\\S++).++$") withString:("$1") options:(current application's NSRegularExpressionSearch) range:({0, Test_String's |length|()})
return (Test_String as text) -- 's paragraphs

It’s as well to remember that there’s a limit to the length of text that can be used to constitute a ‘do shell script’ command (https://developer.apple.com/library/archive/technotes/tn2065/_index.html#//apple_ref/doc/uid/DTS10003093-CH1-TNTAG6-HOW_LONG_CAN_MY_COMMAND_BE__REALLY_), although in the current case, the “echo” and “cut” commands and the four thousand paragraphs appear to be OK. Otherwise, the multi-K-paragraph text would be need to saved to a file for the shell script to read itself while running.

I hadn’t run into the argument length pitfall before, Nigel, but I’ll note the technical limit tidbit for the future. Regarding a recommendation to newbies, the vanilla method in post #2 is probably the better starting point. I couldn’t recommend the ASObjC approach to a newbie; I would have abandoned AppleScript altogether, had that been the barrier to entry.

It seems clear that the OP has 4000 records of 4 columns.

But it’s not really the problem.

On my side, I tried to enhance my original proposal, without using regex which I don’t understand.
To test the new version, I built a list of 400,000 ‘records’ of 4 columns each.

----------------------------------------------------------------
use AppleScript version "2.5"
use framework "Foundation"
use scripting additions

-- Edited on 2020/06/17
----------------------------------------------------------------

property saveIt : true
-- true --> save the result in a text file
-- false --> don't save the result in a text file

my germaine()

on germaine()
	set record1 to "217.0 241.0 217224033 37.404"
	-- split the string using space as delimiter
	set nbColumns to count (my splitString:record1 usingString:space)
	set wantedColumn to 3
	set Test_List to {record1, "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
	-- Create an empty array (one whose content may be modified)
	set theArray to current application's NSMutableArray's new()
	set nbLists to 100000
	-- append nbLists times the list Test_List at the end of the array
	repeat nbLists times
		theArray's addObject:Test_List
	end repeat
	-- Gather the list of nbLists sublists in a list of 4 * nbLists 'records'
	set theArray to (theArray's valueForKeyPath:"@unionOfArrays.self")
	-- Now we have an array of 400,000 'records'
	tell me to say "Go"
	
	set startDate to current application's NSDate's |date|()
	-- Concat them with space character
	set NSString to theArray's componentsJoinedByString:space
	-- Split it using space character
	set newArray to NSString's componentsSeparatedByString:space
	set indexMax to count newArray # EDITED
	-- Create an empty array (one whose content may be modified)
	set mutableArray to current application's NSMutableArray's new()
	-- Extract the 3rd item from every record
	repeat with i from wantedColumn to indexMax by nbColumns -- EDITED
		set wanted to item i of newArray
		(mutableArray's addObject:wanted) -- add it to the mutable array
	end repeat
	-- Concat the extracted strings using linefeed
	set theData to (mutableArray's componentsJoinedByString:linefeed)
	if saveIt then -- save data into a file
		set newURL to ((path to desktop as text) & "third column.txt") as «class furl»
		(theData's writeToURL:newURL atomically:true encoding:(current application's NSUTF8StringEncoding) |error|:(missing value))
	else
		set theData to theData as string
	end if
	set timeDiff to startDate's timeIntervalSinceNow()
	display dialog "That took " & (-timeDiff as real) & " seconds."
end germaine

#=====

on splitString:someText usingString:d1
	set theString to current application's NSString's stringWithString:someText
	set theList to theString's componentsSeparatedByString:d1
	return theList as list
end splitString:usingString:

#=====

On my iMac (mid 2011), extracting the 400,000 strings took less than 30 seconds.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mardi 16 juin 2020 18:57:17

Hi Yvan.

I think your line in post #15 which sets indexMax to (count newArray) - 1 is actually meant to set nbItems to (count newArray) or newArray’s |count|(). With that fix, the script displays a processing time of just over 22 seconds on my machine. This timing version of the regex script takes just over 1 second:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

on main()
	set Test_List to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
	set sample to Test_List
	repeat 999 times
		set Test_List to Test_List & sample
	end repeat
	set sample to Test_List
	repeat 99 times
		set Test_List to Test_List & sample
	end repeat
	set sample to missing value
	count Test_List
	say result
	
	set start to current application's class "NSDate"'s new()
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to linefeed
	set Test_String to Test_List as text
	set AppleScript's text item delimiters to astid
	
	set Test_String to current application's class "NSString"'s stringWithString:(Test_String)
	set Test_String to Test_String's stringByReplacingOccurrencesOfString:("(?m)^(?:\\S++\\s){2}+(\\S++).++$") withString:("$1") options:(current application's NSRegularExpressionSearch) range:({0, Test_String's |length|()})
	(Test_String as text) -- 's paragraphs
	display dialog "That took " & -(start's timeIntervalSinceNow()) & " seconds."
end main

main()

I was surprised by Nigel’s results and so I ran some timing tests with Script Geek. Some comments:

  • In both cases, I created Test_List with Nigel’s code.

  • Before testing, I ran both scripts in Script Editor to make sure they returned the desired results.

  • The timing results reported below are those shown under “First Run”.

  • I restarted Script Geek after each run (which is significant).

My results:

Nigel - Post 13 - 0.338 seconds

Marc Anthony - Post 11 - 0.121 seconds

Test results for subsequent runs of both scripts with everything in memory were very close (about 0.11 seconds). I guess this reflects the time it takes ASObjC to start.

I didn’t test any of Yvan’s scripts because I didn’t understand them sufficiently.

Because I changed it somewhat for test purposes, the following is the script I tested with Marc Anthony’s shell solution:

set Test_List to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
set sample to Test_List
repeat 999 times
	set Test_List to Test_List & sample
end repeat

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
set Test_String to Test_List as text
set AppleScript's text item delimiters to astid

do shell script "echo " & Test_String's quoted form & "| cut -d '" & space & "' -f3 "

BTW, I tested the above script in Script Geek but without the last line and the test results were 0.10 seconds. So, the shell command only takes about 21 milliseconds to do its work.

Hello Nigel

The newArray contain (count newArray) items

I mistakenly mistook theArray’s objectAtIndex:i which starts with the index 0 and ends with (count newArray) - 1
and theArray’s item i which starts with the index 1 and ends with (count newArray).

Worse, my script extracted the 2nd column when it was supposed to extract the 3rd one.
but the index in the array starts at 0 while it starts at 1 if I convert the array into a list.

I edited message #15 accordingly

If I read well, your script treat a list of 4000 records while mine treat a list of 400,000 ones.
Am’I wrong ?

I was wrong, I missed that you replicated the original list 999 times

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mercredi 17 juin 2020 11:21:41

I edited the script in message #15 adding comments explaining what it’s doing.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mercredi 17 juin 2020 11:31:26

Hi peavine.

My initial timings were admittedly crudely done, mainly to establish which method was the faster. The scripts were run in separate Script Debugger windows and were timed with SD’s built-in timer. Both timings included the time taken to build the 4,000-item list — which isn’t a good way to compare the performances of the methods themselves — and I didn’t bother pausing the BOINC tasks I normally have running in the background.

If I substitute the ‘do shell script’ command for the ASObjC stuff in my post #16 timing script, comment out the second repeat to leave the number of items at 4,000, and have the handler return the final message instead of displaying it:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

on main()
	set Test_List to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
	set sample to Test_List
	repeat 999 times
		set Test_List to Test_List & sample
	end repeat
	(*set sample to Test_List
	repeat 99 times
		set Test_List to Test_List & sample
	end repeat*)
	set sample to missing value
	count Test_List
	say result
	
	set start to current application's class "NSDate"'s new() -- Start timing here.
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to linefeed
	set Test_String to Test_List as text
	set AppleScript's text item delimiters to astid
	
	do shell script "echo " & Test_String's quoted form & "| cut -d '" & space & "' -f3 "
	
	return "That took " & -(start's timeIntervalSinceNow()) & " seconds."
end main

main()

… the result is typically “That took 0.049322962761 seconds.”, ± 0.003 seconds, in Script Editor with BOINC paused on my machine. The same arrangement with the original ASObjC/regex code gives “That took 0.010099053383 seconds.”, ± 0.001 seconds.

With the second repeat uncommented to raise the number of items to 400,000, the ASObjC version typically returns “That took 1.024765014648 seconds.”, ± 0.008 seconds. The shell script version errors out with “The command exited with a non-zero status.”

Intriguingly, if the shell script version’s modified to be able to handle 40,000 items — ie. the text is written out to a file which the shell script then reads — the shell script’s faster than the ASObjC with that many items:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

on main()
	set Test_List to {"217.0 241.0 217224033 37.404", "225.0 241.0 057181074 25.407", "249.0 241.0 193039045 11.751", "241.0 241.0 247147030 35.026"}
	set sample to Test_List
	repeat 999 times
		set Test_List to Test_List & sample
	end repeat
	set sample to Test_List
	repeat 99 times
		set Test_List to Test_List & sample
	end repeat
	set sample to missing value
	count Test_List
	say result
	
	set start to current application's class "NSDate"'s new() -- Start timing here.
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to linefeed
	set Test_String to Test_List as text
	set AppleScript's text item delimiters to astid
	
	set Text_File to ((path to desktop as text) & "Test.txt") as «class furl»
	set fRef to (open for access Text_File with write permission)
	try
		set eof fRef to 0
		write Test_String as «class utf8» to fRef
		close access fRef
	on error errMsg
		close access fRef
		display dialog errMsg buttons {"Stop"} default button 1 cancel button 1
	end try
	
	do shell script "cut -d '" & space & "' -f3 <" & quoted form of POSIX path of Text_File
	
	return "That took " & -(start's timeIntervalSinceNow()) & " seconds."
end main

main()
--> "That took 0.783697009087 seconds."