Removing excess rows in Numbers

Hi All,

I am a longtime lurker here and recently have been able to do more scripting. I am working with data tables in Numbers, and in these tables I recieve data with rows that do not contain data interspersed with rows that do contain data. Since there are several thousand lines worth of data, removing these rows makes the rest of my work faster and easier. So I wrote a script that works to remove the rows without data, but it runs much slower than I would like like. I am new to applescript, and scripting in general, so I am sure that there is a much faster way to do the same task. Thanks for any help! Below is the script…sorry it is so messy!


set theCol to display dialog "Choose column with Data (Enter a Single Letter)" default answer "C" buttons {"OK"} default button 1
set TestCol to text returned of theCol

set StartTime to time of (current date)

tell application "Numbers"
	set totalChanges to 0
	tell document 1
		tell active sheet
			set theTable to the first table
			set EachRow to the rows of theTable
			try
				repeat with theRowRef in EachRow
					set theRow to the contents of theRowRef
					set RowIndex to the address of theRow
					tell theTable
						set theCelladd to cell (TestCol & RowIndex as string)
						set theCell to the value of theCelladd
					end tell
					
					if theCell = missing value then
						remove theRow
						set totalChanges to totalChanges + 1
						repeat while theCell = missing value
							set theRow to the contents of theRowRef
							set RowIndex to the address of theRow
							tell theTable
								set theCelladd to cell (TestCol & RowIndex as string)
								set theCell to the value of theCelladd
							end tell
							if theCell = missing value then
								remove theRow
								set totalChanges to totalChanges + 1
								
							end if
						end repeat
						
						--exit repeat (for troubleshooting)
						
					end if
					
				end repeat
			end try
		end tell
		
	end tell
end tell
set EndTime to time of (current date)
set ScriptTime to EndTime - StartTime
display dialog (totalChanges as string) & " total changes were made." & " This script took " & (ScriptTime as string) & " seconds."


Hi.

This should be faster in theory, but it’s about the same speed when there are only fifty rows to check! It sends fewer individual data requests to Numbers, so it should be faster if you have thousands of rows. But it still has to do the same number of row removals.


on deleteEmptyRows()
	script o
		property rowValues : missing value -- Will be a list of lists of the rows' cell values.
	end script
	
	tell application "Numbers"
		set theTable to table 1 of active sheet of document 1
		return row 1 of theTable
		-- Get a list of lists of row cell values rather than a list of Numbers rows. It saves quizzing the rows individually in the application.
		set o's rowValues to value of cells of rows of theTable
	end tell
	
	-- Go through the rows from the bottom up because the later row numbers will change as the earlier rows are deleted. It may also be faster.
	set rowCount to length of o's rowValues
	repeat with rowIndex from rowCount to 1 by -1
		set valuesInThisRow to item rowIndex of o's rowValues
		set emptyRow to true
		repeat with thisCellValue in valuesInThisRow
			if (thisCellValue's contents is not missing value) then
				set emptyRow to false
				exit repeat
			end if
		end repeat
		if (emptyRow) then tell application "Numbers" to remove row rowIndex of theTable
	end repeat
	
	tell application "Numbers" to return rowCount - (address of last row of theTable)
end deleteEmptyRows

set startTime to (current date)
set totalChanges to deleteEmptyRows()
set endTime to (current date)
set ScriptTime to endTime - startTime
display dialog (totalChanges as string) & " total changes were made." & " This script took " & (ScriptTime as string) & " seconds."

Nigel-

Thanks so much for taking the time write a responses! It never dawned on me to parse the table backward to avoid the changing row issue. Your script works very well. I had to modify it slightly because in my data tables every row has a time reference in column A, and your data check would keep the row unless it was completely empty. Below is the modified version, but I think it is very similar to yours, instead of looping through every cell of a row, it disregards the first cell of the row.

I thought you would be interested to know that I ran both my original script and your modified script on one of my smaller data sets.

My Script: 1055 changes, 551 seconds
Nigel’s Script: 1055 changes, 428 seconds.

I think that is a serious improvement! I did have one question for you, and I am sorry if it seems rudimentary, but I don’t understand the script/end script block. Why use a property instead of a local variable to keep track of the all the row’s cell’s values?

Thanks so much for the help! It will save me quite a bit of time in the future.

Mac



on deleteEmptyRows()
	script o
		property rowValues : missing value -- Will be a list of lists of the rows' cell values.
	end script
	
	tell application "Numbers"
		set theTable to table 1 of active sheet of document 1
		-- return row 1 of theTable
		-- Get a list of lists of row cell values rather than a list of Numbers rows. It saves quizzing the rows individually in the application.
		set o's rowValues to value of cells of rows of theTable
	end tell
	
	-- Go through the rows from the bottom up because the later row numbers will change as the earlier rows are deleted. It may also be faster.
	set rowCount to length of o's rowValues
	repeat with rowIndex from rowCount to 1 by -1
		set valuesInThisRow to item rowIndex of o's rowValues
		set emptyRow to true
		set cellCount to count of valuesInThisRow
		-- return cellCount
		repeat with cellIndex from cellCount to 2 by -1
			set CellValue to item cellIndex of valuesInThisRow
			
			if (CellValue's contents is not missing value) then
				set emptyRow to false
				exit repeat
			end if
		end repeat
		if (emptyRow) then tell application "Numbers" to remove row rowIndex of theTable
	end repeat
	
	tell application "Numbers" to return rowCount - (address of last row of theTable)
end deleteEmptyRows




set startTime to (current date)
set totalChanges to deleteEmptyRows()
set endTime to (current date)
set ScriptTime to endTime - startTime
display dialog (totalChanges as string) & " total changes were made." & " This script took " & (ScriptTime as string) & " seconds."


Hi Mac.

AppleScript has a quirk whereby if a list is quite long and you need to access its items many times, the process is much faster if you include a reference to the list variable in the references to the items, instead of just the bare variable. ie.:

item x of someScript's myList -- someScript could be the current script. eg. my myLIst

… rather than just …

item x of myList

In my script and your second one, the list rowValues is going to be very long, since its items correspond to the thousands of rows in your Numbers table, so it’s a candidate for the reference treatment. It’s not possible to reference local variables, only globals, properties, and variables defined in a script’s run handler. My preferred way round this is to have a script object with the list variable as one of its properties. This way, the list variable can be referenced in relation to the script object. However, the script object itself is assigned to a local variable, so that when the action’s over, it and its very long contents are simply discarded and aren’t saved back into the main script file as a persistent value.

Each item in rowValues is a list containing only as many items as there are columns in the table, so it’s not worth using references with them. In fact it can be slightly faster not to use references with short lists!

I’m glad you were able to adapt the script to your requirements. Since all the rows are the same length, you could, if you liked, set cellCount just once, before the repeat, to the length of any of the lists in o’s rowValues. And it’s not necessary to do reverse repeats with the valuesInThisRow lists — although if there’s more likely to be a non-empty cell near the end of each row than near the beginning, it could be worth it.

I’ve been looking today to see if it’s possible to speed up the actual row removals. The only way to do this is to have Numbers remove ranges of rows at a time, which means fewer commands have to be sent and less time is spent on sending and acknowledging them. On the other hand, unless you have blocks of multiple empty rows, it’s not worth doing. And it turns out that while Numbers (5.2)'s remove command will accept a range of rows as a parameter, it behaves in the same way as it would in response to individual forward-indexed script commands — that is, removing rows one at a time and updating its internal indexing as it goes. It’s faster than removing the rows with individual commands, but the wrong rows get removed!

A compromise is to delete half the range of empty rows (actually alternate rows through the whole range), then half of what’s left, and so on until just one empty row remains, which is then also removed. I’m posting the script here, but unless you have several groups of four or more empty lines, your own adaptation above is probably the best for your purposes:


on deleteEmptyRows()
	script o
		property rowValues : missing value -- Will be a list of lists of the rows' cell values.
	end script
	
	tell application "Numbers"
		set theTable to table 1 of active sheet of document 1
		-- return row 1 of theTable
		-- Get a list of lists of row cell values rather than a list of Numbers rows. It saves quizzing the rows individually in the application.
		set o's rowValues to value of cells of rows of theTable
	end tell
	
	-- Go through the rows from the bottom up because the later row numbers will change as the earlier rows are deleted. It may also be faster.
	set rowCount to length of o's rowValues
	set cellCount to count of beginning of o's rowValues
	set emptyBlockEnd to missing value
	repeat with rowIndex from rowCount to 1 by -1
		set valuesInThisRow to item rowIndex of o's rowValues
		set emptyRow to true
		-- return cellCount
		repeat with cellIndex from 2 to cellCount
			set CellValue to item cellIndex of valuesInThisRow
			
			if (CellValue's contents is not missing value) then
				set emptyRow to false
				exit repeat
			end if
		end repeat
		
		if (emptyRow) then
			-- If this is an empty row, update the range of empty rows waiting for removal.
			if (emptyBlockEnd is missing value) then set emptyBlockEnd to rowIndex
			set emptyBlockStart to rowIndex
		else if (emptyBlockEnd is not missing value) then
			-- Otherwise, if this is a non-empty row and there are empty rows waiting to be removed, remove them.
			-- Because range removals work forwards and upset their own indexing, delete the range progressively by halves.
			-- The row index variable names refer to the indices at the time they're set, not necessarily to rows subsequently deleted!
			repeat while (emptyBlockEnd > emptyBlockStart)
				set midBlock to (emptyBlockStart + emptyBlockEnd) div 2
				tell application "Numbers" to remove rows emptyBlockStart thru midBlock of theTable
				set emptyBlockEnd to emptyBlockEnd - (midBlock - emptyBlockStart + 1)
			end repeat
			try -- Numbers doesn't allow tables to contain less than one non-header row.
				tell application "Numbers" to remove row emptyBlockStart of theTable
			end try
			-- Set the block end index to missing value to indicate that there are currently no empty lines waiting to be removed.
			set emptyBlockEnd to missing value
		end if
	end repeat
	-- Remove any empty rows still waiting when the repeat ends.
	if (emptyBlockEnd is not missing value) then
		repeat while (emptyBlockEnd > emptyBlockStart)
			set midBlock to (emptyBlockStart + emptyBlockEnd) div 2
			tell application "Numbers" to remove rows emptyBlockStart thru midBlock of theTable
			set emptyBlockEnd to emptyBlockEnd - (midBlock - emptyBlockStart + 1)
		end repeat
		try
			tell application "Numbers" to remove row emptyBlockStart of theTable
		end try
	end if
	
	tell application "Numbers" to return rowCount - (address of last row of theTable)
end deleteEmptyRows




set startTime to (current date)
set totalChanges to deleteEmptyRows()
set endTime to (current date)
set ScriptTime to endTime - startTime
display dialog (totalChanges as string) & " total changes were made." & " This script took " & (ScriptTime as string) & " seconds."


Nigel,

Thank you so much, that is an excellent explanation, and I don’t think I would have ever stumbled on that just looking around the forums or searching the web. That will be a useful technique in the future as I anticipate having to do large scale manipulation in numbers fairly often.

I will take your advice and set the cellCount before the repeat. And you are correct that the table is more likely to have a data value in later columns rather than earlier which is why I looped from the rear. I was reminded of the capability of doing that by your looping through theTable backwards.

Thanks so much for looking into ways of speeding up the row removals. The way the data is generated is by a recording device that samples at a faster rate than the measurement device, leading to time intervals in between measurements with no data. So that leads to 1-3 rows between each actual measurement. So I do not believe that removing the blocks will lead to better results. That behavior is good to know for future reference though.

Thanks so much for your help, and now that I get a chance to do some actual work on my mac, I hope to post here more often. Thanks!

Mac