I am a longtime lurker here and recently have been able to do more scripting. I am working with data tables in Numbers, and in these tables I recieve data with rows that do not contain data interspersed with rows that do contain data. Since there are several thousand lines worth of data, removing these rows makes the rest of my work faster and easier. So I wrote a script that works to remove the rows without data, but it runs much slower than I would like like. I am new to applescript, and scripting in general, so I am sure that there is a much faster way to do the same task. Thanks for any help! Below is the script…sorry it is so messy!
set theCol to display dialog "Choose column with Data (Enter a Single Letter)" default answer "C" buttons {"OK"} default button 1
set TestCol to text returned of theCol
set StartTime to time of (current date)
tell application "Numbers"
set totalChanges to 0
tell document 1
tell active sheet
set theTable to the first table
set EachRow to the rows of theTable
try
repeat with theRowRef in EachRow
set theRow to the contents of theRowRef
set RowIndex to the address of theRow
tell theTable
set theCelladd to cell (TestCol & RowIndex as string)
set theCell to the value of theCelladd
end tell
if theCell = missing value then
remove theRow
set totalChanges to totalChanges + 1
repeat while theCell = missing value
set theRow to the contents of theRowRef
set RowIndex to the address of theRow
tell theTable
set theCelladd to cell (TestCol & RowIndex as string)
set theCell to the value of theCelladd
end tell
if theCell = missing value then
remove theRow
set totalChanges to totalChanges + 1
end if
end repeat
--exit repeat (for troubleshooting)
end if
end repeat
end try
end tell
end tell
end tell
set EndTime to time of (current date)
set ScriptTime to EndTime - StartTime
display dialog (totalChanges as string) & " total changes were made." & " This script took " & (ScriptTime as string) & " seconds."
This should be faster in theory, but it’s about the same speed when there are only fifty rows to check! It sends fewer individual data requests to Numbers, so it should be faster if you have thousands of rows. But it still has to do the same number of row removals.
on deleteEmptyRows()
script o
property rowValues : missing value -- Will be a list of lists of the rows' cell values.
end script
tell application "Numbers"
set theTable to table 1 of active sheet of document 1
return row 1 of theTable
-- Get a list of lists of row cell values rather than a list of Numbers rows. It saves quizzing the rows individually in the application.
set o's rowValues to value of cells of rows of theTable
end tell
-- Go through the rows from the bottom up because the later row numbers will change as the earlier rows are deleted. It may also be faster.
set rowCount to length of o's rowValues
repeat with rowIndex from rowCount to 1 by -1
set valuesInThisRow to item rowIndex of o's rowValues
set emptyRow to true
repeat with thisCellValue in valuesInThisRow
if (thisCellValue's contents is not missing value) then
set emptyRow to false
exit repeat
end if
end repeat
if (emptyRow) then tell application "Numbers" to remove row rowIndex of theTable
end repeat
tell application "Numbers" to return rowCount - (address of last row of theTable)
end deleteEmptyRows
set startTime to (current date)
set totalChanges to deleteEmptyRows()
set endTime to (current date)
set ScriptTime to endTime - startTime
display dialog (totalChanges as string) & " total changes were made." & " This script took " & (ScriptTime as string) & " seconds."
Thanks so much for taking the time write a responses! It never dawned on me to parse the table backward to avoid the changing row issue. Your script works very well. I had to modify it slightly because in my data tables every row has a time reference in column A, and your data check would keep the row unless it was completely empty. Below is the modified version, but I think it is very similar to yours, instead of looping through every cell of a row, it disregards the first cell of the row.
I thought you would be interested to know that I ran both my original script and your modified script on one of my smaller data sets.
I think that is a serious improvement! I did have one question for you, and I am sorry if it seems rudimentary, but I don’t understand the script/end script block. Why use a property instead of a local variable to keep track of the all the row’s cell’s values?
Thanks so much for the help! It will save me quite a bit of time in the future.
Mac
on deleteEmptyRows()
script o
property rowValues : missing value -- Will be a list of lists of the rows' cell values.
end script
tell application "Numbers"
set theTable to table 1 of active sheet of document 1
-- return row 1 of theTable
-- Get a list of lists of row cell values rather than a list of Numbers rows. It saves quizzing the rows individually in the application.
set o's rowValues to value of cells of rows of theTable
end tell
-- Go through the rows from the bottom up because the later row numbers will change as the earlier rows are deleted. It may also be faster.
set rowCount to length of o's rowValues
repeat with rowIndex from rowCount to 1 by -1
set valuesInThisRow to item rowIndex of o's rowValues
set emptyRow to true
set cellCount to count of valuesInThisRow
-- return cellCount
repeat with cellIndex from cellCount to 2 by -1
set CellValue to item cellIndex of valuesInThisRow
if (CellValue's contents is not missing value) then
set emptyRow to false
exit repeat
end if
end repeat
if (emptyRow) then tell application "Numbers" to remove row rowIndex of theTable
end repeat
tell application "Numbers" to return rowCount - (address of last row of theTable)
end deleteEmptyRows
set startTime to (current date)
set totalChanges to deleteEmptyRows()
set endTime to (current date)
set ScriptTime to endTime - startTime
display dialog (totalChanges as string) & " total changes were made." & " This script took " & (ScriptTime as string) & " seconds."
AppleScript has a quirk whereby if a list is quite long and you need to access its items many times, the process is much faster if you include a reference to the list variable in the references to the items, instead of just the bare variable. ie.:
item x of someScript's myList -- someScript could be the current script. eg. my myLIst
… rather than just …
item x of myList
In my script and your second one, the list rowValues is going to be very long, since its items correspond to the thousands of rows in your Numbers table, so it’s a candidate for the reference treatment. It’s not possible to reference local variables, only globals, properties, and variables defined in a script’s run handler. My preferred way round this is to have a script object with the list variable as one of its properties. This way, the list variable can be referenced in relation to the script object. However, the script object itself is assigned to a local variable, so that when the action’s over, it and its very long contents are simply discarded and aren’t saved back into the main script file as a persistent value.
Each item in rowValues is a list containing only as many items as there are columns in the table, so it’s not worth using references with them. In fact it can be slightly faster not to use references with short lists!
I’m glad you were able to adapt the script to your requirements. Since all the rows are the same length, you could, if you liked, set cellCount just once, before the repeat, to the length of any of the lists in o’s rowValues. And it’s not necessary to do reverse repeats with the valuesInThisRow lists — although if there’s more likely to be a non-empty cell near the end of each row than near the beginning, it could be worth it.
I’ve been looking today to see if it’s possible to speed up the actual row removals. The only way to do this is to have Numbers remove ranges of rows at a time, which means fewer commands have to be sent and less time is spent on sending and acknowledging them. On the other hand, unless you have blocks of multiple empty rows, it’s not worth doing. And it turns out that while Numbers (5.2)'s remove command will accept a range of rows as a parameter, it behaves in the same way as it would in response to individual forward-indexed script commands — that is, removing rows one at a time and updating its internal indexing as it goes. It’s faster than removing the rows with individual commands, but the wrong rows get removed!
A compromise is to delete half the range of empty rows (actually alternate rows through the whole range), then half of what’s left, and so on until just one empty row remains, which is then also removed. I’m posting the script here, but unless you have several groups of four or more empty lines, your own adaptation above is probably the best for your purposes:
on deleteEmptyRows()
script o
property rowValues : missing value -- Will be a list of lists of the rows' cell values.
end script
tell application "Numbers"
set theTable to table 1 of active sheet of document 1
-- return row 1 of theTable
-- Get a list of lists of row cell values rather than a list of Numbers rows. It saves quizzing the rows individually in the application.
set o's rowValues to value of cells of rows of theTable
end tell
-- Go through the rows from the bottom up because the later row numbers will change as the earlier rows are deleted. It may also be faster.
set rowCount to length of o's rowValues
set cellCount to count of beginning of o's rowValues
set emptyBlockEnd to missing value
repeat with rowIndex from rowCount to 1 by -1
set valuesInThisRow to item rowIndex of o's rowValues
set emptyRow to true
-- return cellCount
repeat with cellIndex from 2 to cellCount
set CellValue to item cellIndex of valuesInThisRow
if (CellValue's contents is not missing value) then
set emptyRow to false
exit repeat
end if
end repeat
if (emptyRow) then
-- If this is an empty row, update the range of empty rows waiting for removal.
if (emptyBlockEnd is missing value) then set emptyBlockEnd to rowIndex
set emptyBlockStart to rowIndex
else if (emptyBlockEnd is not missing value) then
-- Otherwise, if this is a non-empty row and there are empty rows waiting to be removed, remove them.
-- Because range removals work forwards and upset their own indexing, delete the range progressively by halves.
-- The row index variable names refer to the indices at the time they're set, not necessarily to rows subsequently deleted!
repeat while (emptyBlockEnd > emptyBlockStart)
set midBlock to (emptyBlockStart + emptyBlockEnd) div 2
tell application "Numbers" to remove rows emptyBlockStart thru midBlock of theTable
set emptyBlockEnd to emptyBlockEnd - (midBlock - emptyBlockStart + 1)
end repeat
try -- Numbers doesn't allow tables to contain less than one non-header row.
tell application "Numbers" to remove row emptyBlockStart of theTable
end try
-- Set the block end index to missing value to indicate that there are currently no empty lines waiting to be removed.
set emptyBlockEnd to missing value
end if
end repeat
-- Remove any empty rows still waiting when the repeat ends.
if (emptyBlockEnd is not missing value) then
repeat while (emptyBlockEnd > emptyBlockStart)
set midBlock to (emptyBlockStart + emptyBlockEnd) div 2
tell application "Numbers" to remove rows emptyBlockStart thru midBlock of theTable
set emptyBlockEnd to emptyBlockEnd - (midBlock - emptyBlockStart + 1)
end repeat
try
tell application "Numbers" to remove row emptyBlockStart of theTable
end try
end if
tell application "Numbers" to return rowCount - (address of last row of theTable)
end deleteEmptyRows
set startTime to (current date)
set totalChanges to deleteEmptyRows()
set endTime to (current date)
set ScriptTime to endTime - startTime
display dialog (totalChanges as string) & " total changes were made." & " This script took " & (ScriptTime as string) & " seconds."
Thank you so much, that is an excellent explanation, and I don’t think I would have ever stumbled on that just looking around the forums or searching the web. That will be a useful technique in the future as I anticipate having to do large scale manipulation in numbers fairly often.
I will take your advice and set the cellCount before the repeat. And you are correct that the table is more likely to have a data value in later columns rather than earlier which is why I looped from the rear. I was reminded of the capability of doing that by your looping through theTable backwards.
Thanks so much for looking into ways of speeding up the row removals. The way the data is generated is by a recording device that samples at a faster rate than the measurement device, leading to time intervals in between measurements with no data. So that leads to 1-3 rows between each actual measurement. So I do not believe that removing the blocks will lead to better results. That behavior is good to know for future reference though.
Thanks so much for your help, and now that I get a chance to do some actual work on my mac, I hope to post here more often. Thanks!