Delete Duplicate Rows in Excel

I have a script that merges multiple Excel files into one master file. After the merge, I’d like to delete all duplicate rows in that file so that only unique records remain. I’ve looked at the Excel Applescript Reference but can’t seem to figure this out. I’d appreciate some help.

Thanks.

Have you looked at Advanced Filter’s Unique only option?

When you say “duplicate rows” does that mean that each cell in the row is to be checked?
i.e.

Smith 1
Jones 2
Smith 3

is that OK
or
should one of the Smith’s go, if so which Smith?

OR (he hopes. he hopes) is there a column where a duplicate in that column implies that the other columns are the same. i.e. two Smiths implies that the number is the same and it doesn’t matter which row gets deleted?

Mike,

In your example above, the Smith entries are unique since not all cells in their respective rows are identical. If you had Smith 1 in row 3, then that would be duplicated. So, yes, each row’s data needs to be unique.

I looked at this function:

advanced filter range “Database” action filter in place ¬
criteria range range “Criteria”

but don’t really know how that works.

To use Advanced Filter, each column has to have a unique header and there can be no blank rows.
This will copy myMessyRange to myCleanRange without duplicates. The optional last line deletes myMessyRange, leaving the un-duplicated data.

tell application "Microsoft Excel"
	set myMessySheet to get worksheet "Sheet1" of workbook "Workbook2.xls"
	set myMessyRange to get current region of range "A1" of myMessySheet
	
	set myCleanRange to get resize (get offset (myMessyRange) column offset ((count of columns of myMessyRange) + 1)) row size 1
	
	advanced filter myMessyRange action filter copy copy to range myCleanRange with unique
	
	-- optional deletes dirty range
	delete range (entire column of (get resize myMessyRange column size ((count of columns of myMessyRange) + 1))) shift shift to left
end tell

As you clearly indicated, this will only work IF the header is unique. Therein lies the problem. Most of these files share the same top row header (and hence the need to get rid of them). I tried your script and sure enough, it will work if the header is not duplicated elsewhere.

I will try to poke around to see how I may modify your script so that I may de-dupe if the header is NOT unique. Actually, the script works when blank spaces were present. :wink:

Maybe a tall order but I have to get this to work. Thanks for you insight and direction so far!

This will insert a header row, fill it with unique headers and delete after Filter has done its work.

tell application "Microsoft Excel"
	set myMessySheet to get worksheet "Sheet1" of workbook "Workbook1.xls"
	set myMessyRange to get current region of range "A1" of myMessySheet
	
	insert into range (entire row of (row 1 of myMessyRange)) shift shift down
	
	set headerRow to get resize (row 1 of myMessyRange) row size 1 column size (count of columns of myMessyRange)
	
	set formula of headerRow to "=\"header\"&column()"
	set value of headerRow to get value of headerRow
	
	set myCleanRange to get resize (get offset (myMessyRange) column offset ((count of columns of myMessyRange) + 1)) row size 1
	
	advanced filter myMessyRange action filter copy copy to range myCleanRange with unique
	delete range (entire row of headerRow) shift shift up
	-- optional deletes dirty range
	delete range (entire column of (get resize myMessyRange column size ((count of columns of myMessyRange) + 1))) shift shift to left
end tell

Aw man, I spent the last 2 hours trying do this exact workaround of inserting a dummy row header, filter, then deleting the dummy. Now, I feel like the dummy! Many thanks for the quick response by the way. Much appreciated. I am sure this is good to go.

Marlon

Mike,

I’ve tried to implement this script as part of a subroutine in my application. I have a popup button where if the user selects xls files to merge, a checkbox button “duplicate” becomes visible and the default is unchecked (0 state). During the merge of the xls files, if the state of the “duplicate” is 1, I need to run the delete duplicate rows subroutine deleteRows.

tell application "Microsoft Excel"
		
		open text file filename output_ data type delimited text qualifier text qualifier double quote with comma
		set display alerts to false
		save active workbook in output_ as default save format
		if state of button "duplicate" of window "merger" is 1 then
			deleteRows()
		end if
		
		quit

on deleteRows()

set myMessyRange to get current region of range "A1"
	insert into range (entire row of (row 1 of myMessyRange)) shift shift down
	set headerRow to get resize (row 1 of myMessyRange) row size 1 column size (count of columns of myMessyRange)
	
	set formula of headerRow to "=\"header\"&column()"
	set value of headerRow to get value of headerRow
	
	set myCleanRange to get resize (get offset (myMessyRange) column offset ((count of columns of myMessyRange) + 1)) row size 1
	
	advanced filter myMessyRange action filter copy copy to range myCleanRange with unique
	delete range (entire row of headerRow) shift shift up
	-- optional deletes dirty range
	delete range (entire column of (get resize myMessyRange column size ((count of columns of myMessyRange) + 1))) shift shift to left
end deleteRows

I am not certain the subroutine can be called in the middle of the script where it needs to look at the state of a button on a separate window. I get a syntax error where “region” is highlighted in the deleteRows sub. Your thoughts…

The script needs to be inside a Tell block

on deleteRows()
    tell application "Microsoft Excel"
        -- script
    end tell
end deleteRows

You may also need to call the method with the my keyword

If .... my deleteRows() End If

Although the subroutine was being called from within the Tell block of Excel application, I still need to indicate the Tell block for deleteRows()? Yep, that worked…now, the code is being hung up on this section where the state of button is being checked:

...
tell application "Microsoft Excel"
		
		open text file filename output_ data type delimited text qualifier text qualifier double quote with comma
		set display alerts to false
		save active workbook in output_ as default save format
		if state of button "duplicate" of window "merger" is 1 then
			my deleteRows() *******hung up here*************
		end if
		if state of button "duplicate" of window "merger" is 0 then
			quit
		end if
		
	end tell

Am I setting the button correctly? This is driving me nuts! Thanks for all your help (as a sidebar, Stefan also pointed out the issue with the missing tell block)

I don’t know how you are setting the button.

Is “the button” a checkbox on a sheet (presumably the active sheet of that book) of the Excel workbook “merger”?

Edit: I did some investigation and got this script to probe the value of a checkbox (from the Forms menu) on a sheet.

tell application "Microsoft Excel"
	if (value of checkbox "Check Box 2" of active sheet) = checkbox on then
		display dialog "on"
	else
		display dialog "off"
	end if
end tell

The checkbox is on the main NSWindow “merger” (the UI).

I’m not a computer guy. I’m an Excel guy. So I’m not sure what you mean by an NSWindow.

If the checkbox is not on an Excel sheet, getting its value should probably be done outside an Excel try block.

SOLVED!

My if-else syntax was incorrect. All’s well and thanks Mike.