Find and Delete all duplicate rows but keep one


In this post “Find and Delete all duplicate rows but keep one”, we are going to discuss that how we can find and delete all the duplicate rows of a table except one row. Assume that we have a table named tbl_sample which has four columns – EmpId, EmpName, Age, and City. This table has some duplicate data (in all the four columns) which needs to be deleted except the original one row. To demonstrate this, let’s create the dummy table with some sample data.

Here is the code to create the dummy table with sample data:

IF OBJECT_ID('dbo.tbl_Sample') IS NOT NULL
	DROP TABLE dbo.tbl_Sample
GO

CREATE TABLE dbo.tbl_Sample
(
	EmpId INT,
	EmpName VARCHAR(256),
	Age TINYINT,
	City VARCHAR(50)
)
GO

INSERT INTO dbo.tbl_Sample
(EmpId, EmpName, Age, City)
VALUES('1', 'Smith', 30, 'New York'),
('1', 'Smith', 30, 'New York'),
('1', 'Smith', 30, 'New York'),
('2', 'Adam', 35, 'California'),
('2', 'Adam', 35, 'California'),
('3', 'John', 42, 'Chicago'),
('3', 'John', 42, 'Chicago'),
('3', 'John', 42, 'Chicago'),
('3', 'John', 42, 'Chicago'),
('4', 'Jose', 36, 'Boston'),
('4', 'Jose', 36, 'Boston'),
('4', 'Jose', 36, 'Boston'),
('4', 'Jose', 36, 'Boston'),
('5', 'Elgar', 28, 'Seattle')
GO

Let’s have a look at the table data now.

SELECT * FROM dbo.tbl_Sample
GO
Dummy table with sample data

Dummy table with sample data

In above table, we can see that we have multiple duplicate rows for EmpId 1, 2, 3 and 4. To find the duplicate records from the above table, we can use this query. This query will output all the duplicate rows from the table with the duplicate rows count.

SELECT * FROM dbo.tbl_Sample GROUP BY EmpId, EmpName, Age, City HAVING COUNT(1) > 1
GO
Duplicate rows

Duplicate rows

Now, we need to write a query to find and delete all the duplicate records from the table excluding the original one record. Here is the query.

WITH CTE AS
(
	SELECT *, ROW_NUMBER() OVER(PARTITION BY EmpId, EmpName, Age, City ORDER BY (SELECT NULL)) Seq 
	FROM dbo.tbl_Sample
)
DELETE FROM CTE WHERE Seq > 1
GO

In the above code, we are using a CTE (Common Table Expression) to find the duplicates and then we are deleting all the records using DELETE command but keeping the only one record for each employee.

After executing this query, we have only one row for each employee as this.

Table with original data

Table with original data

Thanks for the reading. Please share your inputs in the comments.

Rate This
[Total: 5    Average: 2.2/5]

Gopal Krishna Ranjan

About Gopal Krishna Ranjan

Gopal has 8 years of industry experience in Software development. He has a head down experience in Data Science, Database, Data Warehouse, Big Data and cloud technologies and has implemented end to end solutions. He has extensively worked on SQL Server, Python, Hadoop, Hive, Spark, Azure, Machine Learning, and MSBI (SSAS, SSIS, and SSRS). He also has good experience in windows and web application development using ASP.Net and C#.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.