🧹 How to Identify and Remove Duplicate Records in SQL Server
Duplicate records are a common headache in databases. They mess up reporting, inflate numbers, and introduce inconsistencies that can quietly break things. Whether you're cleaning legacy data or preventing future problems, knowing how to find and eliminate duplicates is an essential SQL skill.
Here’s a practical approach using a sample table we'll call xxyyzz
.
Step 1: Explore the Table
Start by reviewing what’s in the table:
SELECT * FROM xxyyzz;
Step 2: Detect Duplicates
Let’s say duplicates are defined by a combination of columns — for example, everything except a unique ID
or timestamp. You can group by those columns and count how many times each combination appears.
SELECT
column1, column2, column3, ..., columnN,
COUNT(*) AS DuplicateCount
FROM xxyyzz
GROUP BY
column1, column2, column3, ..., columnN
HAVING COUNT(*) > 1;
column1
through columnN
with the fields that define a unique row in your context.Step 3: Remove Duplicates Using a CTE
Now comes the cleanup. You want to keep one row from each group and delete the rest. This is where a Common Table Expression (CTE) with ROW_NUMBER()
comes in.
WITH DuplicatesCTE AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY column1, column2, column3, ..., columnN
ORDER BY ID -- or CreatedDate, or another tie-breaker
) AS rn
FROM xxyyzz
)
DELETE FROM DuplicatesCTE WHERE rn > 1;
What this does:
-
The
ROW_NUMBER()
function assigns a row number to each record in each duplicate group. -
Rows are grouped by the columns that define uniqueness.
-
The row with
rn = 1
(usually the oldest or lowest ID) is kept. -
Everything else (
rn > 1
) is deleted.
⚠️ Always back up your data or test in a staging environment before running deletes.
✅ Optional: Prevent Future Duplicates
If this table should never have duplicates in the first place, consider enforcing a constraint with a unique index:
CREATE UNIQUE INDEX UX_xxyyzz_UniqueRows
ON xxyyzz (column1, column2, column3, ..., columnN);
This will block any future insertions that would create another duplicate.
Final Thoughts
Cleaning duplicates is routine, but it pays off every time. It enhances data accuracy, streamlines queries, and optimises performance. The combo of GROUP BY
, ROW_NUMBER()
, and a targeted approach DELETE
gives you a clean, reliable fix.
Let this be part of your standard SQL toolkit — and if you need help customising it for a specific dataset or situation, just ask.