I mistakenly allowed inserting duplicate records with the same name and email in my SQL table. I know how to find duplicates based on a single column using GROUP BY, but I need to identify and remove duplicates that share both name and email values.
What’s the best way to query and clean up these records efficiently?
Hey! I totally understand, duplicates like these can be a pain
. I’ve dealt with this exact issue before, and my favorite approach is using ROW_NUMBER().
It’s clean and super precise:
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY name, email ORDER BY id) AS rn
FROM your_table
)
DELETE FROM CTE
WHERE rn > 1;
What I usually do first is run:
SELECT * FROM CTE WHERE rn > 1;
…just to double-check which rows will be deleted. Works really well even on big tables and gives you a safety net before removing anything.
Sometimes I prefer a more cautious approach, especially if I’m on a production database.
Here’s what I do:
CREATE TABLE temp_table AS
SELECT MIN(id) AS id
FROM your_table
GROUP BY name, email;
DELETE FROM your_table
WHERE id NOT IN (SELECT id FROM temp_table);
DROP TABLE temp_table;
The idea here is simple: you keep only one row per name + email. I like this because you can peek at temp_table
first to make sure everything looks right. It’s a nice safety net when you don’t want to risk deleting the wrong rows.
Just want to add up, if your table isn’t huge, sometimes I just create a fresh table with distinct rows:
CREATE TABLE new_table AS
SELECT DISTINCT name, email, other_columns
FROM your_table;
-- Optional: Replace the old table
DROP TABLE your_table;
ALTER TABLE new_table RENAME TO your_table;
I like this method because it avoids complex deletes entirely. Works great when I’m cleaning up a messy import and just want a clean slate.