Friday, March 30, 2012

Remove Duplication

Hi
Is there a way to remove duplicated records
Also if there is a way what about records that are the same except the key
field, is there a way to remove them?
Thank you in advance,
Shmuel Shulman
SBS Technologies LTDRather than removing duplicates they should be prevented. The designer shoul
d
be spanked, at least. :)
Seriously:
First of all identify duplicates (Qry#1):
select <column_list>
from <table>
group by <column_list>
having (count(*) > 1)
(Only include columns that aren't actually unique in the table).
Then decide which one's to drop (Qry#2):
select <unique_column>
from <table>
inner join (
..put Qry#1 here
) Duplicates
on Duplicates.<Col1> = <table>.<Col1>
and ... <-- put the rest of the columns
as additional conditions
Then find the keys in all foreign tables and decide whether to drop foreign
rows or update them to reference the keys of the rows you intend to keep.
I'd suggest you post DDL adn sample data for a more accurate solution.
ML
http://milambda.blogspot.com/|||Believe it or not, but "adn" is another way to spell "and". It'll be a big
hit in 2006. ;)
ML
http://milambda.blogspot.com/
"ML" wrote:

> Rather than removing duplicates they should be prevented. The designer sho
uld
> be spanked, at least. :)
> Seriously:
> First of all identify duplicates (Qry#1):
> select <column_list>
> from <table>
> group by <column_list>
> having (count(*) > 1)
> (Only include columns that aren't actually unique in the table).
> Then decide which one's to drop (Qry#2):
> select <unique_column>
> from <table>
> inner join (
> ...put Qry#1 here
> ) Duplicates
> on Duplicates.<Col1> = <table>.<Col1>
> and ... <-- put the rest of the columns
> as additional conditions
> Then find the keys in all foreign tables and decide whether to drop foreig
n
> rows or update them to reference the keys of the rows you intend to keep.
> I'd suggest you post DDL adn sample data for a more accurate solution.
>
> ML
> --
> http://milambda.blogspot.com/
>|||>> Is there a way to remove duplicated records [sic] <<
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are.
The right way to always declare tables with a key. In fact, if there
is no key, then by definition that thing is NOT a table at all. Fire
the moron who did not provide a key.
If this is the result of the aforesaid moron using IDENTITY in a table,
then run over him in the parking lot so that the quality of databases
is improved everywhere.
DELETE FROM StinkingNonTable
WHERE <pseudo_key_column>
IN (SELECT MAX (<pseudo_key_column> ),
<non-key column list>
FROM StinkingNonTable
GROUP BY <non-key column list>
HAVING COUNT(*) > 1);
Run this a few times until zero rows are dropped, then fix the table
and hunt down the moron.

No comments:

Post a Comment