Showing posts with label tables. Show all posts
Showing posts with label tables. Show all posts

Friday, March 30, 2012

Remove identity property of a primary key

Hi,
I have a table with a column named ID as primary key, this column has
the identity property. This ID is referenced by some other tables as foreign
key.
Is there a way, I can use "alter table alter ID int not null...." TSQL
to remove this identity property?
Thanks!
WWW: http://hardywang.1accesshost.com
ICQ: 3359839
yours Hardy1. drop FK
2. sp_rename table with identity to some other name
3. create table new with the same name without identity
4. insert new select * from old
5. create FK
6 drop old
"Hardy Wang" <hardywang@.hotmail.com> wrote in message
news:O7AqRnXrFHA.1252@.TK2MSFTNGP09.phx.gbl...
> Hi,
> I have a table with a column named ID as primary key, this column has
> the identity property. This ID is referenced by some other tables as
> foreign key.
> Is there a way, I can use "alter table alter ID int not null...." TSQL
> to remove this identity property?
> Thanks!
> --
> WWW: http://hardywang.1accesshost.com
> ICQ: 3359839
> yours Hardy
>|||The only way is dropping the column. Try using EM if you really want to do
this. In EM, before saving changes, press button "save change script", third
from left to right in the tool bar. You will see what really EM does in orde
r
to accomplish this tak.
Example: (from northwind.orders)
BEGIN TRANSACTION
SET QUOTED_IDENTIFIER ON
SET ARITHABORT ON
SET NUMERIC_ROUNDABORT OFF
SET CONCAT_NULL_YIELDS_NULL ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
COMMIT
BEGIN TRANSACTION
ALTER TABLE dbo.Orders
DROP CONSTRAINT FK_Orders_Shippers
GO
COMMIT
BEGIN TRANSACTION
ALTER TABLE dbo.Orders
DROP CONSTRAINT FK_Orders_Employees
GO
COMMIT
BEGIN TRANSACTION
ALTER TABLE dbo.Orders
DROP CONSTRAINT FK_Orders_Customers
GO
COMMIT
BEGIN TRANSACTION
ALTER TABLE dbo.Orders
DROP CONSTRAINT DF_Orders_Freight
GO
CREATE TABLE dbo.Tmp_Orders
(
OrderID int NOT NULL,
CustomerID nchar(5) NULL,
EmployeeID int NULL,
OrderDate datetime NULL,
RequiredDate datetime NULL,
ShippedDate datetime NULL,
ShipVia int NULL,
Freight money NULL,
ShipName nvarchar(40) NULL,
ShipAddress nvarchar(60) NULL,
ShipCity nvarchar(15) NULL,
ShipRegion nvarchar(15) NULL,
ShipPostalCode nvarchar(10) NULL,
ShipCountry nvarchar(15) NULL
) ON [PRIMARY]
GO
DECLARE @.v sql_variant
SET @.v = N''
EXECUTE sp_addextendedproperty N'MS_Description', @.v, N'user', N'dbo',
N'table', N'Tmp_Orders', N'column', N'OrderID'
GO
ALTER TABLE dbo.Tmp_Orders ADD CONSTRAINT
DF_Orders_Freight DEFAULT (0) FOR Freight
GO
IF EXISTS(SELECT * FROM dbo.Orders)
EXEC('INSERT INTO dbo.Tmp_Orders (OrderID, CustomerID, EmployeeID,
OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName,
ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry)
SELECT OrderID, CustomerID, EmployeeID, OrderDate, RequiredDate,
ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion,
ShipPostalCode, ShipCountry FROM dbo.Orders (HOLDLOCK TABLOCKX)')
GO
ALTER TABLE dbo.[Order Details]
DROP CONSTRAINT FK_Order_Details_Orders
GO
DROP TABLE dbo.Orders
GO
EXECUTE sp_rename N'dbo.Tmp_Orders', N'Orders', 'OBJECT'
GO
ALTER TABLE dbo.Orders ADD CONSTRAINT
PK_Orders PRIMARY KEY CLUSTERED
(
OrderID
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX CustomerID ON dbo.Orders
(
CustomerID
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX CustomersOrders ON dbo.Orders
(
CustomerID
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX EmployeeID ON dbo.Orders
(
EmployeeID
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX EmployeesOrders ON dbo.Orders
(
EmployeeID
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX OrderDate ON dbo.Orders
(
OrderDate
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX ShippedDate ON dbo.Orders
(
ShippedDate
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX ShippersOrders ON dbo.Orders
(
ShipVia
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX ShipPostalCode ON dbo.Orders
(
ShipPostalCode
) ON [PRIMARY]
GO
ALTER TABLE dbo.Orders WITH NOCHECK ADD CONSTRAINT
FK_Orders_Customers FOREIGN KEY
(
CustomerID
) REFERENCES dbo.Customers
(
CustomerID
)
GO
ALTER TABLE dbo.Orders WITH NOCHECK ADD CONSTRAINT
FK_Orders_Employees FOREIGN KEY
(
EmployeeID
) REFERENCES dbo.Employees
(
EmployeeID
)
GO
ALTER TABLE dbo.Orders WITH NOCHECK ADD CONSTRAINT
FK_Orders_Shippers FOREIGN KEY
(
ShipVia
) REFERENCES dbo.Shippers
(
ShipperID
)
GO
COMMIT
BEGIN TRANSACTION
ALTER TABLE dbo.[Order Details] WITH NOCHECK ADD CONSTRAINT
FK_Order_Details_Orders FOREIGN KEY
(
OrderID
) REFERENCES dbo.Orders
(
OrderID
)
GO
COMMIT
AMB
"Hardy Wang" wrote:

> Hi,
> I have a table with a column named ID as primary key, this column has
> the identity property. This ID is referenced by some other tables as forei
gn
> key.
> Is there a way, I can use "alter table alter ID int not null...." TSQ
L
> to remove this identity property?
> Thanks!
> --
> WWW: http://hardywang.1accesshost.com
> ICQ: 3359839
> yours Hardy
>
>

Remove Identity column constraint/mgmt

Hello...
We have some tables that had been using Identity columns as a Primary
Key...but we abandoned that approach a few weeks ago and adopted GUIDs
instead.
These tables are included in Publications that were originally on SQL
2000...but we upgraded to SQL 2005.
Is there a way I can remove the Identity constraint from the server
pub...does the Not for Replication handle this?
Or...would be better to Drop the column and reinitialize the subscribers?
thanks for any help
- will
You would be best to drop the identity column, NFR will not drop the
identity column but will not enforce the identity property if the insert is
caused by a replication process.
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"dw" <dw@.discussions.microsoft.com> wrote in message
news:16F706A2-F1D3-4215-9A70-DDDD5575920E@.microsoft.com...
> Hello...
> We have some tables that had been using Identity columns as a Primary
> Key...but we abandoned that approach a few weeks ago and adopted GUIDs
> instead.
> These tables are included in Publications that were originally on SQL
> 2000...but we upgraded to SQL 2005.
> Is there a way I can remove the Identity constraint from the server
> pub...does the Not for Replication handle this?
> Or...would be better to Drop the column and reinitialize the subscribers?
> thanks for any help
> - will
|||Thanks for the help. I may test out going the NFR route...just so I don't
have to mess with the table schema too much. Will changing to NFR force a
Re-Init of for the subscribers?
"Hilary Cotter" wrote:

> You would be best to drop the identity column, NFR will not drop the
> identity column but will not enforce the identity property if the insert is
> caused by a replication process.
> --
> Hilary Cotter
> Director of Text Mining and Database Strategy
> RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
> This posting is my own and doesn't necessarily represent RelevantNoise's
> positions, strategies or opinions.
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> "dw" <dw@.discussions.microsoft.com> wrote in message
> news:16F706A2-F1D3-4215-9A70-DDDD5575920E@.microsoft.com...
>
>
|||This is something you do on the subscriber - so for transactional
replication it will have no impact on reinitialization. For merge and
updateable subscribers it will.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"dw" <dw@.discussions.microsoft.com> wrote in message
news:C109A5B7-07DE-4E85-88C5-6BA71CBCCEFD@.microsoft.com...[vbcol=seagreen]
> Thanks for the help. I may test out going the NFR route...just so I
> don't
> have to mess with the table schema too much. Will changing to NFR force a
> Re-Init of for the subscribers?
> "Hilary Cotter" wrote:
sql

remove duplicates, add PK..need help

I am currently cleaning up a "loose" database, by adding primary keys on a
few particular tables that currently have none.
The Primary key will contain 2 fields, but before I add the pk, I need to
delete any duplicates. There should be none or very few that snuck by the
application, so deleting them is not a concern.
Here is a sample of the CURRENT table format:
CREATE TABLE OrderSalesReps(
OrderID int NOT NULL,
SalesRepID int NOT NULL,
Revenue_1 decimal(14,2) NOT NULL,
Revenue_2 decimal(14,2) NOT NULL
)
So the new Primary Key will be on [OrderID and SalesRepID]
but there may be duplicates that exist currently.
What is a clean query to delete dups before I create the Primary Key.
Thanks in advance.
[Please note: OrderID in this table is a Foreign Key which links to
Orders.OrderID - doesn't matter for this case, but thought I'd mention]Hi Chris,
Which version of SQL Server are you working with?
In SQL Server 2005 the solution is pretty simple and fast:
WITH Dups AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY OrderID, SalesRepID
ORDER BY OrderID, SalesRepID) AS RowNum
FROM OrderSalesReps
)
DELETE FROM Dups WHERE RowNum > 1;
BG, SQL Server MVP
www.SolidQualityLearning.com
www.insidetsql.com
Anything written in this message represents my view, my own view, and
nothing but my view (WITH SCHEMABINDING), so help me my T-SQL code.
"Chris" <rooster575@.hotmail.com> wrote in message
news:utBY9xXhGHA.3904@.TK2MSFTNGP02.phx.gbl...
>I am currently cleaning up a "loose" database, by adding primary keys on a
>few particular tables that currently have none.
> The Primary key will contain 2 fields, but before I add the pk, I need to
> delete any duplicates. There should be none or very few that snuck by the
> application, so deleting them is not a concern.
> Here is a sample of the CURRENT table format:
> CREATE TABLE OrderSalesReps(
> OrderID int NOT NULL,
> SalesRepID int NOT NULL,
> Revenue_1 decimal(14,2) NOT NULL,
> Revenue_2 decimal(14,2) NOT NULL
> )
> So the new Primary Key will be on [OrderID and SalesRepID]
> but there may be duplicates that exist currently.
> What is a clean query to delete dups before I create the Primary Key.
> Thanks in advance.
> [Please note: OrderID in this table is a Foreign Key which links to
> Orders.OrderID - doesn't matter for this case, but thought I'd mention]
>
>|||Unfortunately, this solution has to work with SQL Server 2000 and 2005.
Thanks
-Chris
"Itzik Ben-Gan" <itzik@.REMOVETHIS.SolidQualityLearning.com> wrote in message
news:eMVD1ZYhGHA.4144@.TK2MSFTNGP02.phx.gbl...
> Hi Chris,
> Which version of SQL Server are you working with?
> In SQL Server 2005 the solution is pretty simple and fast:
> WITH Dups AS
> (
> SELECT *,
> ROW_NUMBER() OVER(PARTITION BY OrderID, SalesRepID
> ORDER BY OrderID, SalesRepID) AS RowNum
> FROM OrderSalesReps
> )
> DELETE FROM Dups WHERE RowNum > 1;
> --
> BG, SQL Server MVP
> www.SolidQualityLearning.com
> www.insidetsql.com
> Anything written in this message represents my view, my own view, and
> nothing but my view (WITH SCHEMABINDING), so help me my T-SQL code.
>
> "Chris" <rooster575@.hotmail.com> wrote in message
> news:utBY9xXhGHA.3904@.TK2MSFTNGP02.phx.gbl...
>|||First you need to decide on what your rules are for deciding which
duplicate to keep. If you're talking about exact duplicates (i.e., all
column values are identical) then you can select the unique records
into a temporary table, truncate the original table, add the primary
key, then insert the unique records back into the original table (you
could insert back then recreate the PK if you wanted as well). For
example:
SELECT DISTINCT OrderID, SalesRepID, Revenue_1, Revenue_2
INTO tmp_OrderSalesReps
TRUNCATE TABLE OrderSalesReps
ALTER TABLE OrderSalesReps
ADD CONSTRAINT PK_OrderSalesReps PRIMARY KEY CLUSTERED (OrderID,
SalesRepID)
INSERT OrderSalesReps (OrderID, SalesRepID, Revenue_1, Revenue_2)
SELECT OrderID, SalesRepID, Revenue_1, Revenue_2
FROM tmp_OrderSalesReps
--
Of course, if you end up with two rows with identical OrderIDs and
SalesRepIDs, but different Revenues then you need to decide which one
to keep. If it's really only a few rows then you could manually remove
them from your work table before inserting back to the original table.
If you wanted to use the highest revenue values found for an order/rep
(for example) then you might change your SELECT...INTO to something
like:
SELECT OrderID, SalesRepID, MAX(Revenue_1), MAX(Revenue_2)
INTO tmp_OrderSalesReps
FROM OrderSalesReps
GROUP BY OrderID, SalesRepID
HTH,
-Tom.
Chris wrote:
> I am currently cleaning up a "loose" database, by adding primary keys on a
> few particular tables that currently have none.
> The Primary key will contain 2 fields, but before I add the pk, I need to
> delete any duplicates. There should be none or very few that snuck by the
> application, so deleting them is not a concern.
> Here is a sample of the CURRENT table format:
> CREATE TABLE OrderSalesReps(
> OrderID int NOT NULL,
> SalesRepID int NOT NULL,
> Revenue_1 decimal(14,2) NOT NULL,
> Revenue_2 decimal(14,2) NOT NULL
> )
> So the new Primary Key will be on [OrderID and SalesRepID]
> but there may be duplicates that exist currently.
> What is a clean query to delete dups before I create the Primary Key.
> Thanks in advance.
> [Please note: OrderID in this table is a Foreign Key which links to
> Orders.OrderID - doesn't matter for this case, but thought I'd mention]

Remove duplicates within pipeline

I have a situation where we get XML files sent daily that need uploading into SQL Server tables, but the source system producing these files sometimes generates duplicate records in the file. The tricky part is, that the record isn't entirely duplicated. What I mean, is that if I look for duplicates by grouping the key columns, having count(*) > 1, I find which ones are duplicates, but when I inspect the data on these duplicates, the other details in the remaining columns may differ. So our rule is: pick the first record, toss the rest of the duplicates.

Because we don't sort on any columns during the import, the first record kept of the duplicates is arbitrary. Again, we can't tell at this point which of the duplicated records is more correct. Someday down the road, we will do this research.

Now, I need to know the most efficient way to accomplish this in SSIS. If it makes it easier, I could just discard all the duplicates, since the number of them is so small.

If the source were a relational table, I could use a SQL statement to filter the records to remove the duplicates, but since the source is an XML file, I don't know how to filter these out in the pipeline, since the file has to be aggregated to search for dups.

Thanks

Kory

Never mind... I think I found exactly what I needed: The Sort Transform.

-Kory

|||

The only way I can think is to use the sort or aggregate transformations. Have you explore those? Notice that those are full blcoking transformation, so memory usage and performance are things you may want to check.

Rafael Salas

|||

Yes, I thought the sort transform would do the trick- and it did for small files. Files with < 500,000 rows sorted immediately, within 5-10 seconds. Files > 500,000 or so just hung. Looking at task manager, the DTSDebugHost.exe kept climbing and my overall memory consumption was > 5G and I only have 3G total on the server.

I would have thought the performance was linearly decrease, and not go from 10 seconds to indefinite for just 200K rows more.

I've downloaded and installed the Extrasort component, but get an error when I try to put it on the design surface. It complains that it wasn't installed correctly. I've uninstalled and reinstalled it twice. I know NSort is another option, but I really am not needing sorting functionality, just removing duplicates.

SSIS comes with a sample solution that builds a component for removing duplicates, but as far as I can tell, the fields to pick to determine duplicates are the only fields that it passes through the pipeline. I need to remove dups based on 3 fields, but pass through the rest of the fields, like the sort component does.

Any other ideas out there?

Thanks

Kory

|||

Hi I'm having the same problem

does the ssis have the capabilities of seperating the duplicate records ? or still i use the query? can you give me some advice on this ?

KoryS wrote:

I have a situation where we get XML files sent daily that need uploading into SQL Server tables, but the source system producing these files sometimes generates duplicate records in the file. The tricky part is, that the record isn't entirely duplicated. What I mean, is that if I look for duplicates by grouping the key columns, having count(*) > 1, I find which ones are duplicates, but when I inspect the data on these duplicates, the other details in the remaining columns may differ. So our rule is: pick the first record, toss the rest of the duplicates.

Because we don't sort on any columns during the import, the first record kept of the duplicates is arbitrary. Again, we can't tell at this point which of the duplicated records is more correct. Someday down the road, we will do this research.

Now, I need to know the most efficient way to accomplish this in SSIS. If it makes it easier, I could just discard all the duplicates, since the number of them is so small.

If the source were a relational table, I could use a SQL statement to filter the records to remove the duplicates, but since the source is an XML file, I don't know how to filter these out in the pipeline, since the file has to be aggregated to search for dups.

Thanks

Kory

|||What version of ExtraSort are you using and what platform is it running on?

Had no problems running ExtraSort file version 1.0.0.3 (98,304 bytes) on a 32 bit dev platform Win XP SP2 as well as Win2k3 SP1. The SQL Server Build on both is 2153 , which "everyone" running IS should be on at this point. Have been unable to get ExtraSort to run natively on x64.

By default, the component installs to C:\Program Files\Ivolva Digital\ExtraSort Component\ExtraSort.dll.

Remove duplicates within pipeline

I have a situation where we get XML files sent daily that need uploading into SQL Server tables, but the source system producing these files sometimes generates duplicate records in the file. The tricky part is, that the record isn't entirely duplicated. What I mean, is that if I look for duplicates by grouping the key columns, having count(*) > 1, I find which ones are duplicates, but when I inspect the data on these duplicates, the other details in the remaining columns may differ. So our rule is: pick the first record, toss the rest of the duplicates.

Because we don't sort on any columns during the import, the first record kept of the duplicates is arbitrary. Again, we can't tell at this point which of the duplicated records is more correct. Someday down the road, we will do this research.

Now, I need to know the most efficient way to accomplish this in SSIS. If it makes it easier, I could just discard all the duplicates, since the number of them is so small.

If the source were a relational table, I could use a SQL statement to filter the records to remove the duplicates, but since the source is an XML file, I don't know how to filter these out in the pipeline, since the file has to be aggregated to search for dups.

Thanks

Kory

Never mind... I think I found exactly what I needed: The Sort Transform.

-Kory

|||

The only way I can think is to use the sort or aggregate transformations. Have you explore those? Notice that those are full blcoking transformation, so memory usage and performance are things you may want to check.

Rafael Salas

|||

Yes, I thought the sort transform would do the trick- and it did for small files. Files with < 500,000 rows sorted immediately, within 5-10 seconds. Files > 500,000 or so just hung. Looking at task manager, the DTSDebugHost.exe kept climbing and my overall memory consumption was > 5G and I only have 3G total on the server.

I would have thought the performance was linearly decrease, and not go from 10 seconds to indefinite for just 200K rows more.

I've downloaded and installed the Extrasort component, but get an error when I try to put it on the design surface. It complains that it wasn't installed correctly. I've uninstalled and reinstalled it twice. I know NSort is another option, but I really am not needing sorting functionality, just removing duplicates.

SSIS comes with a sample solution that builds a component for removing duplicates, but as far as I can tell, the fields to pick to determine duplicates are the only fields that it passes through the pipeline. I need to remove dups based on 3 fields, but pass through the rest of the fields, like the sort component does.

Any other ideas out there?

Thanks

Kory

|||

Hi I'm having the same problem

does the ssis have the capabilities of seperating the duplicate records ? or still i use the query? can you give me some advice on this ?

KoryS wrote:

I have a situation where we get XML files sent daily that need uploading into SQL Server tables, but the source system producing these files sometimes generates duplicate records in the file. The tricky part is, that the record isn't entirely duplicated. What I mean, is that if I look for duplicates by grouping the key columns, having count(*) > 1, I find which ones are duplicates, but when I inspect the data on these duplicates, the other details in the remaining columns may differ. So our rule is: pick the first record, toss the rest of the duplicates.

Because we don't sort on any columns during the import, the first record kept of the duplicates is arbitrary. Again, we can't tell at this point which of the duplicated records is more correct. Someday down the road, we will do this research.

Now, I need to know the most efficient way to accomplish this in SSIS. If it makes it easier, I could just discard all the duplicates, since the number of them is so small.

If the source were a relational table, I could use a SQL statement to filter the records to remove the duplicates, but since the source is an XML file, I don't know how to filter these out in the pipeline, since the file has to be aggregated to search for dups.

Thanks

Kory

|||What version of ExtraSort are you using and what platform is it running on?

Had no problems running ExtraSort file version 1.0.0.3 (98,304 bytes) on a 32 bit dev platform Win XP SP2 as well as Win2k3 SP1. The SQL Server Build on both is 2153 , which "everyone" running IS should be on at this point. Have been unable to get ExtraSort to run natively on x64.

By default, the component installs to C:\Program Files\Ivolva Digital\ExtraSort Component\ExtraSort.dll.

sql

Remove duplicates within pipeline

I have a situation where we get XML files sent daily that need uploading into SQL Server tables, but the source system producing these files sometimes generates duplicate records in the file. The tricky part is, that the record isn't entirely duplicated. What I mean, is that if I look for duplicates by grouping the key columns, having count(*) > 1, I find which ones are duplicates, but when I inspect the data on these duplicates, the other details in the remaining columns may differ. So our rule is: pick the first record, toss the rest of the duplicates.

Because we don't sort on any columns during the import, the first record kept of the duplicates is arbitrary. Again, we can't tell at this point which of the duplicated records is more correct. Someday down the road, we will do this research.

Now, I need to know the most efficient way to accomplish this in SSIS. If it makes it easier, I could just discard all the duplicates, since the number of them is so small.

If the source were a relational table, I could use a SQL statement to filter the records to remove the duplicates, but since the source is an XML file, I don't know how to filter these out in the pipeline, since the file has to be aggregated to search for dups.

Thanks

Kory

Never mind... I think I found exactly what I needed: The Sort Transform.

-Kory

|||

The only way I can think is to use the sort or aggregate transformations. Have you explore those? Notice that those are full blcoking transformation, so memory usage and performance are things you may want to check.

Rafael Salas

|||

Yes, I thought the sort transform would do the trick- and it did for small files. Files with < 500,000 rows sorted immediately, within 5-10 seconds. Files > 500,000 or so just hung. Looking at task manager, the DTSDebugHost.exe kept climbing and my overall memory consumption was > 5G and I only have 3G total on the server.

I would have thought the performance was linearly decrease, and not go from 10 seconds to indefinite for just 200K rows more.

I've downloaded and installed the Extrasort component, but get an error when I try to put it on the design surface. It complains that it wasn't installed correctly. I've uninstalled and reinstalled it twice. I know NSort is another option, but I really am not needing sorting functionality, just removing duplicates.

SSIS comes with a sample solution that builds a component for removing duplicates, but as far as I can tell, the fields to pick to determine duplicates are the only fields that it passes through the pipeline. I need to remove dups based on 3 fields, but pass through the rest of the fields, like the sort component does.

Any other ideas out there?

Thanks

Kory

|||

Hi I'm having the same problem

does the ssis have the capabilities of seperating the duplicate records ? or still i use the query? can you give me some advice on this ?

KoryS wrote:

I have a situation where we get XML files sent daily that need uploading into SQL Server tables, but the source system producing these files sometimes generates duplicate records in the file. The tricky part is, that the record isn't entirely duplicated. What I mean, is that if I look for duplicates by grouping the key columns, having count(*) > 1, I find which ones are duplicates, but when I inspect the data on these duplicates, the other details in the remaining columns may differ. So our rule is: pick the first record, toss the rest of the duplicates.

Because we don't sort on any columns during the import, the first record kept of the duplicates is arbitrary. Again, we can't tell at this point which of the duplicated records is more correct. Someday down the road, we will do this research.

Now, I need to know the most efficient way to accomplish this in SSIS. If it makes it easier, I could just discard all the duplicates, since the number of them is so small.

If the source were a relational table, I could use a SQL statement to filter the records to remove the duplicates, but since the source is an XML file, I don't know how to filter these out in the pipeline, since the file has to be aggregated to search for dups.

Thanks

Kory

|||What version of ExtraSort are you using and what platform is it running on?

Had no problems running ExtraSort file version 1.0.0.3 (98,304 bytes) on a 32 bit dev platform Win XP SP2 as well as Win2k3 SP1. The SQL Server Build on both is 2153 , which "everyone" running IS should be on at this point. Have been unable to get ExtraSort to run natively on x64.

By default, the component installs to C:\Program Files\Ivolva Digital\ExtraSort Component\ExtraSort.dll.

Monday, March 26, 2012

remove article from trans replication?

Hi,
Using sql server 2005, we've got transactional replication going between two
servers. if i need to remove a couple of tables from the replication, do i
have to reinitialize the snapshot every time i do that? or is there an
easier way to achieve this?
thanks for any help on this,
Fred
Hi Paul, and thanks for your response,
I've checked out the page your referenced, and because I'm just learning
about replication, it's not clear to me if it applies in our situation, and
also to Sql Server 2005. So, allow me to ask again with more details.
In our next release procedure, we're changing the name of a dozen fields in
a dozen tables. some of them are primary key fields. we're also dropping a
couple of constraints. (I had thought we were dropping tables but just
learned differently.) Can these things be done without reinitializing the
snapshot.
what I've done this time, is to remove all the tables from replication, and
reinit the snapshot. Now the developer can make the schema changes
mentioned above, and then immediately after the release, I'll add the tables
back into the replication and reinit the snapshot again. Not pretty, but it
works. but that's why I'm asking about a way to make changes and have them
propagated thru replication, without breaking the replication.
I hope this is clear, and makes some sense! thanks for your thoughts.
Fred
"Paul Ibison" <Paul.Ibison@.Pygmalion.Com> wrote in message
news:enB1dfLsHHA.3492@.TK2MSFTNGP02.phx.gbl...
> Please take a look here: http://www.replicationanswers.com/AddColumn.asp
> HTH,
> Paul Ibison
>
|||Hi Paul,
If you're still reading this, might it work to turn off DDL replication for
the publication, make the changes, then turn DDL replication back on?
I'm starting to implement via stored procedures now...
thanks,
Fred
"Paul Ibison" <Paul.Ibison@.Pygmalion.Com> wrote in message
news:OGPTFsSsHHA.3480@.TK2MSFTNGP02.phx.gbl...
> The way involving the least amount of work for SQL Server would be to drop
> the subscriptions to those tables which you are going to change radically
> (several fields and/or changing the PK fields) then drop the articles.
> Make the changes then add the articles, add the subscription and then run
> the snapshot agent and distribution agent. This effectively is a
> reinitialization of just those articles you are changing. For minor
> changes you'd use sp_repladdcolumn and sp_repldropcolumn.
> HTH,
> Paul Ibison
>

Remove Article

sp_droparticle

>--Original Message--
>How do I remove a table (aka article) from replication?
I have 19 tables replicating to 15 servers, i need to
remove 4 of the 19 tables from the replication. How?
>Thanks so much for your assistance,
>Adam
>.
>
(applies to snapshot or transactional)
Paul
sql

Remove all constraints in a database

Hi Everyone..

I want to remove all the constraints from all the tables in a database. I'm using SQL Server 2000.

will you please help me.. Thanks in advance

with regards

Fraijo

Select the database in the object explorer of SQL Server Enterprise Manager, right click, select Tasks and then Generate Scripts. Generate a script with all tables specified. Get the constraint drops from within that script.

I am curious as to why you wish to do this.

Wednesday, March 21, 2012

Remote tables

Hi everybody:

I am working on a query that referrences a table in a remote database. It seems in Management Studio 2005 anytime you open a new query window you should connect to a specific database instance that's why when I refer to the remote database table using Fully Qualified Name it tells me the database name is unknown.

Do you have any solutions for this?

Thanks a lot

Which qualified name did you use ?


HTH, jens Suessmeyer.


http://www.sqlserver2005.de

|||

[server].[database].[owner].[table]

|||Hi,

if you connect to a database / server you can reach any (linked) server and therefore remote database that are setup on the server machine. You don′t need to specify the remoteserver at connection time, that why you can specify it the four part name:

[server].[database].[owner].[table]

But the [server] has to be a linked server. If you don′t know how to setup, look in the BOl there are some good straight forward examples for it.

HTH, jens Suessmeyer.

http//www.sqlserver2005.de

|||

You can try with OPENROWSET or OPENQUERY function. To do this you need to enable this features from "SQL Server Surface Area Configuration" - Ad Hoc Remote Queries (you need to check "Enable OPENROWSET and OPENDATASOURCE). After you enable this option you can use this function to retrieve data from another server or to get data from another format (including MS Excel - for example).

SELECT a.* FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;', 'SELECT GroupName, Name, DepartmentID FROM AdventureWorks.HumanResources.Department ORDER BY GroupName, Name') AS a;

Remote synchronization

Hi
We are considering an access front end app with back end tables on sql
server in our office. Is there a way to have a copy this app on a laptop
for a tele-worker and have it synchronised from time to time using either a
3G data connection, or the office LAN when the tele-worker is in the office?
I presume we will have to make copies of both access front end and sql
server backend on the laptop and then the laptop sql server will synchronise
with the office sql server. Is there a better way to handle this?
Thanks
Regards
have a look at DB Ghost - http://www.dbghost.com - I use it to synchronize my
laptop with the production database when I'm online so I have a database at
all times to run reports on.
"John" wrote:

> Hi
> We are considering an access front end app with back end tables on sql
> server in our office. Is there a way to have a copy this app on a laptop
> for a tele-worker and have it synchronised from time to time using either a
> 3G data connection, or the office LAN when the tele-worker is in the office?
> I presume we will have to make copies of both access front end and sql
> server backend on the laptop and then the laptop sql server will synchronise
> with the office sql server. Is there a better way to handle this?
> Thanks
> Regards
>
>
|||For occasional testing... I simply backup/restore or attach/detach... Mostly
I use backup/restore because the production database does not have to come
down ( and the backups are already made).
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"John" <John@.nospam.infovis.co.uk> wrote in message
news:eZjXQQABFHA.4004@.tk2msftngp13.phx.gbl...
> Hi
> We are considering an access front end app with back end tables on sql
> server in our office. Is there a way to have a copy this app on a laptop
> for a tele-worker and have it synchronised from time to time using either
a
> 3G data connection, or the office LAN when the tele-worker is in the
office?
> I presume we will have to make copies of both access front end and sql
> server backend on the laptop and then the laptop sql server will
synchronise
> with the office sql server. Is there a better way to handle this?
> Thanks
> Regards
>
|||I was under the impression that sql server supports synchronisation...or is
replication something different?
Thanks
Regards
"Wayne Snyder" <wayne.nospam.snyder@.mariner-usa.com> wrote in message
news:uzGwBYHBFHA.3140@.TK2MSFTNGP15.phx.gbl...
> For occasional testing... I simply backup/restore or attach/detach...
Mostly[vbcol=seagreen]
> I use backup/restore because the production database does not have to come
> down ( and the backups are already made).
> --
> Wayne Snyder, MCDBA, SQL Server MVP
> Mariner, Charlotte, NC
> www.mariner-usa.com
> (Please respond only to the newsgroups.)
> I support the Professional Association of SQL Server (PASS) and it's
> community of SQL Server professionals.
> www.sqlpass.org
> "John" <John@.nospam.infovis.co.uk> wrote in message
> news:eZjXQQABFHA.4004@.tk2msftngp13.phx.gbl...
laptop[vbcol=seagreen]
either
> a
> office?
> synchronise
>

Remote synchronization

Hi
We are considering an access front end app with back end tables on sql
server in our office. Is there a way to have a copy this app on a laptop
for a tele-worker and have it synchronised from time to time using either a
3G data connection, or the office LAN when the tele-worker is in the office?
I presume we will have to make copies of both access front end and sql
server backend on the laptop and then the laptop sql server will synchronise
with the office sql server. Is there a better way to handle this?
Thanks
Regardshave a look at DB Ghost - http://www.dbghost.com - I use it to synchronize m
y
laptop with the production database when I'm online so I have a database at
all times to run reports on.
"John" wrote:

> Hi
> We are considering an access front end app with back end tables on sql
> server in our office. Is there a way to have a copy this app on a laptop
> for a tele-worker and have it synchronised from time to time using either
a
> 3G data connection, or the office LAN when the tele-worker is in the offic
e?
> I presume we will have to make copies of both access front end and sql
> server backend on the laptop and then the laptop sql server will synchroni
se
> with the office sql server. Is there a better way to handle this?
> Thanks
> Regards
>
>|||For occasional testing... I simply backup/restore or attach/detach... Mostly
I use backup/restore because the production database does not have to come
down ( and the backups are already made).
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"John" <John@.nospam.infovis.co.uk> wrote in message
news:eZjXQQABFHA.4004@.tk2msftngp13.phx.gbl...
> Hi
> We are considering an access front end app with back end tables on sql
> server in our office. Is there a way to have a copy this app on a laptop
> for a tele-worker and have it synchronised from time to time using either
a
> 3G data connection, or the office LAN when the tele-worker is in the
office?
> I presume we will have to make copies of both access front end and sql
> server backend on the laptop and then the laptop sql server will
synchronise
> with the office sql server. Is there a better way to handle this?
> Thanks
> Regards
>|||I was under the impression that sql server supports synchronisation...or is
replication something different?
Thanks
Regards
"Wayne Snyder" <wayne.nospam.snyder@.mariner-usa.com> wrote in message
news:uzGwBYHBFHA.3140@.TK2MSFTNGP15.phx.gbl...
> For occasional testing... I simply backup/restore or attach/detach...
Mostly
> I use backup/restore because the production database does not have to come
> down ( and the backups are already made).
> --
> Wayne Snyder, MCDBA, SQL Server MVP
> Mariner, Charlotte, NC
> www.mariner-usa.com
> (Please respond only to the newsgroups.)
> I support the Professional Association of SQL Server (PASS) and it's
> community of SQL Server professionals.
> www.sqlpass.org
> "John" <John@.nospam.infovis.co.uk> wrote in message
> news:eZjXQQABFHA.4004@.tk2msftngp13.phx.gbl...
laptop[vbcol=seagreen]
either[vbcol=seagreen]
> a
> office?
> synchronise
>

Remote synchronization

Hi
We are considering an access front end app with back end tables on sql
server in our office. Is there a way to have a copy this app on a laptop
for a tele-worker and have it synchronised from time to time using either a
3G data connection, or the office LAN when the tele-worker is in the office?
I presume we will have to make copies of both access front end and sql
server backend on the laptop and then the laptop sql server will synchronise
with the office sql server. Is there a better way to handle this?
Thanks
Regardshave a look at DB Ghost - http://www.dbghost.com - I use it to synchronize my
laptop with the production database when I'm online so I have a database at
all times to run reports on.
"John" wrote:
> Hi
> We are considering an access front end app with back end tables on sql
> server in our office. Is there a way to have a copy this app on a laptop
> for a tele-worker and have it synchronised from time to time using either a
> 3G data connection, or the office LAN when the tele-worker is in the office?
> I presume we will have to make copies of both access front end and sql
> server backend on the laptop and then the laptop sql server will synchronise
> with the office sql server. Is there a better way to handle this?
> Thanks
> Regards
>
>|||For occasional testing... I simply backup/restore or attach/detach... Mostly
I use backup/restore because the production database does not have to come
down ( and the backups are already made).
--
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"John" <John@.nospam.infovis.co.uk> wrote in message
news:eZjXQQABFHA.4004@.tk2msftngp13.phx.gbl...
> Hi
> We are considering an access front end app with back end tables on sql
> server in our office. Is there a way to have a copy this app on a laptop
> for a tele-worker and have it synchronised from time to time using either
a
> 3G data connection, or the office LAN when the tele-worker is in the
office?
> I presume we will have to make copies of both access front end and sql
> server backend on the laptop and then the laptop sql server will
synchronise
> with the office sql server. Is there a better way to handle this?
> Thanks
> Regards
>|||I was under the impression that sql server supports synchronisation...or is
replication something different?
Thanks
Regards
"Wayne Snyder" <wayne.nospam.snyder@.mariner-usa.com> wrote in message
news:uzGwBYHBFHA.3140@.TK2MSFTNGP15.phx.gbl...
> For occasional testing... I simply backup/restore or attach/detach...
Mostly
> I use backup/restore because the production database does not have to come
> down ( and the backups are already made).
> --
> Wayne Snyder, MCDBA, SQL Server MVP
> Mariner, Charlotte, NC
> www.mariner-usa.com
> (Please respond only to the newsgroups.)
> I support the Professional Association of SQL Server (PASS) and it's
> community of SQL Server professionals.
> www.sqlpass.org
> "John" <John@.nospam.infovis.co.uk> wrote in message
> news:eZjXQQABFHA.4004@.tk2msftngp13.phx.gbl...
> > Hi
> >
> > We are considering an access front end app with back end tables on sql
> > server in our office. Is there a way to have a copy this app on a
laptop
> > for a tele-worker and have it synchronised from time to time using
either
> a
> > 3G data connection, or the office LAN when the tele-worker is in the
> office?
> > I presume we will have to make copies of both access front end and sql
> > server backend on the laptop and then the laptop sql server will
> synchronise
> > with the office sql server. Is there a better way to handle this?
> >
> > Thanks
> >
> > Regards
> >
> >
>

Tuesday, March 20, 2012

Remote SQL admin across the internet

Hi,

What is a secure way or accepted method to make database changes,IE: edit tables, add sprocs. ect. on a server being ran by an ASP.NET hosting service across the internet?

Thank you
-heywadeYou should check out theData Access Security section of "Building Secure ASP.NET Applications: Authentication, Authorization, and Secure Communication" in Microsoft's Patterns & Practices series.

Terri|||Thanks Terri,

That is a great artical. Alot of that is out of My control though. An ASP.NET hosting service said that I would use Enterprize manager to admin My "would be" database across the internet. They didn't mention using SSL, also the use of IPSec wouldn't work on My end with NAT. So I guess I'll have to follow up with them. I didn't know the SSL would work with an SQL server...cool.

Thanks again
-heywade

Remote Server login

I got a problem a system administrator can not log on to sql remote server so he can back up the system tables. Hes doing it through active directory, not sure what the problem is. can anyone help please. reason says not associated with a trusted sql connection?Is Builtin Administrators group registered with the server with sysadmin server role? And if not, is the server configured for Mixed Security Mode? How is your admin trying to connect?|||hes logging in through windows only, not in mix mode|||Then you need to look into what account he uses and if that account is explicitly present in the Logins of the server or if its group is present there. I'd make sure that Builtins is in the logins and tell him that he should fix it himself, - include Domain Admins or his account into Local Administrators group (Builtin Administrators).|||Actually, if the BUILTIN/Administrators group is not part of SQL Server, that's a good thing. The DBA group should be the ONLY group with sysadmin rights on SQL Servers. The Domain Admins should be allowed backup permissions if that's part of their job description. Normally, the DBA does that also though.|||Well, not if you're doing replication.|||Well, I doubt they're doing replication if they can't even figure out how to log into a box. Codd, what were you thinking?|||Hey, you know we're talking about Brett here (wink-wink), Codd knows what he does there...between margaritas... :D...I'm loosing my mind...OF COURSE IT'S a WRONG THREAD!!! Oh well, eventually Brett will show up and say what I thought he already did...How many of those shots did I have?

Friday, March 9, 2012

Remote Foreign keys.. What should I do?

Right now i'm building a language centre DB. Is going to hold translations for data in tables in another DB (english DB). The idea is going to be that there is going to be a table in the Language DB for every language and table it is going to translate in the english DB.

So lets consider the following in the English DB:

PROJ_TBL_HELPTOPICS

-> PK_HELP_ID

-> TITLE

-> DESCR

PROJ_TBL_CATEGORIES

-> PK_CAT_ID

-> TITLE

-> DESCR

In the Language DB I want to hold translations for HELPPTOPICS and CATEGORIES, and I want translations for Spanish and Japanese.

PROJ_TBL_HELPTOPICS_ES

-> PK_TRANS_ID

-> FK_HELP_ID

-> TRANS_TITLE

-> TRANS_DESCR

The rest is going to be the layout as above

PROJ_TBL_HELPTOPICS_JA

PROJ_TBL_CATEGORIES_ES

PROJ_TBL_CATEGORIES_JA

The reasons I separated up the language DB from the english DB are:

1. English DB has, and is going to have a lot more tables, and is going to be heavily queried (plus I think dont think the translations are going to be used anywhere near as often and the english). I figured the less tables, where possible, the better.

2. Putting translations in different a different DB, I could take better advantage of colliations specific to a language for example when using Full-Text searching on Japanese text

Anyways, here's my question!?!

I want to link the foreign key column to the table it is translating primary key column (in the English DB). I want to be able to take advantage of Cascade on Delete. So when an item is deleted from EnglishDB.PROJ_HELP_TOPICS it is going to be deleted from LanguageDB.PROJ_HELP_TOPICS_[LANG ISO]. Is this done through Mirroring?

Mirroring is a high-availability feature, it is not the solution you are looking for.

Quote from BOL "FOREIGN KEY constraints can reference only tables within the same database on the same server. Cross-database referential integrity must be implemented through triggers.FOREIGN KEY constraints can reference only tables within the same database on the same server. Cross-database referential integrity must be implemented through triggers."

I'm not sure but if they are different servers or instances you may be able to create a linked server and then create triggers to maintain the integrity.

Remote Foreign keys.. What should I do?

Right now i'm building a language centre DB. Is going to hold translations for data in tables in another DB (english DB). The idea is going to be that there is going to be a table in the Language DB for every language and table it is going to translate in the english DB.

So lets consider the following in the English DB:

PROJ_TBL_HELPTOPICS

-> PK_HELP_ID

-> TITLE

-> DESCR

PROJ_TBL_CATEGORIES

-> PK_CAT_ID

-> TITLE

-> DESCR

In the Language DB I want to hold translations for HELPPTOPICS and CATEGORIES, and I want translations for Spanish and Japanese.

PROJ_TBL_HELPTOPICS_ES

-> PK_TRANS_ID

-> FK_HELP_ID

-> TRANS_TITLE

-> TRANS_DESCR

The rest is going to be the layout as above

PROJ_TBL_HELPTOPICS_JA

PROJ_TBL_CATEGORIES_ES

PROJ_TBL_CATEGORIES_JA

The reasons I separated up the language DB from the english DB are:

1. English DB has, and is going to have a lot more tables, and is going to be heavily queried (plus I think dont think the translations are going to be used anywhere near as often and the english). I figured the less tables, where possible, the better.

2. Putting translations in different a different DB, I could take better advantage of colliations specific to a language for example when using Full-Text searching on Japanese text

Anyways, here's my question!?!

I want to link the foreign key column to the table it is translating primary key column (in the English DB). I want to be able to take advantage of Cascade on Delete. So when an item is deleted from EnglishDB.PROJ_HELP_TOPICS it is going to be deleted from LanguageDB.PROJ_HELP_TOPICS_[LANG ISO]. Is this done through Mirroring?

Mirroring is a high-availability feature, it is not the solution you are looking for.

Quote from BOL "FOREIGN KEY constraints can reference only tables within the same database on the same server. Cross-database referential integrity must be implemented through triggers.FOREIGN KEY constraints can reference only tables within the same database on the same server. Cross-database referential integrity must be implemented through triggers."

I'm not sure but if they are different servers or instances you may be able to create a linked server and then create triggers to maintain the integrity.

Remote Foreign Keys

Right now i'm building a language centre DB. Is going to hold translations for data in tables in another DB (english DB). The idea is going to be that there is going to be a table in the Language DB for every language and table it is going to translate in the english DB.

So lets consider the following in the English DB:

PROJ_TBL_HELPTOPICS
-> PK_HELP_ID
-> TITLE
-> DESCR

PROJ_TBL_CATEGORIES
-> PK_CAT_ID
-> TITLE
-> DESCR

In the Language DB I want to hold translations for HELPPTOPICS and CATEGORIES, and I want translations for Spanish and Japanese.

PROJ_TBL_HELPTOPICS_ES
-> PK_TRANS_ID
-> FK_HELP_ID
-> TRANS_TITLE
-> TRANS_DESCR

The rest is going to be the layout as above
PROJ_TBL_HELPTOPICS_JA
PROJ_TBL_CATEGORIES_ES
PROJ_TBL_CATEGORIES_JA

The reasons I separated up the language DB from the english DB are:

1. English DB has, and is going to have a lot more tables, and is going to be heavily queried (plus I think dont think the translations are going to be used anywhere near as often and the english). I figured the less tables, where possible, the better.

2. Putting translations in different a different DB, I could take better advantage of colliations specific to a language for example when using Full-Text searching on Japanese text

Anyways, here's my question!?!

I want to link the foreign key column to the table it is translating primary key column (in the English DB). I want to be able to take advantage of Cascade on Delete. So when an item is deleted from EnglishDB.PROJ_HELP_TOPICS it is going to be deleted from LanguageDB.PROJ_HELP_TOPICS_[LANG ISO]. Is this done through Mirroring?

Maybe this a dumb question, but why don't you make the language a column on the table instead of having n number of tables? What u are doing doesn't seem necessary.

PROJ_TBL_HELPTOPICS
PK_HELP_ID
PK_HELP_LANG
TITLE
DESCRIPTION (why abbreviate?)
(don't you need a FK to the category?)

PROJ_TBL_CATEGORIES
PK_CAT_ID
PK_CAT_LANG
TITLE
DESCRIPTION

Unless you plan on putting the the different language databases on completely different servers i don't see the performance benefit in splitting it up. I do however see lots of headaches occuring as a result of de-normalising ur data structure. Not to mention your data access code, instead of only varying by one parameter (the lang - ie. "en") you instead need to write code that access different databases and different tables (by name). That's just nasty.

Having the lang part of the PK means the lang (and id) will be indexed. Access should be very fast. It may be best to put the LANG column before the id column.. depends on how you use it.

I dunno about your point about collations. Based on my limited understanding that may be relevant to simple stringA = stringB comparisons. That said japanese doesn't have things like different case etc so it's not really relevant... IMHO.

And in regards to your question, i dunno if FK's can work across seperate db's. My guess is NO!! If it was possible u'd need to create a cascading deletable FK from the english help topics table to all the other language databases help_topis table.

|||

Remeber, the point of this forum is not trash one another, supposed to be making helpful hits and tips.

My reason for different table per language
I want the ability to store the last 10 versions of translation. So when translations are modified, I have the ability to go back and the last 10 versions of the same translation. In the past I've found that when I have a number of different translators come in, they have a tendancy to change each others' translations, and I want to be able to undo their translation to and be able to backtrack up to 10 versions of that same translation. Keeping that in mind, I was a little reluctant throwing that in a single table in the LanguageDB.PROJ_TBL_HELPTOPICS with a LANG_ISO column as the size of the table could potentially grow really big, and hurt performance in the long run.

Right now EnglishDB.PROJ_TBL_HELPTOPICS has 20,000 rows, if I have 5 languages, and going to have the ability keep up to 10 versions of a translation, that means LanguageDB.PROJ_TBL_HELPTOPICS can potientially grow up to 1,000,000 rows. I figured, if I split it up to language specific tables (ie PROJ_TBL_HELPTOPICS_[LANG ISO]) then I could limit each table size to only 10x that of the table it is translating; which I figured would help has this is going to be a Web Application and reasonibly high queried table.

My reason for Column abbreviation
This is a safety precaution; lets say down the link I'm moving to a scenario where I feel comfortable putting all of this in a single DB (ie migrating to Oracle), then I'll be able to tell the difference between a translation column and a english column when I'm doing a JOIN in a query.

Remote Foreign Key Constraints
I know there is a way of doing it through linked servers, I just wanted to get feedback from experienced DBA's to get their input as far as whether that is the right way to go about it as they might have real world examples where it worked, or didn't work.

Conclusion
I'm trying to design my database for performance as this has the potiential of becoming a web project with a huge user base, sorting, storing and updating of an enormous about of data. Of course, later on I'm going to hier proper DBA's to tweek the DBs for optimum performance, but I'm trying do as much as I can from the get go. I could be doing this wrong, but that's the reason why I'm posting it up here BEFORE i get started on it.

So, as I asked before, is DB mirroring / linking the way to go about this? Does anyone have any senario's where they've used mirroring or linking?

|||

My apologies for the trash Tim, I've removed it.

Tim Dilbert:

My reason for different table per language
I want the ability to store the last 10 versions of translation. So when translations are modified, I have the ability to go back and the last 10 versions of the same translation. In the past I've found that when I have a number of different translators come in, they have a tendancy to change each others' translations, and I want to be able to undo their translation to and be able to backtrack up to 10 versions of that same translation. Keeping that in mind, I was a little reluctant throwing that in a single table in the LanguageDB.PROJ_TBL_HELPTOPICS with a LANG_ISO column as the size of the table could potentially grow really big, and hurt performance in the long run.

I hear what ur saying, you dont' want a massive table. You could create a historical table; create another table called LanguageDB.PROJ_TBL_HELPTOPICS_HISTORY with the same schema as LanguageDB.PROJ_TBL_HELPTOPICS. Remove the identity insert from the identity field and add a new field called DATECREATED. Then every time you save a new record to LanguageDB.PROJ_TBL_HELPTOPICS you can copy the old row to the LanguageDB.PROJ_TBL_HELPTOPICS_HISTORY. You can then easily get the last X historical records by filtering by ID & LAND and ordering by DATECREATED DESC. You could copy the old data to the HISTORY table by use of a TRIGGER (having only one table makes this easy to manage, i hate lots of triggers) OR you just copy from your application when you save.

This approach keeps your primary tables lean and also gives you a complete history.

Tim Dilbert:

My reason for Column abbreviation
This is a safety precaution; lets say down the link I'm moving to a scenario where I feel comfortable putting all of this in a single DB (ie migrating to Oracle), then I'll be able to tell the difference between a translation column and a english column when I'm doing a JOIN in a query.

I'm not sure I understand that but to each his own :) What DB are u currently using? I was assuming MS SQL.

Tim Dilbert:

Remote Foreign Key Constraints
I know there is a way of doing it through linked servers, I just wanted to get feedback from experienced DBA's to get their input as far as whether that is the right way to go about it as they might have real world examples where it worked, or didn't work.

Well i'm not an experienced dba. But i think it's indisputable, given your goal is to improve performance, running your application through linked servers is probably going to negate any performance benefit of smaller tables

Tim Dilbert:

So, as I asked before, is DB mirroring / linking the way to go about this? Does anyone have any senario's where they've used mirroring or linking?

Well if your heart's set on some kind of distributed architecture have u considered a sql cluster?

|||

Tim Dilbert:

Remeber, the point of this forum is not trash one another, supposed to be making helpful hits and tips.

...

I could be doing this wrong, but that's the reason why I'm posting it up here BEFORE i get started on it.

I think you are being perhaps a little over-sensitive; Sam Critchley's suggestions could save you a lot of grief in the future.

I do agree that this forum is not for trashing each other, but what do you do when your friend strikes a match to look for a gas leak? You say, here is a helpful hint or tip; don't do that, it's a bad idea, you'll regret it later.

Having tables for each language could get expensive, especially if you add a language or six. Adding a language would require code changes, an ideal opportunity to introduce a few bugs every now and again, always gives the users something to laugh about. Even other changes that didn't involve adding a language wold need to be made in several places instead of one.

I shudder at the idea of splitting the data not only into separate tables but separate databases. That is another opportunity for something to go wrong, for versions to get out of step with each other. Murphy's law has not yet been repealed.

Your reasons for splitting up the database seem to be about performance, not to have too many records, not to have too many tables.

Having big tables is not a problem for a proper industrial strength database. You said "potentially grow up to 1,000,000 rows" as if that was a lot. No. I have a database with over 200 tables, some of them with 20-30 million rows. One copy lives happily on my laptop for a current project, another copy supports 200 concurrent users actively updating on a clapped-out 12 year old machine with 192Mb of memory.

However, there is the point about using different collation for different languages. I really don't know how much difference that would make, but if it was going to be significant, that might be a good reason for splitting into separate tables.

And if you were going to have separate web sites for each language, that might be a justification for splitting into separate databases.

BUT

I would want to keep all the data for all languages in a single central database where you do all the updating and maintenance, and periodically extract data for a particular language, to refresh the separate language database. That would be a one-way flow, data would go out to the satellite databases, but no changes would be made there, and no data would come back. The separate single language database should have the same structure, the same table names and column names, as the master database, so that you only need to maintain one version of the code.

I think you should have a master table of languages, and language keys in the translation tables, all languages in the same tables. Your searches would include language as a parameter, and even a full-text search (and you should try to avoid them as much as possible) would only look at records with a particular language.

As Sam pointed out, you have a category table, but nothing else connects back to it? What is it for?

Have you given any thought to how you would identify which records were the 10 different translations of the same topic, and which was the latest one?

Should there be a version number?

Should there be a date added, maybe a date obsolete/superceded? I tend to add a date/time last updated and UserId_last_updated_by too.

Might you want to store the name of the translator, or a key to link to a translators table?

You didn't say how much text there was in each topic, minimum,maximum, average? Whether a line, a page, a fifty page article, war and peace? That's a very important design consideration. If more than a paragraph or two, I would consider separating the main text in one table and a bunch of keywords, search terms, categories, or whatever, maybe a summary, into another table. I would encourage users to search the summary and keywords rather than the full text, because it will be quicker; but still allowing the option to search the full text, but requiring another step, a conscious effort rather than the default.

It would be a good idea to structure the data so that you only search the current versions, so a current/obsolete flag could be in an index with the language.

Alternatively, keep obsolete stuff out of the way by moving it into an archive table, so it is out of the way when searching.

Optionally, don't save the full text of each version of the translation, save only one full text version, and the changes that need to be applied to get back to each of the other versions. Like a source code version system, in fact. You'd want to keep the latest version online in full, keep the current version in the main table, with a separate table for change history. That would keep the overall size down a bit.

Well, that's my tuppence worth.

Good luck with your project.

|||

Alright, sorry for taking so long to get back but I've done some extensive research on this and found out the answers.

Seems like my idea of splitting data into language specific tables is the right thing to do. In SQL Server 2005, you can set collations to the server level, database level and column level. If I was to put all translations into a central table referencing languages through a "lang" column, I wouldn't be able to specify a collation (ie. Chinese_Hong_Kong_90_xxxx, French_CS_AS etc) since a collation on a row basis is not available.

Collations & Full Text
When building a Multilingual application, if you want to fully take advantage of the features that SQL Server has to offer, you should be using a collation specific to that language the appropriate word breakers, stemmers, and iFilters are used at Index and Query time. You can specify with language you'd like to use when creating your full-text calatogue (though it doesn't say it in the article I'm linking here) I did read somewhere else, it is still a good idea to use the proper collation so your words are stemmed properly. For example, in German "H?user" (houses) would also be steamed as "haeuser" (houses) and "haus" (house) in a German collation with German full-text calatloge, however I think it only gets steamed as "H?user" (houses) or "haeuser" (houses) with Latin1_General_CI_AS but a German Full-text cataloge (but I COULD BE WRONG).

Conclusion
After everything I am liking the suggestions that LockH made, putting the older versions of translations in another table is a really good idea. That's going to save me a hell of a lot of headaches later on. Translations in the archieved table will have two time stamps on them, the date the translation was inputed into the DB, and the date it was replaced. The new translation is going to go into the LanguageDB.TBL_HELP_TOPICS_TCH, updating the row that's in there.

Linked Server Overhead
Now as far as the over-head of using Linked Servers. I think you all mis-understood me when i said that i want to use linked servers. I'm not talking about doing selects through a link server, the link would only be used in a trigger when I delete is done so it's cascade down to the translations attached to the help topic. In SQL server you can dynamically concat a query string, then use 'EXECUTE'.

-- soexecute'select * from TBL_LANGS';-- that will execute the same as thisselect *from TBL_LANGS

Keeping that in mind, I was able to make a stored proceedure which is going to be attached to 'after delete' trigger on each of the tables, passing the table name and the PK of row deleted, and the stored proceedure loops through all the languages in the EnglishDB.TBL_LANGS table deleting the rows in LanguageDB.TBL_HELP_TOPICS_[LANG ISO] table related to the delete row. Since this is only going to be done on deletes, then the overhead is minimal considering how helpful it is going to be managing the data in LanguageDB.

In the EnglishDB.TBL_LANGS there is a column which is going to tell C# which DB to connect to to get the translation from. Since you open and close connections for every query you make to SQL Server it is no extra overhead other than the extra query (which I think is unavoidable because you'd be doing that call anyways whether all the translations are in one database or not).

Anyways, heres the articles I read, which are actually amazing. I think you should read this entirely because you even get started designing any DB for multilingual content.

http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/ - Part 1

http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features,-part-2/ - Part 2

I wish more people would reply to this article. It'd be good if there where some suggestions for which collation is the best to use for which language. I haven't really been able to find anyone with real-world examples.

|||

After everything I am liking the suggestions that LockH made, putting the older versions of translations in another table is a really good idea.

Errm... i actually suggested that.. with the timestamp... (second post first paragraph)

Well good luck.