Tuesday, March 20, 2012

Remote Source and Destination Performance

Given the following scenario, what kind of performance should be expected in transferring about half a million rows? We are seeing about a 9 minute execution time. Is this reasonable for about 460,000 records moving from source to target, with 3 inner joins from the source?

Source: Server A.OLTPDB

Target: ServerA.DataMartDB

Server A is running SQL Server 2000. SSIS is running on a different machine, Server B.

The reason for this is that we are distributing the SSIS package for use with a BI product built on SSAS 2005 and the requirements are such that the client could very well have the source OLTP database on a different physical machine than the data mart.

My understanding is therefore that:

1. SSIS will do all of the heavy lifting on Server B.

2. Even though OLTPDB and DataMartDB are on the same server, it is expensive for Server B to pull the records from Server A and then send them back to Server B, but with SSIS being on a different machine, this is inevitable.

3. In the OLE DB Source Adapter, specifying table or view has the effect of an OPEN QUERY command, whereas a SQL command with straight SQL will be executed on Server B and the former would be somewhat more performant.

Can you guys validate/dispel these assumptions?

TIA,

Rick

First, only you can determine performance stats... You'll have to test within your environment.

With that said though, OPEN QUERY will almost always be slower than defining your own SQL in the OLE DB source. Even better will be to explicitly define the columns you want returned in that SQL. (Instead of doing a "select * from...").

Phil|||

Phil, thanks. I understand your point about baselining and benchmarking and all environments and scenarios being unique, but would you agree that pulling half a million records from a remote SQL box is not trivial?

I am just, at a high level, trying to assert that working with that large a dataset between two remote SQL boxes is expensive and that the performance is what it is. If on the other hand, folks have done something like this and see mere seconds response- then I know I am doing something very wrong.

Thanks,

Rick

|||

Assert.True wrote:

Given the following scenario, what kind of performance should be expected in transferring about half a million rows? We are seeing about a 9 minute execution time. Is this reasonable for about 460,000 records moving from source to target, with 3 inner joins from the source?

Impossible to say. There are so many variables in play here. 9 minutes seems like a very long time but it COULD be the query, it COULD be network latency, it COULD be any number of things. If you want some tips on performance tuning then follow this link:

Donald Farmer's Technet webcast
(http://blogs.conchango.com/jamiethomson/archive/2006/06/14/4076.aspx)

Assert.True wrote:

Source: Server A.OLTPDB

Target: ServerA.DataMartDB

Server A is running SQL Server 2000. SSIS is running on a different machine, Server B.

The reason for this is that we are distributing the SSIS package for use with a BI product built on SSAS 2005 and the requirements are such that the client could very well have the source OLTP database on a different physical machine than the data mart.

My understanding is therefore that:

1. SSIS will do all of the heavy lifting on Server B.

Explain "heavy lifting". The query to get the data gets executed on serverA. if you are doing any manipulation of the data in the pipeline then that will occur on serverB. The insertion of the data will occur on serverA. Any of these things COULD be bottlenecks - you need to find which of them IS the bottleneck.

Assert.True wrote:

2. Even though OLTPDB and DataMartDB are on the same server, it is expensive for Server B to pull the records from Server A and then send them back to Server B, but with SSIS being on a different machine, this is inevitable.

It seems that way. Your network becomes a variable here.

Assert.True wrote:

3. In the OLE DB Source Adapter, specifying table or view has the effect of an OPEN QUERY command, whereas a SQL command with straight SQL will be executed on Server B and the former would be somewhat more performant.

Some info here that may help:

SELECT *... or select from a dropdown in an OLE DB Source component?
(http://blogs.conchango.com/jamiethomson/archive/2006/02/21/SSIS_3A00_-SELECT-_2A002E002E002E00_-or-select-from-a-dropdown-in-an-OLE-DB-Source-component_3F00_.aspx)

Assert.True wrote:

Can you guys validate/dispel these assumptions?

Hope that has helped somewhat!

-Jamie

|||

Jamie/Phil- thanks for your feedback. It does help and I am confident that the bottleneck has been isolated to a single query for which I can reproduce the timing by running from local Query Analyzer (which repros the effect of having to pull all those records from "Server A").

Having changed all Merge Joins to Lookups (which did help), there is no question that pulling half a million rows (to perform transformation) is really expensive and I guess it makes sense- would you try to do that in a web browser, for instance? No way (unless you want to crash it).

So, I think the options include limiting the result set or just dealing with the 10 minute +/- duration for the ETL to run. I will also look into source indexes and I will watch Farmers webcast and if anything changes, I will report back.

Thanks again for the responses- especially on a weekend!

Rick

|||~500,000 rows (depending on row size) should be extremely trivial to process. The timings of which should be under one minute, easily. I'm always amazed at the thought that 1,000,000 rows is a large amount of data. I'll put it into perspective: First Data Resources (one of the major credit card processing companies in the world) processes so many transactions per minute, that if 500,000 rows took over 10 minutes to run, they'd be out of business -- they'd never be able to catch up.

With that said, the slow times are usually a factor of (and Jamie listed most of them already): network, local and remote hardware, row size, unique processing constraints (converting non-standard date formats to database date/time fields), whether or not you are logging each transaction, referential integrity checks, etc...

Have you looked at the execution plan for your query to see if adding indexes or something might help?|||

I'd like to add my 5 cents to this discussion. When you are trying to improve performance or find botlenecks you wnat to isolate and mesurre every piece involved. This paper offer a couple of very simple but helpful sugestions in the troubleshooting section. I hope you find it helpful.

|||Hi Rafael, there was no link provided (to the paper).|||

Assert.True wrote:

Hi Rafael, there was no link provided (to the paper).

There you go:

http://www.microsoft.com/technet/prodtechnol/sql/2005/ssisperf.mspx

No comments:

Post a Comment