Parallel Data Warehouse (PDW) How-To: Avoid ShuffleMove and PartitionMove Operations

There are a couple process hungry operations that can be avoided when developing or migrating T-SQL queries into Microsoft’s Parallel Data Warehouse (PDW) environment. With proper planning and attention to detail, one can side-step two operations more commonly known as the Data Movement Service (DMS) ShuffleMove and PartitionMove operations.  To shed a bit of light on what DMS is, its basically a  service agent that moves data between the appliance compute nodes within the PDW ‘shared-nothing’ architecture.

In an attempt to set a hook of interest in on ya, I’ll share a recent project breakthrough associated with the bypassing of these two DMS operations.  Recently my PDW implementation team experienced a 600% performance boost in query runtimes after reformulating a few steps in a query that previously utilized the PartitionMove and ShuffleMove DMS operations….I would assume I have your attention now!

So lets get to it… the idea of a large table or the concept of a subquery should be a familiar to data developers, DBAs,  and the like. When I refer to table distributions this may sound foreign, however, I am referring to the method in which the developer has decided to distribute the data across the PDW nodes via a CREATE TABLE statement. There are only two methods of distribution, and one must be chosen when creating a table:

Replicated Tables – Replicated tables are an ideal method for storing dimensional tables, lookup tables, or other small reference tables that are commonly used in PDW joins. The developer should consider creating tables via the replicated method when they would like to have a local copy of a table on each node. Since every node has a local copy, joins to the replicated table should be quick and efficient as long as the DMS doesn’t have to arrange for datasets to be distributed on other nodes (see incompatible joins toward the bottom of this post) and as long as the replicated table is relatively small, say less than 5mil records (this is just an estimate, there is a science behind choosing distribution modes and I’ll dive further into this on a future post).

Sample:

CREATE TABLE [testpdwdb].[dbo].[replicatedTable]
(
[AcctCorp]
INT NULL,
[CutoverDate] DATE
NULL
)
WITH (DISTRIBUTION = REPLICATE);

Distributed Tables –  Distributed tables are best suited for fact tables, very large dimension tables, or larger tables that contain a column that has many distinct values used in table join predicates (columns that meet this criteria are considered candidate distribution columns).  Finding the best distribution column can be a time consuming process as you really have to understand the queries that run/may run against the table. After an appropriate column has been chosen, PDW’s hash distrubution algorithim takes care of evenly distributing the data across the appliance nodes.

Note: Even data distribution across the appliance compute nodes prevents queries from encountering  performance draining ‘data skew’ scenarios. Data skew occurs when the developer makes a poor choice in a distribution column which can result in the hash algorithm ‘clumping’  together data that have same distributed column values. Ultimately, this results in some nodes working overtime while others nodes are vacationing…it’s tough to call this a parallel data warehouse if all nodes are not working at the same time under similar workloads!

Sample:

CREATE TABLE [testpdwdb].[dbo].[distrubutedTable]
(
[AcctCorp]
INT NULL,
[CutoverDate] DATE
NULL
)
WITH (CLUSTEREDINDEX([Cutoff_Date]),
DISTRIBUTION = HASH
([AcctCorp]));

With that mini-lesson out of the way, let me formally introduce the ShuffleMove and the PartitionMove DMS operations as these  performance-draining operations kick in when tables are not distributed correctly.

The ShuffleMove: 

Lets start with the ShuffleMove. A ShuffleMove occurs when the PDW must move data between nodes to fulfill a query request.  These occur when a query joins between two or more tables that are not distributed on the same column (also knows as an incompatible join).  To create a compatible join, PDW must create a temp table on every node for the incompatible table, redistribute the data from the incompatible table on a compatible column across the nodes, join the temp table back to the other table,  and stream results to the client.

As you can see in the following diagram, there is a lot of data movement going on in a shuffle move to perform a simple query task.  Records in Fact_dealer_sales must be ‘shuffled’ to align with the make_id hash of  table fact_manufacturer_sales.

.

..

.

The Partition Move:
A Partition move is the most expensive DMS operation and involves moving large amounts of data to the Control Node and across all of the appliance distributions on each node (8 per node).  Partition moves are typically the result of a GROUP BY statement that does not include the distribution column.  When a query attempts to a GROUP BY on  a non-distribution column, the aggregation takes place locally on each nodes distribution (10 nodes * 8 per node = 80 distributions), then the results are sent up to the Control Node for final aggregation.

Note: the column fact_dreamcar_sales.make would not reside in the fact table as a text field (more likely to be make_id) and was included just for kicks. BTW Lamborghini makes one heck of a supercar, however, it would’ve made the ‘make’ column quite wide.  Additionally, Zonda is a ‘model’ and not a ‘make’ although the Zonda did ‘make’ a name for its manufacturer - Pagini (just occured to me that I’ve been watching too much of BBC’s Top Gear lately).

Here are some steps that be taken to avoid PDW ShuffleMove and PartitionMove operations:

  • CTAS, CTAS, CTAS…when you see a table that has a distribution method that doesn’t work for the majority of queries ran, use CTAS to reassign distribution modes or the distribution column (for syntax on how to do this, see http://saldeloera.wordpress.com/2012/07/12/creating-tables-on-sql-server-parallel-data-warehouse-pdw/)
  • Research the size (row count, width)  and common joins to every table that you create in the PDW. It is critical to align every table to a proper distribution method and column.
  • Keep joins between non-compatible data sets to a minimum.
  • Try to use a distributed column as a join predicates when joining between two distributed tables.
  • If neccessary, explicitly create temp tables via CTAS to store non-compatible datasets and align distributions in the WITH clause
  • Avoid using GROUP BY on non-distributed columns.
  • If GROUP BY is neccessary on a non-distributed column, use the DISTRIBUTED_AGG query hint. This will force dynamic data redistributoin on a table before query is processed.
  • Cost based optimization is the execution engines ability to derive the most efficient query plan based on table size, statistics, and cardinality. In AU2, there is no cost-based query optimizer so additional steps may be required (see CTAS hint above) to get adequate performance out of your queries.
  • Thoroughly examine the steps created in the DMS D-SQL (Distributed SQL) plans to identify ShuffleMove or PartitionMove operations.
  • Left Outer Joins between a replicated (Left table) and distributed table (Right table) are considered incompatible joins, use sparingly.
  • Right Outer Joins between a distributed(Left table) and replicated table (Right table) are considered incompatible joins, use sparingly.
  • Full Outer Joins are incompatible joins, use sparingly.

If interested, below I included the T-SQL skeleton and the DMS Query Plan for the query that was running for 4+ hours on the PDW. Note the T-SQL was pulled directly off of the SMP (where runs in 40 min) and was modified slightly to run on the PDW.

As you look at the T-SQL and the DMS Query Plan below, hopefully a couple of burning questions come to mind after reading this post – maybe something along the lines of “Are all of those joins to subquries aligned?”, or “Is the distribution key included in those group by’s ? ” or maybe even “WTH is this?”… we asked ourselves these questions, made some code changes (CTAS’d the subqueries into temp tables and aligned distribution columns before joining), and managed to get it down to a 30min runtime on the PDW…not bad considering all of the indexes the SMP required to get it to run in 40 min!

Declare @EndDate as date
Set @EndDate = '7/4/2012'

Declare @StartDate as date
Set @StartDate = ’7/4/2012′

Delete From testpdwdb.dbo.someFinancialMasterTable where fiscal_date = @EndDate
insert into testpdwdb.dbo.someFinancialMasterTable
SELECT
FROM
(
   SELECT
   FROM
  (
    SELECT
    FROM
(
      SELECT
      FROM
(
         SELECT
         FROM testpdwdb.dbo.someOrderSnapshotTable
         WHERE someDatecol = DATEADD(D, -1, @StartDate)
)c
      JOIN
      (
         SELECT
         FROM testpdwdb.dbo.someProductRevenueTable
         WHERE Fiscal_date = DATEADD(D, -1, @StartDate)
)r
      ON c.sys_oci = r.sys
      AND c.prin_oci = r.prin
      AND c.sub_acct_no_oci = r.sub_acct_no
      AND R.some_other_date_col = C.some_date_col
      JOIN
      (
        SELECT
        FROM testpdwdb.dbo.someMarketTable
      )M
      ON R.sys = M.sys
      and r.prin = m.prin
    LEFT JOIN
    (
        SELECT
        FROM testpdwdb.dbo.someHierarchyTable
     )h
    ON r.gl_acct_id = h.gl_acct_id
    ) m
GROUP BY
) a
FULL OUTER JOIN
(
SELECT
FROM
(
    SELECT
    FROM
    (
        SELECT
        FROM testpdwdb.dbo.someItemTable a
        full outer JOIN
        (
            SELECT
            FROM testpdwdb.dbo.someAccountItemTable b
            left join testpdwdb.dbo.someActivityTable c
            on b.sys = c.sys
            and b.prin = c.prin
            and b.SUB_ACCT_NO = c.sub_acct_no
            and b.date = c.date
            WHERE b.date BETWEEN @StartDate AND @EndDate
            and c.date BETWEEN @StartDate AND @EndDate
        ) b
    ON a.sys = b.sys
    AND a.prin = b.prin
    AND a.acct = b.sub_acct_no
    AND UPPER(a.product) = UPPER(b.product)
    AND a.date = b.date
    GROUP BY
) z
JOIN
(
    SELECT
    FROM
    testpdwdb.dbo.someMarketTable
)M
ON z.sys = M.sys
and z.prin = m.prin
GROUP BY
) X
FULL OUTER JOIN
(
    SELECT
    FROM
    (
        SELECT
        FROMtestpdwdb.dbo.someFinancialModelTable
WHERE DATE BETWEEN @StartDate AND @EndDate AND gl_acct_id IS NOT NULL
    )pm
    JOIN
    (
        SELECT
        FROM testpdwdb.dbo.someMarketTable
    )M
    ON pm.sys = M.sys
    and pm.prin = m.prin
    GROUP BY
) PM
ON X.some_entity_col = PM.some_entity_col
AND UPPER(X.some_customer_col) = UPPER(PM.some_customer_col)
AND UPPER(X.some_product_col) = UPPER(PM.some_product_col)
AND X.gl_acct_id = PM.gl_acct_id
) b
ON a.some_entity_col = b.some_entity_col
AND UPPER(a.some_customer_col) = UPPER(b.some_customer_col)
AND UPPER(a.some_product_col) = UPPER(b.some_product_col)
AND a.gl_acct_id = b.gl_acct_id
GROUP BY
) x
GROUP BY

…and the Parallel Data Warehouse DMS plan only a mother could love:

About these ads

About Sal De Loera
I have been involved in Information Technology, specifically Data Warehousing and BI, for over 8 years. Over these years I have assumed the following roles: Software Engineer Intern, Programmer/Analyst, Sr. Data Warehouse Engineer, Data Architect, Director of Information Technology, and as most recently as a Data Warehouse / BI consultant. Additionally, my IT experience spans the following industries: Insurance, Restaurant, Mass Media, Education, and Local Government. I have created this blog primarily as a means to not only share information and case studies regarding Business Intelligence, Data Warehousing, and Data Modeling, but also receive input from the community on some of the topics that I'l cover. Looking forward to your feedback and to the exciting opportunities that come with new people and technology!

5 Responses to Parallel Data Warehouse (PDW) How-To: Avoid ShuffleMove and PartitionMove Operations

  1. Great Post Sal !! Keep up the good work. I would like to add that PDW or any MPP will always preform the best with the right architecture (fact & dimension tables). You can put different architectures on a MPP but there will be a lot of work/rework to get it to preform i.e. lots of CTAS ;)

    • Sal De Loera says:

      Thanks for the feedback and to your point, absolutely! MPP performance can be hampered by the same db architecture pitfalls that slow SMP systems to a crawl. Furthermore, well-wrought SMP architectures have the capability to trump an MPP solution in terms of performance and usability – provided that data volumes stay within a supportable range, do not undergo significant factors of scale, and have a well-designed db.

      Some organizations may overlook the root cause data warehousing initiative failures (see poorly planned database and/or ETL architectures) in hopes finding their DW savior in the form of a MPP appliance. This could ultimately lead to some organizations ending up with an MPP that performs comparably or even slower than their SMP instance!

      Alas, some database environments are just too small or too frumpy to reside on an MPP appliance!

      As a sidenote, it will be interesting to see how a 3NF model (vs Multidimensional) performs under the MPP architecture. I am in ongoing contact with an organization that will be utilizing an industry specific 3NF model (for television and broadcast) as their primary PDW model, updates to follow!

  2. Pingback: Great BLOG posts part 2 « Garrett Edmondson

  3. I love your blog.. very nice colors & theme.
    Did you design this website yourself or did you

    hire someone to do it for you? Plz answer back
    as I’m looking to design my own blog and

    would like to find out where u got this from. appreciate it

    • Sal De Loera says:

      Thanks, glad you like the look and feel and more importantly the content! There is a standard template called ‘Corporate’ that you can apply to your free wordpress site. It will set all of the layouts, borders, and fonts that you see on my site.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 60 other followers

%d bloggers like this: