iShare
The IntelliCorp Community Site
Batching a LiveCompare DataSet

Working through a problem today with a customer we hit upon a way to batch a large dataset for use with a specialized LiveCompare HR action.  The basic problem was that trying to retrieve the payroll data for 10,000+ employees was running out of memory on SAP.  At first I thought we could simply cut the employees up into fixed batches are store them all as external data sources.  Unfortunately the dynamic nature of the employees - different analyses used different employees - meant that wouldn't work.

Since we already had a series of actions for discovering the employees and we knew the LiveCompare HR action could process about 1,000 employees comfortably the problem became one of how can we split the 10,000 employees into batches of 1,000.

Update (2010.01.06) - yes, I could have used the Start Loop / End Loop Collect to collect the payroll data one employee at a time but the time overhead of calling out to SAP 10,000+ times meant I needed to find a way to process the dynamically collected employees in batches.

The solution relies on the SQL Server T-SQL function - ROW_NUMBER(). First we use an Execute SQL (XSQL) action to compute the start row and end row numbers for each batch.  As a workflow designer you can easily change the page size - the number of rows to be sent in a single batch.  The SQL looks like this:

  • 1. Create the results table:
    CREATE TABLE Result
    (
    START_ROW INT,
    END_ROW INT
    )
  • 2. Insert the batch start row and end row pairs:
    DECLARE @page_size INT
    DECLARE @total_rows INT
    DECLARE @rows_to_read INT
    DECLARE @start_row INT
    DECLARE @end_row INT
    SET @page_size = 9
    SET @total_rows = (SELECT COUNT(*) FROM Source1)
    SET @end_row = 0
    WHILE @total_rows > 0
    BEGIN
    IF @total_rows > @page_size
    SET @rows_to_read = @page_size
    ELSE
    SET @rows_to_read = @total_rows
    SET @start_row = @end_row + 1
    SET @end_row = @end_row + @rows_to_read
    INSERT INTO Result (START_ROW, END_ROW) SELECT @start_row, @end_row
    SET @total_rows = @total_rows - @rows_to_read
    END

The highlight - SET @page_size = 9 - line is where you can tailor the size of each batch.  9 is the number of rows from the Source 1 that will make-up a single batch.

Now we can use the standard Start Loop Collect action to iterate through each batch.  The Result dataset will contain a single row with START_ROW and END_ROW fields.  We use another XSQL action to retrieve the rows from the input data for that particular batch we are processing.  The SQL looks like:

DECLARE @start_row INT;
DECLARE @end_row INT;
SET @start_row = (SELECT TOP 1 START_ROW FROM Source2);
SET @end_row = (SELECT TOP 1 END_ROW FROM Source2);
WITH OrderedPerns_CTE AS(
  SELECT PERNR, ROW_NUMBER() OVER (ORDER BY PERNR) AS 'ROW_NUMBER'
  FROM Source1
)
INSERT INTO Result (PERNR)
SELECT PERNR FROM OrderedPerns_CTE
WHERE ROW_NUMBER BETWEEN @start_row AND @end_row

The WITH statement prepares for us a temporary table called OrderedPerns_CTE that contains an ordered set of records from the input data with the accompanying row number.  The INSERT INTO then selects only those rows that fall within the bounds of the current batch.

The output of this XSQL action represents the batch of source data to be processed.  In this case we fed the rows into the HR Get Payroll Data action.

Lastly the End Loop Collection action collects into a single dataset all the payroll data for all 10,000 employees.

As a test, my sample workflow simply collects each batch of source data into the final dataset.  It's a good way to test that everything's working correctly - the dataset should be identical to the input data.  In effect the sample workflow is possible the most complex way of implementing XSQL's quick CopyRows option!


Posted 01-04-2010 8:57 PM by Chris Trueman
Filed under: