How to use Script Component as Asynchronous Transformation

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube

Suppose you have a flat file which is not delimited or fixed width or Ragged right. the structure of data is not well formed for Flat File Connection manager to read, 

This is an example of data file:

date                     : 04/14/2010
customer name        : John doe
address                  : 2014 apple st. Chicago, IT 30000
sale order         : 04/15/2010
confirmation tag     : abcdefghi
date                     : 12/14/2010
customer name        : reza rad
address                  : Tehran Iran
sale order         : 02/15/2010
confirmation tag     : rererererere
date                     : 06/14/2009
customer name        : Abolfazl rad goudarzi
address                  : Tehran Iran
sale order         : 02/05/2010
confirmation tag     : fasgfsagas

You need to fetch data and fill them in appropriate fields in a Database table.

As you may know you can not use Flat File manager without any handy works to change data in appropriately manner. in fact you need a Transformation between Flat File Source and the Destination.

But how can you do this?

the answer is …

With Asynchronous Transformation.

 Let me talk about what the Asynchronous transformation is exactly:

there are two types of transformation in SSIS .

1- Synchronous Transformation

in this kind of transformation , the input data will process ROW BY ROW . and output is synchronize with the input. a good example of Synchronous transformation is the Derived Column Transformation. 

2- Asynchronous Transformation

in this kind of transformation, the input data will fetch all at first, ALL ROWS will read in first step, and then output will generate. there is no synchronization between input and output. an example of Asynchronous Transformation is the Aggregate Transformation.

Let’s back to sample data file above, We need to fetch data of 5 rows and combine them and make one output row. combination of input data needs to be done by a Script Component Transformation. And this is obvious that this script component should be Asynchronous Transformation.

Now we will implement a Script Component which work Asynchronously . Let me explain in steps:

1- Create new SSIS package,and add a Data Flow task in it. then add a Flat File Source in the Data Flow.

2- Configure the Flat File source, add a Flat File connection manager, connect it to the sample datafile above, let’s name it as ee.txt . in the Columns tab, set Column Delimiter as Colon {:} .


this Column delimiter will separates field names and field values . this means that there will be two column : Column 0 and Column 1. column 0 will consists of field names in flat file: date  – customer name – address – sale order – confirmation tag . and column 1 will consists of field values for each field name.

3- Add a Script Component Transformation and set it as transformation. double click on it and configure it.

4- in Input Columns tab, add two columns : column 0 , column 1


5- in Input and Outputs tab, select the Output 0, and rename it as myFlatOutput. set the SynchronousInputID as None. this will change your script component to asynchronous.


6- add these output columns under myFlatOutput :

column name           DataType

date                      date [DT_DATE]

customername         string [DT_STR]

address                  string [DT_STR]

saleorder                date [DT_DATE]

confirmationtag       string [DT_STR]


7- in script tab, set Script Language as Microsoft Visual C# 2008, and Edit Script.

8- override the Input0_ProcessInput method to fetch all data till end of file, as below:

public override void Input0_ProcessInput(Input0Buffer Buffer)
    {
        while (Buffer.NextRow())
        {
            Input0_ProcessInputRow(Buffer);
        }

        if (Buffer.EndOfRowset())
        {
            myFlatOutputBuffer.SetEndOfRowset();
        }
    }

9- override the Input0_ProcessInputRow method to add new rows to output, and combine values and make a full row, as below:

public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if(Row.Column0.ToLower().Trim() == "date")
        {
            myFlatOutputBuffer.AddRow();
        }

        switch(Row.Column0.ToLower().Trim())
        {
            case "date":
                myFlatOutputBuffer.date = DateTime.ParseExact(Row.Column1.Trim(), "MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture);
                break;
            case "customer name":
                myFlatOutputBuffer.customername = Row.Column1.Trim();
                break;
            case "address":
                myFlatOutputBuffer.address = Row.Column1.Trim();
                break;
            case "sale order":
                myFlatOutputBuffer.saleorder = DateTime.ParseExact(Row.Column1.Trim(), "MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture);
                break;
            case "confirmation tag":
                myFlatOutputBuffer.confirmationtag = Row.Column1.Trim();
                break;

        }
    }

Note that you should use AddRow() method to add new rows to the output of transformation.

10- Build the script and exit. this is the whole script:

/* Microsoft SQL Server Integration Services Script Component
*  Write scripts using Microsoft Visual C# 2008.
*  ScriptMain is the entry point class of the script.*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void PreExecute()
    {
        base.PreExecute();
        /*
          Add your code here for preprocessing or remove if not needed
        */
    }

    public override void PostExecute()
    {
        base.PostExecute();
        /*
          Add your code here for postprocessing or remove if not needed
          You can set read/write variables here, for example:
          Variables.MyIntVar = 100
        */
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if(Row.Column0.ToLower().Trim() == "date")
        {
            myFlatOutputBuffer.AddRow();
        }

        switch(Row.Column0.ToLower().Trim())
        {
            case "date":
                myFlatOutputBuffer.date = DateTime.ParseExact(Row.Column1.Trim(), "MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture);
                break;
            case "customer name":
                myFlatOutputBuffer.customername = Row.Column1.Trim();
                break;
            case "address":
                myFlatOutputBuffer.address = Row.Column1.Trim();
                break;
            case "sale order":
                myFlatOutputBuffer.saleorder = DateTime.ParseExact(Row.Column1.Trim(), "MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture);
                break;
            case "confirmation tag":
                myFlatOutputBuffer.confirmationtag = Row.Column1.Trim();
                break;

        }
    }

    public override void Input0_ProcessInput(Input0Buffer Buffer)
    {
        while (Buffer.NextRow())
        {
            Input0_ProcessInputRow(Buffer);
        }

        if (Buffer.EndOfRowset())
        {
            myFlatOutputBuffer.SetEndOfRowset();
        }
    }

    public override void CreateNewOutputRows()
    {
        /*
          Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
          For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
        */
    }

}

11- Great Done. and now, I want to use a Recordset destination and a DataViewer to show the results, Note that you can use your OLEDB destination or any other destination you want here.

Create a variable and name it as Variable, set datatype as object.

add a RecordSet Destination, double click on it, set variable name as User::Variable,

in Input columns tab, add all input columns.

right click on precedence constraint between script component and recordset destination, and select DataViewer.

hit add , and in next window hit ok.

12- run the package.

this is a schema of DataFlow :


and the result:


That’s all.



FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for nine continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

Leave a Reply