Script Component as Source – SSIS

There are many situations which you can use power of scripting in SSIS data flow. Today I encounter a situation here , there is source flat file here with header and trailer records which have different structure from data rows. Data rows delimited by vertical line ( | ) , but header and trailer not. Also Trailer rows information needs to import to another destination.

So we have a situation to read both data rows delimited and trailer row which isn’t delimited ( it’s fixed length on each value )

This is one of the situations which Power of script component makes sense.

In this sample I used a Script Component as Source.


Problem:

This is content of source flat file:

HVWBB630 201012302010123005473333
Iseyapcm            |88071523|E|Y|N|N|N|Y|
Anitakaul1          |88072163|E|Y|N|N|N|Y|
efarrellssb         |88072233|E|Y|N|N|N|Y|
markghouse          |88072450|E|Y|N|N|N|Y|
dechopra            |88074988|E|Y|N|N|N|Y|
TVWBB630 00000005122010123020101230TRADE PROFILE

first row is header row which should be skipped so we don’t consider this .

and last row is trailer which have useful information for us, like:

Program Name : VWBB630

Record Count: 512

Date:  20101230

Data rows should import into a data table, and trailer information should import in another destination table.

Solution:

1- Add a Data flow task.

2- Add a Script Component Transformation as SOURCE .

3- Double click on script component, and in General tab, set language as Visual Basic


4- Go to Inputs and Outputs Tab,

rename the existing Output0 to OutputMainRows

and add another output named OutputTrailerRow


Under OutputMainRows add 8 columns with names : Column 0 , Column 1 , … , Column 7 and set data type of them as DT_STR

( this is because we have 8 data columns in this source file sample )

Under OutputTrailerRow add 3 Columns as below:

Column name                      data type

————————————–

ProgramName                      DT_STR

RecordCount                       DT_I4

Date                                  DT_DATE


4- Go back to general tab, and click on Edit Script,

Here is where we should read data from source file, in this sample I used System.IO.StreamReader .NET class which get us the ability to read data from file.

First of all , in the class definition, declare a variable of this type:

Dim sr As System.IO.StreamReader

the StreamReader class should be instantiated once the script component is in pre execute step, so I write this line in the PreExecute() method as below:

Public Overrides Sub PreExecute()
        MyBase.PreExecute()
sr = New System.IO.StreamReader("D:\SSIS\ScriptComponentAsSource\sourcefile.txt")
End Sub


Now we opened the file and we can read it with .ReadLine() method line by line, this is where we process source data and redirect each data to appropriate output.

We should use the CreateNewOutputRows method, and write our main code there, also note that for adding row to every output we should use this structure <OUTPUTNAME>Buffer.AddRow()

This is the code which should be written in CreateNewOutputRows() :

Public Overrides Sub CreateNewOutputRows()
        Dim lineIndex As Integer = 0

        While (Not sr.EndOfStream)
            Dim line As String = sr.ReadLine()
            If (lineIndex <> 0) Then ‘remove header row
                Dim columnArray As String() = line.Split(Convert.ToChar("|"))
                If (columnArray.Length > 1) Then
                    ‘main rows
                    OutputMainRowsBuffer.AddRow()
                    OutputMainRowsBuffer.Column0 = columnArray(0)
                    OutputMainRowsBuffer.Column1 = columnArray(1)
                    OutputMainRowsBuffer.Column2 = columnArray(2)
                    OutputMainRowsBuffer.Column3 = columnArray(3)
                    OutputMainRowsBuffer.Column4 = columnArray(4)
                    OutputMainRowsBuffer.Column5 = columnArray(5)
                    OutputMainRowsBuffer.Column6 = columnArray(6)
                    OutputMainRowsBuffer.Column7 = columnArray(7)
                Else
                    ‘trailer row
                    OutputTrailerRowBuffer.AddRow()
                    OutputTrailerRowBuffer.ProgramName = line.Substring(1, 7)
                    OutputTrailerRowBuffer.RecordCount = line.Substring(9, 10)
                    OutputTrailerRowBuffer.Date = DateTime.ParseExact(line.Substring(19, 8), "yyyyMMdd", New   System.Globalization.CultureInfo("en-US"))
                End If
            End If
            lineIndex = lineIndex + 1
        End While

    End Sub


When reading data completed we should close the StreamReader , the PostExecute() method is the best place to close connections.

Public Overrides Sub PostExecute()
        MyBase.PostExecute()
sr.Close()
End Sub


This is whole code in this script component:

‘ Microsoft SQL Server Integration Services Script Component
‘ Write scripts using Microsoft Visual Basic 2008.
‘ ScriptMain is the entry point class of the script.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent
    Dim sr As System.IO.StreamReader

    Public Overrides Sub PreExecute()
        MyBase.PreExecute()
        sr = New System.IO.StreamReader("D:\SSIS\ScriptComponentAsSource\sourcefile.txt")
    End Sub

    Public Overrides Sub PostExecute()
        MyBase.PostExecute()
        sr.Close()
    End Sub

    Public Overrides Sub CreateNewOutputRows()
        Dim lineIndex As Integer = 0

        While (Not sr.EndOfStream)
            Dim line As String = sr.ReadLine()
            If (lineIndex <> 0) Then ‘remove header row
                Dim columnArray As String() = line.Split(Convert.ToChar("|"))
                If (columnArray.Length > 1) Then
                    ‘main rows
                    OutputMainRowsBuffer.AddRow()
                    OutputMainRowsBuffer.Column0 = columnArray(0)
                    OutputMainRowsBuffer.Column1 = columnArray(1)
                    OutputMainRowsBuffer.Column2 = columnArray(2)
                    OutputMainRowsBuffer.Column3 = columnArray(3)
                    OutputMainRowsBuffer.Column4 = columnArray(4)
                    OutputMainRowsBuffer.Column5 = columnArray(5)
                    OutputMainRowsBuffer.Column6 = columnArray(6)
                    OutputMainRowsBuffer.Column7 = columnArray(7)
                Else
                    ‘trailer row
                    OutputTrailerRowBuffer.AddRow()
                    OutputTrailerRowBuffer.ProgramName = line.Substring(1, 7)
                    OutputTrailerRowBuffer.RecordCount = line.Substring(9, 10)
                    OutputTrailerRowBuffer.Date = DateTime.ParseExact(line.Substring(19, 8), "yyyyMMdd", New System.Globalization.CultureInfo("en-US"))
                End If
            End If
            lineIndex = lineIndex + 1
        End While

    End Sub

End Class

Save and close the script editor.

5- Now you have two outputs in script component. connect OutputMainRows to the table which you want to insert data rows.


and connect OutputTrailerRow to the destination table which should store trailer information.


6- Now is the time to run package, This is output showed in the OutputMainRows:


and this is output in the OutputTrailerRow:


Hope this be helpful post.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

Leave a Reply