Regular Expression in SSIS

Regular Expressions are very useful expressions for text processing

there are many usages like validation a text against a pattern or find appropriate parts of text with defined patterns which can be solved with Regular Expressions.

To find out more about Regular Expressions read here.

Today I find a simple case of such things in SSIS, SSIS will act great if combined with Regular Expressions.

Consider a case when Input column has values like this:

 col1

2011\09\23 rev.018
2011\09\26 rev.019
2011\09\25 rev.005
\\ rev.

Desired output is to fetch out just date part like this:

col1

NULL

2011\09\23

2011\09\26

2011\09\25

NULL

Suppose we have a text file which contains source data, 

after the source add a Script Component as Transformation and set the Col1 as Input Column, and create new output columns of type DT_STR, name this as OutputCleansed

then Set language as C#, and Edit script, write this script to apply the Regular Expression to input column’s data as below:

public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        Regex reg = new Regex(@"\w{4}\\\w{2}\\\w{2}");
        if (Row.col1_IsNull || string.IsNullOrEmpty(Row.col1.Trim()))
            Row.OutputCleansed_IsNull = true;
        else
            if (reg.IsMatch(Row.col1))
                Row.OutputCleansed = reg.Match(Row.col1).Groups[0].Value;
            else
                Row.OutputCleansed_IsNull = true;
    }

Note that for using Regular Expressions in Script you need to add this using part :

using System.Text.RegularExpressions;

The expression used in this sample is just to fetch YYYY\MM\DD part , and the expression is : \w{4}\\\w{2}\\\w{2}

but for any other cases you can use any other regular expression, a quick reference of regular expressions can be found here:

http://www.regular-expressions.info/reference.html

http://www.regular-expressions.info/refadv.html

After the Script Component add a destination, and add a Data Viewer.

this is a sample of desired output fetched by Script Component resorting Regular Expressions:

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

Leave a Reply