Regular Expressions are very useful expressions for text processing
there are many usages like validation a text against a pattern or find appropriate parts of text with defined patterns which can be solved with Regular Expressions.
To find out more about Regular Expressions read here.
Today I find a simple case of such things in SSIS, SSIS will act great if combined with Regular Expressions.
Consider a case when Input column has values like this:
col1
2011\09\23 rev.018
2011\09\26 rev.019
2011\09\25 rev.005
\\ rev.
Desired output is to fetch out just date part like this:
col1
NULL
2011\09\23
2011\09\26
2011\09\25
NULL
Suppose we have a text file which contains source data,
after the source add a Script Component as Transformation and set the Col1 as Input Column, and create new output columns of type DT_STR, name this as OutputCleansed
then Set language as C#, and Edit script, write this script to apply the Regular Expression to input column’s data as below:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Regex reg = new Regex(@"\w{4}\\\w{2}\\\w{2}");
if (Row.col1_IsNull || string.IsNullOrEmpty(Row.col1.Trim()))
Row.OutputCleansed_IsNull = true;
else
if (reg.IsMatch(Row.col1))
Row.OutputCleansed = reg.Match(Row.col1).Groups[0].Value;
else
Row.OutputCleansed_IsNull = true;
}
Note that for using Regular Expressions in Script you need to add this using part :
using System.Text.RegularExpressions;
The expression used in this sample is just to fetch YYYY\MM\DD part , and the expression is : \w{4}\\\w{2}\\\w{2}
but for any other cases you can use any other regular expression, a quick reference of regular expressions can be found here:
http://www.regular-expressions.info/reference.html
http://www.regular-expressions.info/refadv.html
After the Script Component add a destination, and add a Data Viewer.
this is a sample of desired output fetched by Script Component resorting Regular Expressions: