Problem with comma values in Comma Delimited File

Posted by on Jan 22, 2010 in SSIS | No Comments
Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Assume .csv file which columns delimited by comma ( , ) . and Texts delimited by Double Quote ( " ) like below:

"Invoice Number","Invoice Status","Invoice Date of Service","Invoice Date Opened","Invoice Hold from printing/submission","Invoice Branch","Invoice Balances Adjustments","Invoice Balances Payments","Invoice Balances Balance","Patient Last Name","Patient First Name","Patient ID","Policy Payor Name","Invoice Detail Item ID","Invoice Detail Item Name","Invoice Detail Balance","Invoice Detail Qty","Invoice Detail Item Group"
"12345","Open","3/31/2009 12:00:00 AM","5/26/2009 8:09:02 AM","False","Test","($346.23)","$0.00","$346.23","Smith","John","12344","[Patient]","Test-1234","5""x8"" Multi-Axis Stump Support, Therafin, 31552","$51.12","1","Wheelchair Accessories"

If you look at "5""x8"" Multi-Axis Stump Support, Therafin, 31552" , there are some ( " ) and ( , ) inside this value.

So, How it must be handled in SSIS Flat File Source Connection Manager?


If you set column delimiter as comma ( , ) and text qualifier as double quote ( " )  you will receive this error:

The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.

the error says that you can not have  delimiter values like comma inside a column.

Solution is:

1-change every comma values with another unique delimiter which values haven’t any of this characters inside,like : vertical bar ( | )

2- change column delimiter to this new delimiter , and set text qualifier with double quote ( " )

for the above .csv file, we can use ( | ) as delimiter because there is no instance of this characters inside values

but we can not replace any comma ( , ) with vertical bar ( | ). because there are commas who are inside the values . So I replaced every "," with "|" .

this will replace only column delimiters surrounded by text qualifiers .

the result will be

 "Invoice Number"|"Invoice Status"|"Invoice Date of Service"|"Invoice Date Opened"|"Invoice Hold from printing/submission"|"Invoice Branch"|"Invoice Balances Adjustments"|"Invoice Balances Payments"|"Invoice Balances Balance"|"Patient Last Name"|"Patient First Name"|"Patient ID"|"Policy Payor Name"|"Invoice Detail Item ID"|"Invoice Detail Item Name"|"Invoice Detail Balance"|"Invoice Detail Qty"|"Invoice Detail Item Group"
"12345"|"Open"|"3/31/2009 12:00:00 AM"|"5/26/2009 8:09:02 AM"|"False"|"Test"|"($346.23)"|"$0.00"|"$346.23"|"Smith"|"John"|"12344"|"[Patient]"|"Test-1234"|"5""x8"" Multi-Axis Stump Support, Therafin, 31552"|"$51.12"|"1"|"Wheelchair Accessories"

as you can see there is no delimiter inside values , and now we can set column delimiter as ( | ) ,  and text qualifier as  ( " ) .

everything is OK now.

but for automating the replace process, I added a Script Task before Dataflow Task and added this code in Visual C# 2008 for replacing delimiters:

public void Main()
        {
            System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\Users\Reza\Downloads\test(2).csv");
            string str1=(char)34+","+(char)34;
            string str2=(char)34+"|"+(char)34;
            string strContent = sr.ReadToEnd().Replace(str1, str2);
            sr.Close();
            sr.Dispose();


            System.IO.StreamWriter sw = new System.IO.StreamWriter(@"C:\Users\Reza\Downloads\test(2).csv",false);
            sw.Write(strContent);
            sw.Flush();
            sw.Close();
            sw.Dispose();

            Dts.TaskResult = (int)ScriptResults.Success;
        }

Thats All.

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for eight continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

Leave a Reply

Your email address will not be published. Required fields are marked *