Problem with comma values in Comma Delimited File

Assume .csv file which columns delimited by comma ( , ) . and Texts delimited by Double Quote ( " ) like below:

"Invoice Number","Invoice Status","Invoice Date of Service","Invoice Date Opened","Invoice Hold from printing/submission","Invoice Branch","Invoice Balances Adjustments","Invoice Balances Payments","Invoice Balances Balance","Patient Last Name","Patient First Name","Patient ID","Policy Payor Name","Invoice Detail Item ID","Invoice Detail Item Name","Invoice Detail Balance","Invoice Detail Qty","Invoice Detail Item Group"
"12345","Open","3/31/2009 12:00:00 AM","5/26/2009 8:09:02 AM","False","Test","($346.23)","$0.00","$346.23","Smith","John","12344","[Patient]","Test-1234","5""x8"" Multi-Axis Stump Support, Therafin, 31552","$51.12","1","Wheelchair Accessories"

If you look at "5""x8"" Multi-Axis Stump Support, Therafin, 31552" , there are some ( " ) and ( , ) inside this value.

So, How it must be handled in SSIS Flat File Source Connection Manager?


If you set column delimiter as comma ( , ) and text qualifier as double quote ( " )  you will receive this error:

The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.

the error says that you can not have  delimiter values like comma inside a column.

Solution is:

1-change every comma values with another unique delimiter which values haven’t any of this characters inside,like : vertical bar ( | )

2- change column delimiter to this new delimiter , and set text qualifier with double quote ( " )

for the above .csv file, we can use ( | ) as delimiter because there is no instance of this characters inside values

but we can not replace any comma ( , ) with vertical bar ( | ). because there are commas who are inside the values . So I replaced every "," with "|" .

this will replace only column delimiters surrounded by text qualifiers .

the result will be

 "Invoice Number"|"Invoice Status"|"Invoice Date of Service"|"Invoice Date Opened"|"Invoice Hold from printing/submission"|"Invoice Branch"|"Invoice Balances Adjustments"|"Invoice Balances Payments"|"Invoice Balances Balance"|"Patient Last Name"|"Patient First Name"|"Patient ID"|"Policy Payor Name"|"Invoice Detail Item ID"|"Invoice Detail Item Name"|"Invoice Detail Balance"|"Invoice Detail Qty"|"Invoice Detail Item Group"
"12345"|"Open"|"3/31/2009 12:00:00 AM"|"5/26/2009 8:09:02 AM"|"False"|"Test"|"($346.23)"|"$0.00"|"$346.23"|"Smith"|"John"|"12344"|"[Patient]"|"Test-1234"|"5""x8"" Multi-Axis Stump Support, Therafin, 31552"|"$51.12"|"1"|"Wheelchair Accessories"

as you can see there is no delimiter inside values , and now we can set column delimiter as ( | ) ,  and text qualifier as  ( " ) .

everything is OK now.

but for automating the replace process, I added a Script Task before Dataflow Task and added this code in Visual C# 2008 for replacing delimiters:

public void Main()
        {
            System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\Users\Reza\Downloads\test(2).csv");
            string str1=(char)34+","+(char)34;
            string str2=(char)34+"|"+(char)34;
            string strContent = sr.ReadToEnd().Replace(str1, str2);
            sr.Close();
            sr.Dispose();


            System.IO.StreamWriter sw = new System.IO.StreamWriter(@"C:\Users\Reza\Downloads\test(2).csv",false);
            sw.Write(strContent);
            sw.Flush();
            sw.Close();
            sw.Dispose();

            Dts.TaskResult = (int)ScriptResults.Success;
        }

Thats All.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

Leave a Reply