Retrieve Part of XML File

Today I faced a question on changing style of an xml file and fetch desired result as another xml file, 

The XML file has lots of un-necessary tags which should be removed , I tried to do this with C#.NET code.


This is the input xml:

<?xml version="1.0" encoding="utf-16"?>
<DataTable>
  <xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
    <xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:MainDataTable="Table" msdata:UseCurrentLocale="true">
      <xs:complexType>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
          <xs:element name="Table">
            <xs:complexType>
              <xs:sequence>
                <xs:element name="PersonId" type="xs:long" minOccurs="0" />
                <xs:element name="StudentId" type="xs:long" minOccurs="0" />
                <xs:element name="FaAliyasFName" type="xs:string" minOccurs="0" />
                <xs:element name="FaAliyasLName" type="xs:string" minOccurs="0" />
                <xs:element name="BornInCity" type="xs:string" minOccurs="0" />
                <xs:element name="BornInCountry" type="xs:int" minOccurs="0" />
                <xs:element name="BornInProvince" type="xs:int" minOccurs="0" />
                <xs:element name="DateIn" type="xs:dateTime" minOccurs="0" />
                <xs:element name="EnAliyasFName" type="xs:string" minOccurs="0" />
                <xs:element name="EnAliyasLName" type="xs:string" minOccurs="0" />
                <xs:element name="EnFName" type="xs:string" minOccurs="0" />
                <xs:element name="EnLName" type="xs:string" minOccurs="0" />
                <xs:element name="HealthId" type="xs:int" minOccurs="0" />
                <xs:element name="MarriageId" type="xs:int" minOccurs="0" />
                <xs:element name="NoNational" type="xs:string" minOccurs="0" />
                <xs:element name="PassExpire" type="xs:dateTime" minOccurs="0" />
                <xs:element name="PassIssuance" type="xs:dateTime" minOccurs="0" />
                <xs:element name="PassNumber" type="xs:string" minOccurs="0" />
                <xs:element name="PursueCode" type="xs:string" minOccurs="0" />
                <xs:element name="ShIssunce" type="xs:string" minOccurs="0" />
                <xs:element name="ShNo" type="xs:string" minOccurs="0" />
                <xs:element name="Status" type="xs:short" minOccurs="0" />
                <xs:element name="SolicitorshipId" type="xs:int" minOccurs="0" />
                <xs:element name="FaAliyasMName" type="xs:string" minOccurs="0" />
                <xs:element name="EnAliyasMName" type="xs:string" minOccurs="0" />
                <xs:element name="NaAliyasFName" type="xs:string" minOccurs="0" />
                <xs:element name="NaAliyasLName" type="xs:string" minOccurs="0" />
                <xs:element name="NaAliyasMName" type="xs:string" minOccurs="0" />
                <xs:element name="EnMName" type="xs:string" minOccurs="0" />
                <xs:element name="NaFName" type="xs:string" minOccurs="0" />
                <xs:element name="NaLName" type="xs:string" minOccurs="0" />
                <xs:element name="NaMName" type="xs:string" minOccurs="0" />
                <xs:element name="FName" type="xs:string" minOccurs="0" />
                <xs:element name="LName" type="xs:string" minOccurs="0" />
                <xs:element name="MName" type="xs:string" minOccurs="0" />
                <xs:element name="NationId" type="xs:int" minOccurs="0" />
                <xs:element name="GHozeId" type="xs:int" minOccurs="0" />
                <xs:element name="GUniId" type="xs:int" minOccurs="0" />
                <xs:element name="Expr1" type="xs:long" minOccurs="0" />
                <xs:element name="FaMarriage" type="xs:string" minOccurs="0" />
                <xs:element name="EnMarriage" type="xs:string" minOccurs="0" />
                <xs:element name="ArMarriage" type="xs:string" minOccurs="0" />
                <xs:element name="ArNation" type="xs:string" minOccurs="0" />
                <xs:element name="EnNation" type="xs:string" minOccurs="0" />
                <xs:element name="FaNation" type="xs:string" minOccurs="0" />
                <xs:element name="FaGender" type="xs:string" minOccurs="0" />
                <xs:element name="EnGender" type="xs:string" minOccurs="0" />
                <xs:element name="ArGender" type="xs:string" minOccurs="0" />
                <xs:element name="Is_Sadat" type="xs:boolean" minOccurs="0" />
                <xs:element name="BirthDate" type="xs:dateTime" minOccurs="0" />
                <xs:element name="Code" type="xs:string" minOccurs="0" />
                <xs:element name="Accept_Date" type="xs:dateTime" minOccurs="0" />
                <xs:element name="GenderId" type="xs:int" minOccurs="0" />
                <xs:element name="ReligionId" type="xs:int" minOccurs="0" />
                <xs:element name="FatherName" type="xs:string" minOccurs="0" />
              </xs:sequence>
            </xs:complexType>
          </xs:element>
        </xs:choice>
      </xs:complexType>
    </xs:element>
  </xs:schema>
  <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
    <DocumentElement>
      <Table diffgr:id="Table1" msdata:rowOrder="0">
        <PersonId>1</PersonId>
        <StudentId>1</StudentId>
        <FaAliyasFName>اکبر </FaAliyasFName>
        <EnFName>MAGERRAM</EnFName>
        <EnLName>KAZEM OV</EnLName>
        <MarriageId>1</MarriageId>
        <FName>محرم</FName>
        <LName>كاظم</LName>
        <NationId>111</NationId>
        <Expr1>1</Expr1>
        <FaMarriage>نامشخص</FaMarriage>
        <EnMarriage>نامشخص</EnMarriage>
        <ArMarriage>نامشخص</ArMarriage>
        <FaNation>ایران</FaNation>
        <FaGender>مرد</FaGender>
        <EnGender>مرد</EnGender>
        <ArGender>مرد</ArGender>
        <Is_Sadat>false</Is_Sadat>
        <BirthDate>1976-03-21T00:00:00-07:00</BirthDate>
        <Code>1111</Code>
        <GenderId>2</GenderId>
        <ReligionId>6</ReligionId>
      </Table>
      <Table diffgr:id="Table2" msdata:rowOrder="1">
        <PersonId>2</PersonId>
        <StudentId>2</StudentId>
        <EnFName>tahsin</EnFName>
        <EnLName>tahmaz ov</EnLName>
        <MarriageId>1</MarriageId>
        <FName>تحسين</FName>
        <LName>تحمازاف</LName>
        <NationId>111</NationId>
        <Expr1>2</Expr1>
        <FaMarriage>نامشخص</FaMarriage>
        <EnMarriage>نامشخص</EnMarriage>
        <ArMarriage>نامشخص</ArMarriage>
        <FaNation>ایران</FaNation>
        <FaGender>مرد</FaGender>
        <EnGender>مرد</EnGender>
        <ArGender>مرد</ArGender>
        <Is_Sadat>false</Is_Sadat>
        <BirthDate>1978-03-21T00:00:00-07:00</BirthDate>
        <Code>1112</Code>
        <GenderId>2</GenderId>
        <ReligionId>6</ReligionId>
      </Table>  
    </DocumentElement>
  </diffgr:diffgram>
</DataTable>

If you try to use this xml directly in XML Source in data flow task you will face errors, so you should remove tags like diffgr diffgram from this xml and fetch only valuable data from it.

I used this C# Script to change the xml input:

System.Xml.XmlDocument xdoc = new System.Xml.XmlDocument();
            System.Xml.XmlDocument xdocResult = new System.Xml.XmlDocument();
            xdoc.Load(@"D:\SSIS\New folder\GetStudentByCode.xml");
            System.Xml.XmlNode ParentNew = xdocResult.CreateNode(System.Xml.XmlNodeType.Element, "Data", string.Empty);

            foreach (System.Xml.XmlNode xnode in xdoc.GetElementsByTagName("DocumentElement"))
            {
                foreach (System.Xml.XmlNode xchild in xnode.ChildNodes)
                {
                    xchild.Attributes.RemoveAll();
                    System.Xml.XmlNode importNode = ParentNew.OwnerDocument.ImportNode(xchild, true);
                    ParentNew.AppendChild(importNode);
                }
            }
            xdocResult.AppendChild(ParentNew);
            xdocResult.Save(@"D:\SSIS\New folder\GetStudentByCode_Converted.xml");

result of applying this code will be :

<Data>
  <Table>
    <PersonId>1</PersonId>
    <StudentId>1</StudentId>
    <FaAliyasFName>اکبر </FaAliyasFName>
    <EnFName>MAGERRAM</EnFName>
    <EnLName>KAZEM OV</EnLName>
    <MarriageId>1</MarriageId>
    <FName>محرم</FName>
    <LName>كاظم</LName>
    <NationId>111</NationId>
    <Expr1>1</Expr1>
    <FaMarriage>نامشخص</FaMarriage>
    <EnMarriage>نامشخص</EnMarriage>
    <ArMarriage>نامشخص</ArMarriage>
    <FaNation>ایران</FaNation>
    <FaGender>مرد</FaGender>
    <EnGender>مرد</EnGender>
    <ArGender>مرد</ArGender>
    <Is_Sadat>false</Is_Sadat>
    <BirthDate>1976-03-21T00:00:00-07:00</BirthDate>
    <Code>1111</Code>
    <GenderId>2</GenderId>
    <ReligionId>6</ReligionId>
  </Table>
  <Table>
    <PersonId>2</PersonId>
    <StudentId>2</StudentId>
    <EnFName>tahsin</EnFName>
    <EnLName>tahmaz ov</EnLName>
    <MarriageId>1</MarriageId>
    <FName>تحسين</FName>
    <LName>تحمازاف</LName>
    <NationId>111</NationId>
    <Expr1>2</Expr1>
    <FaMarriage>نامشخص</FaMarriage>
    <EnMarriage>نامشخص</EnMarriage>
    <ArMarriage>نامشخص</ArMarriage>
    <FaNation>ایران</FaNation>
    <FaGender>مرد</FaGender>
    <EnGender>مرد</EnGender>
    <ArGender>مرد</ArGender>
    <Is_Sadat>false</Is_Sadat>
    <BirthDate>1978-03-21T00:00:00-07:00</BirthDate>
    <Code>1112</Code>
    <GenderId>2</GenderId>
    <ReligionId>6</ReligionId>
  </Table>
</Data>

This is sample of using .NET System.XML Classes to flatten and making easy a complicated xml file. now you can use this xml result in an XML Source simply.


Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

Leave a Reply