XML Task – Validating Data

I decided to explain XML Task for a while. This task has many abilities to work with XML.
to use this Task, create a SSIS package , add a XML task ,
double click on it, Xml Task Editor window will open.
the most important property of XML task is : OperationType .
there are 6 OperationTypes in SSIS 2008:
-Validate
-XSLT
-XPATH
-Merge
-Diff
-Patch
there operation types has great abilities on xml files. in this post I will talk about Validate operation type…

Validate operation types will get the ability of validating a xml file with two types: XSD or DTD.
If you are currently familiar with xml , you may know XSD files probably. XSD ( Xml Schema Definition ) is a Xml based language and used to define or describe Xml data.
let’s see it with an example,
I used msdn books.xml sample ,
this is books.xml file:


<?xml version="1.0"?>
<x:books xmlns:x="urn:books">
   <book id="bk001">
      <author>Writer</author>
      <title>The First Book</title>
      <genre>Fiction</genre>
      <price>44.95</price>
      <pub_date>2000-10-01</pub_date>
      <review>An amazing story of nothing.</review>
   </book>

   <book id="bk002">
      <author>Poet</author>
      <title>The Poet’s First Poem</title>
      <genre>Poem</genre>
      <price>24.95</price>
      <review>Least poetic poems.</review>
   </book>
</x:books>


let’s try to validate this xml file , first assume this .xsd file(books.xsd):


<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            targetNamespace="urn:books"
            xmlns:bks="urn:books">

  <xsd:element name="books" type="bks:BooksForm"/>

  <xsd:complexType name="BooksForm">
    <xsd:sequence>
      <xsd:element name="book"
                  type="bks:BookForm"
                  minOccurs="0"
                  maxOccurs="unbounded"/>
      </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="BookForm">
    <xsd:sequence>
      <xsd:element name="author"   type="xsd:string"/>
      <xsd:element name="title"    type="xsd:string"/>
      <xsd:element name="genre"    type="xsd:string"/>
      <xsd:element name="price"    type="xsd:float" />
      <xsd:element name="pub_date" type="xsd:date" />
      <xsd:element name="review"   type="xsd:string"/>
    </xsd:sequence>
    <xsd:attribute name="id"   type="xsd:string"/>
  </xsd:complexType>
</xsd:schema>


As the XSD file declared, the xml file should contain( 0 or more ) complex elements of book, which should contains these sub elements:
author, title, genre, price, pub_date, review .

If you have difficulties to understand the .xsd structure above, read tutorials here about XSD:

http://www.w3schools.com/schema/default.asp


Result of this validation should be false, because second book element in the books.xml hasn’t pub_date element inside.

Now we set up the XML Task,
1) set OperationType as Validate.
2) set SourceType as FileConnection , and choose books.xml file for Source.
3) set ValidationType as XSD.
4) set secondOperatorType as FileConnection , and choose books.xsd file for SecondOperand. ( Note that if you want to use inline .xsd , just set SecondOperandType as Direct input and then put xsd script in SecondOperand )
6) now you should check the result of validation. create a variable, let’s name it xmlResult , set datatype as object.
7) set SaveOperationResult as True, set DestinationType as Variable, and destination as User::xmlResult.
this is the screenshot of whole above properties:

then add a Script task, and write result of variable to messagebox with this code:
MessageBox.Show(Dts.Variables["User::xmlResult"].Value.ToString());

That’s all. now everytime you run the package you will get the result FALSE.( this is because of second book element which hasn’t any pub_date sub element)

Another ValiationType is DTD ( Document Type Definition) , the DTD is also structural language to define form of xml document.
for out example assume this DTD:


<!DOCTYPE books
[
<!ELEMENT books (book+)>

    <!ELEMENT book (author,title,genre,price,review)>
    <!ATTLIST book  id CDATA  #REQUIRED >

    <!ELEMENT author (#PCDATA)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT genre (#PCDATA)>
    <!ELEMENT price (#PCDATA)>
    <!ELEMENT review (#PCDATA)>
]>


Note that I removed pub_date element this time in DTD.

If you want more information about the structure of DTD , go here:

http://www.xmlfiles.com/dtd/


and this is combination of XML and DTD , ( booksWithDTD.xml) :


<?xml version="1.0"?>
<!DOCTYPE books
[
<!ELEMENT books (book+)>

    <!ELEMENT book (author,title,genre,price,review)>
    <!ATTLIST book  id CDATA  #REQUIRED >

    <!ELEMENT author (#PCDATA)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT genre (#PCDATA)>
    <!ELEMENT price (#PCDATA)>
    <!ELEMENT review (#PCDATA)>
]>
<books>
    <book id="bk001">
        <author>Writer</author>
        <title>The First Book</title>
        <genre>Fiction</genre>
        <price>44.95</price>
        <review>An amazing story of nothing.</review>
    </book>

    <book id="bk002">
        <author>Poet</author>
        <title>The Poet’s First Poem</title>
        <genre>Poem</genre>
        <price>24.95</price>
        <review>Least poetic poems.</review>
    </book>
</books>

Note that in XML Task you can use only InLine DTD, because there is no place to set source .dtd file when you set ValicationType as DTD, for this reason I combined xml with dtd in one file.
Now change the ValidationType as DTD, and other properties are same as XSD format.
and the result this time is TRUE, because the XML data meets the DTD requirements.
this is screenshot of DTD setting in XML Task:

This was simple example of use XML Task to Validate xml data, I will explain about other operation types in Xml Task in the future posts.


Reza.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

Leave a Reply