XML Task – Validating Data

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube

I decided to explain XML Task for a while. This task has many abilities to work with XML.
to use this Task, create a SSIS package , add a XML task ,
double click on it, Xml Task Editor window will open.
the most important property of XML task is : OperationType .
there are 6 OperationTypes in SSIS 2008:
-Validate
-XSLT
-XPATH
-Merge
-Diff
-Patch
there operation types has great abilities on xml files. in this post I will talk about Validate operation type…

Validate operation types will get the ability of validating a xml file with two types: XSD or DTD.
If you are currently familiar with xml , you may know XSD files probably. XSD ( Xml Schema Definition ) is a Xml based language and used to define or describe Xml data.
let’s see it with an example,
I used msdn books.xml sample ,
this is books.xml file:


<?xml version="1.0"?>
<x:books xmlns:x="urn:books">
   <book id="bk001">
      <author>Writer</author>
      <title>The First Book</title>
      <genre>Fiction</genre>
      <price>44.95</price>
      <pub_date>2000-10-01</pub_date>
      <review>An amazing story of nothing.</review>
   </book>

   <book id="bk002">
      <author>Poet</author>
      <title>The Poet’s First Poem</title>
      <genre>Poem</genre>
      <price>24.95</price>
      <review>Least poetic poems.</review>
   </book>
</x:books>


let’s try to validate this xml file , first assume this .xsd file(books.xsd):


<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            targetNamespace="urn:books"
            xmlns:bks="urn:books">

  <xsd:element name="books" type="bks:BooksForm"/>

  <xsd:complexType name="BooksForm">
    <xsd:sequence>
      <xsd:element name="book"
                  type="bks:BookForm"
                  minOccurs="0"
                  maxOccurs="unbounded"/>
      </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="BookForm">
    <xsd:sequence>
      <xsd:element name="author"   type="xsd:string"/>
      <xsd:element name="title"    type="xsd:string"/>
      <xsd:element name="genre"    type="xsd:string"/>
      <xsd:element name="price"    type="xsd:float" />
      <xsd:element name="pub_date" type="xsd:date" />
      <xsd:element name="review"   type="xsd:string"/>
    </xsd:sequence>
    <xsd:attribute name="id"   type="xsd:string"/>
  </xsd:complexType>
</xsd:schema>


As the XSD file declared, the xml file should contain( 0 or more ) complex elements of book, which should contains these sub elements:
author, title, genre, price, pub_date, review .

If you have difficulties to understand the .xsd structure above, read tutorials here about XSD:

http://www.w3schools.com/schema/default.asp


Result of this validation should be false, because second book element in the books.xml hasn’t pub_date element inside.

Now we set up the XML Task,
1) set OperationType as Validate.
2) set SourceType as FileConnection , and choose books.xml file for Source.
3) set ValidationType as XSD.
4) set secondOperatorType as FileConnection , and choose books.xsd file for SecondOperand. ( Note that if you want to use inline .xsd , just set SecondOperandType as Direct input and then put xsd script in SecondOperand )
6) now you should check the result of validation. create a variable, let’s name it xmlResult , set datatype as object.
7) set SaveOperationResult as True, set DestinationType as Variable, and destination as User::xmlResult.
this is the screenshot of whole above properties:

then add a Script task, and write result of variable to messagebox with this code:
MessageBox.Show(Dts.Variables["User::xmlResult"].Value.ToString());

That’s all. now everytime you run the package you will get the result FALSE.( this is because of second book element which hasn’t any pub_date sub element)

Another ValiationType is DTD ( Document Type Definition) , the DTD is also structural language to define form of xml document.
for out example assume this DTD:


<!DOCTYPE books
[
<!ELEMENT books (book+)>

    <!ELEMENT book (author,title,genre,price,review)>
    <!ATTLIST book  id CDATA  #REQUIRED >

    <!ELEMENT author (#PCDATA)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT genre (#PCDATA)>
    <!ELEMENT price (#PCDATA)>
    <!ELEMENT review (#PCDATA)>
]>


Note that I removed pub_date element this time in DTD.

If you want more information about the structure of DTD , go here:

http://www.xmlfiles.com/dtd/


and this is combination of XML and DTD , ( booksWithDTD.xml) :


<?xml version="1.0"?>
<!DOCTYPE books
[
<!ELEMENT books (book+)>

    <!ELEMENT book (author,title,genre,price,review)>
    <!ATTLIST book  id CDATA  #REQUIRED >

    <!ELEMENT author (#PCDATA)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT genre (#PCDATA)>
    <!ELEMENT price (#PCDATA)>
    <!ELEMENT review (#PCDATA)>
]>
<books>
    <book id="bk001">
        <author>Writer</author>
        <title>The First Book</title>
        <genre>Fiction</genre>
        <price>44.95</price>
        <review>An amazing story of nothing.</review>
    </book>

    <book id="bk002">
        <author>Poet</author>
        <title>The Poet’s First Poem</title>
        <genre>Poem</genre>
        <price>24.95</price>
        <review>Least poetic poems.</review>
    </book>
</books>

Note that in XML Task you can use only InLine DTD, because there is no place to set source .dtd file when you set ValicationType as DTD, for this reason I combined xml with dtd in one file.
Now change the ValidationType as DTD, and other properties are same as XSD format.
and the result this time is TRUE, because the XML data meets the DTD requirements.
this is screenshot of DTD setting in XML Task:

This was simple example of use XML Task to Validate xml data, I will explain about other operation types in Xml Task in the future posts.


Reza.

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for nine continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

Leave a Reply