I decided to explain XML Task for a while. This task has many abilities to work with XML.
to use this Task, create a SSIS package , add a XML task ,
double click on it, Xml Task Editor window will open.
the most important property of XML task is : OperationType .
there are 6 OperationTypes in SSIS 2008:
-Validate
-XSLT
-XPATH
-Merge
-Diff
-Patch
there operation types has great abilities on xml files. in this post I will talk about Validate operation type…
Validate operation types will get the ability of validating a xml file with two types: XSD or DTD.
If you are currently familiar with xml , you may know XSD files probably. XSD ( Xml Schema Definition ) is a Xml based language and used to define or describe Xml data.
let’s see it with an example,
I used msdn books.xml sample ,
this is books.xml file:
<?xml version="1.0"?>
<x:books xmlns:x="urn:books">
<book id="bk001">
<author>Writer</author>
<title>The First Book</title>
<genre>Fiction</genre>
<price>44.95</price>
<pub_date>2000-10-01</pub_date>
<review>An amazing story of nothing.</review>
</book>
<book id="bk002">
<author>Poet</author>
<title>The Poet’s First Poem</title>
<genre>Poem</genre>
<price>24.95</price>
<review>Least poetic poems.</review>
</book>
</x:books>
let’s try to validate this xml file , first assume this .xsd file(books.xsd):
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="urn:books"
xmlns:bks="urn:books">
<xsd:element name="books" type="bks:BooksForm"/>
<xsd:complexType name="BooksForm">
<xsd:sequence>
<xsd:element name="book"
type="bks:BookForm"
minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="BookForm">
<xsd:sequence>
<xsd:element name="author" type="xsd:string"/>
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="genre" type="xsd:string"/>
<xsd:element name="price" type="xsd:float" />
<xsd:element name="pub_date" type="xsd:date" />
<xsd:element name="review" type="xsd:string"/>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:string"/>
</xsd:complexType>
</xsd:schema>
As the XSD file declared, the xml file should contain( 0 or more ) complex elements of book, which should contains these sub elements:
author, title, genre, price, pub_date, review .
If you have difficulties to understand the .xsd structure above, read tutorials here about XSD:
http://www.w3schools.com/schema/default.asp
Result of this validation should be false, because second book element in the books.xml hasn’t pub_date element inside.
Now we set up the XML Task,
1) set OperationType as Validate.
2) set SourceType as FileConnection , and choose books.xml file for Source.
3) set ValidationType as XSD.
4) set secondOperatorType as FileConnection , and choose books.xsd file for SecondOperand. ( Note that if you want to use inline .xsd , just set SecondOperandType as Direct input and then put xsd script in SecondOperand )
6) now you should check the result of validation. create a variable, let’s name it xmlResult , set datatype as object.
7) set SaveOperationResult as True, set DestinationType as Variable, and destination as User::xmlResult.
this is the screenshot of whole above properties:
then add a Script task, and write result of variable to messagebox with this code:
MessageBox.Show(Dts.Variables["User::xmlResult"].Value.ToString());
That’s all. now everytime you run the package you will get the result FALSE.( this is because of second book element which hasn’t any pub_date sub element)
Another ValiationType is DTD ( Document Type Definition) , the DTD is also structural language to define form of xml document.
for out example assume this DTD:
<!DOCTYPE books
[
<!ELEMENT books (book+)>
<!ELEMENT book (author,title,genre,price,review)>
<!ATTLIST book id CDATA #REQUIRED >
<!ELEMENT author (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT genre (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT review (#PCDATA)>
]>
Note that I removed pub_date element this time in DTD.
If you want more information about the structure of DTD , go here:
and this is combination of XML and DTD , ( booksWithDTD.xml) :
<?xml version="1.0"?>
<!DOCTYPE books
[
<!ELEMENT books (book+)>
<!ELEMENT book (author,title,genre,price,review)>
<!ATTLIST book id CDATA #REQUIRED >
<!ELEMENT author (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT genre (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT review (#PCDATA)>
]>
<books>
<book id="bk001">
<author>Writer</author>
<title>The First Book</title>
<genre>Fiction</genre>
<price>44.95</price>
<review>An amazing story of nothing.</review>
</book>
<book id="bk002">
<author>Poet</author>
<title>The Poet’s First Poem</title>
<genre>Poem</genre>
<price>24.95</price>
<review>Least poetic poems.</review>
</book>
</books>
Note that in XML Task you can use only InLine DTD, because there is no place to set source .dtd file when you set ValicationType as DTD, for this reason I combined xml with dtd in one file.
Now change the ValidationType as DTD, and other properties are same as XSD format.
and the result this time is TRUE, because the XML data meets the DTD requirements.
this is screenshot of DTD setting in XML Task:
This was simple example of use XML Task to Validate xml data, I will explain about other operation types in Xml Task in the future posts.
Reza.