Thursday, February 18, 2010

Validate xml against schema files using XLINQ

In many applications mainly integration projects you might have used XML. I have seen XML is used for data transfer which makes the integration very flexible but there are some cautions that need to be taken care.

Some of the integrations are very complex. You can expect the data being sent may not be on the format which we are expecting. So its always good idea to validate whatever we receive from other end before we do anything with it. That is one of the best practices and will avoid lots of problems.


Normally the schema of the xml is shared between two parties and the xml will be generated according to the schema. Most often I have seen developers create the xml and start building the logic around it. But what if the other party sends something wrong? This will actually break your program and you will not be able to tell other party that the parameter supplied is wrong.


Let’s see how we can achieve this easily.

I am going to create a sample schema file. Follow below steps on your visual studio
File - > New -> File. Select XML schema

Here is my schema

<?xml version="1.0" encoding="utf-16"?>
<xs:schema xmlns:b="http://schemas.microsoft.com/BizTalk/2003" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Document">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="1" maxOccurs="10" name="DocumentIndex">
          <xs:complexType>
            <xs:attribute name="ID" type="xs:int" />
            <xs:attribute name="Value" type="xs:string" />
          </xs:complexType>
        </xs:element>
        <xs:element name="Attachment">
          <xs:complexType>
            <xs:attribute name="Type" type="xs:string" />
            <xs:attribute name="Size" type="xs:string" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="DocumentID" type="xs:int" />
    </xs:complexType>
  </xs:element>
</xs:schema>


According to this schema my XML should be as below


<Document DocumentID="12345>
<DocumentIndex ID="1" Value="03457911" />
<DocumentIndex ID="2" Value="03457911" />
<Attachment Type="Doc" Size="4567" />
</Document>


We need to validate this XML against our schema.. Here is how we do it.



XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add("", " XMLSchema.xsd");

XElement objElement = XElement.Load("testXMl.xml");
XDocument objXDoc = new XDocument(objElement);

bool errors = false;
objXDoc.Validate(schemas, (o, y) =>
{
MessageBox.Show(y.Message);
errors = true;
});

MessageBox.Show("validation error status " + errors.ToString());



In above code, if there is any problem with XML, it will show the error else it will just show validatation error status – false. The XML is valid only if the errors has the value false after our execution. If there is any problem with validation, you can simply return the error message on your program and reject the request.

Let us put our xml on testXMl.xml



<Document DocumentID="12345">
<DocumentIndex ID="1" Value="03457911" />
<DocumentIndex ID="2" Value="03457911" />
<Attachment Type="Doc" Size="1234" />
</Document>


So it validated and the output shows that there is no error. Now lets say what if we don’t supply Attachment element?


<Document DocumentID="12345">
<DocumentIndex ID="1" Value="03457911" />
<DocumentIndex ID="2" Value="03457911" />
</Document>


The error message that shows “The element 'Document' has incomplete content. List of possible elements expected: 'DocumentIndex, Attachment'.” According to our schema we should have at least one Attachment element in our xml.

What if the xml contains all the attributes and elements and still has some problem? Consider below xml


<Document DocumentID="12345">
<DocumentIndex ID="1a" Value="03457911" />
<DocumentIndex ID="2" Value="03457911" />
<Attachment Type="Doc" Size="1234" />
</Document>



Now when you run the program again, It will pop up different error
The 'ID' attribute is invalid - The value '1a' is invalid according to its datatype 'http://www.w3.org/2001/XMLSchema:int' - The string '1a' is not a valid Int32 value.
We are expecting documentIndex ID as int but the XML has invalid type.


When you are working with large size of xml, its hard to validate each element or attribute in your program. Validating the XML against the schema defined makes it very easy. If you change your schema later on, you don’t need to change anything in your code to validate your program.


So it is very easy to validate xml against schema and make sure you are actually working with the valid input. This would definitely avoid lots of errors due to the datatype mismatch or the requirement data missing on XML.

No comments:

Post a Comment