Converting XML to JSON with XSL – Part 3

In this final installment of converting XML to JSON with XSL I will move outside of the ‘normal’ JSON conventions in order to provide data typing of returned values as well as flattening your resulting JSON for greater efficiency. Although naming conventions like BadgerFish provide for backwards conversion to XML it is often the case that this is not required.  Conventions like BadgerFish also make the assumption that the source of the JSON may ‘not’ be safe and therefore all entities and corresponding values are encased as strings.

However, in the majority of cases the source of your XML is from within your own system and in those cases it is safe to assume that the resulting objects are safe and can be typed to a certain degree.

Data Typing your JSON

Typing your result set offers numerous advantages in your client side code. In this section I will discuss how to do some basic typing of your XML to JSON values.  It would be fairly easy to expand upon the number of types but for starters we will stick with the basics, numbers, booleans, dates and JavaScript functions.

Let’s assume the following XML structure:

<root>
   <aDateTime>2009-02-01T13:15:00.000</aDateTime>
   <aDate>2008-02-01</aDate>
   <aNumber>2458.23</aNumber>
   <aBooleanAsANumber>1</aBooleanAsANumber>
   <aBooleanAsText>true</aBooleanAsTest>
   <aString>Just another string of information</aString>
   <aString>Just another string of information part 2</aString>
   <aFunction>function(){alert(‘hello, I am a method!’)</aFunction>
   <someOtherData myDate=”2007-04-19″>blurb</someOtherData>
</root> 

 In part 2 of this series I discussed how to do character escaping within an iterative template entitled “replace”.  This template returned the escaped content and placed it in a result variable entitled $cleanData. In order to implement datatyping of the content we need to pass this value to the data typing template. But in order to do this correctly we need some way of identifying the value that is being passed to the data type it is to be converted to.  Again, we will turn to the configuration file to provide the ability:

<pointers>
  <pointer type=”datetime” match=”exact”>/root/aDateTime</pointer>
  <pointer type=”date” match=”exact”>/root/aDate</pointer>
  <pointer type=”number” match=”exact”>/root/aNumber</pointer>
  <pointer type=”boolean” match=”exact”>/root/aBooleanAsNumber</pointer>
  <pointer type=”boolean” match=”exact”>/root/aBooleanAsTest</pointer>
  <pointer type=”string” match=”any”>aString</pointer>
  <pointer type=”native” match=”exact”>/root/aFunction</pointer>
</pointers> 

The above pointers element within the config.xml file contains pointer nodes that define the data type to convert to, the type of match to look for and the string/xpath to match on.  Based on this information we therefore need to track the full xpath location of the current element as we enumerate through the xml data.  This is done fairly easily by iteratively passing the name of the current element to the same template with the name() function and appending on a / for the path.  This value is then passed to the data typing template along with the value for processing. In the below example I will provide a full example of the workflow of the template in order to fully express the logic.

<!– initial template –>
<xsl:template match=”/”>
   <xsl:variable name=”initial_JSON”>
      <xsl:apply-templates select=”current()/child::*” mode=”build”>
         <xsl:with-param name”path”>/</xsl:with-param>
      </xsl:apply-templates>
   </xsl:variable>
</xsl:template><!– iterative template to build JSON –>
<xsl: template match=”*” mode=”build”>
  
   <!– location path variable –>
   <xsl:param name=”path”/>
   <!– name of current node –>
   <xsl:variable name=”nName” select=”name()”> 
  
   <!– count of preceding and following nodes with the same name as the current node –>
   <xsl:variable name=”iPreceding” select=”count(preceding-sibling::*[name()=#nName])”/>
   <xsl:variable name=”iFollowing” select=”count(following-sibling::*[name()=#nName])”/>
   <xsl:choose>
      <xsl:when test=”$iPreceding = 0 and $iFollowing = 0″>
         <xsl:variable name=”properties”>
            <xsl:apply-templates select=”@” mode=”properties”>
               <xsl:with-param name=”path” select=”concat($path,name(),’/’)”/>
            </xsl:apply-templates>
            <xsl:if test=”count(@*) != 0 and string-length(text()) !=0″>,</xsl:if>
            <xsl:apply-templates select=”*” mode=”elements”>
               <xsl:with-param name=”path” select=”concat($path,name(),’/’)”/>
            </xsl:apply-templates>
         </xsl:variable>
         <xsl:value-of select=”concat($encaseObject, $nName, $encaseObject, $cln, ‘{’, $properties)”/>
            <xsl:if test=”child::*”>
               <xsl:if test=”string-length($properties) != 0″>,</xsl:if>
               <xsl:apply-templates select=”current()/*” mode=”build”>
                  <xsl:with-param name=”path” select=”concat($path,name(),’/’)”/>
               </xsl:apply-templates>
            </xsl:if>
         <xsl:text>}</xsl:text>
         <xsl:if test=”following-sibling::*”>,</xsl:if>
      </xsl:when>
      <xsl:when test=”$iPreceding = 0 and $iFollowing &gt; 0″>
         <xsl:value-of select=”concat($encaseObject, $nName, $encaseObject, $cln, ‘[‘)”/>
         <xsl:for-each select=”../*[name() = $nName]“>
             <xsl:variable name=”properties”>
               <xsl:apply-templates select=”@” mode=”properties”>
                  <xsl:with-param name=”path” select=”concat($path,name(),’/’)”/>
               </xsl:apply-templates>
               <xsl:if test=”count(@*) != 0″ and string-length(text()) !=0″>,</xsl:if>
               <xsl:apply-templates select=”*” mode=”elements”>
                  <xsl:with-param name=”path” select=”concat($path,name(),’/’)”/>
               </xsl:apply-templates>
             </xsl:template>
             <xsl:value-of select=”concat($encaseObject, $nName, $encaseObject, $cln, ‘{’, $properties)”/>
             <xsl:if test=”child::*”>
               <xsl:if test=”string-length($properties) != 0″>,</xsl:if>
               <xsl:apply-templates select=”current()/*” mode=”build”>
                  <xsl:with-param name=”path” select=”concat($path,name(),’/’)”/>
               </xsl:apply-templates>
            </xsl:if>
            <xsl:text>}</xsl:text>
            <xsl:if test=”position() != last()”>,</xsl:if>
         </xsl:for-each>
         <xsl:text>]</xsl:text>
         <xsl:if test=”following-sibling::*”>,</xsl:if>
      </xsl:when>
   </xsl:choose>
</xsl:template>
<xsl:template match=”@*” mode=”attributes”>
   <xsl:param name=”path”/>
   <xsl:variable name=”cleaned”>  
      <xsl:call-template name=”
process-string-content“>
         <xsl:with-param name=”value” select=”
normalize-space(text())“/>
         <xsl:with-param name=”path” select=”concat($path,’@’,name())”/>
      </xsl:call-template>
   </xsl:variable>
   <xsl:value-of select=”concat($encaseObject, name(), $encaseObject,$cln, $encaseString, $cleaned, $encaseString)
   <xsl:if test=”position() != last()”>,</xsl:if>
</xsl:template>
<xsl:template match=”*” mode=”elements”>
   <xsl:param name=”path”/>

   <xsl:variable name=”cleaned”>  
      <xsl:call-template name=”
process-string-content“>
         <xsl:with-param name=”value” select=”
normalize-space(text())“/>
         <xsl:with-param name=”path” select=”$path”/>
      </xsl:call-template>
   </xsl:variable>

   <xsl:value-of select=”concat($encaseObject, $txtPrefix, $txtSuffix,$encaseObject, $cln, $finalData, $cleaned, $encaseString)
</xsl:template>

As can be seen above in each instance where we either call the build, elements or attributes template we are passing the current path as a concatenated string.  Notice that within the attributes template that we add in a @ value to indicate an attribute. Once this has been accomplished the actual data type template is a fairly simple matter to code.  Notice in the below template that I automatically derive a number when the numbe() method returns True within XSL.

<xsl:template name=”datatype”>
   <xsl:param name=”valueToProcess”/>
   <xsl:param name=”path”/>
   <xsl:variable name=”value”>
      <xsl:call-template name=”support-escape-characters”>
         <xsl:with-param name=”cleanIt” select=”$valueToProcess”/>
      </xsl:call-template>
   </xsl:variable>
   <xsl:variable name=”nPtrs” select=”$config/xmltojson/pointers”/>
   <!— strip trailing / on path for check for match –>
   <xsl:variable name=”pathStrip” select=”substring($path,1,string-length($path) -1)”/>
   <!– first lets define the datatype to code to, notice we do an inherinet number conversion outside of pointer nodes –>
   <xsl:variable name=”datatype”>
      <xsl:choose>
         <xsl:when test=”$nPtrs/pointer[text()=$pathStrip and @match=’exact’]”>
            <xsl:value-of select=”$nPtrs/pointer[text()=$pathStrip and @match=’exact’]/@type”/> 
         </xsl:when>
         <xsl:when test=”$nPtrs/pointer[contains($pathStrip,text()) and @match=’any’]”>
            <xsl:value-of select=”$nPtrs/pointer[contains($pathStrip,text()) and @match=’any’]/@type”/> 
         </xsl:when>
         <xsl:when test=”string(number($value))!=’NaN'”> 
           <xsl:text>number</xsl:text>
         </xsl:when>
         <xsl:when test=”translate($value,’true’,’TRUE’)=’TRUE’ or translate($value,’false’,’FALSE’)=’FALSE'”>
            <xsl:text>boolean</xsl:text>
         </xsl:when>
         <xsl:otherwise> 
            <xsl:text>string</xsl:text>
         </xsl:otherwise>
       </xsl:choose>
   </xsl:variable> 
   <xsl:choose>
      <xsl:when test=”$datatype=’native'”>
         <xsl:if test=”string-length($value)=0″>
            <xsl:text>{}</xsl:text>
         </xsl:if>
         <xsl:value-of select=”$value”/>
      </xsl:when>
      <xsl:when test=”$datatype=’number'”>
         <xsl:text>new Number(</xsl:text>
         <xsl:if test=”string-length($value)=0″>
            <xsl:text>0</xsl:text>
          </xsl:if>
          <xsl:value-of select=”$value”/>
          <xsl:text>)</xsl:text>
      </xsl:when>
      <xsl:when test=”$datatype=’boolean'”>
         <xsl:choose>
            <xsl:when test=”translate($value,’TRUE’,’true’)=’true’ or $value=’1′”>
              <xsl:text>new Boolean(True)</xsl:text>
            </xsl:when>
            <xsl:otherwise>
               <xsl:text>new Boolean(False)</xsl:text>
            </xsl:otherwise>
          </xsl:choose>
      </xsl:when>
      <xsl:when test=”$datatype=’date'”>
        <xsl:value-of select=”concat(‘new Date(‘,substring($value,1,4),’,’,number(substring($value,6,2))-1,’,’,substring($value,9,2),’)’)”/>
      </xsl:when>
      <xsl:when test=”$datatype=’datetime'”>
         <xsl:value-of select=”concat(‘new Date(‘,substring($value,1,4),’,’,number(substring($value,6,2))-1,’,’,substring($value,9,2),substring($value,12,2),’,’,substring($value,15,2),’,’,substring($value,18,2),’)’)”/>
      </xsl:when>
      <xsl:when test=”$datatype=’string'”>
         <xsl:value-of select=”$value”/>
      </xsl:when>
   </xsl:choose>
</xsl:template>

As can be seen above the first case statement checks for the correct data type and the second template implements the resulting code.

Flattening that Fat JSON

While it can easily be argues that JSON is a much more efficient means of tranportng asynchronous data to the caller there is still some ‘fatness’ in it that could be trimmed if desired.  Take for example the following XML snippet.

<mydata id=”1″ guid=”1234-ab1234-13443-tyuad-asdfew” name=”A flatter JSON”>
   <volume>500</volume>
   <sell>300</sell>
   <purchaseDate>2007-06-30</purchaseDate>
</mydata>

 In a normal JSON implementation the above XML would be converted under the BadgerFish convention would be converted into a rather FAT JavaScript object. For instance, to get to the sell element value you would have to do MJSON[0].mydata.volume.$.  In code this would appear as:

var MYJSON=
   [{mydata:{id:1,guid:”1234-ab1234-13443-tyuad-asdfew”,name:”A flatter JSON”,volume:{$:500}, sell:{$:300}, purchaseDate:{$:new Date(2007,05,30)}
   }]

Noticed that I have Data Typed the above example and moved even further from the BadgerFish convention by not enclosing object names and values in “.  The idea of flattening your JSON is to streamline the data coming down the pipe.  It does not inherently allow for reverse engineering back to XML but along with Data typing provides for a much more accurate and streamlined resulting JavaScript Object.  In essence we want to drop the $ for text elements or other child elements who’s name does NOT conflict with any parent element or attribute.

In order to flatten an node it must meet some basic criteria:

  • The node must not contain any attributes 
  • The node must not contain any child nodes

If the current node meets each of the above three criteria then it can be safely moved into the parent JSON object without fear of conflict.

The difficult part in implementing this logic is recognizing that there are child nodes that fit the category and to incorporate them into the current object.  The result is a fair amount of impact on the majority of templates to ensure valid JSON.  I will not however go into great detail on the actual XSL implementation for flattening the JSON as it would take far longer than all of the other posts combined. But I will go over some of the switches supplied within the CONFIG.XML file and how they impact on the resulting JSON along with some examples.  Following is a basic summary of the CONFIG options the transformation supports: 

  • Encase For Array [boolean string true|false]
    Location: xmltojson/options/encaseforarray
    XSL Param Name: $encaseForArray
    Wraps the resulting JSON output in [ ] when true.
  • Drop Root [boolean string true|false]
    Location: xmltojson/options/dropRoot
    XSL Param Name: $dropRoot
    Drops the root XML element for the resulting JSON when true.
  • Flatten Simple Elements [boolean string true|false]
    Location: xmltojson/options/flattenSimpleElements and xmltojson/options/flattenSimpleCollectionsToArrays
    XSL Param Names: $flattenSimpleElements and $flattenSimpleCollectionsToArrays
    Forces simple child elements to be moved into the parent object as direct properties with no BadgerFish convention for element values.
  • Append to Element Name to For Uniqueness [string]
    Location: xmltojson/options/elementAppendForUnique
    XSL Param Name: $elementAppendForUnique
    When flattening JSON child node names can conflict with parent attribute names.  This string is appending to the child element name in order to ensure uniqueness. Works for both singletons and array constructs.

Of the 4 above CONFIG switches both Encase for Array and DropRoot are fairly obvious in nature and result and therefore I will not provide examples of.  The other two switches, specially Flatten Simple Elements, can be fairly difficult to understand with regards to their impacts.  As a result the following code block with provide several examples of JSON with the switch on or off.

----------------------------------------------------------------------
Example 1: Parent Inherits Child
----------------------------------------------------------------------
Flatten Simple Off:
   [{data:{id:1,someinfo:{$:"This is a simple element"}}}]
Flatten Simple On: 
   [{data:{id:1,someinfo:"This is a simple element"}}]

----------------------------------------------------------------------
Example 2: Parent Inherits Simple Children Only
----------------------------------------------------------------------
<data id="1">
   <someinfo>This is a simple element</someinfo>
   <key stamp="2009-01-01">123435</key>
</data>
Flatten Simple Off:
   [{data:{id:1,someinfo:{$:"This is a simple element"},key:{stamp:"2009-01-01",$:123435}}}]
Flatten Simple On: 
   [{data:{id:1,someinfo:"This is a simple element",key:{stamp:"2009-01-01",$:123435}}}]
----------------------------------------------------------------------
Example 3: Parent Inherits Simple Children but Ignores Collection
----------------------------------------------------------------------
<data id="1">
   <someinfo>This is a simple element</someinfo>
   <key>1</key>
   <key>2</key>
   <key>3</key>
</data>
Flatten Simple Off:
   [{data:{id:1,someinfo:{$:"This is a simple element"},key:[{$:1},{$:2},{$:3}]}}]
Flatten Simple On: 
   [{data:{id:1,someinfo:"This is a simple element",key:[{$:1},{$:2},{$:3}]}}]
Flatten Simple On and Flatten Simple Collections On
   [{data:{id:1,someinfo:"This is a simple element",key:[1,2,3]}}]

 As you can see in the above examples the impacts of turning both Flatten Simple and Flatten Simple Collections on can be fairly dramatic in the size and complexity of your resulting JSON.  There is however a situation that can occur that can cause this to break.  This is when a child node name conflicts with a parent nodes attribute name.  This is the reason for the append to element name for uniqueness parameter setting:

<data id=”1″ key=”abcd”>
   <key>12345</key>
</data>
Append For unique not set:
   [{data:{id:1,key:”abcd”,key:{$:12345}}}]
Append For unique set to string “myNode”:
   [{data:{id:1,key:”abcd”,keymyNode:{$:12345}}}]

Conclusion

Although far from perfect JSON provides for easy integration into your client side code.  It is still fairly early in its development and has not yet been fully adopted by the W3C.  That being said it still offers some advantages over XML.

This transformation is far from perfect. For instance it does not render attributes are elements with a null, rather it ignores them.  This is the first issue that comes to mind and I am sure there will be others as time goes by. Also the code has changed somewhat since the writing of this series but remains in essence the same.  I have split some of the templates in order to provide greater readibility.

You can download the coad at  http://code.google.com/p/xmltojson

Until next time….
Keith Chadwick

Advertisements

8 responses to “Converting XML to JSON with XSL – Part 3

  1. I really like the clarity of your code and the customization allowed by the configuration flags.

    I still spotted an error in you description of the stylesheet:

    “When dealing with a node there are only two things we have to deal with, properties and content/text. To be clear I am looking at xml as XML not HTML. In other words the following is invalid … Although the above may be valid in the HTML world, in the XML world it is NOT… ”

    This is actually false: xml supports mixed content (What html would allow is, for instance, an open tag with no corresponding closing tag). Therefore you transform deals with data oriented xml, but not with document oriented xml.

  2. Thanks and yes you are correct I do NOT deal with mixed content but more data centric content. The reason for this is that the goal was to convert to JSON which given its structure does not fit well with mixed content. From my point of view mixed content, such as HTML, should be left belonging to a single property within the JSON such as myRichText:’content’ and the caller can deal with it.

  3. I would like to use your stylesheet and found a problem, when flattenSimpleElements and flattenSimpleCollectionsToArrays are true :

    
    	0
    	
    		A
    		B
    	
    	
    		C
    		D
    	
    
    

    becomes

    {“root”:{“simple”:0,”listElt1″:{“item”:[{“A”},{“B”}]},”listElt2″:{“item”:[{“att”:”c”,”C”},{“att”:”d”,”D”}]}}}

    where I’m expecting

    {“root”:{“simple”:0,”listElt1″:{“item”:[“A”,”B”]},”listElt2″:{“item”:[{“att”:”c”,”$”:”C”},{“att”:”d”,”$”:”D”}]}}}

    I’m debugging the stylesheet, but this is not easy. I found a first bug l.162 (withParam isSimpleNode should be isSimple) but this is not enough.

    Any help welcome.

  4. mmmm

    reads
    <root>
    <simple>0</simple>
    <listElt1>
    <item>A</item>
    <item>B</item>
    </listElt1>
    <listElt2>
    <item att=”c”>C</item>
    <item att=”d”>D</item>
    </listElt2>
    </root>

  5. Hi,
    First , tks provinding us your job it’s very usefull
    however , when I use your prStyleSheet it does not work in my case of xml doc because I have attrs mixed with elements.

    tks in advance for your help

  6. Just to communicate on another issue I had to solve in the transform: quotes are escaped by default in config.xml, where json does not specify this for strings, and validators like jsonlint throw an error in this case.

  7. Hi,
    Thank you very much for providing this! I tried a lot of converters and they all had issues. But with your XSL I can finally get good results.

    I found one small issue:
    If I set dropRoot to true, the resulting JSON is empty (“[]”).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s