Converting XML to JSON with XSL – Part 2

In this installment of converting XML to JSON with XSL I will discuss escaping your resulting JSON plus the introduction of a config.xml file to drive your conversion.  The config file becomes a necessity in order to do datatyping and character escaping. 

Source XML

In my previous post we have a fairly simple XML document to convert.  For this example we will use the same XML but create 2 book elements.  In our JSON this would result in a array of books.

<root>
   <book id=”780102497″>
     <title>Building a Better Web Site</title>
     <pricing releaseDate=”2007-07-25″>
          <sellPrice>120.00</sellPrice>
          <purchasePrice>95.49</purchasePrice>
          <sales>
               <item count=”52″ value=”120″ dtm=”2008-08-01″>Below Forecast</item>
               <item count=”208″ value=”180″ dtm=”2007-08-01″>Exceeded Forecast</item>
          </sales>
     </pricing>
     <description>If you’re a developer working with XML, you know there’s a lot to know about XML, and the XML space is evolving almost moment by moment. But you don’t need to commit every XML syntax, API, or XSLT transformation to memory; you only need to know where to find it. Use the < & > values to define your xml nodes.</description> 
   </book>
   <book id=”780102498″>
     <title>Web for Dummies</title>
     <pricing releaseDate=”2007-07-02″>
          <sellPrice>30.00</sellPrice>
          <purchasePrice>10.50</purchasePrice>
          <sales>
               <item count=”50″ value=”100″ dtm=”2008-08-01″ >Below Forecast</item>
               <item count=”3104″ value=”183″ dtm=”2007-08-01″ >Exceeded Forecast</item>
          </sales>
     </pricing>
     <description>Confused on how to use the web? Use this manual for idiots when you don’t know a bit from a bite if it bit you on the behind!</description> 
   </book> 
<root>

The basic resulting JSON structure would be the below, I have left out the second book element in order to save a little space as the structure would be the same as the first book element.

var myJSON=
[{“root”:[{“book”:{“id”:”780102497″,”title”:{“$”:”Building a Better Web Site”},”pricing”:{“releaseDate”:”2007-07-25″,”sellPrice”:{“$”:120.00},”purchasePrice”:{“$”:95.49},”sales”:{“item”:[{“count”:”52″,”value”:120,”dtm”:”2008-08-01″},{“count”:”104″,”value”:”180″,”dtm”:”2007-08-01″}]}},”description”:{“$”:”If
you’re a developer working with XML, you know there’s a lot to know about XML, and
the XML space is evolving almost moment by moment. But you don’t need to commit
every XML syntax, API, or XSLT transformation to memory; you only need to know where
to find it. Use the < & > values to define your xml nodes.”}}},{next book
element}]}]

Step 1: Implementing a Config File

Given the number of different conventions for JSON it is difficult to build that level flexibility directly into the XSL.  However, if we introduce a config.xml file and use the document() command with the XSL to load we can provide a much greater level of switches and configuration.

If we use the root/book[@id=”780102497″]/pricing/sales/item nodes as an example the basic JSON under the Badgerfish convention would be as follows:

{ “item” : [
            { “count” : “50”, “value” = “100”, “dtm” = “2008-08-01”,  “item” :  { “$” : Below
Forecast”} },
            { “count”: “50”, “value” = “100”, “dtm” = “2008-08-01”,  “item” :  { “$” : Exceeded Forecast”} }
           ]
}

The above JSON snippet allows us to get to the item values within the array as ..preamble[“item”][0][“item”][“$”]  or …preamble.item[0].item.$.

If we look at the syntax for the above JSON, the count property could be viewed as :

[encase object] [xml attribute name] [ encase object] : [encase value] [ xml attribute value] [encase value]

The Item node content could be interpreted as:

[encase object] [node name] [encase object] : { [encase object] [element text identifier] [ encase object] : [encase value] [node text value] [ encase value] }

Based on the above we have several variables we can place into our configuration file.  Originally these where defined as params in the XSL but let’s now push them over into config.xml.

<?xml version=”1.0″ encoding=”utf-8″?>
<xsl:stylesheet version=”1.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”&gt;
<xsl:output method=”text”/>
<xsl:param name=”doc”>config.xml</xsl:param>
<xsl:param name=”config” select=”document($doc)”/>
<xsl:param name=”encaseObject” select=”$config/xmltojson/settings/encase/objectNames”/>
<xsl:param name=”encaseString” select=”$config/xmltojson/settings/encase/stringValues”/>
<xsl:param name=”attPrefix” select=”$config/xmltojson/settings/attributes/prefix”/>
<xsl:param name=”attSuffix” select=”$config/xmltojson/settings/attributes/suffix”>
<xsl:param name=”txtPrefix” select=”$config/xmltojson/settings/elements/prefix”/>
<xsl:param name=”txtSuffix” select=”$config/xmltojson/settings/elements/suffix”/>
..remainder of xsl file
</xsl:stylesheet>
<!– The config XML would be the following –>
<?xml version=”1.0″ encoding=”utf-8″?>
<xmltojson>
   <settings>
      <encase>
         <objectNames>”</objectNames>
         <stringValues>”</stringValues>
      </encase>
      <attributes>
         <prefix/>
         <suffix/>
      </attributes>
      <elements>
         <prefix>$</prefix>
         <suffix/>
      </elements>
   </settings>
</xmltojson>

So far fairly straight forward.  Moving these parameters to a config file has no impact on the XSL code. Notice that the document() function takes a variable as well, $doc.  This allows for passing in the doc parameter on transformation in order to over ride the default config.xml file.  This means that you can define multiple configuration files on your site and pick the one that you need for the current request.  The above config settings adhere to the BadgerFish convention.

Step 2: Escaping XML Characters

Since our source xml could have rich text content that contains both \ and ” we have to be sure to be able to escape these characters within the resulting JSON otherwise this could result in an error.  Again we will use our config.xml to provide the list of items to escape.

<xmstojson>
   <settings>…</settings>
   <escape>
      <item>
         <from>\</from>
         <to>\\</to>
      </item>
      <item>
         <from>”</from>
         <to>\”</to>
      </item>
   </escape>
</xmltojson>

As can be seen from the above structure it is a fairly straight forward operation to add in values to check for, from element, then convert to, to element.  For each conversion we add in an item node with the corresponding from and to.  In order to implement this the XSL we, unfortunately, have to create an iterative template that does the replacement.  It is frustrating that XSL does not provide for a translate method that acts more like a replace with n characters to n* characters. We need to call the cleaning template from within both the param and element templates.

<xsl:stylesheet>
…preamble
<xsl:template match=”@*” mode=”attributes”>
   <xsl:variable name=”cleanData”>
      <xsl:call-template name=”replace”>
         <xsl:with-param name=”cleanIt” value=”text()”/>
      </xsl:call-template>
   </xsl:variable>
   <xsl:value-of select=”concat($encaseObject, name(), $encaseObject,$cln, $encaseString, $cleanData, $encaseString)
   <xsl:if test=”position() != last()”>,</xsl:if>
</xsl:template>
<xsl:template match=”*” mode=”elements”>
   <xsl:variable name=”cleanData”>
      <xsl:call-template name=”replace”>
         <xsl:with-param name=”cleanIt” value=”text()”/>
      </xsl:call-template>
   </xsl:variable>
   <xsl:value-of select=”concat($encaseObject, $txtPrefix, $txtSuffix,$encaseObject, $cln, $encaseString, $cleanData, $encaseString)
</xsl:template>
<xsl:template name=”replace”>
   <xsl:param name=”cleanIt” />
   <xsl:param name=”cleaned” />
   <xsl:param name=”nodePos” select=”1″/>
   <xsl:variable name=”escapeFrom” select=”$config/xmltojson/escape/item[$nodePos]/from”/>
   <xsl:variable name=”escapeTo” select=”$config/xmltojson/escape/item[$nodePos]/to”/>
   <xsl:choose>
      <xsl:when test=”string-length(substring-before($cleanIt,$escapeFrom))!=0 or starts-with($cleanIt,$escapeFrom)”>
         <xsl:variable name=”left” select=”substring-before($cleanIt,$escapeFrom)”>
         <xsl:call-template name=”replace”>
            <xsl:with-param name=”cleanIt” select=”substring($cleanIt,number(string-length($left)+string-length($escapeFrom)+1),string-length($cleanIt))” />
            <xsl:with-param name=”cleaned” select=”concat($cleaned,$left,$escapeTo)”/>
            <xsl:with-param name=”nodePos” select=”$nodePos”/>
         </xsl>
      </xsl:when>
      <xsl:otherwise>
         <xsl:choose>
            <xsl:when test=”number($nodePos) &lt; count($config/xmltojson/escape/item)”>
               <xsl:call-template name=”replace”>
                  <xsl:with-param name=”cleanIt” select=”concat($cleaned,$cleanIt)”/>
                  <xsl:with-param name=”nodePos” select=”number($nodePos) + 1″ />
               </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
               <xsl:value-of select=”concat($cleaned,$cleanIt)/>
            </xsl:otherwise>
         </xsl:choose>
      </xsl:otherwise>
   </xsl:choose>
</xsl:template>
</xsl:stylesheet>

Although the replace template looks fairly complex, it is rather simple in its implementation. In essense it enumerates though the passed in value looking for the \escape\from value and when found takes the left of it, appends on the \escape\to value and the remaining content to the right in the value.  The template then calls its self and passes the same string.  The process then repeats. If the \escape\from value is not found in the string it enumerates the nodePos in \escape to the next character to replace and the process start all over again.  Finally when all enumeration is complete and there are no more \escape elements to test for, it returns the result.

Finally 

In the next installment of this series I will move outside of the normal JSON conventions in order to provide greater flexibility in the result set.  In the final post I will show how to provide for data typing of both elements and attributes plus returning native javascript functions within your JSON.  Finally I will introduce how to flatten your JSON for greater efficiency in both size and complexity.  I like to call this SJSON, Simplified JavaScript Object Notation.

I will also make all of the code available in the last post on a code hosting site but have not decided which one to go with yet.

Until then…
Keith Chadwick

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s