I’ve got a requirement to take some XML and transform it into a fixed-width load file for loading to an SAP system. My algorithm works fine except for some weird European characters such as Ã, which, when in a string returns a string length of +1 for each instance of the char. So for example the text Ãbcd would have a string-length($value) of 5 instead of 4.
This is a problem, because my code checks to see what the length of the property is, then subtracts that from the max-length of the fixed-length output format (i.e. for a 30-width field if it read Ãbcd it would think it needed 25 spaces instead of 26).
Does anyone know of a better way to do this, or what I’m doing wrong in my algorithm?
Below are my xsl templates (for the most part… can’t get them in here quite right…)
Template to Write out Property:
<xsl:param name="value"/>
<xsl:param name="width"/>
<!-- find the current length of the field-->
<xsl:variable name="valueWidth" select="string-length($value)" />
<xsl:variable name="difference" select="$width - $valueWidth" />
<xsl:if test="$difference > 0">
<xsl:value-of select="$value"/>
<!-- run this for loop x times outputing space for each -->
<xsl:call-template name="for-loop-spaces">
<xsl:with-param name="count" select="$difference - 1" />
</xsl:call-template>
</xsl:if>
<xsl:if test="($difference < 0)">
<xsl:value-of select="substring($value,0,$width)"/>
</xsl:if>
<xsl:if test="$difference = 0">
<xsl:value-of select="$value"/>
</xsl:if>
</xsl:template>
For-loop-spaces template (it wouldn’t copy-paste):
outputs a space each time it’s called. accepts param “count”. If count greater then zero, recursively call with count-1 until 0.
Any input would be very useful 🙂
The problem is that combining diacritical marks can be used instead of single characters. This is what gives you the “wrong length”.
See http://en.wikipedia.org/wiki/Combining_character for more info on those characters.
If you have XSLT 2, there is a built-in function to normalize them which should work: fn:normalize-unicode
For XSLT 1.0, you’d have to use some function to count the characters excluding the combining characters. One possiblity may be the use of translate:
Note that you’ll have even more problems if you have asian characters which are combined.
Quote from http://www.dpawson.co.uk/xsl/characters.html