Recently we encountered a performance problem from a piece of code that generates an XML. Thought of sharing the experience here. This is slightly long, please bear with me.
We prepare a simple XML with a number of items. Each item can have 5-10 elements. The structure is something like this:
<Root>
<Item>
<Element1Key>Element1Val</Element1Key>
<Element2Key>Element2Val</Element2Key>
<Element3Key>Element3Val</Element3Key>
<Element4Key>Element4Val</Element4Key>
<Element5Key>Element5Val</Element5Key>
<Item>
<Item>
<Element1Key>Element1Val</Element1Key>
<Element2Key>Element2Val</Element2Key>
<Element3Key>Element3Val</Element3Key>
<Element4Key>Element4Val</Element4Key>
<Element5Key>Element5Val</Element5Key>
<Item>
</Root>
The code that generates the XML was (in simplified form as global functions) :
void addElement(std::string& aStr_inout, const std::string& aKey_in, const std::string& aValue_in)
{
aStr_inout += "<";
aStr_inout += aKey_in;
aStr_inout += ">";
aStr_inout += "Elemem1Val";
aStr_inout += "<";
aStr_inout += aValue_in;
aStr_inout += ">";
}
void PrepareXML_Original()
{
clock_t commence,complete;
commence=clock();
std::string anXMLString;
anXMLString += "<Root>";
for(int i = 0; i < 200; i++)
{
anXMLString += "<Item>";
addElement(anXMLString, "Elemem1Key", "Elemem1Value");
addElement(anXMLString, "Elemem2Key", "Elemem2Value");
addElement(anXMLString, "Elemem3Key", "Elemem3Value");
addElement(anXMLString, "Elemem4Key", "Elemem4Value");
addElement(anXMLString, "Elemem5Key", "Elemem5Value");
anXMLString += "</Item>";
replaceAll(anXMLString, "&", "&");
replaceAll(anXMLString, "'", "'");
replaceAll(anXMLString, "\"", """);
replaceAll(anXMLString, "<", "<");
replaceAll(anXMLString, ">", ">");
}
anXMLString += "</Root>";
complete=clock();
LONG lTime=(complete-commence);
std::cout << "Time taken for the operation is :"<< lTime << std::endl;
}
The replaceAll() code will replace the special characters with the encoded form. This is given below.
void replaceAll(std::string& str, const std::string& from, const std::string& to)
{
size_t start_pos = 0;
while((start_pos = str.find(from, start_pos)) != std::string::npos)
{
str.replace(start_pos, from.length(), to);
start_pos += to.length();
}
}
In the minimal example, I have encoded 200 items. But, in the actual situation this could be more. The above code took around 20 seconds to create the XML. This was far beyond any acceptable limit. What could be the problem? And how to improve the performance here?
Note : The usage of the string class doesn’t make much difference. I tested same logic with another string implementation from MFC CString and I got the similar(much worse) observation. Also, I don’t want to use any DOM XML parsers here to prepare the XML in a better way. The question is not specific to XML.
If you can estimate length of of result string (
anXMLString) before creation of content, then you could allocate enough buffer space for the string.When the buffer is big enough, then re-allocation and copying (of target string) won’t happen.
This way:
I’m not sure about std::string, does it need to search appending point, or does in keep length of string in memory.