I have a set of CSV data to be converted to XML. The codes look OK but the output isn’t perfect enough. It omits some columns because they have no value and produces a long line of XML data instead of breaking it.
This is a sample of my CSV data:
Name Age Sex
chi 23
kay 19 male
John male
And my code:
public class XMLCreators {
// Protected Properties
protected DocumentBuilderFactory domFactory = null;
protected DocumentBuilder domBuilder = null;
public XMLCreators() {
try {
domFactory = DocumentBuilderFactory.newInstance();
domBuilder = domFactory.newDocumentBuilder();
} catch (FactoryConfigurationError exp) {
System.err.println(exp.toString());
} catch (ParserConfigurationException exp) {
System.err.println(exp.toString());
} catch (Exception exp) {
System.err.println(exp.toString());
}
}
public int convertFile(String csvFileName, String xmlFileName,
String delimiter) {
int rowsCount = -1;
try {
Document newDoc = domBuilder.newDocument();
// Root element
Element rootElement = newDoc.createElement("XMLCreators");
newDoc.appendChild(rootElement);
// Read csv file
BufferedReader csvReader;
csvReader = new BufferedReader(new FileReader(csvFileName));
int fieldCount = 0;
String[] csvFields = null;
StringTokenizer stringTokenizer = null;
// Assumes the first line in CSV file is column/field names
// The column names are used to name the elements in the XML file,
// avoid the use of Space or other characters not suitable for XML element
// naming
String curLine = csvReader.readLine();
if (curLine != null) {
// how about other form of csv files?
stringTokenizer = new StringTokenizer(curLine, delimiter);
fieldCount = stringTokenizer.countTokens();
if (fieldCount > 0) {
csvFields = new String[fieldCount];
int i = 0;
while (stringTokenizer.hasMoreElements())
csvFields[i++] = String.valueOf(stringTokenizer.nextElement());
}
}
// At this point the coulmns are known, now read data by lines
while ((curLine = csvReader.readLine()) != null) {
stringTokenizer = new StringTokenizer(curLine, delimiter);
fieldCount = stringTokenizer.countTokens();
if (fieldCount > 0) {
Element rowElement = newDoc.createElement("row");
int i = 0;
while (stringTokenizer.hasMoreElements()) {
try {
String curValue = String.valueOf(stringTokenizer.nextElement());
Element curElement = newDoc.createElement(csvFields[i++]);
curElement.appendChild(newDoc.createTextNode(curValue));
rowElement.appendChild(curElement);
} catch (Exception exp) {
}
}
rootElement.appendChild(rowElement);
rowsCount++;
}
}
csvReader.close();
// Save the document to the disk file
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(newDoc);
Result result = new StreamResult(new File(xmlFileName));
aTransformer.transform(src, result);
rowsCount++;
// Output to console for testing
// Resultt result = new StreamResult(System.out);
} catch (IOException exp) {
System.err.println(exp.toString());
} catch (Exception exp) {
System.err.println(exp.toString());
}
return rowsCount;
// "XLM Document has been created" + rowsCount;
}
}
When this code is executed on the above data it produces:
<?xml version="1.0" encoding="UTF-8"?>
<XMLCreators>
<row>
<Name>chi</Name>
<Age>23</Age>
</row>
<row>
<Name>kay</Name>
<Age>19</Age>
<sex>male</sex>
</row>
<row>
<Name>john</Name>
<Age>male</Age>
</row>
</XMLCreators>
I arranged it in this form myself but the output produces a long line. The output to be produced should be:
<?xml version="1.0" encoding="UTF-8"?>
<XMLCreators>
<row>
<Name>chi</Name>
<Age>23</Age>
<sex></sex>
</row>
<row>
<Name>kay</Name>
<Age>19</Age>
<sex>male</sex>
</row>
<row>
<Name>john</Name>
<Age></Age>
<sex>male</sex>
</row>
</XMLCreators>
I’d agree with Kennet.
I simply added
This added a new line between the elements and allowed for indentation.
UPDATED
Let’s start with the fact that the file you’re presented isn’t a CSV (Comma separated value) file and I’ll let you worry about that problem…
Now I’ve used a
Listinstead of aMaphere. You’ll need to decide how best to approach the missing values problem. Without knowing the structure of the file in advance, this is not going to be a simple solution.Any way, I end up with
Updated with merge
UPDATED with use of OpenCSV
Next update (2022)
So, for example, using something like…
It will generate an output of something like…
Runnable example