I am a newbie in Java, after evaluate some java libraries i choosed VTD-XML by its performance tests and the option to use Xpath, I tried StaX and i think is not for human beings, really hard to understand how the parsing works (almost for me XD).
So, my target is to “inject” the geo_code node from partial_geo_codes.xml into geo_code accommodations.xml matching the values on node ext_id from both
accommodation.xml
<accommodations>
<accommodation>
<ext_id>12345</ext_id>
<type>A</type>
<details>D</details>
<geo_code />
</accommodation>
and this is the file to be appended into accommodation.xml:
partial_geo_codes.xml
<geo_codes>
<geo_code>
<ext_id>12345</ext_id>
<geo_idlocacion>77500</geo_idlocacion>
<latitude>42.578114</latitude>
<longitude>1.648293</longitude>
</geo_code>
<geo_code>
...
<geo_code>
<geo_code>
...
<geo_code>
<geo_codes>
this is the expected output:
accommodation_new.xml
<accommodations>
<accommodation>
<ext_id>12345</ext_id>
<type>A</type>
<details>D</details>
<geo_code>
<ext_id>12345</ext_id>
<geo_idlocacion>77500</geo_idlocacion>
<latitude>42.578114</latitude>
<longitude>1.648293</longitude>
<geo_code>
</accommodation>
<accommodation>
.....
</accommodation>
......
</accommodations>
and this is my “wannabe-really-sucks” java class:
import com.ximpleware.extended.*;
import java.io.*;
public class MergeVtd {
public static void main(String args[]) throws Exception {
String filesPath = new java.io.File("").getAbsolutePath() .concat("/main/src/");
long start = System.currentTimeMillis();
//init original xml
VTDGenHuge vgh = new VTDGenHuge();
//init tobemerged xml
VTDGenHuge vgm = new VTDGenHuge();
if (vgm.parseFile(filesPath.concat("partial_geo_code.xml"),true,VTDGenHuge.MEM_MAPPED)){
VTDNavHuge vnm = vgm.getNav();
AutoPilotHuge apm = new AutoPilotHuge(vnm);
apm.selectElement("ext_id");
int count=0;
while (apm.iterate()){
int t = vnm.getText();
if (t!=-1) {
System.out.println("Value vnm ==> "+vnm.toNormalizedString(t));
//we have id to match....
if (vgh.parseFile(filesPath.concat("accommodation.xml"),true,VTDGenHuge.MEM_MAPPED)){
VTDNavHuge vnh = vgh.getNav();
AutoPilotHuge aph = new AutoPilotHuge(vnh);
aph.selectXPath("/accommodations/accommodation/ext_id[text()='" + vnm.toNormalizedString(t) + "']" );
int result = -1;
while ((result=aph.evalXPath())!=-1){
int g = vnh.getText();
if (g!=-1) {
System.out.println("Value vnh ==> "+vnh.toNormalizedString(g));
} else {
System.out.println("no match in vnh !======= ");
}
}
}
}
System.out.println("============================== " + count);
count++;
}
}
long end = System.currentTimeMillis();
System.out.println("Execution time was "+ (end - start) +" ms.");
System.exit(0);
}
}
i really appreciate any clue helping me how to iterate into 2 xml files at once and merge by ext_id node value much faster, now really takes too much time.
How big is partial_geo_codes.xml? Can it fit in memory? If yes then I would recommend indexing it using hash-map. Just create simple HashMap, and put there references to geo_code nodes with values of ext_id as keys.
Having done that you’ll need to pass accomodations.xml only once. Right now your algorithm complexity is O(n^2), what’s worse is that involves n reads from disk! Version with HashMap will take O(n) time and will require only single pass through both xml files.