I have written following code for extratcting URIs from a web page with content type application/rdf-xml for Linked Data application.
public static void test(String url) {
try {
Model read = ModelFactory.createDefaultModel().read(url);
System.out.println("to go");
StmtIterator si;
si = read.listStatements();
System.out.println("to go");
while(si.hasNext()) {
Statement s=si.nextStatement();
Resource r=s.getSubject();
Property p=s.getPredicate();
RDFNode o=s.getObject();
System.out.println(r.getURI());
System.out.println(p.getURI());
System.out.println(o.asResource().getURI());
}
}
catch(JenaException | NoSuchElementException c) {}
}
But for the input
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ex="http://example.org/stuff/1.0/">
<rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"
dc:title="RDF/XML Syntax Specification (Revised)">
<ex:editor>
<rdf:Description ex:fullName="Dave Beckett">
<ex:homePage rdf:resource="http://purl.org/net/dajobe/" />
</rdf:Description>
</ex:editor>
</rdf:Description>
</rdf:RDF>
The output is :
Subject URI is http://www.w3.org/TR/rdf-syntax-grammar
Predicate URI is http://example.org/stuff/1.0/editor
Object URI is null
Subject URI is http://www.w3.org/TR/rdf-syntax-grammar
Predicate URI is http://purl.org/dc/elements/1.1/title
Website is read
I require in the output all the URIs present on that page to build a web crawler for RDF pages.
I require all following links in output:
http://www.w3.org/TR/rdf-syntax-grammar
http://example.org/stuff/1.0/editor
http://purl.org/net/dajobe
http://example.org/stuff/1.0/fullName
http://www.w3.org/TR/rdf-syntax-grammar
http://purl.org/dc/elements/1.1/title
Minor mistake: you mean
application/rdf+xml(note the plus).Anyway, your problem is very simple:
Bad! You’re missing the error thrown here, and the output is being truncated:
oisn’t always a resource, and this will break on the tripleso you need to guard against that:
or even more specific:
which will skip the
nulloutput you see forex:editor.Now write one thousand times I will not swallow exceptions 🙂