I have build a sparql query for dbpedia with a regex in it which is very slow :
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
select ?label where {
?s rdfs:label ?label.
?s dbpedia-owl:thumbnail ?photo.
?s dbpedia-owl:abstract ?abstract.
FILTER langMatches( lang(?label), "FR" ).
FILTER langMatches( lang(?abstract), "FR" ).
FILTER regex(?label, "^Jules V", "i").
}
LIMIT 10
You can try it using the public endpoint http://fr.dbpedia.org/sparql and see you have to wait some seconds.
Is there a way for me to get better performance on this, even if the final quality is not so good ?
Thanks,
Samuel
Any query using
REGEXwill almost certainly be slow unless your query restricts to a small enough portion of the dataset. Processing aREGEXbasically requires that the store do a linear scan over the potential results checking each to see whether it matches the regular expression.If you have a sufficiently simple regular expression as in your case you should try one of two things:
Solution 1 – Use a lighter weight string function
In your case you’re looking for strings that start with a certain substring, so it will almost certainly be more efficient to use the
STRSTARTSfunction instead since that doesn’t require full regex. This of course assumes your SPARQL engine complies with the latest SPARQL 1.1 draft specification.Solution 2 – Use Full Text Search
Many stores include full text search extensions which can be used in place of
REGEXand often yield significantly better performance because you are accessing a full text index rather than doing a linear scan over the potential results.In the case of DBPedia the Virtuoso store behind it supports the following syntax:
Note that the Virtuoso full text syntax is somewhat limited so you can’t use
Jules Vas is because each term must be at least 4 characters (possibly 3). But you can combine this with a furtherFILTERto narrow down to the results you wanted like so:This query runs almost instantaneously