Use Ontology to correctly assign data types in Virtuoso
I have ingested the Geonames RDF dump (https://download.geonames.org/all-geonames-rdf.zip) into a Virtuoso instance, and I've been running queries against it with varying degrees of success. However, I've found that certain objects have the incorrect datatype. For example, population is encoded using xsd:string, and therefore trying to sort by population ends up sorting the results in lexicographic order:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX gn: <http://www.geonames.org/ontology#>
SELECT ?country ?name ?population (datatype(?population) AS ?type)
WHERE {
?country a gn:Feature .
?country gn:name ?name .
# A.PCLI is feature code for 'independent political entity'
?country gn:featureCode <https://www.geonames.org/ontology#A.PCLI> .
?country gn:population ?population .
}
ORDER BY DESC(?population)
LIMIT 10
| country | name | population | type |
|---|---|---|---|
| https://ift.tt/3Fg8k6V | China | 1330044000 | https://ift.tt/2VRgVvo |
| https://ift.tt/3c4uLiM | India | 1173108018 | https://ift.tt/2VRgVvo |
| https://ift.tt/3HhNLc1 | United States | 310232863 | https://ift.tt/2VRgVvo |
| https://ift.tt/3oj64ou | Indonesia | 242968342 | https://ift.tt/2VRgVvo |
| https://ift.tt/31Ylaby | Brazil | 201103330 | https://ift.tt/2VRgVvo |
I know I can cast the variable to get the correct result like so ORDER BY DESC(xsd:integer(?population)), but once my queries get more complicated, this no longer works. Specifically, when running sub queries and using the results to apply further logic. For example:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX gn: <http://www.geonames.org/ontology#>
SELECT ?cityName ?countryName ?population datatype(?population)
WHERE
{
?city gn:parentCountry ?country ;
gn:population ?population ;
gn:name ?cityName .
?country gn:name ?countryName .
{
# a) SELECT ?country (MAX(?population) AS ?population)
# b) SELECT ?country (MAX(xsd:integer(?population)) AS ?population)
# c) SELECT ?country (xsd:string(MAX(xsd:integer(?population))) AS ?population)
WHERE
{
?city a gn:Feature ;
gn:featureClass <https://www.geonames.org/ontology#P> ;
gn:population ?population ;
gn:parentCountry ?country .
}
GROUP BY ?country
ORDER BY DESC(?population)
}
}
Select a returns the populations in lexicographic order, as before.
Select b correctly orders the populations, but seeing as the result set has cast the population to integers, I can no longer match the city using population outside the sub query as I'm comparing strings with integers. So b returns an empty result set.
Select c was my attempt at recasting the results back to strings in order to be able to match them outside the sub query, but this ends in a timeout (estimated 4000 second execution time).
My question is this: Is there a way to either
a) change the datatype in Virtuoso manually
b) use the Geonames ontology to instruct Virtuoso about the correct types
c) alter my query to more efficiently cast to the correct type
I'm hoping option b is possible, as this seems the most effective solution, because the Geonames ontology correctly specifies the types to all of the resulting predicate's objects.
You can find the Geoname ontology here.
You can test the queries above and your own against our endpoint here: http://18.170.45.162:8890/sparql
from Recent Questions - Stack Overflow https://ift.tt/3Cc4F83
https://ift.tt/eA8V8J
Comments
Post a Comment