Learn SPARQL in 5 minutes and use it to query WikiData

Share on:

I’m working on Entity Linking and Knowledge Bases. In that context, exporting a relevant part of Wikidata can be really useful to build surface form dictionaries and coocurence probabilities etc… In order to know which part of Wikidata is relevant to dump, I thought we could query Wikidata (although it seems we can only download the entire dump and filter afterwards).

SPARQL is a language to formulate questions (queries) for knowledge databases. Therefore you can query Wikidata with SPARQL. At first sight, the syntax is not particularly easy and I’ve gone through this tutorial.


  • #is the comment character
  • The SELECT clause lists variables that you want returned (variables start with a question mark)
  • The WHERE clause contains restrictions on them, in the form of SPO triples (subject, predicate, object), e.g. ?fruit hasColor yellow.
  • On Wikidata, items and properties are not identified by human-readable names like “hasColor” (property) or “yellow” (item).
  • Instead, Wikidata items and properties are assigned an identifier, that you need to know beforehand.
    • for items, it’s a Q-number, e.g. “yellow” is Q943
    • for properties, it’s a P-number, e.g. “hasColor” is P462
  • you can search for the itentifiers using search term for items and P:search term for properties
  • but you should rely on autocompletion in query.wikidata.org by pressing Ctrl + Space
  • Finally, you need to include prefix namespaces to query the WQDS (Wikidata Query Service). There are many prefixes, one for each namespace in SPARQL :
    • wd: for items
    • wdt: for properties, pointing to the object
  • to doublecheck what prefix links to what resource, use http://prefix.cc/
  • to get more than the Wikidata ID as selectable attributes, you need to include them in the WHERE clause using SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }

Putting this all together we get:

SELECT ?fruit ?fruitLabel
  # fruit hasColor yellow
  ?fruit wdt:P462 wd:Q943
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }

Try it

  • You can further filter this down by adding more triple conditions using ; character e.g. you could filter for actual fruits by doing
# fruit instance of or subclass of a fruit
?fruit wdt:P31/wdt:P279* wd:Q3314483;
  • Advanced filters :
    • p: for properties, pointing to the subject
    • ps: for property statement
    • pq: for property qualifier
  • You can abbreviate a lot with the [] syntax
SELECT ?painting ?paintingLabel ?material ?materialLabel
  # element is a painting
  ?painting wdt:P31/wdt:P279* wd:Q3305213;
  # extract the statement node 'material' (P186)
            p:P186 [
              # get material property statement
              ps:P186 ?material;
              # 'applies to part'(P518) 'painting surface'(Q861259)
              pq:P518 wd:Q861259
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  • More grammar by example: ORDER BY, LIMIT
SELECT ?country ?countryLabel ?population
  # instances of sovereign state
  ?country wdt:P31/wdt:P279* wd:Q3624078;
  # hasPopulation populationValue
           wdt:P1082 ?population.
  # filter for english translations
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
# ASC(?something) or DESC(?something)
ORDER BY DESC(?population)
  • If you add more variables like population above, the query will filter out countries that don’t have a population value. To fix this, use an OPTIONAL clause
SELECT ?book ?title ?illustratorLabel ?publisherLabel ?published
  ?book wdt:P50 wd:Q35610.
  OPTIONAL { ?book wdt:P1476 ?title. }
  OPTIONAL { ?book wdt:P110 ?illustrator. }
  OPTIONAL { ?book wdt:P123 ?publisher. }
  OPTIONAL { ?book wdt:P577 ?published. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
SELECT ?person ?personLabel ?age
  # instance of human
  ?person wdt:P31 wd:Q5;
          wdt:P569 ?born;
          wdt:P570 ?died;
  # died from capital punishment
          wdt:P1196 wd:Q8454.
  BIND(?died - ?born AS ?ageInDays).
  BIND(?ageInDays/365.2425 AS ?ageInYears).
  BIND(FLOOR(?ageInYears) AS ?age).
  FILTER(?age > 90)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }

Try it

  • One can select based on a list of items using VALUES
SELECT ?item ?itemLabel ?mother ?motherLabel
  # A. Einstein or J.S. Bach
  VALUES ?item { wd:Q937 wd:Q1339 }
  # mother of
  OPTIONAL { ?item wdt:P25 ?mother. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  • The Label Service extension automatically generates labels as follows:
    • ?xxxLabel as a shortcut for rdfs:label
    • ?xxxAltLabel as a shortcut for skos:altLabel
    • ?xxxDescription as a shortcut for schema:description

Get the 🇳🇱 dutch nicknames of a team:

# get the dutch nicknames from Bayern München
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
  VALUES ?item { wd:Q15789 }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "nl". }

Try it

Get the stadium names of the teams that are part of the Big 5:

SELECT ?item ?itemLabel ?venue ?venueLabel ?venueAltLabel
  ?item wdt:P31/wdt:P279* wd:Q847017;
        wdt:P118 ?league;
        wdt:P115 ?venue.
  # filter for Big 5
  VALUES ?league { wd:Q82595 wd:Q9448 wd:Q13394 wd:Q15804 wd:Q324867 wd:Q206813}.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
ORDER BY ?league

Try it

Solutions to the tutorial exercices 

Here are my solution to the exercises in that tutorial.

Chemical elements 

Write a query that returns all chemical elements with their element symbol and atomic number, in order of their atomic number.

SELECT ?element ?elementLabel ?symbol ?atomic_number
  ?element wdt:P31 wd:Q11344;
           wdt:P246 ?symbol ;
           wdt:P1086 ?atomic_number .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
ORDER BY ASC(?atomic_number)

Try it

Rivers that flow into the Mississippi 

Write a query that returns all rivers that flow directly or indirectly into the Mississippi River.

SELECT ?river ?riverLabel
  ?river wdt:P31 wd:Q4022;
         wdt:P403/wdt:P403* wd:Q1497 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
ORDER BY ASC(?riverLabel)

Try it

References to Le Figaro website 

  ?ref pr:P854 ?refURL .
  FILTER (CONTAINS(str(?refURL),'lefigaro.fr')) .
} LIMIT 10

Try it