“… try to think your way into the main part of a fungus, the mycelium, a proliferating network of tiny white threads known as hyphae. Decentralised, inquisitive, exploratory and voracious, a mycelial network ranges through soil in search of food. It tangles itself in an intimate scrawl with the roots of plants, exchanging nutrients and sugars with them; it meets with the hyphae of other networks and has mycelial sex; messages from its myriad tips are reported rapidly across the whole network ...”
The quotation is from Frances Gooding's article “From its Myriad Tips”, London Review of Books, 20 May 2021.
It inspired me to prepare a Presentation pack at the time, whose text is copied here.
Fungal networks resemble human projects: a complex mesh of chance encounters and temporary cross-fertilisations whose dynamics are difficult to grasp at the time or in retrospect.
Can Wikidata provide the ground to investigate an activity network?
Wikidata Q-items provide a good harvest for queries based on intrinsic properties (“Show me all X in the Y area”)
However most endeavours involve some form of associative network; queries seeking a view of network involvement can be disappointing.
Querying to visualise individuals’ involvement in art projects, football clubs’ participation in league competitions, and railway stations associated with early rail companies, have all encountered shortfalls in the associational data, for example
P137 for railway stations to companies;
P118 for football clubs to leagues,
P710 project to participating artists
Supplementary information is often available in Wikipedia article texts, as links if not as structured data. Reading Wikipedia and then adding the sought properties is possible by hand. The process can also be semi-automated using a Python script:
utilising wikipedia 1.4 functions to identify all pages in a category (e.g. Defunct Scottish football leagues) which themselves fit a given property;
then screen-scraping Wikipedia and Wikidata to check each link for the sought item type and identify all those lacking the sought property value;
generating a file of property updates to be applied using quickstatements.
Unfortunately, the file needs sense-checking and filtering. For example, an article on a rural rail link may mention connections through to further locations such as London Euston. Anachronism is also a problem, when the article on a historical rail line mentions new 21st century stations.
This kind of informal research to visualise a particular project network is both nourished by Wikidata and can enrich Wikidata’s property-based associations.
The network visualisation is made possible by base data in Wikidata.
Attempts to visualise a network find gaps in the Wikidata properties, many of which can be supplemented by harvesting textual information in the corresponding Wikipedia articles for Wikidata properties needed for the SPARQL visualisation queries.
Visualisation also helps identify and correct data faults, for example where a similarly-named but distant location has been accidentally blended with one in the area in question.
What are the limitations?
Using data sources which are mastered outside Wiki* but have been loaded through a one-off exercise is a problem, in terms of how/where to enhance or correct the Wikidata records.