A black and white photo of a large crowd.
Rob Curran on Unsplash

The sad state of synthetic populations

Synthetic populations are artificial big data sets whose records represent fictional individuals whose attributes reflect those of the real population. These data sets are widely used in agent-based models for urban research. Increasingly in urban planning and urban tech product development as well. This has caused conflicts over privacy and the provenance of data from private sector firms used to construct the synthetic populations.

A recent paper suggests, however, that the problems with synthetic populations may go much deeper. Despite the fact that "[s]everal well established methods have been proposed to build consistent and realistic synthetic populations from partial data, as information regarding the target population is often limited due to privacy concerns", they are rarely used even in research, let alone in business or policy analysis. Looking at published studies in a leading forum, the Journal of Artificial Societies and Social Simulation, the authors found "during the last 5 years, that these methods are rarely used."

Synthetic populations are widely seen as a panacea for expanding the use of sensitive big data in urban contexts. But this signal suggests that startups and others supplying software in urban tech will struggle to do so for some time to come.

Source: redtailmedia.org
Sector
Urban Science
Tags
geodemographics
ABM
agent-based models
data quality