Emerging Leaders

An argument for Data unions

© Atlas of Inequality

These days we are generating a lot of data simply by carrying our phones around and consenting cookies on websites. While some of the data collected can be accessed by researchers for social good, other data is being used to provide targeted advertisements. Companies buy and sell our data, what is currently lacking is a mechanism for individuals to get a grasp of where their personal data is being used and for what purpose. An article by Saulė Gabrielė, Nadia Leonova and Lukas Utzig, Urban AI's Emerging Leaders.

Discover the extended version, Who owns your data ? 

On the one hand, we may want more insights about how the data about us is collected and used, on the other — we have already given up a lot of our data without this consideration. Yet, for any other form of data collection and storage to take place, we would need a mindset shift.

This paper is going to examine how data collection and exchange might take place in the future, particularly with regards to personal oversight, centralized access and how the proposed strategies may influence the future of research done by private companies and for public good.

We argue that there is not enough public interest and understanding about ownership and use of data. We review ideas of data management proposed in Building the New Economy (Pentland et al, 2021) and discuss the feasibility of those ideas in today’s context. 

Establishing data ownership

Aggregated data can provide valuable insights into a variety of socio-economic factors which in turn can be used to identify and analyze target audiences, while individual data points do not hold much value on their own. As described by Pentland et al (2021) data is nonfungible, implying that the same amount of data may not be of the same value, which makes it difficult to trade within our society. It is also nonexclusive, and can be used for a variety of purposes at the same time, unlike labour or capital. Which in turn allows one to gain a variety of insights for different purposes by supplying an algorithm which is checked and applied to the data held by the data exchange. However, Pentland et al (2021) argue that individuals should be able to own their data and can rightfully expect for it to be protected and secured to avoid being identified and targeted. Having a third party manage data access is imperative to allow for the citizens to claim back control of their data. This is the central idea discussed in Building the New Economy (Pentland et al, 2021).

Pentland (2014) and Pentland et al (2021) criticize the way data is currently given away by individuals to large entities, such as companies, without a proper bargaining process for consumers as to its value, or a safe and reliable record of permissions. He compares this inequality of power to the situation of workers during the 19th century Industrialization, which led to the formation of unions for collective wage bargaining and the establishment of credit unions and cooperative banks to support low- and mid-income households with lending and financial services.

In the UK, Trade unions are labor unions which first emerged during the Industrial Revolution to defend workers’ rights such as salaries, work hours, and collective bargaining

Using the terms ‘data union’ or ‘data cooperative’ Pentland et al (2021) lay out a strategy of decentralised community organisations that have members who share a common bond, i.e. geographically, socially or through their consumer behaviour. These unions would hold a record of their members’ data, together with the history of usage rights that were granted, and they would represent the collective interest legally and financially. The members’ gain would be to make the aggregate data available for collective analysis insights and improvements in health, transport, etc. to allow improvements of the community while also bargaining for fair compensation for the use of personal data for commercial purposes by corporations. No matter how small or large, established or a start-up — all companies would be able to have equal access to the information, should they wish to use it, thus allowing for fair market competition.

In the book Social Physics (2014), Pentland uses the term ‘public data commons’ making a reference to the economic concept of a shared public good that is accessible and usable by everyone but also requires a strict set of rules for everyone to follow. This concept has evolved in his later work Building the New Economy (Pentland et al, 2021) as the unions and cooperatives he mentions are now also private or nonprofit entities, whose goal is to defend their members’ interests. This indicates a conceptual shift from a system of individual usership under state rules towards a fluid bargaining system of opposing interests and negotiations of large companies and collectives with many members, re-establishing a so-far missing equilibrium.

On Alex Pentland's proposal

One of the main ideas described by Pentland et al (2021) is the idea of data cooperative and the need of such third party to manage citizens’ data. These cooperatives would work as labour unions defending the rights of its members. Moreover, the main focus of such cooperatives is at community scale and would have strong geographical constraints. If someone wants to know more about a community and develop data analysis over it, one can do it through a data cooperative of that community. If the particular community is widely involved in giving consent about their data, such a request could be successful. However, most often it would lack the full representation of the community population, thus being not representative and useful.

In our view, such quality issues could be solved if data cooperatives are not geographically bounded, but united by the platform’s users, such as social network platforms such as Facebook or healthcare providers such as NHS. That would mean the users of a platform would share their consent about their data with a union, which is designed to manage only this particular platform’s data footprint. In this way the data cooperatives concept would be clearer for businesses willing to use the data for analysis — they could address a particular type of data that they know they need regarding their research question or business. In terms of geography, it could be easily split geographically depending on the needed region. Data cooperatives would thus solve the data quality question, as a specific data feature of the company or platform would come in more complete coverage. Avoiding geographical limitations would then allow such cooperatives usability to be scaled geographically much faster. Even if data communities based on geographical proximity would exist, the question of data quality remains. If the community consists of people of various ages, that could lead to very different types of data being collected over them, since older generations have smaller digital footprints. Data in such a cooperative would then become more scattered, less representative and biased, thus hardly being used by businesses that seek to see the full picture of multiple areas with aggregated high-quality datasets.

Moreover, having specialised data cooperatives for a company or a particular use case could motivate users to join it. Since users already are on the platform, this could work as an upgrade of terms and conditions, as if it were when signing a contract for a job and it included a part about labour unions rights. Such data unions could also defend users’ rights against the platform and would be more powerful since there would be not one person fighting a company such as Facebook, but millions that are unsatisfied with the data policy. We have already seen cases of people uniting in a similar way to protect their rights in cases of a data breach.

Another aspect is the indifference of users, whose data is being used as capital. For example, many people are unaware that their data is collected by cookies and that there is an option to opt out of everything, which does not jeopardize the provision of the service they are after. On this note, there are also probably very few people who read terms and conditions of the social platforms they are registered in. It becomes too common that the tradeoff to access a free platform is terms of money is to pay with one’s data currency. Even though data cooperatives on specific platforms could help to advocate for its users’ data sharing rights, if data sharing is not a concern for the majority, the progress to implement any changes would be slow. Such indifference in data security might arise from the lack of data and technology literacy. Most people do not really understand what the idea of data is and what information is being collected around them. Even though Pentland et al (2021) advocate for putting control of the data back in the hands of the individuals, there needs to be a global awareness. While some of us might question why outdated rules still govern in the 21st century, such as land ownership (Minton, 2009) and access to nature (Right to Roam, 2022). It will take time to realise the scale of what we’ve given up by agreeing to share our data over the recent years and it will take time for the regulations to catch up. The only way we can hold the companies accountable as of now is in the court of public opinion.

As an outlook towards further investigation, it would be useful to study the emergence of labour unions in the US and Europe, and specifically the intensive pushback they are faced with by corporations in the United States. Union membership there has declined over the last decades due to companies’ aggressive measures. The outlook of global enterprises giving up their use of free data is likely to evoke a similar response including political lobbying efforts. To overcome this, a thorough knowledge of negotiation processes and precedents in other countries with higher union membership rates such as Sweden or Denmark will be necessary.


This essay was written by Saulė Gabriele, Nadia Leonova and Lukas Utzig as part of the Emerging Leaders Program.

Saule Gabriele Petraityte is a spatial data scientist from Lithuania working on data-driven cities projects. She is the CEO of Datahood and Co-Founder of GovTech Lab Lithuania.

Nadia Leonova is a consultant for the World Bank’s Global Facility for Disaster Reduction and Recovery (GFDRR). Her work focuses on the analysis of the impacts of natural disasters on urban environments. Nadia holds an MSc degree in Smart Cities and Urban Analytics and a BSc degree in Architecture

Lukas Utzig is a researcher and designer holding a master’s degree in spatial research from the Space Syntax Lab, UCL. Currently he works as lead architect and urban designer for an international practice. In his research, he focusses on understanding spatial patterns of movement, segregation, and social networks.


Minton, A. (2009). Ground control: Fear and happiness in the twenty-first-century city. London: Penguin Books.

Right to Roam. (2022). [online] Available at: https://www.righttoroam.org.uk. (Accessed: 10/05/2022).

Pentland, A. (2014). Social physics: How good ideas spread-the lessons from a new science. Penguin.

Pentland A., Lipton A., and Hardjono T. (2021). Building the New Economy: Data as Capital. The MIT Press.