The Amsterdam Law & Technology Institute’s team is inviting external faculty members to publish guest articles in the ALTI Forum. Here is a contribution authored by Jason Potts, Professor of Economics at RMIT University and Co-director of the Blockchain Innovation Hub at RMIT and chief investigator on the ARC Centre of Excellence for Automated Decision-Making and Society.
This forum note elaborates an idea first proposed by Andrew Torrance, Dietmar Harhoff, Eric von Hippel and me (Potts et al 2021), as one of several solutions to the problem of how to redesign innovation policy for a 21st century economy in which open innovation (von Hippel 2017) is growing compared to industrial innovation due to the rise of the digital economy and the innovation commons (Potts 2019).
The basic idea is this: we need a new type of property right over data to incentivise getting data into the commons. Blockchain can be used to build this—in a time-locked, DAO governed data vault—with antitrust law supplying the enforcement mechanism.
Innovation policy and the digital economy
In an industrial economy, innovation policy works through three instruments, all dealing with the diagnosis of market failure in the production of new ideas (Bloom et al 2019). The first, public supply of key inputs (e.g. science funding, training). The second, public subsidy of private investment (e.g. R&D tax credits). The third, intellectual property, particularly patents, to create a temporary monopoly in an idea, enabling monopoly pricing to create a temporary rent, in return for putting the knowledge in the public domain (compulsory disclosure). Observe that industrial innovation policies do not target data.
In a digital economy, however, data is a prime economic resource for production and innovation. As the cost of data falls due to technological change, both demand and supply increase. Digital technologies lower the cost of creating, searching, sharing and analyzing data (Goldfarb and Tucker 2019). This is due to economic activity becoming increasingly digital, or interacting with digital infrastructure, as more economic goods produce digital data streams, This is itself due to falling costs of hardware and improving technologies for processing, storing and sharing data (Brynjolfsson and McElheran 2017). The increased supply and falling price of data induces increased demand from machines, as an input into automation and artificial intelligence (Cockburn et al 2018).
An important question arises with the optimal economic organisation—i.e. the institutional form that is both incentive compatible and maximizes social welfare—of the data economy. Economic theory suggests that, by efficiency criteria, data is best governed by the consumers who produced it, to control privacy (Acemoglu et al 2020), and in pools or trusts to ensure efficient bargaining in data markets (Coyle et al 2020). The most efficient institutional configuration for data—the prime resource in a digital economy—is not as a private or public good, but as a commons.
Note that an Ostrom-type commons is not the predominant institutional form in which data exists in modern economies. Instead, most data today is harvested and used on private platforms. A social welfare loss accrues to the opportunity costs of that data not being re-used by others due to restricted access. Potts et al (2021) argue that innovation policy for a digital economy should find ways to get key resources for innovation into the commons. The objective is to get data efficiently into the commons to facilitate its large-scale and wide-spread use as an innovation resource, while minimising distortions to incentives to produce data.
A possible policy mechanism to do this is a new type of intellectual property right—like patents, operating as a time limited monopoly, with obligations—but targeting not ideas (which patents do), but data.
Economists have long recognised that certain classes of technology—called general purpose technologies—disproportionately raise social welfare when they are in the commons. An analogous argument can be made about general purpose information, as a way of thinking about large data sets for general purpose input into production and innovation. Examples of such data commons occurring in the wild, i.e. without specific institutional incentives are: PubChem, ImageNet (led by FeiFei Li), and the Open Search Foundation (led by Stefan Voigt).
What is data?
What is data, and how does it relate to economic policy? In mediaeval and early industrial economies, data was scarce and administrative. Data mostly consisted of what we now think of as census data, i.e. collected by hand, for government use to facilitate administration of the realm. This data was not fundamentally commercial, nor economically valuable.
Due to the continuing advance in digital technologies, the quantity of raw data in the world has exploded by orders of magnitude. Data is now measured in exabytes, zettabytes and yottabytes, doubling in size every year or so. It is almost entirely generated as exhaust, as a byproduct of ordinary economic activities. It is produced (and used!) by machines, not humans. And it is collected and stored (and used!) by very large companies (as well as governments, at all levels of administration). Typical data nowadays is transactions data, or location data, or continuous state data.
But this data is also a powerful source of competitive advantage for those that possess it, enabling many types of modern business processes to function. The falling costs, diffusion and capability-enhancing affordances of new digital technologies are making the commons a steadily more valuable institution for supporting innovation. So data is a potent economic resource, and access to data is essential both for competition and for innovation.
Data and the law
Data is of course represented in modern intellectual property law, although obliquely. It is covered by copyright law and regulations governing databases (database rights), and in some cases trade secrets law. Copyright law also safeguards the programs used to collect and analyse data. But copyright law is difficult to apply when authorship of data is ambiguous, as with most machine-collected or platform-derived data. Because most data is co-produced, ownership is difficult to establish. (Database rights seek to resolve this in order to protect database investment.)
Much legislation and regulation relating to data in an economy has concerned civil issues carried over from administrative use in relation to privacy (e.g. the EU’s GDPR legislation). But from an economic perspective, the institutional governance of data is fundamentally an issue of competition policy and innovation policy.
How to get data into the commons?
Data is a nonrival economic good. It is economically valuable because it is an input into production and innovation. The corollary is that when data is efficiently collected and access-controlled by large platforms (for various technical reasons) that data is monopolistic. It follows that economic policy—both competition policy and innovation policy—should seek to eliminate that monopoly (and its costs) by creating incentives to push that data into the commons.
How can economic policy do that? The first best solution to data commons is voluntary contribution, in which the owners and producers of the data—leaving aside the question of who or what that is—voluntarily choose (i.e. without regulatory coercion) to supply the data to a common pool resource (Ostrom 1990), or to make it available at a price (i.e. sell into a data market).
Data pools require both technical and legal solutions to govern joint ownership. One version is direct consumer data sovereignty. This model has been proposed by Tim Berners-Lee in a project called Solid (Social Linked Data), with the technical architecture of data PODS (Personal Online Data Stores). Individuals have control to place data in a commons, or sell directly to the market.
Another model is data trusts—a cooperative to pool data that is governed by trustees with fiduciary responsibility (Coyle et al 2020). Trustees negotiate on behalf of data providers to sell into data markets or place data in data commons, where it is deemed safe and in accordance with the objectives of the trust to do so.
Self-sovereign PODS and data trusts are good architectural solutions where ownership of data can be established on the consumer side (Jones and Tonetti 2020). However, these solutions lack the benefits of automation. A technically feasible advance is to overlay smart contracts into data PODS and DAOs into data trusts. In a DAO governed data trust, control of the data and decisions whether to place it in the commons and under what circumstances would be shaped by token governance and based on a fiduciary constitution. Neither model requires the sort of new intellectual property that we propose here, but they do rely on initial ownership and control of data separated from platforms that collect the data.
Consider a crude approach: Congress passes a law requiring that all large companies collecting data shall make that data publicly available to everyone, on threat of punishment.
How might that work? Because data is non-rivalrous, a legislative requirement to make it open and publicly available is not like nationalization or a taking per se, e.g. of land or a factory, which are rivalrous resources, because the firm would still have the data. They would just make a copy available to the public. Because that data was largely collected as a byproduct of economic activity, forcing forms to give it away does not imply they won’t collect it. For instance, transaction data was collected in the process of making a transaction. The firm can’t help but collect data about the transaction in the process of making a sale. Similarly so for location data, etc. We’re assuming that a firm’s willingness to collect the data in the first place, even if public sharing is mandatory, follows from private incentives in their core business. Of course, additional incentives (say tax credits or direct subsidies, or extended periods of exclusivity) could be designed-in as part of the regulatory bargaining.
However, it is equally obvious that the firm has no incentive to let that happen, because that data is a competitive advantage, i.e. they don’t want competitors getting the same data in real time. In that case, firms would likely deliberately make the data hard to use, releasing it as a giant unstructured uncleaned error-ridden table. We could force firms to release data into the commons, but they would rationally resist, delay, and the outcome may well backfire when only low quality data was released, or firms and platforms went to extra effort to obscure data. It would produce a poor quality resource, with high monitoring costs.
Property right solutions
Our approach builds on a framework of compulsory revealing, but seeks to address fundamental concerns about competitive advantage by offering a period of exclusive use. This is similar in design to patent law, where an exclusive period of usually 21 years is extended, and enforced, in return for immediate disclosure of the idea, creating a temporary monopoly that can be exploited by the patentee through legally sanctioned monopoly pricing. Our proposal for a ‘data patent’ (in a time-locked vault) is a variation on this scheme.
The scheme itself is simple. At the point of production data is streamed into a protocol-controlled vault that is time-locked. A ‘data patent’ is granted that provides exclusive use of that data for a period X. After X time, the raw data (not the processed findings or uses) is put into the commons (the vault is opened). In return, these companies that hold a valid data patent are not in breach of antitrust law.
This model is based on a particular perspective and recognition: namely that data gathered and held exclusively on a platform is a form of monopoly because of its effect—it is a resource that is a nonrival good. If it was made open access that would enhance competition and social welfare, but when held in closed-form this restricts competition and imposes a social welfare loss on innovation.
What to do about that monopoly? From the perspective of antitrust or competition policy law, we ought to enforce anti-monopoly provision and require making that resource open access. The trade-off our proposal suggests is to allow a temporary monopoly—say 21 days—with which to exclusively use that data, i.e. for AI training, etc. But after that period the data must be released to the commons. Failure to do so would trigger an antitrust action—to prosecute the illegal monopoly—potentially resulting in a fine, and an order to release data.
The radical idea here is to bring raw data held monopolistically within the realm of antitrust consideration. When a corporation is found guilty of violating antitrust law, several remedies are common. For egregious monopolistic behavior, criminal penalties may be assessed or civil penalties may be amplified. Future antitrust violations may be prevented by courts granting injunctions against specific courses of action. Consent decrees mandating particular behavior may also be granted by courts.
The basic logic of this equation in the special context of innovation commons suggests some novel antitrust remedies. Specifically, anti-competitive behaviour could be prosecuted to seek restitution not only through monetary fines or criminal prosecution, but through data fines – i.e. an enforcement that requires the company or platform to put data into the commons. Or similarly, the data patent could be revoked, requiring data to be placed in the commons with a more limited or completely annulled exclusivity period.
A safeguard on this remedy is that companies are not the only parties with a record of data collected; users who participate in generating that data in the first place will also know it. There are many specific instances of data-revealing requirements in current regulation that may suggest additional, more general approaches to the problem.
Blockchain fixes this
Blockchain technologies provide the primitives to support such a new type of property right, and to comply with antitrust enforcement.
It would work like this: all data could be placed in a time-locked vault at the moment of collection. This establishes a time stamp on collection, and automatically releases it through a smart contract after a period (say 21 days). Privacy can be assured through zero knowledge proofs that the data exists without revealing data. Moreover, enforcement works through a user being able to prove that they have the data at time t, in the instance that the platform did not put the data into a vault until time t+1. The time-locked contract can facilitate variations in time-lock period or disclosure conditions as directed by an external agency.
Vaults themselves can be DAO governed, with governance apportioned between various stakeholders, including users and public agencies. This balances the desire to maximize the social value of the data while maintaining transparency and accountability (Lane 2020).
Data is a valuable economic resource for production in a digital economy, but it is also an important resource for innovation. Economic theory shows the benefits of data being controlled by users and consumers (rather than platforms) in order to ensure efficient tradeoffs with privacy. Economic theory also supports the benefits of data commons as a way to maximise social value of collected data. Time-locked data vaults are a new sort of intellectual property right over platform-collected data that could offer a solution to this increasingly valuable and useful resource for both production and innovation in a digital economy.
Citation: Jason Potts, A proposal for a new type of intellectual property: Time-locked data vaults, ALTI Forum, February 17, 2022.