THE ECONOMIC VALUE OF DATA IN BIG DATA
Attaching an economic value to the data in big data is a difficult task, as this data is used in many situations and contexts for many different purposes as discussed above. Arguably, approximating the specific value of big data is only possible in specific cases.
There are at least two potentially differing ways to define value in big data: from a provider perspective (supply) and from a user perspective (demand).25.2.1 Value for Providers
In the case of providers three different approaches for approximating monetary values of data exist: valuation based on company reports, valuation according to observed market prices, and ‘production cost’ (i.e., the cost of obtaining and/or preparing data) (Feijoo et al., 2013). These monetary transactions and market valuations can serve as an initial base for further understanding the economics of data, although each of the methods has strengths and weaknesses.
Provided that datasets are capitalized and thus appear on the balance sheet as assets, a direct way of approaching the value of data would be to rely on companies’ own accounts. Unfortunately, this is rarely the case and most companies that ground their businesses on the commercial use of data opt not to disclose this type of information.
The valuation of data assets based on financial figures such as market capitalization and revenues is easier and more straightforward given that most companies, whether publicly traded or not, report such figures. However, several challenges related to the use of financial company information for approximating the value of data exist. First, a wealth of other company (internal) components beyond data influence the firm’s financial results, such as its human and physical capital stock, volume of other intangibles, expertise/know-how, and so on. The same holds for approximations relying on the measures of intangible assets and/or derived from the goodwill of a company.
Moreover, the financial results of a company are also influenced by external factors, such as market trends, random shocks, and speculation. This means that measures of data value will be imprecise and fluctuate over time in response to general market sentiments or speculative activity rather than according to the intrinsic value of the data. Overall, the revenue per data record has been recently suggested as the most appropriate approximation of economic returns to data. Where future earnings attributed to the data can be considered and appropriately discounted a net present value may be determined. Feijoo et al. (2014) applied this approach to the largest collectors of personal data. According to their results, figures for the value of personal profiles ranged from US$8 to US$43, although these figures vary considerably from region to region.Several data brokers publish their retail prices for various types of data records. These prices reflect at least partially the real market price for obtaining specific data in a given market. The observed retail price of a record incorporates several economic components. These include the (marginal) value of the data (which might be different from the average price of an individual record if purchased as part of a comprehensive database), the costs incurred in generating the data, as well as current and future revenues achievable utilizing the data. The price will also be influenced by considerations concerning the competitive use of the information and how the corresponding business activity may evolve in future. Moreover, the price at which data are exchanged in an open market relates to a specific context. The quality of data provided by data brokers cannot be verified ex ante and can be flawed or inaccurate. Data are also exchanged on illegal markets but these are difficult to collect. Given the illegal nature of the goods and services offered, the prices of offers and deals will never be fully transparent and thus they are difficult to measure.
As a result, estimations based on such transactions are subject to biases due to unrepresentative samples.From the perspective of producing big data, that is, from a cost perspective and given the current stage in the evolution of data markets, big data requires considerable investment in the infrastructure for acquisition and storage, as well as in tools for management and analytics. It is, therefore, similar to typical ICT industries characterized by high fixed costs and low incremental costs. The cost structure, however, could be different if infrastructure and tools are shared in some way and/or the prices for such infrastructure and tools drop significantly in the future.
25.2.2 Value for Users
Particular types of data are - supposedly - controlled by their owners that have to - should - grant access to it. In those cases, assuming that markets are not fully developed, a completely different way of associating an economic value to data is to attempt to approximate the price a firm would need to pay in exchange for obtaining them.
This is particularly the case for personal data. In recent years, several experimental studies attempting to quantify individual valuations of personal data in diverse contexts have been conducted. Even though this research remains at a preliminary stage, two general messages can already be extracted. First, people differ with respect to their individual valuation of personal data (measured by the monetary compensation sufficient for them to divulge personal information) and with respect to their individual valuation of privacy (i.e., the amount of money they are prepared to spend in order to protect their personal data from disclosure) (Grossklags et al., 2007). Second, theoretical and empirical studies point out that both the valuation of privacy and the valuation of personal data are extremely sensitive to contextual effects (Nissenbaum, 2010).
To deepen the understanding of data value from the demand side several additional methods have been proposed. One is the application of a method widely used in areas such as transport or health, known as the stated preference discrete choice experiment (SPDCE) approach, which can be used to estimate the value of personal data in real-life contexts and situations (Potoglou et al., 2013). Another is to conduct laboratory experiments (Jentzsch et al., 2012). In the case of individuals, the SPCDE approach reveals users’ latent utility function, allowing for the gauging of respondents’ welfare over a range of circumstances in relation to a range of services that imply the disclosure and/ or collection of personal information. These methods, although promising, include two potential sources of bias: the data may lack market verification - hence the need for a realistic setting of the experiment - and they may capture the individually perceived cost of damage caused by data breaches rather than value of the data themselves.
25.3