BACKGROUND AND DEFINITIONS
As a general concept, big data loosely refers to datasets exceeding a certain size, although there is no widely agreed formal or informal threshold above which a dataset shall be considered ‘big’.
As of 2014 there seems to be tacit agreement that this value is at least in the terabyte range and that it is increasing rapidly. The term big data is also invoked in reference to the huge amounts of varied data stored by the public and private sectors in the course of their regular activities. Both ideas - size and/or heterogeneity - hint at the challenges of managing the data as well as the promises embedded in it. Alluding to new economic and social opportunities the term is frequently invoked in policy declarations, initiatives and scientific as well as non-scientific documents.A more technical approach defines big data as ‘data whose size forces us to look beyond the tried-and true methods for data analysis that are prevalent at that time’ (Jacobs, 2009). In other words, ‘big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze’ (Manyika et al., 2011). Obviously this definition is a moving target and will change along with the evolution of ‘typical database software’. Furthermore, size is just one parameter when pronouncing a dataset as big data. Another important dimension to define big data relates to the structural construct of the dataset. Traditional relational database management systems have in-built capabilities to store, manage and analyze quite large datasets subject to one important attribute: that the datasets are sufficiently structured. Significant amounts of data created nowadays via social media, devices and sensors in smart cities, public agencies, and so on, are highly unstructured, thus requiring different technologies to store, manage, and analyze them.
A more eloquent definition for big data therefore is ‘a large dataset combining structured and unstructured data, which cannot be managed and/or analyzed by legacy - conventional - database management methodologies’.These technical approaches are complemented by a strand of definitions adopting an economic perspective. For instance the BBVA banking corporation defines big data as ‘a set of processes, technologies and business models that are based on data and on capturing the value hidden in the data itself’.1 From such a value and business point of view, big data could then be considered as a new class of economic asset (World Economic Forum, 2012) and the basis of ‘a drift toward data-driven discovery and decision-making’ (Lohr, 2012).
Summarizing and restating what has been discussed, big data can be characterized by specific properties or ‘dimensions’, most importantly:2
510
• Volume: The threshold above which a dataset could be considered big data. Some experts suggest the idea that every database surpassing the capacity of one single - contemporary - ordinary computer can be considered as big data.3
• Variety: The different types of data involved in big data, including non-structured data.
• Velocity: The need to manage and analyze big data as much as possible in real time or near-real time.4
• Veracity and/or validity: In contrast to the traditional statistical inference, implications of data sampling are not the origin of the relevant issues in big data analytics, because there is no scarcity of data; rather, the opposite.
Around the term big data other concepts such as data science, data mining and data visualization have arisen. Probably ‘data science’ should be highlighted as the global term encompassing all the others including big data. Data science is different from statistics and other similar disciplines because of the increasingly heterogeneous and unstructured nature of the data and the new processes required to analyze it (Dhar, 2013).
Big data needs to be acquired, ingested, processed, persisted, integrated, analyzed and exposed (Akred et al., 2013) to produce results. Thus, ‘the real issue is not that you are acquiring large amounts of data. It’s what you do with the data that counts’.5 In Hal Varian’s words: ‘the complimentary [sic] scarce factor is the ability to understand that data and extract value from it’ (Varian, 2009). In any case, the technologies that have made possible the emergence of big data are different from conventional techniques and maybe this is the ultimate reason why the term has prevailed. Other similar terms such as business intelligence and analytics (Chen et al., 2012) are losing momentum compared to big data. Moreover, it is worth mentioning that big data is intricately linked with the ‘app economy’. It may even be considered as part of it (Mulligan and Card, 2014). Many applications and the business models that support them are based on the exploitation of data about users and related to users.Big data has been also heralded as contributing to a major transformation in the methodology of scientific work. Traditional notions of causality may be (at least, partially) replaced by a focus on correlation, a shift from knowing ‘why’ to only knowing ‘what’ (Mayer-Schonberger and Cukier, 2013). Others speak of a fourth paradigm in science, based on data-intensive computing where data would be stored in some archival media - like libraries for paper-based storage - and be publicly accessible in the cloud to humans and machines to extract knowledge out of it (e.g., Hey et al., 2009).
Big data is also linked to concepts such as the ‘Internet of Things’ (IoT) and ‘open data’, which are becoming increasingly popular on their own. The IoT is essentially a network of sensors and devices that is fully compatible and accessible from the Internet. A network of ‘countless digital sensors worldwide in industrial equipment, automobiles, electrical meters and shipping crates’ that ‘can measure and communicate location, movement, vibration, temperature, humidity, even chemical changes in the air’ (Lohr, 2012) is an obvious source of big data.
Open data makes reference, according to the Open Data Institute,6 to ‘data that is available and can be reused by anyone at no cost, subjected to a pre-defined license under which, the user/distributor of the data has to provide appropriate credit to the primary owner of the data’. Murray-Rust (2008) contends that open data in science is published data that should be available to the scientific community for reuse with proper crediting to the primary owner, but should not include any copyright or monetary value associated with its reuse.A number of non-profit foundations and organizations define open data, or sometimes ‘open access to data’, with different terminologies, but mostly revolve around the two constructs of open standards/interoperability and freedom to use/reuse. Many governments around world are providing open data through different agencies such as the European Union Open Data Portal,7 the US Government General Service Administration’s Open Government Initiative,8 the UK Government,9 and international organizations that study its impact such as the Organisation for Economic Co-operation and Development (OECD).10 It is worth noting that this data displays in different types, formats, structures, file systems, and so on, that are not necessarily compatible with each other. The combination of open with big data, that is massive open data along with varied levels of complexity, results in ‘open big data’ (Marton et al., 2013). Last but not least, in parallel with this relatively new discipline ‘a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data’.11
Aware of the considerable hype about the big data concept, available figures from research institutions and market analysts nonetheless talk about its huge potential. There is some evidence indicating that data-driven decision-makers are better off (McAfee and Brynjolfsson, 2012).
To provide perspective, industry analysts estimate that a typical corporation of 1000 employees has about 200 TB of stored data and that the total amount of data stored in companies amounts to some tens of exabytes (Manyika et al., 2011). Industry forecasts (Cisco, Gartner, or McKinsey) put this figure at hundreds of exabytes by 2016. The same sources suggest that banking will take the largest share of big data (about 25 percent). It will be followed by services (15 percent), manufacturing (15 percent) and government (12 percent). At a macroeconomic level, the Warsaw Institute for Economic Studies estimates that big and open data will contribute 1.9 percent of EU-28 GDP by 2020 (Buchholtz et al., 2014). The forecast anticipates that trade will contribute 23 percent of this total, manufacturing 22 percent, finance and insurance 13 percent, public administration 13 percent, and health and social care 5 percent.Given the emergent nature of the big data domain, it is no wonder that from an economic perspective it is still a field with more questions than answers. Main topics that slowly start to appear in the scientific literature and the research roadmaps are, among others, issues related to the economic value of data12 and their impact on growth, jobs and the quality of life, the analysis of the structure of this emerging industry and its implications for innovation and competition, and the application of new and existing economic theories to explain its dynamic behavior (see, among others, Feijoo et al., 2013 on personal data). Beyond the narrower economic aspects, considerable uncertainties prevail over the overall effects of the intensive usage of big data on society. boyd and Crawford (2012) rightly state that it is time to start critically interrogating the phenomenon of big data as well as its assumptions and its biases.
Within the framework described in this introductory section this chapter explores the emerging domain of big data economics. The next section describes the features of the big data ecosystem, the main players, and their relationships. From there different economic approaches are used to explore the big data market, its dynamics and the value of data within it. Opportunities and challenges for both researchers and marketers are presented in section 25.4. Conclusions with a policy view close the chapter.
25.2