top of page

Why Collect Data?

Part of Datacop's Blog Series on Data Science in the Digital Economy (#2)

Today, the market for technology that enables organisations to collect and process digital data is immense and growing rapidly. In its 2019 Global Cloud Computing Report, MarketLine estimates the size of the market at 153.9bn USD in 2018. That is a staggering size already and MarketLine expects it to grow to 539bn USD by 2023. For context, the annual GDP of Slovakia was 105bn USD in 2019. The market is dominated by a number of technology firms such as: Amazon Web Services, Google Cloud, Microsoft Azure and Alibaba Cloud. These companies have developed the digital infrastructure for thousands of SaaS companies that provide a variety of different data collection services specialising in different digital verticals. These can include e-commerce, digital apps, finance, social media, entertainment, internet of things, etc.

Figure 1.1: Global Cloud Computing Market (2014 - 2018) In this four-year period the market grew by more than $100bn.

Why do we invest such a large amount of resources in our economy to collect data? Why do digital companies allocate a significant portion of their budget to data infrastructure? If you have been in meetings, where data is discussed, how often do you hear the why discussed behind the data collection?

Well, the short answer is that collecting data enables us to observe and make useful decisions on things that we cannot do so with our senses. Specifically, data collection stretches human capacity of knowing in three ways: scale, time and online. This takes on a variety of shapes depending on the context in which we are discussing data collection. This blog post will explore those perspectives.

We collect data to know what is happening at scale.

Firstly, data collection enables us to observe things at a new dimension - scale. Humans ability to subitize, or the capacity to instantly count up items, is around 3 to 5 items at once. Any more than that and most of us have to start counting. Taking a record of a count of things would be the first types of data collection humans did. Once humans started to settle down and developed agriculture we started facing a lot of critical questions such as: How many bushels of food do we have? How many citizens live in our settlement? How large is that field? Facing these questions we see even tens of thousands of years ago The answers to these questions forced us to start relying on recorded data, as it was impossible for us to answer with our senses alone. This in turn enabled us to make better decisions on matters regarding the size of the farming yield, planning food storage and counting the size of a settlement. Following from this, we can see that it is a human technology that has been enabling human development and enterprise ever since we settled down to farm. Just like any other human technology, data is only as good as the ones using it.

Figure 1.2 Example of an unfilled template of a “letter of credit” a form of storing value in pre-modern financial systems. The use of letters of credits in Europe, are known to be used as far back as the Medieval Ages.

We collect data to know what is happening over time.

Data collection regarding time enables us to conceptualise the dimensions of time. Unless humans commit a piece of information to long-term memory, our short-term memory is accurate for approximately 20 to 30 seconds. In general, there is no debate regarding the fact that human memory decays over time and is less reliable as time continues. Once humans started to collect data regarding time, we could start conceptualising things like age and debt. We could plan our day out better. We could start trying to predict the future based on the events of the past. Again it enabled us to make better decisions. I am sure you are used to using data regarding time to make decisions.

An often overlooked added value of data is its potential for productivity benefits for human collaboration. Consider this example with time. The standardisation of how we communicate time enabled the construction of the assembly line - a collaborative effort impossible without the coordination of hundreds, sometimes thousands of people. When London and Edinburgh were connected by the first trains, they realised that one city is “17 minutes” ahead of London. The subsequent need to harmonise the perception of time globally created the modern framework of thinking about time. Just as today, it took time to create the innovations that enabled us to take full advantage of the data; which in this case was “simply”, time. Consider that it took 14 years to have a publicly available timetable for trains, after the first regular train connection was set up in 1825. Applying the insights from this historical parallel into today begs the question - what potential benefits for human collaboration do we have once we figure out how to standardise the communication about data across society in the coming decades? Well, that sounds like something our children may deal with. For now, it would be cool if teams within the same organisations would figure out how to standardise their communication about data. What productivity benefits would those organisations reap?

Figure 1.3: Bradshaw’s Timetable from 1850s, Britain. Bradshaw’s was a commercially successful series of travel guides and timetables. These were all a spin-off from the first railway timetable Bradshaw developed in 1839. The publishing company that spawned from that success closed operations in 1961.

We collect data to know what is happening online.

Only recently did data collection started to be interchangeable with “digital data collection”. Data collection enabled us to perceive what is happening online. Unlike the analogue world, digital interactions can’t be perceived. Managers of websites have to rely on data collection to make sense of what is going on on the site. Who are their customers? Where do they connect from? The needs behind why data is collected have changed over time as the internet has developed from a helpful tool for early web developers to a fundamental pillar of every digital firm’s existence and a deciding factor in the digital competitive landscape.

This is the modern answer to why we collect data. Data is a core strategic asset for digital firms. The larger and more mature a certain digital industry or digital company the more relevant data collection becomes. In many digital verticals such as e-commerce, social media, finance, entertainment, news platforms and applications - the ability to extract as much insights and value from their data is the key differentiator to success. It is also important to be mindful of the minimum limit, when data starts becoming really relevant. For instance, in an e-commerce company, investing into your data will return limited rewards until you reach at least 1m EUR of annual revenue. This number is a rough rule of thumb and it depends between different companies and verticals. If your ambition as a digital company is to grow over time, then you need to have a good data strategy for your company in place.

In summary, data collection acts as an additional human sense of perception. Processing of the data, in turn, allows humans to make informed and effective decisions in abstract areas beyond our biological ability to perceive. When the internet came about, slowly these tools started to be adopted by digital professionals, who needed to make sense of the many business questions digital firms face and can only be answered by data collection. Digital data science in itself has been a very dynamically evolving field as the complexity of the web was increasing rapidly over the past 3 decades. This is because the pace of the digital revolution has been rapid. By 2019, less than three decades after the Internet became a commercialised technology, 50% of the global population or 3,5bn+ people, were connected.


Brief History of Digital Data Collection: Part of Datacop's Weekly Blog Series on Data Science in the Digital Economy (#3)

Tune in to next week’s article, in which we are going to look at how the function of data collection changed in the digital economy since the Internet first came about. We will cover how data collection went from an obscure study to the “new oil” powering an entirely new economy online. We will examine the key innovations of each developmental stage and the key business pressures motivating the adoption of these innovations. Lastly, we will illustrate key challenges digital players are facing in the new digital economy that has emerged by the late 2010s.

78 views0 comments

Recent Posts

See All

Part of Datacop's Blog Series on Data Science in the Digital Economy (#5) A version of this article was posted in Slovak in Bridge, the Ecommerce Magazine in the CZ/SK market. In our work at Datacop

bottom of page