Big Data - chances and risks

At every point in our life, more or less consciously, we leave some traces – our data. A visit on a website, using a device at home, shopping with credit card, clicking ‘like’ on Facebook. There are a lot of data sources. This data – often unstructured – is not only stored, but also processed and analyzed.

The volume of the produced data is rising at an impressive pace: it is predicted that the data growth in 2018 will stand at 50.000 GB/s. However, in 2002 it was ‘only’ 100 GB/s. This significant availability of data caused popularity of a new term– ‘big data’. It corresponds to data that falls under following rules:

  1. Big volume of data
  2. Big variety of data
  3. High velocity at which new data is generated
  4. Significant value of knowledge that comes from analyzing the data

In order to prepare the data for an analysis appropriately, there are commonly used tools such as Apache Spark and Apache Hadoop. R and Python are recommended for processing and analysis of the data, whereas for the final stage – data visualization – business intelligence tools like Tableau and QlikView can be used.

Big data applies in almost every sector and each area of everyday life, from industry, banking, insurance to trade, communication and politics. In this article we will look at opportunities of big data, but also at risks that are connected with it.

Big data brings a lot of chances…

Thanks to applying big data in enterprises it is possible to increase the competitive advantage of the company – data analysis helps to make optimal decisions.

One of the possibilities that big data offers is on-line marketing personalization: basing on products in which the customer was interested in the past, the advertiser (for example, Google or Amazon) shows a specially tailored offer that should be well received by customers. In particular, big data can help with creating offers for customers that the company lost. This process can be reinforced by individual pricing and promotions.

An individual approach to the customer can be also observed in banking and insurance – in the analysis of the customer’s creditworthiness and in the evaluation of the risk of a security incident. Using algorithms and models built on the basis of data from the past it is possible to determine whether a customer with given characteristics – for instance – will not pay the credit off. It is a win-win situation: on the one hand, reliable customers increase their chance to get the credit, and on the other hand the bank avoids giving credits that would be unpaid.

Another example of a highly needed usage of big data is foreseeing of epidemics. A project in this area was initiated by Google – Google Flu Trends. Using phrases connected with flu that web users entered into the search engine, Google created a model that showed the level of threat of an epidemic for the given area. The comparison between the model’s forecast and real data proved a high correlation (approx. 0,8). What is more, the Google’s predictions were about ten days ahead the real data, which can have a key impact on epidemic prevention in the future.

…but also threats

In spite of the opportunities mentioned above that can positively influence the company’s performance and everyday life of ordinary people, big data is related to some potential threats. There is a risk that companies will gather data violating our privacy. The previous example with banking can be continued here: in order to check our creditworthiness, the bank can misuse data (e.g. photos that indicate a reckless or risky behavior of the customer) that was posted in social media.

Another problem is data discrimination – treating of a particular group in a different way due to a specific feature. This term may concern employees that can be omitted during the hiring process because of a decision given by the algorithm based on big data technology. Nevertheless, discrimination can also occur in case of customers, for example on ground of race, gender or political opinions.

Moreover, big data can possibly have negative influence on variety of information that we receive via social media and web search engines. Things that we once liked in the past, or things for which we searched before will influence information that will be delivered in the future. This phenomenon is called ‘filter bubble’. It leads not only to narrowing of the worldview and interests, but it can also discourage users to compare opinions on various subjects.

Conclusions

Should we cease using big data because of these potential threats? The answer for this question should be negative – big data brings too many advantages to ignore them. Some of the problematic issues can be avoided by supervision (internal or external) over the way big data is used. On the other hand, users should be aware that their behavior in the past will have impact on the content that the service provider will show them in the future.

Karolina Stelmaszek 

Analyst

Beginning analyst at Hycom and fresh graduate of Quantitative Methods and Information Systems at the Warsaw School of Economics. She spends her free time learning foreign languages (especially German), she’s also enthusiastic about dance, theatre and musicals. 

let's make something great together!

start a project