Back

What is ‘Big Data’? And why should I care?

If you’re reading this on your laptop or on your phone, take a second to double check that your wifi is on and you’re not wasting data. Now, is your wifi on? What about your Bluetooth, and your location? How many devices around you right now have a sensor of some sort? Any wireless headphones? Speakers?

You may have heard that Google knows us better than we know ourselves, this is due to the enormous amounts of data that we are constantly - and inadvertently - generating. In fact, 90 percent of the data in the world today has been created in the past two years, and data is forecasted to double every two years. We do not realise that, if someone wanted to (and had average hacking expertise) they could know everything about our lives, up to the smallest detail. It is true that nobody cares enough to hack into your alarm clock to know at what time you wake up (or how many times you hit the snooze button), but what if they knew what time you wake up, what you have for breakfast, where you park, what coffee you like, what online stores you frequent, your shoe size, your mom’s phone number? This is where big data comes in.

 

Just tell me what ‘Big Data’ is!

Big data refers to the process of acquiring information through synthesising extremely large volumes of data. The key aspect is not its volume, but what’s done with it. Data sets may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions. Once these large amounts of data are synthesised, we are able to do things like determine consumer choices, foresee the spread of new diseases or even predict at what time a certain person will wake up this Saturday. This means that, rather than directing our attention on the relationships between individual pieces of data, Big Data uses algorithms and other techniques to infer general trends over an entire set of data: It looks at correlation (what) rather than the causation (why).

What data counts as ‘Big Data’ will depend on what is useful for a particular enterprise or object, meaning that big data sets can vary greatly. However, three (or five, or seven, depending on who you ask) characteristics have been identified. The most important ones are Volume, Variety, Velocity and Veracity. The volume of the data obviously refers to the quantity that is generated and stored as this will determine the potential value of the data set, the more data the more valuable as it is generally more representative. Variety refers to the type and nature of the data; a greater variety means it has covered more bases and thus is more accurate to reality. Velocity means the speed at which it is generated, processed and shared; the importance of this variable will depend on the object of the inquiry, as some enterprises require quick feedback more than others. Veracity is arguably the next most important one, as consistency and the quality of data is of utmost importance when drawing large conclusions.

Our ‘digital footprint’ allows the entities that process the information to see us in a new way: rather than the image that we want to project they see us based on our likes and dislikes, perhaps even unknown to ourselves. With every year that passes we become more interconnected, to the point where interconnectedness has become the norm. We often make claims like “Guys, check this out, I can make toast with my phone” without understanding the repercussions of phone-toast: Our toaster and phone are communicating through a very unprotected channel, accessible to many. This information is extremely valuable because it can target and even foresee trends in our lives, which could have all sorts of implications, ranging from economical to medical. As technologies develop each time to be more interconnected, it is likely that this trend will continue to rise.

We are aware of the fact that they want our information and we still have autonomy over the decision to share it or not, right? Well no, not exactly.

 

Repercussions of "Phone-Toast"

The biggest problem with Big Data is what other people might gather from the information we generate. Jobs could easily make an algorithm with your public information such as our ‘last seen’ on WhatsApp or number of social media posts after 00:00, to determine the number of nights-out you spend a week. Insurance companies could also access this data to determine what sort of lifestyle you live and decide not to give you insurance. We are seeing this behaviour also in banks (in credit scoring), hospitals and even in the tourism industry (a phone number associated to a 20-year-old British man in Barcelona will be exposed to completely different touristic advertisements than that of a 60-year-old Japanese lady).

These fantasy situations may not be scary as of now. We are aware of the fact that they want our information and we still have autonomy over the decision to share it or not, right? Well no, not exactly. For example, many of us are unaware that, every time we check a review, a price, or just check-out a product online, our activity is being monitored. Little things like how much time we spend on the product or how much money are we willing to ‘impulse spend’, are just some of the millions of bytes of information that we are giving companies just by accessing certain information online. Did you know that this information determines prices on Black Friday? Or that the use of cookies means that, in the end, the media, information and products we consume are being catered to us depending on a "profile of us" that has been created? Does this mean that our ability to chose, and therefore our personalities, are being limited to a set of categories?  The internet used to be thought as "The Great Unifier", but it appears that in the end it is going to be the tool that efficiently and mathematically puts people in boxes.

You’ll be waving your fists to the chant of ‘this should be illegal’ at this point - I hope. But it is no... not yet. This is a new field and it is up to us citizens to mobilise and protect ourselves (I urge you to check out the cases brought forth by Max Schrems, if you haven’t done so yet). As your lawyer I would recommend that you know your rights and keep your mouth shut, but this is the internet and we can’t keep our mouths shut, so at least know your rights.

How do I protect myself?

If you’re inside the European Union, good news! The EU is already looking out for our needs and has enacted the GDPR (General Data Protection Regulation) which will come into effect in May of this year. This of course acts together with the rest of the EU legal framework for Big Data (included in the last Data Protection Directive: Directive 95/45/EC). One of the most important new advances in this new directive is that of purpose limitation. This just means that there has to be an established and legitimate purpose for the processing of data. This is an essential first step since it allows for transparency and legal certainty as well as limiting how data controllers are able to use personal data. For example, if an organisation wanted to analyse or predict your preferences (you being an individual customer) in order to take measures, they would need free, specific informed consent which you would have to ‘opt in’ (no more pre-ticked boxes that you forget to un-click). The consent you’re given must also be informed, which means that the company must disclose their decisional criteria because often the inferences drawn from our data are much more important than the data itself. If the same company wanted only to detect trends and correlation and thus would not affect you, then they would need functional separation: your data cannot determine how they’ll cater products to you. This entails that your data is anonymised (fully or partially when full anonymisation is not feasible) combined with other techniques such as key coding or key-hashing, pseudonymisation or encryption.

Many organisations such as the Article 29 Data Protection Working Party (advisory body made up of a representative from the data protection authority of each EU Member State) issue statements to help interpret this new legislation. For example, on September 2014 they issued a statement on the impact on the development of Big Data regarding the protection of the individuals, taking into account the processing of their personal data in the EU. They stress the need to start international cooperation with other competent authorities because only the simultaneous application of different legal frameworks (regional, national and international) will allow the correct upholding of the rights recognised by European law - those, for example, of transparency, right of access, rectification, objection or right to be forgotten (Mario Costeja - (Case C‑131/12) is another example of citizen activism to protect our rights).

In October of 2017, the Working Party issued another statement where they review the 3 different types of profiling distinguished by the GDPR. In addition to ‘general profiling’ they now distinguish between ‘decision making profiling’, when a human decides something based on a profile made by an algorithm, and ‘solely automated decision making’, where decisions are made without human input. The second is regulated by Article 22.1 of the GDPR whereas the third is generally banned. This right to not have a decision made solely by a computer is added to the two ‘reinforced’ rights: the right of access (under 15 (1) (h)) and the right to be adequately informed (enshrined under Article 13 (2) (f) and 14 (2) (g).

People ought to be informed that their data is being used and should be given adequate information in order to make an educated decision.

Moving on to the USA. Under the Obama administration, American doctrine on the matter was slowly converging with European legislation. Under Obama, PCAST (President's Council of Advisors on Science and Technology) wrote that “[t]he challenges to privacy arise because technologies collect so much data (e.g., from sensors in everything from phones to parking lots) and analyse them so efficiently (e.g., through data mining and other kinds of analytics) that it is possible to learn far more than most people had anticipated or can anticipate given continuing progress. These challenges are compounded by limitations on traditional technologies used to protect privacy (such as de-identification). PCAST concludes that technology alone cannot protect privacy, and policy intended to protect privacy needs to reflect what is (and is not) technologically feasible.This extract - I believe- captures the essence of the GDPR and previous European regulations, which sought to regulate to avoid abuses. These ideas can also be seen under the FTC’s (Federal Trade Commission) report on Big Data (do read it if you’re bored, it explains more or less everything in this article in much more depth. Under Trump, however, the situation has completely reverted. The Trump administration is opting to undo all of the steps that Obama took and is trying to take steps in the opposite direction: De-regulating and allowing the market to take care of it all.

We will have to wait to see which policy ends up being the best for individuals and companies. New technologies are developing at such a rapid pace that legislators are struggling greatly to attempt to foresee the problems and quickly create legislation to prevent abuses. The future is very uncertain for us as individuals, so the best thing that we can do as citizens is to take a page out of Schrems', Costeja's or even Megakini’s book and take our safety into our own hands. And I reiterate, as your lawyer: know your rights and keep your metaphorical online mouths shut.

Article written by: Irene Duque Femenia, 4th Year LLB Student

Assignment: Disruption and Technology in the Legal Markets

Professor: Cristina Sirera

IE LAW SCHOOL