Data-data-data: why and how to use data

Posted on 25/08/2016

0


Data are generated everywhere, and about everything. With 2.5 quintillion bytes of data generated per day, we create data like our cells create carbon dioxide – relentlessly.

While most of these data are created and used by machines, people need to understand and use it too. We’re getting to a stage where data skills are going the way of literacy or numeracy – an essential life skill.

Why data analysis is important

Data analytics / data mining is fast becoming a core professional skill. It’s no longer just analysts and scientists who need to use it – whether reviewing time sheets, budgets, marketing campaign analytics or information about our own health.
t-khabaza-applications-of-data-mining-crop

I believe everyone can and should be able to analyse data. Why? Because it’s a skill human brain’s evolved to do. Every day our minds access:

  1. Information the world presents us, against
  2. Our knowledge and experience of the world, in order to make
  3. Judgements and decisions.

The process of analysing data is approximately the same how our every day skills of perception work. You combine data and its trends with your industry knowledge to decide what actions to take. Analysis of data is simply informed perception.

How to analyse data

1. Brush up that GCSE mathematics

To get up to speed might need you a refresh of a few skills (the mathematics and statistics we learnt at school). But it’s easy enough to do and you can even do it from the secrecy of your own home – GCSE Bitesize; Coursera; Khan Academy etc.

2. Follow a process

But it’s not a straightforward process.

t-khabaza-myth of data mining-crop

Data analysis is iterative and incremental. One finding, leads to another line of enquiry, modelling and evaluation. Much like this: (one of the leading methods – the CRISP-DM methodology. More here).

CRISP-DM_Process_Diagram

3. Understand why we analyse data
Data mining is the process of searching for as yet unknown connections. But it’s not just about finding any old connection – the connection needs to help you better understand something, and guide you towards doing something else as a result.

Turns out, there’s been a lot of thinking put into this – there are a whole 9 laws of data mining, defined by this guy, Tom Khabaza. In essence:

  1. Business objectives are the origin of every data mining solution
  2. Business knowledge is central to every step of the data mining process
  3. Data preparation is more than half of every data mining process
  4. The right model for a given application can only be discovered by experiment, aka “there is no free lunch for the data miner”
  5. There are always patterns
  6. Data mining amplifies perceptions in the business domain
  7. Prediction increases information locally by generalisation
  8. The value of data mining results is not determined by the accuracy or stability of predictive models
  9. All patterns are subject to change

Data are increasingly a part of life. If we ignore it, if we refuse to question data and the decisions people base on it, we risk losing our ability to understand the world. Don’t just believe what you read – explore, test and evaluate for yourself.

Posted in: data, Digital