I _Really_ Don't Know

A low-frequency blog by Rob Styles

Linked Data, Big Data and Your Data

Week five of my new challenge and I figured I really should get around to scribbling down some thoughts. I talked in my last post about RDF and graphs being useful inside the enterprise; and here I am, inside an enterprise.

Callcredit is a data business. A mixture of credit reference agency (CRA) and consumer data specialists. As the youngest of the UK’s CRAs, 12 years old, it has built an enviable position and is one of few businesses growing strongly even in the current climate. I’ve worked with CRAs from the outside, during my time at Internet bank Egg. From inside there’s a lot to learn and some interesting opportunities.

Being a CRA, we hold a lot of data about the UK population – you. Some of this comes from the electoral roll, much of it from the banks. Banks share their data with the three CRAs in order to help prevent fraud and lower the risk of lending. We know quite a lot about you.

Actually, if you want to see what we know, check out your free credit report from Noddle – part of the group.

Given the kind of data we hold, you’d hope that we’re pretty strict about security and access. I was pleased to see that everyone is. Even the data that is classed as public record is well looked after; there’s a very healthy respect for privacy and security here.

The flip side to that is wanting those who should have access to be able to do their job the best way possible; and that’s where big data tools come in.

As in my previous post, variety in the data is a key component here. Data comes from lots of different places and folks here are already expert at correcting, matching and making consistent. Volume also plays a part. Current RDBMS systems here have in excess of 100 tables tracking not only data about you but also provenance data so we know where information came from and audit data so we know who’s been using it.

Over the past few weeks I’ve been working with the team here to design and start building a new product using a mix of Hadoop and Big Data® for the data tiers and ASP.net for the web UI, using Rob Vesse’s dotNetRDF. The product is commercially sensitive so I can’t tell you much about that yet, but I’ll be blogging some stuff about the technology and approaches we’re using as I can.