How to Assess Your Big Data Quality
Updated: Feb 7
Quality over quantity: A mantra we have been hearing floating around the industry for a very long time. But when the quantity of data you receive is in such abundance, how will you determine your big data quality?
Everyone knows that bad data is troublesome. It wastes time, increases costs, weakens decision making and makes making executing any strategy very difficult. Yet very few managers have any real appreciation for good quality data. To address this issue, there is something called the Friday Afternoon Measurement (FAM) Method.
In a study conducted in Ireland, with a group of executive belonging to different fields, they asked managers to assemble 10-15 critical data attributes for the last 100 units of work completed by their departments. They combed through each record, spotting obvious errors. A total number of error-free records is noted; this number presents the percentage of data entered correctly - it’s their Data Quality (DQ) score.
Over the past 2 years, 75 managers have been taken and the results have yielded 75 data quality measurements. These provide some real opportunity to access just how good or bad your data is.
While the analysis painted a really bad picture of the data, here are some findings that stood out-
On an average, 47% of newly-created data records have at least one critical (e.g., work-impacting) error.
A quarter of the score in the samples was below 30% and half are below 57%. In today’s world where business and data are intertwined tightly, none can claim that they are functioning effectively in the face of a poor quality of data. It’s is hard to understand how businesses can survive under such conditions.
Only 3% of the DQ scores in our study can be rated “acceptable” using the loosest-possible standard.
When managers were asked how good their data needs to be and how much an error costs them, none ever thought that a score less than the high nineties were acceptable.
Reality? Well, that’s a completely different story.
Less than 3% of the sample met this standard. For the vast majority, the problem is quite severe.
How to try out the Friday Afternoon Measurement (FAM) Method
This method helps you to easily measure the level of big data quality, develop a high-level estimate of its impact and synthesize the results. It adapts well to different companies, processes, and sets of data. And it can be done in these 4 simple steps-
Assemble the last 100 data records you created or used. For example, if your department takes customer orders assemble the last 100 orders if you are into creating engineering drawing, assemble them. Then focus on 10-15 critical data elements and lay them out.
Ask data experts to join you over for a meeting, where you can ponder over the results with different opinions.
Work record by record and ask your colleagues to mark your most obvious errors. For the most time, this will be very quick. Your team will either spot errors like misspelled customer names as they won’t. But don’t expect to spend more than 30 seconds on a record.
Summarize your results to see how many perfect records you have. And the number of correct records will be your percentage figure for data quality.
These findings will confirm your data quality problem. And if you really want to understand the problem better, go one step further. Imagine the cost of every unit of data to be 1. The number of incorrect data problems you have is the amount of money you lose. For example, if you have 67 records with errors out of 100, it means your days earning was only 33.
This tool is simple and gives excellent insights into where you are going wrong in your strategy and data. For more info on quality data, contact Datahut, your big data experts.