Measuring Data Quality
With a strong data analytics strategy, SMEs can grow their capability to enhance performance, incorporate fact-based insights and make more-effective decisions. However, such strategies are only as good as the quality of data that is used to fuel them.
Good-quality data enables analysis to be made more accurate, processes made more efficient and automated, and that historical information can be effectively collated and potentially used to identify further opportunities.
Poor-quality data can be damaging not only for analytics initiatives, but also for the business as a whole. The following are some examples of the areas poor-quality data can have a negative impact:
- Inability to identify high-value customers
- Time and cost of remediating data inaccuracies
- Inability to identify suppliers for spend analysis
• Poor customer interactions due to inaccurate data
• Inability to provide unified billing to customers
• Poor decision-making for price-setting
• Inability to automate routine tasks
• Increased need for reconciliation of reports
• Poor supply chain-management
• Inability to perform accurate risk assessments
• Lack of compliance with regulation
• Privacy or data protection violations.
As SMEs are often less likely to be governed by strict regulation and have fewer IT resources, they typically are only beginning to harness analytics and they do not view data quality as a “must have” process. But the continued growth of data as a strategic enabler for organisations of all sizes means that data quality cannot be ignored. While the extent to which organisations monitor and control data quality varies considerably, there are some key dimensions of data quality that deserve our focus.
How comprehensive is our dataset?
Completeness is the expected comprehensiveness of our dataset. It is essential that we know what items of information are deemed critical and what are optional. For example, we definitely need the first name and last name of a customer to identify them, but it is not as important to have their middle name. Completeness gives us a true indication of just how much we know about a customer, how well a product is defined or how identifiable a location is. Accurate contact information is obviously vital in this regard.
A failure to capture these critical or mandatory items can mean that it may be impossible to perform a certain task or analysis. Worse still, we may build activities around partial or incomplete information and run the risk of misleading or inaccurate results, or of hampering our ability to act on the information (e.g. by contacting former customers).
By implementing simple data-validation techniques that force users to input specific data we can ensure we capture essential information. With critical datasets, it is important to have defined tolerances or thresholds and to regularly review the completeness of the datasets.
Is data available when it is needed?
Timeliness is the availability of data at the time that we require it. To make informed decisions, we need to know the relevant facts without delay. Examples of this are:
- Having the underlying data for monthly results within a given timeframe
- Responding to the most-current customer contact information and
- Checking up-to-date billing activity for a particular vendor
The nature and intended use of data dictates what level of timeliness is acceptable. Billing information may need to be as recent as possible for customer services to perform efficiently, but reporting on sales of a particular product may only be required on, say, a weekly basis. Very often, poor timeliness is the result of inefficient data capture or data manipulation techniques. By assessing the timeliness of key datasets we can uncover inefficiencies and bottlenecks requiring amelioration.
Are attributes conforming across various datasets?
Consistency is the synchronisation of data across the organisation. We need to be able to trust the information we receive regardless of the source. Examples of poor consistency include:
- Sending invoices to Ms Murphy, but marketing material to Mrs Murphy
- Different monthly sales figures in SAP and the data warehouse
- Regional offices recording dates of birth in various formats.
Having conflicts across data sources means we must engage in data manipulation to achieve consistency and with this comes increased risk. By taking a consistent approach throughout the organisation, we can make gathering and compiling data much more efficient.
Does the data conform to its expected definition?
Validity is the measure of whether data meets an expected range or definition. It is important that we have defined acceptable values where necessary and that these rules are adhered to. For example, we expect to capture a “days past due” value in our billing database as a numeric figure. If we come across unexpected alpha characters or negative values within this field, it is likely that any analysis incorporating “days past due” will be inaccurate.
Our expected values and acceptable ranges should be captured in data dictionaries and enforced by data validation or business rules.
Does the data accurately represent an object or event?
Accuracy is the extent to which data correctly reflects the real world object or the event being described. If data is not reflecting the reality accurately, then the insights and analysis we can derive from it will be inherently flawed. Examples of inaccurate data are:
- Sales of the Cork office are reported as €2.1 million when the true value is €2.5 million
- The amount charged on an invoice does not reflect the true usage of a client or
- A customer has been flagged for marketing phone calls when they have expressly indicated otherwise.
While determining measures for monitoring accuracy can be difficult, it is a crucial measure affecting both operational and analytical activities.
Is there unwanted duplication in the data?
Uniqueness is ensuring there is no unexpected duplication within our datasets. Data containing unwanted duplicates can lead to doubt as to what the most accurate record is and undermines the goal of having a “single source of the truth”.
A lack of uniqueness can waste time and money and lead to misrepresentation in reporting and analysis. For example, duplicate data could lead to us delivering multiple letters to the same customer, causing them annoyance.
Using market-leading data quality and profiling tools, combined with our unique business insights, we can quickly identify data quality issues and how they affect your business. Our data-quality assessment will identify key datasets, perform data-profiling to benchmark maturity, and report on identified risks and recommendations. By implementing the resulting controls, metrics and governance, your organisation can move up the maturity scale and begin harnessing data in a more strategic manner.