Data and Audit
By Shri Karan Vohra, Officer Trainee, 2016 Batch
In the world of rapid digitisation ‘Data’ has become the “oxygen of the Digital World”. There is vast amount of meaningful data being generated and preserved. The figure below shows the levels of growth of both structured and unstructured data. The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s. As of 2012, every day 2.5 exabytes (2.5×1018) of data are generated which is equivalent to 530,000,000 million songs or 90 years of HD video1 . The amount of data produced is getting bigger and bigger day by day. Such levels of data generation has led to the coining of the term “Big Data”. What exactly is Big Data??
What is Big data?
Big Data refers to extremely large, complex datasets that exceed the traditional processing capabilities of the IT infrastructure due to their size, format diversity and speed of generation. The characteristic features of Big Data are the 4 V’s: Volume, Variety, Velocity and Veracity.
The UN Commission for Europe's (UNECE) task team on Big Data (2013) classified this large volume of data into three categories.
I. Human-Sourced Information - loosely structured and often ungoverned data stored everywhere from personal computers to social networks.
II. Process-Mediated Data - structured data stored in relational database systems like the traditional business and administrative data.
III. Machine-Generated Data - large volume of well-structured data derived from sensors and machines used to measure and record the events and situations in the physical world.
The general view is that big data will have a dramatic impact on enhancing productivity, profits and risk management. But data in itself yields limited value until it has been processed and analyzed. Thus it necessitates the use of various analytic methods to extract value from such data.
Personal Experience with Data Analytics :
Having worked in data and analytics industry, I got exposure to how data gets consumed by different kinds of organizations to answer questions, as simple as, understanding who profitable and valuable customers for a company are, to as complex as, what kind of customers should the company be acquiring and using which strategy to improve profits. I had the opportunity to work with a bank and analyze its different types of data while working with a data analytics firm for 4 years.
In my early days, I worked on creating a Quality Checking tool for the bank which compared reports for 2 consecutive time periods and highlighted major deviations, outliers, missing numbers & duplicates. Around 150 reports were cleaned using this tool which were otherwise being done manually helping save a lot of man-hours.
Later, I went on to create more complex tools like one used to track performance of all campaigns being run on the website at any point based on transaction records which ran in millions. It was a self-serve tool which anyone could use to pull information about any campaign ever run and compare it with any other campaign based on around 50 KPIs with trend charts. This required data pull to be done in SAS(Statistical Analysis System) and linking the output data to excel dashboard using VBA(Visual Basic for Applications) coding.
My experience of working on data enabled me to understand not only the kind of data we capture/we can capture but, more importantly, how to utilize the data to make every small/big decision which had the power to completely revive the future of companies. The key is to improve capabilities to capture different form of data which can give you insights into various aspects of business performance and then to ensure that the data being captured is utilized strategically to help growth. Both steps are equally important.
The experience of working in data analytics industry has exposed me to the numerous possibilities this field holds. With the growth of data at a break-neck speed, we can expect data analytics to play a pivotal role in all the fields, especially auditing.
Big Data and Audit :
Since the Supreme Audit Institutions (SAIs) have the mandate and the legal authority to access data generated by the audited entities, this presents significant opportunities to leverage both open data and Public sector information to enhance their capabilities and audit work. In fact, the Netherlands Court of Audit can even access sensitive and confidential information such as Dutch secret service data and Ministry of Defence. All SAIs enjoy varying degrees of access to data and plenty of opportunities exist for SAIs to make use of the ever growing volume of data which can be analysed to investigate potential fraud and corruption, conduct performance audits and evaluate outcomes and offer insights and recommendations about the issues that the government faces.
Greater reliance on IT is leading to digitisation of records of not only private sector companies but also public sector organisations. The traditional records are being eliminated and more and more financial and non-financial electronic data is available now. This necessitates a change in the way these organisations are audited. The different techniques of Big Data can even impact the fundamental nature of audit. Technology allows the monitoring of very large or complete sets of data, rather than samples, on a more frequent basis resulting in focused and efficient audit. Data analytics can enable the auditor to perform tests on large or complex datasets where a manual approach would not be feasible.
The use of data analytic techniques also allows the analysis of unstructured data as part of the audit which was not done earlier. At the moment auditors focus on the information within the organisation business and systems, but the advent of social media and internet-enabled devices has led to a scenario that there’s lot of data that sits outside the organisation that could be equally relevant.
Using intelligent analytics will help to deliver a higher quality of audit evidence and more relevant business insights. Big data and analytics enables auditors to identify fraud and operational business risks and tailor their approach to deliver a more relevant audit.
Approaches and techniques of Data Analytics:
Various approaches of data analytics for analysing and understanding data can be effective in detecting and preventing potential fraud and corruption. Predictive techniques and models can be used to identify historical patterns and processes associated with known arms of fraud or corruption and can be applied to available data to prevent fraud before it occurs.
There are other techniques of data analytics which can be used to detect fraud after it has occurred. For instance, script based analysis inspects vast amount of data through continuous combinations and comparisons with other data sources in order to identify anomalies. Using this technique helps in identifying root cause and trends. Often lack of scrutiny of all individual transactions enable fraud by exploiting the weaknesses of the internal control systems. Data analytics can be particularly effective in transaction heavy areas, such as expenditures, billing and capital projects, further demonstrating its relevance for investigation of fraud and corruption.
Data mining and data matching are the techniques which can be effective in detecting frauds. Data matching is a large-scale comparison of big data that disregards duplicates in order to identify specific matters of interest. Data mining views large data from different perspectives to identify previously unknown information. For instance, federal interagency task force in Chile which included representatives from the Council for the Internal Auditor General of Government, made use of the data mining technique to prevent fraud and corruption in public procurement. By mining the newly innovated e-procurement transactions, they have been able to avoid bid rigging and break up nexus among suppliers, officials and procurement officers (OECD, 2010).
Auditor General South Africa(AGSA) has been using uses data analytics in various ways. The most useful of these is by cross correlating various databases accessible to AGSA. Cross correlation and analysis of the National Population Register (NPR) which maintains the database for marriages, the Basic Accounting System (BAS) which records all financial transactions, the Companies and Intellectual Property Commission database (CIPC) which registers the directors of the registered companies, and PERSAL which is an HR management system for employees, has helped identify and understand the complete supply-chain of government contracts and reveal cases of fictitious suppliers or conflict of interest (if family members and business partners have close ties with government employees).
Many of the standard data analytics tools employ data visualization techniques. These provide insights to the data being analyzed by placing it in a visual context like using Graphs, plots and Information graphics. These enable patterns, trends, correlations and outliers that may go unnoticed in text based data to be identified more easily. Visualization techniques may also be useful in communicating insights arising out of the audit to the stakeholders.
Big Data and CAG:
In addition to detecting frauds and corruption, data analytics can also be used for other tasks performed by SAIs. We have examples of SAI using the data analytic techniques for increasing the quality of performance audits. For instance, India’s SAI (Comptroller and Auditor General) has actively explored the use of data analytics and applied it to study of social security programmes in Kerala, India. In the performance audit of social assistance programmes of Old Age Pension, Widows Pension and disabled pension schemes, the SAI made use of the data in its audit model. Using the conventional approach would have allowed the auditors to detect improper payments to ineligible persons in pension schemes. Departing form the conventional approach, SAI leveraged external data and were able to verify whether eligible beneficiaries were excluded. This resulted in a comprehensive performance audit, which covered both inclusion of ineligible and exclusion of eligible beneficiaries.
With a view to institutionalise data analytics, the CAG formulated a Big Data management Policy, which included categorization of data sources into internal (i.e created and maintained by CAG) and external(i.e. available from audited entries or in the public domain). Further, it stressed on data protocols to ensure that the data is authentic, relevant and useable and also sets out the guidelines for data access arrangements with external sources. Further, a Centre for Data Management and Analytics has also been set up by CAG to optimize the potential for new data analytics technology.
While opportunities exist to make better use of data in audit work, there are also risks that need to be considered when employing data analytics. For example, if the data source is unreliable and not properly understood, it will have negative impact on the quality of audit. Errors in the data itself present a big challenge that SAIs may face in ensuring that the data is valid, appropriate and reliable for their purposes.
Multiplicity of data types and sources leading to large volume of data (structured and unstructured) being produced poses a big challenge of proper storage of data and tools to be used to analyze such levels of data. Cost of procuring required software and hardware is another constraint.
There are also problems in obtaining access to data in case of entities where centralized data sharing platforms are not available. Non uniformity in the structure of data is a major concern restricting the use of data analytics tools.
The new tools of data analysis being developed require corresponding capacity building of the workforce which has to use those tools. The shortfall of data analysts in SAIs impact their potential in employing data analysis techniques for audit.
The capture, storage and processing of entity data presents SAI with challenges in relation to data security and data protection. Entities need to have confidence that their data will be held and processed securely, so that they can fulfil their own legal and regulatory obligations in making the data available to auditors.
While data analytics can lead to valuable insights, it will not provide everything that the auditor needs to know and still requires human judgement when interpreting results. Human error when conducting data analytics techniques cannot be ruled out.
The types of challenges that SAIs can face while employing data analytics techniques will vary depending on the maturity of the SAI in terms of capacity, expertise and infrastructure. The external factors like the legal framework for accessing and using data also pose a pertinent challenge.
In a highly digitised world, regardless of the type of challenges faced, SAIs have to keep improving their strategies towards big data and data analytics to maintain their relevance in the rapidly changing world. Resource allocation for using the different techniques of data analytics and capacity building through training and workshops is the need of the hour. The implementation of appropriate policies and procedures in relation to data security is a necessary part of the effective deployment of data analytic techniques in Audit. Audit needs to take ‘a quantum leap’ and redesign the audit processes by leveraging on the opportunities presented by today’s advanced technologies.