Data Hackathon
By Mr. Vijay Kumar, Asstt. Admn. Officer,GST,Office of the CAG of India, New Delhi.
1. The Background
The Indian Audit & Accounts Department (Department) formulated a Big Data Management Policy for the Department in January 2016. The Policy recognises that as Governments and other organisations transition into digital environment, they generate, process and store voluminous data. Also, useful and relevant data in disparate forms are incessantly produced by various agencies and entities, such as Census data, NSSO data, Economic survey, industry/domain specific data etc. When collated, they provide the contextual framework and valuable insight into the functioning of an audited organisation. In the course of performing our mandate, the Department has traditionally been creating, gathering and analysing large volumes of data. However, due to the technology explosion that we are witnessing, the department needs to get ready to bring about a transformation in the way audit is done. Big data analytics can enhance risk assessment by discovering red flags, outliers, abnormal behaviour and by providing deeper insights. It can also facilitate predictive analysis and use of advanced statistics for transformation of data into actionable information. It can thus, contribute to greater level of assurance in audits.
The Central Government tax departments have automated most of the functions relating to levy, collection and assessment of taxes and further the data relating to taxation can be correlated with data from other databases like Ministry of Corporate Affairs (MCA), state VAT, etc. Recognising this opportunity to bring about an orbital change in the way our wing carry out the audit mandate, the Central Revenue Audit (CRA) wing decided to embark on a unique experiment of exposing the staff to big data, data analytic tools and the opportunities and possibilities that data analytics can throw up. Our wing obtained pan-India CX and ST data on Registrations, Returns and Payment of three years from the Department of Revenue in August 2016. This coincided with the creation of the Data Analytics Groups (DAGs) in each of the Director General/Principal Director, Central, offices and completion of training of at least two persons from these newly formed DAGs on Big data, data analytics and use of data analytic tools. Taking advantage of the availability of data and training given, it was decided to give the DAGs members an opportunity to explore and experiment with real data.
Bringing about transformation and innovation requires even the process of introduction to be innovative and transformative. It requires changing mindsets, behavior, methods and technologies to bring about disruptive changes. It was, therefore, decided to hold a forth-eight hour Data Hackathon, something which the department has not attempted before.
2. Data Hackathon - Concept
A Data Hackathon is an event that runs for a consecutive period of time, where people get together and work on data-related projects for practice, prizes, recognition, and networking. Team work and a competitive spirit between the teams are the group dynamics on which the Hackathon works. Basically it is a regular Hackathon but for data-related projects. There are different types of Data Hackathons that are usually held – Data tooling Hackathon, Data products Hackathon, Predictive modelling Hackathons and Insights and visualization Hackathons to name a few. We decided to combine two types – data tooling and insights and visualization in the Hackathon that we conducted.
3. The Objectives of the Hackathon
The general objectives of the Hackathon were to increase the popularity and enthuse the participants to make data analytics a part of their regular work activity, expose them to big data – make the transition from theory to practice, encourage innovation and ‘out of the box’ thinking, and imbibe a new way of working with an emphasis on team work
4. Organisers and Participants
The Hackathon was organised by the Office of the Principal. Director of Audit,Central, Bengaluru, Regional Training Centre, Bengaluru and representatives from CRA wing of Headquarters under the guidance of Deputy Comptroller & Auditor General(CRA). It was decided that members of DAGs trained by Centre for Data Management and Analytics (CDMA) on basics of Big Data and data analytic tools would be the participants. Further, recognizing that the domain knowledge as well as IT and data analytic skills are equally important and complementary, it was decided to form teams in such a manner that the members of a team together possess domain expertise as well as IT acumen.
Twenty participants from all the nine CRA field offices located at Hyderabad, Chandigarh, Bengaluru, Chennai, Ahmedabad, Lucknow, Delhi, Mumbai and Kolkata participated in the Hackathon. The participants were divided into five teams keeping choice of data analytics tool and domain knowledge of each of participant. The teams were named Velocity, Volume, Veracity, Variety and Variability. The teams were drawn from different offices and various locations across the country and were to work very closely during the two day Hackathon. To create a positive team identity, each team was given T-Shirts of the same colour with Team names written on them. The work space was organised in such a way that members of a team can sit together. As competitive spirit is one of the key drivers of the Hackathon, it has been announced during the inauguration that there are attractive prizes for the members of the winning team.
5. Place and Time of event
Forty-eight hours non-stop Hackathon started at 4:00 PM on 20 November 2016 and ended at 4:00 PM on 22 November 2016. The RTC Bangalore was available for forty-eight hours during which participants were allowed to work on the computers. The Hackathon was followed by the presentation on insights of the data by the participant teams and evaluation of their performance.
6. Preparation of the site
Four areas were earmarked for the Hackathon – One was the computer room where the desktop computers were specially prepared for the event. Four computers were loaded with trial version of Tableau and the remaining 16 were loaded with Knime as per the choice of the participants. To allow participants to surf the net for any doubts they had about the data analytic tools or the subject matter of the data, the participants were given internet access on their laptops through Wi-Fi and also they were allowed to access desktops with internet connection in another training hall. There was another break-out room where the teams could go to hold consultations, strategize, make plans amongst themselves and return back to their computers.
The fourth area was the informal meeting area where the food and beverages was laid out. Snacks, coffee and tea were available round the clock during the forty–eight hour Hackathon. Arrangements for a coffee/tea/soup vending machine and snacks were made to enable participants to stay alert and energetic and work continuously through the day and night on the site. A comfortable hotel that was walking distance from the Hackathon venue was arranged so that the participants could work continuously for forty-eight hours with short breaks.
During the Hackathon, in addition to the data supplied by the CBEC, the MCA data, as downloaded from the web site of Ministry of Corporate Affairs, was also used to check for the registration of the assessees between the two databases. The MCA database was linked with central excise and service tax data on the basis on CIN number which is available in both the databases. Participants were also encouraged to bring any data that they have access to which could be linked to these data bases for getting additional insights.
7. Results of data Hackathon
The experience was truly inspiring, stimulating and energizing. It succeeded in kindling a lot of enthusiasm amongst the participants. The participants demonstrated great enthusiasm and grit to work on the task assigned to them and came out with very different insights as given below: -
Insights from data analysis:
Given the time constraint to cure the entire data base, the data was split and analytics done only on limited data. As the legends for the codes of various commodities/services were not available, commodity-wise and service-wise analysis could not be done by the participants, though some of them listed this as an area that they would like to explore. Key insights developed by the participating teams, within the constraints, are listed below:
- Comparison of tax and MCA databases revealed that companies are shown active in one database whereas in the other database, the same are shown as inactive or are not at all available in the other database.
- Details of tax payers who have not furnished statutory returns or furnished the returns with delay were generated.
- Utilized CENVAT credit for duty payment was more than available credit.
- Comparative analysis of duty payment from CENVAT and cash
- The Zones where payment of duty by cash is very less as compared to CENVAT can be concentrated more and selected for audit purpose as CENVAT credit utilisation is subject to conditions and is a high risk area for audit.
- Macro analysis of data such as Zone wise central excise and service tax payments, State wise spread of SSI units
- These can be used in audit planning for identifying focus areas, high risk zones, etc.
- Analysis of revenue data across Categories of CX assesses like Manufacturers, Dealers, and Exporters etc.
- This insight can be used while finalizing selection of assesses for verification as risk factors might be different for different categories of assessees.
This shows that there is potential and willingness among the officials to explore the area of Data Analytics. In fact as the event was underway some members could not tear themselves from their computers and we heard them commenting that they wished they had also been provided with sleeping bags.
The Hackathon was an exemplary example of excellent team work by officials pooled from different offices across the country. Teams were seen helping each other when stuck on technical glitches, thereby displaying spirit of healthy competition.
The Hackathon experience assured us that once our officials are given systematic training and provided with required infrastructure, environment, data and guidance, they are capable of bringing in the transformational change in the way audit work is done by using data analytics in every stage of audit.
8. Action proposed to continue/replicate the hackathon
- Wide dissemination of the concept of hackathon and its usefulness.
- Holding hackathon for developing standard models for sample selection and other audit processes may be prescribed.
- The benefits of the Hackathons viz. providing grounds for new ideas and especially good tools to stimulate the creative and problem-solving environment with a low cost of failure may be widely circulated among all offices.
 
|