Tag Data Analytics for Investigations

Purchase Cards – Quick Start

The World of Purchase Card Fraud

How can organizations avoid purchase card fraud but still take advantage of the ease of using credit cards to acquire the goods and services they need? Many organizations have some protections already in place. They develop internal policies to limit purchase card use to designated buyers, products, or situations.  They require their employees to read and sign their “Purchase Card Use” policy. Employees promise they will use the card for only appropriate business transactions and never for personal, non-work use. In an ideal world, these protections should allow organizations to gain the benefits of simplified purchasing without experiencing any issues of purchase card fraud.

In our world, however, misuse of purchase cards occurs frequently.  

employee with a computer committing fraud by using a purchase card to buy personal luxury items online with company purchase card

Identify – Business Questions and Available Data

Business Questions

If your organization issues purchase cards to more than just a few people, you likely have access to a comprehensive data set that you can use to evaluate risk and detect potential misuse or even fraudulent activities. What type of concerns do you have regarding your purchase cards? Do you think employees might be making personal purchases? Are you concerned that suspicious vendors are receiving payments? Which of your departments does the most purchase card spending? Are there opportunities to reduce costs by negotiating contracts with your frequently used purchase card vendors?  Are some purchase agents “splitting” transactions to buy things under the established approval limits?  When employees ship purchases to your organization, are they sent to valid addresses? Have any of your purchase cards been compromised and are now available on the dark web, risking purchases by external agents who have stolen your employees’ identities? Do you want to quickly identify anomalies in spending so that they don’t balloon into a multi-year fraud scheme?

Available Data 

Where can you find the data to help you answer these concerns? Purchase card data can come in various formats. I’ll cover some of the most useful and widely available here. However, if your data is slightly different than what I’ve mentioned, do not worry; you can likely still use it in analytics. The best data is accessible periodically and is in a consistent format such as these: 

    1. Data Set Format – Monthly Credit Card Feed

      The ideal data set would be a monthly feed of data from the credit card company that includes all features of the purchase transaction, including, at a minimum: the cardholder’s id, name, transaction date, vendor name, and transaction amount. It is also beneficial if additional data is available about the transaction. These fields could be Merchant Category Code (MCC), transaction description, online purchase indicator, or other vendor metadata. You may need to download this data monthly as a comma-delimited file (*.csv) from the credit card company’s website. The act of downloading the data, once you incorporate it into your routine, could further be simplified using robotic process automation (RPA).

    2. Data Set Format – Purchase Approval System

      Another ideal data set could come from your purchase approval system. It may be your organization’s policy that all purchase card transactions need to be approved, and there is a system that facilitates this process. If this is the case, you will have all of the monthly credit card feed data fields listed above. Plus, you will have other fields such as approver id, approver name, and approval date. Then, take a look to see if you can capture and match to some additional vendor information from your internal vendor master file. Finally, if the approval system stores copies of receipts supporting the purchases, you will more easily be able to perform follow-up investigations into possible anomalies. Be sure to retrieve the most granular data available, down to the transaction line item, if available.

    3. Data Set Format  – Monthly Credit Card Statement *.pdf

      A less-than-ideal data set, but one that provides the same basic information as data set number 1, would be a *.pdf monthly billing statement sent by the credit card company that is received electronically or is downloadable. The electronic *.pdf file can be ingested by different data analytics software programs, although the initial import is a bit more complicated than a text-based *.csv file. However, once you establish the process and define a standardized import, the next month’s import should be straightforward to automate.

hand pointing to a computer screen that displays credit card data with other purchase card data sources floating around the screen such as a document, cloud data, search magnifying glass to locate fraud, a chat window with the credit card company, and a mobile device with credit card data

Other Concerns

Analysis Frequency

If the layout stays the same from period to period, you have the opportunity to build a process that allows you to monitor purchase cards on an ongoing basis. The ideal way to stay on top of purchase card risk is to perform analysis on a continuing basis. However, the first run-thru of the data can start with whatever large file you have available. For example, if your credit card company allows for an 18-month historic download, start with that. Keep in mind that once you perform the initial analysis, you will want to continue the analytics each subsequent month.  

Protect your Data and Yourself

You may notice that I did not include the credit card number as one of the data fields you would want to include in your data set.  If you have the cardholder’s name or id, you probably have enough information to analyze individual purchasers, even if the cardholder uses more than one purchase card.  Actually, even if you don’t have the cardholder information, you can still perform analysis, as shown in the included case study.  However, there is a practical reason that I did not include this information. The laws surrounding the possession and security of credit card numbers are plentiful. There are different laws and penalties under state laws that penalize abuses of this information.

purchase card receipt with blocked out information to protect credit card data from card fraudsters

For example, Virginia has strict laws governing credit card numbers. The simple possession of two credit card numbers can indicate a violation of the law. Federally, the FTC limits the information that can be stored by merchants about customers’ credit card numbers. The primary reason for this is to protect the numbers so that they are not revealed to fraudsters. That’s why you typically see only a portion of your credit card number on all receipts. Credit card number limitations extend to the way they are stored in databases. As you handle this information, take the necessary precautions to protect it. Do not accumulate a treasure trove of credit card numbers that could be stolen. Ensure secure storage and transfer of all credit card information. If you have the credit card number, please take care to obscure the numbers entirely or at least limit the digits shown.

Once you have identified your data file >> Connect >>


Purchase Cards – Connect

Case Study – Washington, DC Purchase Card Transactions

The City of Washington, DC provides public data sets to encourage residents’ engagement and promote transparency.  They have over a thousand different data sets in many areas, from construction permits to parking tickets.  They also have a data set that covers the purchase card use for their 89 City Agencies that goes back more than ten years.  For the case study, I’m limiting the data set to the available data since Mayor Muriel Bowser has been in office.  It covers January 2015 to January 2022.

https://opendata.dc.gov/
Data retrieved from Open Data DC catalog (https://opendata.dc.gov)

Access

Accessing the data was easy, due to The City’s Open Data Catalog.  The Purchase Card Transactions can be found here.  The format provided the following data fields: Transaction ID, Agency, Transaction Date, Transaction Amount, Vendor Name, Vendor City, and Merchant Category Code Description.  These seven fields can provide us with some initial insights and then allow us to pose future questions to be answered later.

Download and Import

Downloading the data produced a comma-delimited file *.csv with a header row and 468,035 records. It was fairly small, weighing in at just under 60 MB. With a *.csv file, you have the option to bring it into MSExcel easily. However, my analytics software of choice was Arbutus Analyzer for this project. It was a simple import, using the already existing header row values. However, besides seeing the records as you would in Excel and being able to graph and use pivot tables, Arbutus protects the data from accidentally being changed as you work with it, maintaining the integrity of the data file. ACL Desktop Analytics by Galvanize/Diligent has a similar platform, and both are excellent choices for automating this type of import and further analysis.

Data Preparation and Export File

I knew that I wanted to add a few additional computed fields to the data based on the seven existing fields. Here are three fields that I created:

  • Upper MCC – This field standardizes the Merchant Category Code so that it is always displayed in uppercase. Because computers are very literal about how they evaluate field values, without this field, the values of “Hardware Stores” and “HARDWARE STORES” would be considered different.
  • MCC Interesting – I created a flag for MCC codes of interest, those that I wanted to be sure to take a look at later. I was curious about the details related to The City’s spending on chiropractors, podiatrists, and other medical services that would typically fall under a health care plan instead of spending on a purchase card. I also wanted to learn more about Video Game Arcades and Pool Halls. Perhaps these were teambuilding activities, after-school activities, or maybe a restaurant visited during work travel. Other interesting categories included places to purchase vehicles, toupee shops, liquor, bail and bond, tax preparation, and child care. To define this list of “interesting codes,” a group of people familiar with the policies surrounding purchase cards can engage in a brainstorming session. They can propose and rule out different categories based on their knowledge.
  • MCC Interesting Category – I noticed a theme among the different “MCC Interesting” codes and created an overall category that helped to show WHY they might be interesting. Categories included “Casino, Entertainment, Personal Supplies, Healthcare, Funeral Services, Vehicles, Funeral Services, Legal Issues, and Political / Religious.
  • Hotel Indicator – I created an indicator for MCC Codes related to travel, such as hotels and airlines. This prepares the data to be analyzed for travel anomalies and also allows the exclusion of these transactions during a search for duplicate transactions. For example, if a group of employees attends a conference, it is typical that multiple transactions will occur for their hotel and airline charges on the same dates.

Once the additional fields were added, I limited the file to transactions since January 2015, then exported the resulting 274,243 records to a text file using the “|” pipe character to delimit the fields. This is my input file for Tableau.

Once you have Connected to Your Data >> Visualize >>


Purchase Cards – Visualize

Tableau Public Visualizations

This case study uses Tableau Public for visualizations. Tableau Public can be used to quickly create data visualizations and then pull them together in a dashboard. Another tool that is commonly included in organizations’ current collection of software is MS PowerBI which may be a part of your current Microsoft Office 365 license.

Initial Visualizations: Quick Start – Purchase Card Transactions

You can interact with this Tableau Public Dashboard. It selects and shows two example agencies, the Metropolitan Police Department and the Fire & Emergency Medical Services.  The Agency Filter is on the right and shows checkboxes next to the two selected Agencies. 

Investigations Analytics dashboard showing blue and purple purchase cards, graph of merchant category code by government agency, credit card crimes risk by month, vendor outliers by agency, risk totals by government agency and mcc code

You can select other agencies of interest.  The dashboard has been designed to show one or more agencies.  If you are interested in all agencies, the dashboard is not going to help you because it is difficult to visualize all 89 agencies at once.  However, if you go to an individual visualization worksheet you can choose “(All)” in the Agency filter and compare the information.

Question: What is that giant spike?!

If you select the second tab in the visualization, “PCards by Month” and select “(All)” in the “Agency” filter on the right, you will see all Purchase Cards totals per month graphed across 2015 to 2022. 

Graph showing Purchase Cards by Month with a giant anomaly in spending shown in February 2016, indicating possible low hanging fruit of risk and potential credit card crimes

If you mouse over the giant spike in 2016, you can see the description of this purchase card activity.  It seems that this anomalous purchase card spending occurred in February 2016 by the Office of Contracting & Procurement in the amount of $13,412,015.  Now, this is quite out of the ordinary as you can tell from the graph.  Would you see this if you were simply looking at a spreadsheet, or worse, one credit card statement?  Unlikely.  But when you accumulate and look at a month-to-month comparison by agency, it is clear.  The highest peak of spending for all agencies from 2015 to 2022 is always under $6 million during a month.  February 2016’s total was over $14 million.

Perhaps you already know what this spike represents?  Let me know in the comments if you guessed it before the continued investigation below.

You can delve deeper into this information.  Next, let’s find out two pieces of information about February 2016, the MCCs and Vendors. 

MCC Investigation

First, what were the Merchant Category Codes (MCCs) for the data spike?  MCCs represent the primary code assigned to a vendor’s business.  If you go to the third tab, “Total by Agency and MCC – Scatter Plot,” you can limit by both Agency and Month.  Under “Agency” uncheck “(All).”  Now, scroll down or click on the magnifying glass in the “Agency” filter and find Office of Contracting & Procurement.  Using the magnifying glass to find “procurement” is a great way to find this.  Under the “Month, Year of Transaction Date” filter uncheck “(All).”  Now, select February 2016.  This produces a “Total by Agency and MCC – Scatter Plot” with useful information. 

Scatter Plot graph visualization showing Merchant Category Codes (MCCs) with total values, two outliers shown to investigate the data

The horizontal axis shows the Transaction Amount total.  The vertical axis shows the purchase card transaction count.  As we are looking at a spike in dollars, we will pay attention to values that are further to the right, rather than higher. 

We can see the highest dollars are the dot “1” showing $5,634,116 spent on the MCC of “GENL CONTRACTORS-RESIDENTIAL, AND COMMERCIAL.”  Dot “2” is not far behind at $5,428,495 for an MCC of “RENTALS-TOOLS, EQUIPMENT AND FURNITURE.”  But who is being paid by purchase card for these particular MCCs: General Contracting and Tool Rental & Equipment? 

Vendor Investigation

We can look at the vendors by going to the next tab, “Total by Agency and Vendor – Scatter Plot.”  These visualizations have been configured to apply the previous filters of “Office of Contracting & Procurement” and “February 2016.“

Scatter Plot graph visualization showing top four Vendors where purchase cards were used to purchase anomalous amounts of services, possibly indicating low hanging fruit fraud in this analytic

The top vendor is Total Civil Construction.  The second one is CW Strittmatter, another construction company.  The other two vendors that have over a million dollars are JJ Prime Services (construction) and Hurb Landscaping.  So why would the city pay for Tools and Equipment rental from construction and landscape companies in February 2016 in Washington, DC?  Have you figured it out yet?

snowmageddon snowflake for snow removal paid by purchase cards

Of course, if you had wanted, you could have googled “Washington DC February 2016” at the beginning of this process and seen the top results were weather-related, specifically “snowmageddon.”  While the big snowstorm actually happened on January 23, the city was paying for the snow removal services continuing into February.  In fact, if you go to the Hurb Landscaping website, one of their featured services happens to be Snow Removal.  They have pictures of the snowmageddon removal with the Washington monument in the background.

The giant spike in the data was an emergency situation where there wasn’t time to negotiate purchase orders and contracts with snow removal companies.  The City Office of Contracting & Procurement used their purchase cards to spend millions to deal with the emergency.  Perhaps in the future, The City of Washington, DC can negotiate a better rate with its vendors ahead of time to plan for emergency situations. 

Question: Purchase Card Payments vs Negotiated Contracts?

Can we identify any other instances where better rates might be negotiated for purchases?  That is, when someone is making purchases, using a credit card is great and can expedite a transaction. But if you can plan your purchases and negotiate with a vendor to provide what you need by way of a contract and associated purchase orders, you can often negotiate discounts. 

In order to determine which Merchant Categories might benefit from the negotiation of a contract, we can start on the visualization tab called “Vendor Outliers within MCC.” Be sure that “(All)” is selected under “Agency.” This chart is filtered by the top ten MCCs.

Data Visualization graph showing Data Intelligence on Vendor Outlier Anomolies for purchase card Merchant Category Codes (MCCs)

This graph shows the top 10 highest MCC categories with exceptional spending for outlier vendors.  If all Agencies are selected, you can see three of the top ten MCC Codes are related to different versions of “Stationery and Office Supplies.”  We can see the top vendors for these categories are The Hamilton Group, Standard Office Supply, and Capital Services and Supplies.

woman-owned business women shaking hands to represent the negotiation and signing of a contract

A quick visit to these vendors’ websites reveals that they are minority or female-owned or small businesses.  They are all located in Washington, DC.  The City of DC encourages business development within its borders with underrepresented businesses.  Excellent!  This is in line with The City’s purchasing objectives.  However, as a small women-owned business myself, I know that there are significant fees associated with purchases on credit cards.  It could benefit the City AND the vendors to negotiate long-term, steady contracts for the purchases of office supplies. 

Question: Where can purchasing transparency be improved?

Vague MCCs

A few of the MCCs are a bit vague. This makes sense given that a vendor’s complete identity is trying to be summarized in a single Merchant Category Code. However, it can lead to a lack of transparency. For this example, I’m going to focus on “Other Services – Not Elsewhere Classified” and “Professional Services – Not Elsewhere Classified.” Both of these can indicate a lack of diligence in identifying the purchase.  There are approximately 500 MCC codes to choose from.  Please try not to pick “Not Elsewhere Classified.” 

PayPal

I love PayPal and have been using it since 2000. However, they are always struggling to keep scammers off their platform. Additionally, a vendor name may show up simply as “PayPal” instead of their actual vendor name. When this happens, the purchase card transaction may by default fall under “Not Elsewhere Classified” categories.

Report Filtered on MCC and Vendor

Let’s see which agencies are using their purchase cards with PayPal.  To do this, go back to “PCards By Month.” 

  1. For “Agency” select “(All).”
  2. Under “Merchant Cat Code Desc,” deselect “(All)”. Then perform a search using the tiny magnifying glass and type “not elsewhere classified.” While there are many options, limit the search by checking only “Professional Services – Not Elsewhere Classified.” and “Other Services – Not Elsewhere Classified.” Because of data cleanliness issues, you will check 4 boxes.
  3. Unde “Vendor Name,” deselect “(All).” Then use the tiny magnifying glass to type “PayPal.”  You will notice some variations on the PayPal vendor name for the vendors that have their store name included as part of the “PayPal” transaction. For this example, you can simply choose the first “PayPal” on its own.

For the variations on the PayPal vendor name, there are some interesting vendors such as Adult Tutus, 1800LOVECO, 1LUVLYLADY, or AbsoluteDat. They may be valid vendors that the City of DC uses. You would likely need some additional information such as receipts for validating these transactions if they appeared material. Typically, though, it is better to investigate long-term, high-dollar trends rather than getting lost in the tiny details of the data.

A graph that could be used by a Certified Fraud Examiner (CFE) to identify unusual spending in the non-transparent categories shown: PayPal and "Other Services" during purchase card fraud investigations.

One of the highest spending agencies for PayPal is the DC Public Schools, which makes sense as they are one of the highest spending agencies overall.  They are shown at the top of the graph in blue.

The more interesting spending is actually the one that threads its way through the graph in green.  It is identified as “Unknown New Agency.” Surprise! It’s a third generic identifier that reduces spending transparency.  Perhaps this agency is a placeholder as new agencies are created in Washington, DC. 

For better control of spending, the City might consider fast-tracking the set-up of new agencies. This would cause the agency to be able to spend under its own name, thereby better showing the spending that is occurring.  Or, alternatively, they could leave the purchasing for these agencies up to the Office of Contract and Procurement until they are set up. Or perhaps restrict these agencies to a few approved standard vendors and not allow them to make purchases from PayPal.

Another helpful control around the “Unknown New Agency” would mandate that the ability to bill to this agency be turned off after the buyer has their New Agency credentials. This would prevent new transactions from inadvertently or accidentally falling under this agency. It appears that this “Unknown New Agency” purchase card spending may contain values from other agencies when you look at all of its spending on purchase cards for all vendors.

Data Visualization graph of Unknown New Agency Spending with spikes revealed in September may be indicating purchase card spending that is not properly categorized and could show fraud detection anomaly
Unknown New Agency Spending

Most of the significant spikes in the “Unknown New Agency” data fall in the month of September. One possible explanation may be because the budgets for newly established agencies do not start until the beginning of the fiscal year, October. End-of-year spending may need somewhere to be categorized in the meantime. Another possible explanation may be that some spending for the highest spending agency, the District of Columbia Public Schools, was necessary during September when the school year started but was not available as part of the end of fiscal year budget. Instead, perhaps the spending was categorized as “Unknown New Agency.”
Questions about the spending in the “Unknown New Agency” can only be answered by reviewing purchase card spending policy documents and receipts. Consideration about how this agency’s spending is tracked may need some additional oversight to provide better transparency of spending.

Data Visualizations – More Information Needed

These are just a few examples you can look at with seven fields of purchase card data.  Imagine if you worked for The City of Washington, DC, and could find out which employees were making and approving these purchases. 

Visualization is not the final step, but simply a QuickStart.

Once you have Initial Visualizations >> Plan >>


Purchase Cards – Plan

The initial data visualizations that are created as part of any analytics project typically inspire more questions than they answer. It is vitally important to coordinate between the policy, business, and data people of your organization to move on to the next step, planning.

Investigators brainstorming about how to catch credit card fraudsters around a table with notes and a computer with CAATs data for purchase card anomolies

For brainstorming, consider:

  • What kind of purchase card data does your organization have? 
  • What overall trends occur in your data? Are there spikes and anomalies?
  • What kind of cost savings can be identified for your organization?
  • What are the advantages of identifying suspicious spending? Fraud Identification or Peace of Mind?
  • What is your organization’s highest priority when it comes to the use of Purchase Cards?
  • Who would benefit from seeing the data in an organized manner?

Does it sound overwhelming?  You can do it!  There are many different available off-the-shelf tools you can use including the ones shown here:  Arbutus Analyzer and Tableau Public.  MSExcel and PowerBI are also widely available and can be used to design similar data visualizations.  The sample visualizations in the case study are easy to set up and take less than a month if you have the data.

If you have never evaluated your Purchase Card data before, take the next step with your data today!


Graphing Challenge

What will you choose? ACL/Galvanize, Arbutus, Tableau, PowerBI, Python, R, Excel, Something Else?

A long time ago, we had data analytics software and big ideas, but occasionally our ideas were bigger than the built-in functionality provided in the software. Data is data though and with access to basic tools to manipulate it, we could take the next step and made those tools do things beyond the out-of-the-box software capabilities.

To get today’s challenge out there at the top of this article, I’m putting it here. Then I’ll share some examples from the past about what we did “back in the day” to make the data dance the way we wanted.

The Challenge – Data File

The data I am working with is contract awards data. Here is a simple table layout to represent what I am working with:

Some additional information about the data fields:

  • contract_award_id is unique in the table
  • contract_obligation_amt is the dollar value that I wish to total
  • vendor_id uniquely identifies a vendor, they may recur in the file and have multiple contract_award_ids
  • contract_type_code and product_code are the two fields relevant to this challenge, but if you solve for one, the other follows the same solution

If you want a sample data file, grab this one https://hidrive.ionos.com/share/a0rj4f4aug

The Challenge – Defined

For each product_code, produce a 1 page *.pdf file containing a bar graph of the top 10 vendor_ids for that product_code. “Top 10” vendor_ids are defined by the highest total of all of their contract_obligation_amt for that product_code.

Then apply the same solution for contract_type_code. In the sample file, you will find there are far fewer of these codes. For each contract_type_code, produce a 1 page *.pdf file containing a bar graph of the top 10 vendor_ids for that contract_type_code. “Top 10” vendor_ids are defined by the highest total of all of their contract_obligation_amt for that contract_type_code.

Sound easy? That’s what I thought as well. I started out by using Arbutus Analyzer but was unable to write a procedure to create a graph output to a file. I’m thinking of focusing on writing an Excel macro next. Then I’ll move on to Python. I might even try out using Tableau or PowerBI but haven’t tried automating this kind of output from either of those visualization tools yet. I’ll likely circle back to ACL/Galvanize/Diligent Desktop to see if I can get a graph output with a script. I feel like there is a solution with Highbond HCL as well. Who knows, maybe I’ll try it in Alteryx!

The Challenge – Submit Your Success!

Yes, it will take me a while to try out all of the options listed above. And it is likely that I have missed an obvious tool or data programming approach. Take a stab at it and flex on me. I want to see some solutions that work. Email challenge@krobothconsulting.com or post in the comments below your:

  • Description of Approach
  • Tools Used
  • Example Output
  • Code Used (optional)

Historic Backstory

Alright, now that the challenge presentation is out of the way, it’s time for the backstory. Back in the early 2000s, I used ACL for most of my data analytics automation. One of the commands that hadn’t been written for it yet was CROSSTAB. This performs a cross-tabulation, basically the same thing as creating an Excel Pivot table.

Fun Fact: Excel Pivot Tables were created in 1993 as part of Excel 5, but were first developed by the Lotus Corporation in 1986, releasing the functionality in 1991 as part of Lotus Improv for the NeXT platform.

To Pivot/CROSSTAB, one field is selected for the row category values and another field is selected for the columns. The number of rows and columns each represents the count of the unique values in (separately) the rows and columns fields. Then, in the row/column intersections, you provide a numeric field for totaling. That is, if you have a file that has 20 rows of contract awards with 3 different contract types and 4 different vendors, your Pivot/CROSSTAB would show 3 rows of contract types and 4 columns of vendors with the total per contract type rolled up by vendor and shown in the 12 cells.

But how to do this in pre-CROSSTAB ACL? The underlying needed base logic was incorporated into different commands in ACL Desktop, so it was possible to cobble together a set of scripts that allowed the display of data in a cross-tabulated format. This would allow for the selection of a row, column, and total field. If I remember well, I wrote the script to perform these steps:

  • Use SUMMARIZE to create a new file that lists each total for the ROW-COL combinations
  • Use SUMMARIZE to create a second file that only gives a list of all unique ROW values
  • Use SUMMARIZE to create a third file that only gives a list of all unique COL values
  • Write a loop that opened ROW file, saved the name of the ROW value in record 1 to a variable (v_RowName), went to the COL file, saved the name of the COL value in record 1 to a variable, went to the ROW-COL file and filtered on both values, TOTAL and save the dollar field to a variable, v_TotalValue (no hits will result in a zero TOTAL). Do this the number of times equal to the total row count of the COL, iterating by 1 to create additional variables and capture each v_ColName.
  • Create headers by writing all variables to a new file: v_RowName, v_ColName1, vColName2, … vColNameX. Create the first row of data by writing the variable values for NEXT 1 to only export one line. ACL runs commands on all records, so the variable export needs to limit to just the NEXT or FIRST 1.

Ok, maybe this isn’t the best description of a fully functioning CROSSTAB script from back in the day. I’ll go re-create it and update the above description when I get a chance. I’ll post my results in the comments below when I produce them.

I thought that this particular script, and some others that I had created were pretty neat. So I entered the 2002 ACL User Challenge. During the 2001 ACL User Challenge, I had been recognized for some work that I had done with complex Workers’ Compensation data. While my new scripts for 2002 weren’t quite as fancy as that analysis, they were included in a compilation with other users’ projects on this CD:

Yes, I am one of those people who have many CDs, 3.5″ floppy disks, a few larger (actual) floppy disks, and even some ZipDisks in some boxes in my home office closet. Recently I went on an “archaeological” dig into the closet, nearly trapping myself in there. However, I emerged victorious with this relic of the past, “A compilation of innovative 2002 User Challenge entries.”

My entry read like a personal ad for some sort of data analytics dating site. 🙂

Computer specialist, Angela Watterworth, a member of the CAATS team at USPS OIG, enjoys automating processes and developing batches to reduce redundant, manual work. Consequently, she applied her “creative logic” to optimize ACL functionality.

Using ACL™ for Windows, Angela created three applications.
* Subtotal Reporting: to produce a Microsoft Excel spreadsheet as output with subtotals already in place. She wrote a batch to automatically create subtotals, and then export the results to a delimited file, which in turn, she imports to Excel.
* Category Extract: to automatically break a file into specific categories based on one key field. For example, auditors can select an input file definition such as “Contracts” from a drop-down list. Then, from another drop-down list, they can select a key field such as “City” to export all records for each city to a separate file.
* Unique Sampling: to automatically select samples that let you look at certain characteristics of the data. For example, you might want to review at least one record from each US state or each type of contract from your data.

My CROSSTAB script was a combination of the Category Extract and Subtotal Reporting, I believe. I love that ACL/Galvanize, even today, incorporates ideas and suggestions from their users to make their software more effective.

Conclusion – Unlimited Possibilities

These days, there are tons of pre-built libraries available for every sort of programming language or software application. The possibilities are unlimited. We had to be creative with just a few tools in the past, but today we can typically find the answer to our challenges by googling, watching youtube, going old school and reading documentation, or even crowd-sourcing.

Help me solve this challenge. I’ll update this post to clarify anything that is unclear as people engage with it. According to Will Kroboth, creating these graph output files is a “small problem.” If you have a few minutes and find this sort of thing fun, suggest an answer!


A data scientist holding a business intelligence graph in front of a large CAATs graph showing a green line representing data analytics job fraud detection for auditors increasing while yellow line shows criminal investigator data analytics job fraud detection increasing.

Taking Analytics from Audits to Investigations

Beyond CAATs

Auditors embraced data analytics back in the 1980s to transform the way information was used to test and verify business processes. My experience in data analytics over the past 20 years has included both audit tests as well as investigative questions. I started out on a “CAATs Team” that served both groups, even though the acronym “CAATs” or “CAATTs” means Computer Assisted Audit Techniques or Computer Assisted Audit Tools and Techniques. The relationship between auditors and investigators when it came to using analytics to collaborate wasn’t strongly established.  Occasionally an audit would test some controls and find some weird results. Auditors might send these over to a criminal investigator colleague of theirs to take a look.

How did this Auditor-Investigator data analytics collaboration originally develop? It started with the auditors. Back in the 1980s, it was not common for auditors to use data analytics as part of their work. They might define the entire universe of what they want to audit, then select a sample, then review backup documents related to the sample. As an example, if they are reviewing the Purchasing process in their organization they would perform a manual process.

Audit Review of Purchasing

  • Define the Universe – A standard audit could include the universe 5,000 Purchase Orders issued during the prior year.
  • Select a Statistically Valid sample – Based on the errors discovered in their prior audit work, perhaps they knew there was an error rate of 5% and they wanted to operate under a confidence level of 95% margin of error. They would then select 357 of the Purchase Orders to manually review, possibly comparing the purchase order total dollar value with the total dollars invoiced and paid amounts of all bills received against that purchase order. 
File cabinet with long drawer where auditors might perform the job of sampling purchase orders, invoices, and payments to vendors.  Source: openclipart-vectors-30363 on pixabay
  • Review Backup Documents – The auditors would need to track down original documentation for the 357 Purchase Orders. This was quite a manual process even if the auditor was lucky to have a centralized storage system located locally in their primary work office. And how would they decide how to physically select these 357 purchase orders, pull every 14th document in the filing cabinet? Hope that all of the purchase orders were physically available and filed in order and choose the ones numbered with a multiple of 14, documenting if any were missing? Carolyn G, my favorite statistician, with whom I worked early in my career, told a story of auditors taking a ruler with them and pounding carpenter’s nails into it an inch and a half apart. They would go to the file drawer they needed to audit, slide the nails into the files, and pull the documents that were identified this way.  You could say they nailed the control anomaly test sampling process. After finding the 357 purchase orders, they would hope that a diligent clerk had properly filed all related vendor invoices and copies of remitted payments with each purchase order. Then they could manually total up the dollars and compare.

Transition to Electronic Records

Companies started to electronically store purchase orders, invoices, and payments. This changed the activity of auditing. No longer limited to a manual process, auditors were now able to incorporate computer techniques in their work. For me in the 1980s, I was just starting to learn how to code in BASIC on my Commodore64. However, other pioneers in the data analytics world were busy using personal computers to make the job of auditors both easier and more thorough. Two of these pioneers that I read the most when I was getting started in data analytics were David Coderre and Richard Lanza. Instead of physically/manually/awkwardly pulling a sample, these guys applied Computer Assisted Audit Tools and Techniques to perform a computerized process.

Computer Assisted Audit Techniques Review of Purchasing

  • Define the Universe Electronically – Identify a spreadsheet or electronic data file that stores information about purchase orders. Find other files that stored the invoices and payment details. 
  • Skip the Sample and Perform 100% Review – Run a straightforward SUMMARIZE command to total up all invoices and payments for totals per purchase order. JOIN the results of the Invoice Summary and Payments Summary to the list of Purchase Orders. Compare the TOTALS columns, perhaps adding a computed field that shows the difference between the authorized PO amount, the total invoiced, and the total paid. This results in a 100% review of the 5,000 Purchase Orders mentioned earlier. Additional analytics could be applied that identify UNMATCHED payments that don’t even apply to a Purchase Order but were paid anyway.

Computerized reviews became easier and more thorough as auditors applied these computer-assisted analytical techniques. Today, most audit shops use computerized techniques from MS Excel to specialized software such as SAS, ACL/Galvanize/Diligent, Arbutus, IDEA, or Alteryx. Some are even increasing their sophistication and are starting to use Python or R to apply advanced statistical methods and machine learning to their data and detect trends that a traditional sample might not reveal.

A criminal investigator CAAT with a magnifying glass to investigate fraud using data intelligence.  Includes a pipe and trench coat for stealthy investigation of data analytics jobs. Image by iirliinnaa on Pixabay

CAATs for Investigators?

Investigations teams were slower to adopt this “audit focused” computer assisted technology to their own fraud detection efforts. Some auditors make an effort to share weird anomalies with their investigator colleagues. Many times, however, by the time the audit takes place and finishes communicating the findings, the data tends to be stale. A reasonable “quick audit” might last eight months from beginning to end. An audit team could spend two months planning, four months performing fieldwork, and two months compiling the audit report. In an ideal world, the audit team may define their scope to include recent data, perhaps from the prior year before the audit starts. This means that for an audit that starts in January that focuses on the prior year’s data, the data is 8 – 20 months stale by the time the audit is complete. If any anomalies are identified during the audit tests and immediately referred to investigations, they are still stale as they come from the prior year as defined by the audit scope.

Different Goals

Investigators are the most impactful if they are able to find potential fraud as it happens. They cannot depend on stale referrals from completed audits. With an investigative dashboard that is connected to live data, they can see anomalies that exist as they occur, removing the latency for the potentially fraudulent activity.  Additionally, the queries don’t need to be based on audit techniques of testing internal controls but can be tailored to specific queries that seek out and identify suspicious individual behavior. Audits and Investigations have different goals:

A chart with sections that describe the difference between Computer Assisted Audit Techniques when applied to Audits versus Investigations data analytics jobs.  Employee Fraud and kickback scheme detection is the goal of business intelligence in investigators while auditors may be doing controls testing.

Investigative dashboards are designed with the detection of fraudulent activity in mind. Instead of only identifying a list of potential duplicate invoices, the totals can be rolled up by invoicing clerk to draw out exceptions based on employees. An auditor may set a materiality figure of $10,000 for an invoice of concern, but an investigator might be interested in many small-dollar amounts that roll up to the vendor or employee level. Connecting to live data and refining analytic questions with an investigative perspective ensures that investigators can identify emerging threats quickly.

Computer Assisted Techniques for Investigators

Today, investigators are expanding the ways that they use data analytics to change how they identify high-risk activity. An excellent example of creating and using dashboards comes from the United States Postal Service – Office of Inspector General (USPS-OIG). They build dashboards that are “letting investigators drill down for details where suspicious trends are shown.” This drill-down customization of an inquiry can happen during the individual user session of the dashboard. USPS-OIG has been empowering its investigators with data since the early 2000s. But if an investigator doesn’t have a dashboard already, how can they get started without asking their auditors for the results of completed audits?

Where Can Investigators Start?

Investigators should start with the information they know and are most comfortable with.  If you are an investigator and are familiar with issuing subpoenas to retrieve phone records, start there. If you commonly pursue healthcare fraud cases, identify the source of your referrals and look at the universe. If you have been tasked with sorting through all of the Paycheck Protection Program (PPP) Loans in your geographical area and can’t imagine how to thoroughly sort through it, that is your starting point. Does your organization worry about how employees might be abusing their Purchasing Credit Cards? Get that data!

A visualization showing investigative data analytic job areas for criminal investigators to select such as purchasing card fraud, subpoenaed phone records, health care provider billing fraud, and Payroll Protection Program PPP Loan SBA fraud and identity theft.

For smaller data sets, you can use the tools currently at your disposal such as MS Excel to profile the data. Does your organization have MS PowerBI as part of their Microsoft365 implementation? Could you try out Tableau Public? Try creating a simple visualization showing total costs per person. Identify your data, capture it, display it, and make a plan to investigate further. Keep identifying and investigating weird anomalies, adjusting to find true instances of fraud. Then, refer the common issues over to audit to test the controls and find the root cause of the problem, shutting down the fraud opportunity.

A data analytics job dashboard for investigators that shows CAATs information about SBA's Payroll Protection Program PPP Loan program including data intelligence such as Loan Status, Loan Approval, Loan Forgiveness, and counts on a map by state.

Keep Taking the Next Step

It has been a few decades since data analytics started being applied to audits and investigations. Techniques are changing and developing, much like my programming skills since the 1980s. It’s up to you to figure out where you are in the analytics process. Then take the next step to grow your understanding. Use the data you know to expand your insight beyond a manual review of records. Then do it again, take the next next step. Data Analytics and “Computer Assisted Assessment Techniques” are for both Auditors and Investigators!