Case Study – Washington, DC Purchase Card Transactions
The City of Washington, DC provides public data sets to encourage residents’ engagement and promote transparency. They have over a thousand different data sets in many areas, from construction permits to parking tickets. They also have a data set that covers the purchase card use for their 89 City Agencies that goes back more than ten years. For the case study, I’m limiting the data set to the available data since Mayor Muriel Bowser has been in office. It covers January 2015 to January 2022.
Accessing the data was easy, due to The City’s Open Data Catalog. The Purchase Card Transactions can be found here. The format provided the following data fields: Transaction ID, Agency, Transaction Date, Transaction Amount, Vendor Name, Vendor City, and Merchant Category Code Description. These seven fields can provide us with some initial insights and then allow us to pose future questions to be answered later.
Download and Import
Downloading the data produced a comma-delimited file *.csv with a header row and 468,035 records. It was fairly small, weighing in at just under 60 MB. With a *.csv file, you have the option to bring it into MSExcel easily. However, my analytics software of choice was Arbutus Analyzer for this project. It was a simple import, using the already existing header row values. However, besides seeing the records as you would in Excel and being able to graph and use pivot tables, Arbutus protects the data from accidentally being changed as you work with it, maintaining the integrity of the data file. ACL Desktop Analytics by Galvanize/Diligent has a similar platform, and both are excellent choices for automating this type of import and further analysis.
Data Preparation and Export File
I knew that I wanted to add a few additional computed fields to the data based on the seven existing fields. Here are three fields that I created:
- Upper MCC – This field standardizes the Merchant Category Code so that it is always displayed in uppercase. Because computers are very literal about how they evaluate field values, without this field, the values of “Hardware Stores” and “HARDWARE STORES” would be considered different.
- MCC Interesting – I created a flag for MCC codes of interest, those that I wanted to be sure to take a look at later. I was curious about the details related to The City’s spending on chiropractors, podiatrists, and other medical services that would typically fall under a health care plan instead of spending on a purchase card. I also wanted to learn more about Video Game Arcades and Pool Halls. Perhaps these were teambuilding activities, after-school activities, or maybe a restaurant visited during work travel. Other interesting categories included places to purchase vehicles, toupee shops, liquor, bail and bond, tax preparation, and child care. To define this list of “interesting codes,” a group of people familiar with the policies surrounding purchase cards can engage in a brainstorming session. They can propose and rule out different categories based on their knowledge.
- MCC Interesting Category – I noticed a theme among the different “MCC Interesting” codes and created an overall category that helped to show WHY they might be interesting. Categories included “Casino, Entertainment, Personal Supplies, Healthcare, Funeral Services, Vehicles, Funeral Services, Legal Issues, and Political / Religious.
- Hotel Indicator – I created an indicator for MCC Codes related to travel, such as hotels and airlines. This prepares the data to be analyzed for travel anomalies and also allows the exclusion of these transactions during a search for duplicate transactions. For example, if a group of employees attends a conference, it is typical that multiple transactions will occur for their hotel and airline charges on the same dates.
Once the additional fields were added, I limited the file to transactions since January 2015, then exported the resulting 274,243 records to a text file using the “|” pipe character to delimit the fields. This is my input file for Tableau.