Building ad hoc data workflows with SQL, Python pandas and Excel at Hammerton

Summary

Hammerton is a high-end decorative lighting manufacturer based in Salt Lake City, Utah. My work at Hammerton as a Financial/Data Analyst has consisted of ad-hoc reporting and analysis work on a diverse series of projects to analyze and improve business processes.

These projects include, but were not limited to:

Repricing over 30,000 SKUs of the company's main product line under time pressure. This involved calculating product costs and gross margins by calculating the cost of overhead to produce a product + direct material costs, and acting as a project manager coordinate with multiple departments and executive management to figure out what we will sell and at what price.
Building dashboards for Quality to analyze service order data.
Updating database fields to simplify the chart of accounts for Accounting.
Creating a randomly sampled list of inventory to audit inventory counts.
Reconciling discrepancies between purchase orders (what we ordered) and accounting (what we think we paid for) from legacy systems.
Delivering ad-hoc reports to Sales teams to measure changes in sales across different product lines and retailers.

I have used a combination of SQL, Python pandas, Excel and Power BI as my primary tools during my time here, and I have been loving learning and using pandas.

One problem I needed to solve was how to handle changes to my projects as I was executing them. For example, on my first project, I was doing everything in Excel and SQL. I used queries to transform data in Excel and output some analysis. That was fine, but if requirements changed for a step toward the beginning of the workflow, I would have to redo everything! I got the idea to start using pandas because you can define the steps once and then hit a button to re-run the entire process. I have been a huge fan of using pandas for workflows/pipelines instead of having to redo an entire workflow by hand. It doesn't take much longer than just using Excel, and it saves hours of rework. And the next time you need to do the same project, you have the python pandas script there already available to be run. Every step is documented in markdown within the IPython/Jupyter notebook, making the documentation and workflow a single document. This is a perfect solution for quickly rerunning common data tasks.

I will just highlight one project here where I was able to have some impact positive on the business.

The problem

Management wanted to close Q1 2023 on 4/14/23, 2 weeks after the quarter end. To do that, with limited resources, Accounting needed my help to clean and process expense data and prepare it for ingest into accounting software. This meant doing work that used to take one person 3 months, in 5 days. This meant completing 3 projects in quick succession: moving inventory that had been received out of a China asset account, reconciling Amex transactions with expense reports and preparing expenses from the Mexico factory.

Check China materials asset account
Reconcile credit card transactions with expense reports
Organize and import Mexico expenses

For each project, I wrote a python script to pull data, clean it, and export it in a useful format for reporting, analysis and ingest into other software.

Process

First I outlined the steps that would be required to do the project and did the necessary imports.

The first step was to pull the list of excel sheets for the relevant time period. There were exceptions so I needed to handle the normal ones and the building lease invoices separately. To make sure I wasn't missing data I wrote a for loop to print out all the sheet names. You can see some include a "Taxes" tab which would have been bad to miss.

The next step was to iteratively read the files into data frames and clean the data a bit. I ignored the first 15 rows, selected columns 1-6, rows with empty values, filtered data, labeled each row with the file it came from, manipulated dates and dollars to specific formats, and combined it all into a single dataframe. I did this 3 times for rent, taxes, and other expenses.

And finally I combined these 3 data frames into one.

I won't drag on with showing the details of the rest of the project, but from here the process was to pull in purchase order data from a database, merge the data using purchase order numbers, show where things did and did not match up, identify which data needed to be imported, and filter that data to create an import file for the accounting system to make a journal entry.

Using sets to discover process breakdowns

Part of this project was considering all the exceptions that could happen. For every expense, the expense needed an invoice, a purchase order number in the invoice, a matching purchase order in the database, and to be imported into the accounting system. Using a little bit of set theory with a little python program and excel I identified 10 sets of logical possibilities where the process could break down, and worked with the team to identify and fix them. For example, an expense could have an invoice, but not have a purchase order number in the invoice. But the invoice could exist in the database, in which case if we imported it we'd be double counting that expense. We used this list to proactively uncover issues and prevent mistakes.

Tools

I am a heavy user of Chat GPT and Bard to rapidly iterate and learn. Without it, the level of fast learning that I have been able to achieve here without mentorship would have been impossible. I've been able to go from knowing very little python to being able to know a decent amount of operations in Python and libraries like pandas and openpyxl.

Future

Now that I have learned the basics of manipulating data frames, pulling data from multiple sources like databases and CSVs, I am interested in getting into deeper statistical analysis, data visualization and delivering insights. Most of what I have done at this company has been data wrangling, which is fun because it was so needed at this company. And I think I did it well.

alder.stream

Building ad hoc data workflows with SQL, Python pandas and Excel at Hammerton

Comments

Post a Comment

Popular posts from this blog

Automating a high-touch financial process in Python at Angel Studios

Thinking critically about your work