Code Testing in Python: Red, Green and Refactor

You've written some code dealing with logistics data. It has to be tested before it can be deployed into production.

by Nick Whitt

Last week, we went over some tips and tricks with code typing (which can be found here). In this blog, we're going to go over some code testing processes with Python. Before we get started, here's a small bit of context. Let’s assume our system operates over logistics data grouped by date and zip code. It could look something like the following:

{
  "12345": [
    {"day": "2022-01-01", "deliveries": 1.123},
    {"day": "2021-12-31", "deliveries": 3.14},
    {"day": "2021-12-30", "deliveries": 1.618},
    {"day": "2021-12-29", "deliveries": 6.022},
    ...,
  ],
  "23456": [...],
  ...
}

All has been working well for logistics on the West Coast, but processing data for the rest of the US has caused production issues. We have tracked a series of 500 errors to their source, but our logs don’t provide much detail. Here’s the relevant code snippet:

# daily/report.py
from datetime import date

def deliveries_by_day(logistics, zips, day=date.today()):
  deliveries = 0
  for zip_code in zips:
	  for data in logistics[zip_code]:
	    if data["day"] == day:
		  deliveries += data["deliveries"]
		  break

  return deliveries

...
# logistics = {"12345": [{...}, ...], ...}
# CT_HARTFORD = {"06101", "06102", ...}
report = deliveries_by_day(logistics, CT_HARTFORD)
...

On the surface, all looks as expected: zip codes for Hartford, CT, are defined as a set of strings from which today’s count of deliveries are summed up. However, command line (i.e., manual) testing shows that python is throwing `KeyError: ‘06101’` lookup exceptions. Examining the logistics data, we find that zip code keys have all been trimmed when there is one or more leading zero, i.e.:

print(logistics)
"""
{
	...,
	"6101": [...],  # 06101
	"6102": [...],  # 06102
	"501": [...],   # 00501
	...
}
"""

Red - Green - Refactor

Now that we have identified the error, we can write a test that should pass but will fail with our current code.

# tests/test_deliveries_by_day.py
from daily.report import deliveries_by_day

logistics = {
	"9876": [
	  {"day": "2022-01-01", "deliveries": 4.56},
	  {"day": "2022-12-31", "deliveries": 3.21},
	],
}

def test_zip_starts_with_0():
	assert deliveries_by_day(logistics, {"09876"}, "2022-01-01") == 4.56

This will fail (aka “red”) just as in production:

$ pytest
...
FAILED tests/test_deliveries_by_day.py::test_zip_starts_with_0 - KeyError: '09876'
1 failed in 0.03s

Now, we want to make minimal changes to our code such that our test passes (aka “green”). If we update our method to first cast all zip codes to an integer, then cast it back to a string, it will “trim” leading zeroes from the zip code:

def deliveries_by_day(logistics, zips, day=date.today()):
   deliveries = 0
   for zip_code in zips:
-    for data in logistics[zip_code]:
+    for data in logistics[str(int(zip_code))]:
       if data["day"] == day:
         deliveries += data["deliveries"]
         break

And, voila:

$ pytest -v
...
tests/test_deliveries_by_day.py::test_zip_starts_with_0 PASSED [100%]
1 passed in 0.01.s

Now, to make it better: refactor! With a green test, we can improve the implementation. Let’s get that `break` statement out of the inner for-loop, and consider using the native `sum()` method to do a lot of work for us. We can generate a filtered list comprehension and pass that to the `sum()` method, like so:

def deliveries_by_day(logistics, zips, day=date.today()):
  deliveries = 0
  for zip_code in zips:
    deliveries += sum(
      [
        data["deliveries"]
        for data in logistics[str(int(zip_code))]
        if data["day"] == day
      ]
    )

  return deliveries

After we verify that our test is still green, we can continue to refactor the method into a nested comprehension:

def deliveries_by_day(logistics, zips, day=date.today()):
  return sum(
    [
      data["deliveries"]
      for zip_code, days in logistics.items()
		if zip_code in [str(int(x)) for x in zips]
		for data in days
      if data["day"] == day
    ]
  )

$ pytest -v
...
tests/test_deliveries_by_day.py::test_zip_starts_with_0 PASSED [100%]
1 passed in 0.01.s

Unit Tests

Our code is fixed, validated with a test, and ready to commit! But we have accidentally discovered a potential issue while writing our test: the code fails if there isn’t a “deliveries” key in the logistics dictionary. After a short discussion, the team decides to resolve this issue at the same time. Research into the logistics data shows that the “day” key is always present, but all others appear to be optional. It would be simple to just fix the code as part of our refactor, but this specific edge case deserves a dedicated (aka “unit”) test.

# tests/test_deliveries_by_day.py
from daily.report import deliveries_by_day

logistics = {
  ...
  "12345": [{"day": "2022-01-01", "foo": "bar"}],
}

...

def test_deliveries_key_not_present():
  assert deliveries_by_day(logistics, {"12345"}, "2022-01-01") == 0

The test fails as expected, so it’s time to make minimal changes so that the test passes:

def deliveries_by_day(logistics, zips, day=date.today()):
   return sum(
     [
-      data["deliveries"]
+      data.get("deliveries", 0)
       for zip_code, days in logistics.items()
       if zip_code in [str(int(x)) for x in zips]
       for data in days

Now, we have both of our dedicated (unit) tests passing:

$ pytest -v
...
tests/test_deliveries_by_day.py::test_zip_starts_with_0 PASSED [ 50%]
tests/test_deliveries_by_day.py::test_deliveries_key_not_present PASSED [100%]
2 passed in 0.01s

The JBS Quick Launch Lab

Free Qualified Assessment

Quantify what it will take to implement your next big idea!

Our assessment session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best. Let JBS prove to you and your team why over 24 years of experience matters.

Get Your Assessment