With the popularity of Python exploding over the past few years, especially in the machine learning front, Python developers have implemented type hints. With Python type hints, we can better express data expectations within our code. Consider the following function definition:

from datetime import date

def deliveries_by_day(logistics, zips, day=date.today()):
	pass

Without additional context, how do we know what format the function expects for each of the parameters? What is to be returned from the function? Compare this to the same function definition with type hints:

from datetime import date

def deliveries_by_day(logistics: dict, zips: set, day: date = date.today()) -> float:
	pass

This gives the developer a lot more information, while not affecting the runtime in any way (see note below). Additionally, static type checkers such as mypy can be used to help identify potential bugs in your code; often providing feedback directly within your IDE of choice:

deliveries_by_day("foo", "bar")
# Argument 1 to "deliveries_by_day" has incompatible type "str"; expected "Dict[Any, Any]"
# Argument 2 to "deliveries_by_day" has incompatible type "str"; expected "Set[Any]"

Note: This will not prevent code from executing in the Python runtime. The following code will fail static typing, but run without errors:

t: list[str] = [1, 2, 3]

Generic Alias Type

Typing can be used to express complex data structures such as nested mappings or objects. Consider from a previous post the data structure shown below:

{
  "12345": [
    {"day": "2022-01-01", "deliveries": 1.123},
    {"day": "2021-12-31", "deliveries": 3.14},
    {"day": "2021-12-30", "deliveries": 1.618},
    {"day": "2021-12-29", "deliveries": 6.022},
    ...,
  ],
  "23456": [...],
  ...
}

We could better represent this in Python with a `GenericAlias` object defined using standard library types:

from typing import Union

logistics = dict[str, list[dict[str, Union[str, float]]]]

This creates an alias type of `logistics`, which is a dict mapping strings to lists of dicts, themselves of which are strings mapped to either strings or floats. (`Union` is a type that allows us to say “one of these types;” it can also be expressed as `X|Y` since version 3.10.)

That’s a bit messy; but we can nest alias types, too:

from typing import Union

logistics = dict[str, Union[str, float]]
data = dict[str, list[logistics]]

Note: this syntax is valid only since version 3.9. For 3.8 and below, the use of abstract collection types such as Mapping or Sequence are required:

# Python <3.9
from typing import Mapping, Sequence, Union

logistics = Mapping[str, Union[str, float]]
data = Mapping[str, Sequence[logistics]]

Returning to our function definition from above, we can now better express the annotation of our parameters:

from datetime import date
from typing import Union

logistics = dict[str, Union[str, float]]
data = dict[str, list[logistics]]

def deliveries_by_day(
  logistics: data,
  zips: set[str],
  day: date = date.today(),
) -> float:
	pass

Pydantic

Consider the assumption from a previous post where data was expected in one format, but delivered in another. We knew that the data was grouped, or keyed, by zip code; but discovered that the keys did not contain leading zeroes. Thus data expected for zip code 06101 was actually keyed to 6101. Additionally, the daily deliveries element within the zip code grouping was discovered to sometimes be missing when assumed required.

print(data)
"""
{
  "12345": [...],
	...,
	"6101": [  # 06101
    {"day": "2021-12-31", "deliveries": 3.14, ...},
    {"day": "2022-01-01", ...},  # no deliveries key
    ...
  ],
}
"""

This is an all-too common experience with data sources, and many of those will be outside of your control, i.e. consumed over API or from third-party providers. Pydantic is a tool that uses Python type annotations to easily validate data definitions, using syntax nearly identical to native dataclasses. This allows us to convert the generic alias for `logistics` above as such:

from datetime import date
from pydantic import BaseModel
from typing import Optional

# logistics = dict[str, Union[str, float]]
class Logistics(BaseModel):
  day: date
  deliveries: Optional[float] = None

  class Config:
    extra = "allow"

data = {"day": "2022-01-01", "deliveries": 3.14}
logistics = Logistics(**data)
#> Logistics(day=datetime.date(2021, 12, 31), deliveries=3.14)
logistics.day.isoweekday()
#> 5

try:
  logistics = Logistics(deliveries="foo")
except ValidationError as e:
  print(e)
"""
2 validation errors for Logistics
day
  field required (type=value_error.missing)
deliveries
  value is not a valid float (type=type_error.float)
"""

Data elements will be validated on instantiation: i.e. invalid data will throw exceptions; day will be cast to a date object; deliveries, when present, will be cast to a float; and extra elements (all optional) will be allowed without additional validation.

Json object typing:

from pydantic import BaseModel, Json

class SourceData(BaseModel):
    obj: Json[dict[int, list[Logistics]]]

source_data = SourceData(obj=source)
"""
SourceData(
  obj={
    12345: [...],
    ...,
    6101: [Logistics(day=datetime.date(2021, 12, 31), deliveries=3.14), ...],
    ...,
  }
)
"""

If you would like to look into testing your code, check out our next blog Code Testing in Python: Red, Green and Refactor.

The JBS Quick Launch Lab

FREE 1/2 Day Assessment

Quantify what it will take to implement your next big idea!

Our intensive 1/2 day session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best, and all for FREE. Let JBS show you why over 20 years of experience matters.

Get Your FREE Assessment