Cache Store: Skip The Wait

When application speed is of concern, the biggest increase in benefits come from reducing I/O calls to files and databases, reducing method computation time by reusing previously calculated results.

Data Storage Without the Storage

A cache store is used primarily to persist short-lived data for high-speed retrieval. Most often, this is accomplished in hardware components (RAM) as opposed to on-disk file stores. A cache layer can sit between an application and a durable data layer: prior to performing some costly database retrieval, computational calculation, or API request, the cache can be queried for the result. If the cache already has the value (known as a cache hit), there is no need to perform the costly operation and a short-cut can instead provide the response directly from cache. If the cache does not yet have the data (a cache miss), the expensive operation will be performed and its response stored to cache in addition to returning the response. As data storage is much more limited in a cache store, the oldest data will typically be purged to ensure adequate space for new data.


Problem: Planetary Distance


Let’s start out with an astronomy refresher: our Solar System’s four inner planets, in order from the Sun, are Mercury, Venus, Earth, and Mars. The distance of each of these planets from the Sun varies due to their elliptical orbits; but the order never changes as their orbital paths do not cross. However, their order from the Earth will vary wildly as their paths bring them closer to and farther from each other. This means, for any given point in time, we can calculate which celestial body is closest to the Earth: Mercury, Venus, Mars, or the Sun.

Note: the four outer planets (Jupiter, Saturn, Uranus, and Neptune), though also varying in distance due to orbital paths, will never alter in order from Earth or the Sun due to their large distances from either.

We’re going to write a back-end python component that will empower an imaginary application to display the order of the inner planets and the Sun from the Earth. Our component’s sole responsibility is to calculate, for a given time, the distance of Mercury, Venus, and Mars to both the Earth and the Sun. To do this, we will make use of pre-calculated tables published by NASA’s Jet Propulsion Laboratory (JPL) giving planetary positions from 1900 through 2050. We’ll use Skyfield to handle additional calculations. For simplicity, we’ll generalize the solution to always use the current timestamp for calculations.

import json
from skyfield.api import load

t = load.timescale().now()
jpl = load("de421.bsp")

def distance_of(source, dest):
    return source.at(t).observe(dest).apparent().distance()

response = [{"earth": {"sun": distance_of(jpl["earth"], jpl["sun"]).km}}]
for p in ("mercury", "venus", "mars"):
    response.append(
        {
            p: {
                "earth": distance_of(jpl["earth"], jpl[p]).km,
                "sun": distance_of(jpl[p], jpl["sun"]).km,
            }
        }
    )

print(json.dumps(response))

The script first loads the JPL file, downloading it locally if not already available. Then, using the exact position of Earth or the Sun at a given point in time, we calculate the apparent distance (taking into account gravity and light distortions) to each planet. We generate a JSON list of objects, where each object is a planet’s distance in kilometers from the Earth and/or the Sun. An example response follows.

[
  {"earth": {"sun": 148966428.22571564}},
  {"mercury": {"earth": 187142995.568854, "sun": 53301981.50119822}},
  {"venus": {"earth": 256886561.059272, "sun": 108004745.43671818}},
  {"mars": {"earth": 100954164.32083812, "sun": 220127226.7952331}}
]

We can imagine a front-end component using these values to show the following order out from Earth:

Mars (100.954 Mil km) -> Sun (148.966 Mil km) -> Mercury (187.143 Mil km) -> Venus (256.887 Mil km)

Planets move fast: even if we consider only microsecond updates to the timestamp t, we’ll see different values for the distances in our calculations. Space, however, is big: the distance a planet travels in a second is quite small when compared over 100’s of Millions of kilometers. As such, let’s reduce our timestamp’s precision from 1 microsecond to 1 minute. With this change, we can take advantage of a cache store: requesting the values from the previous calculation if they have yet to expire.

from datetime import datetime, timezone

t = load.timescale().from_datetime(
    datetime.now(timezone.utc).replace(second=30, microsecond=0)
)

Cache stores can be referred to as key-value stores. This means that a single key will reflect an entire cache value. Effective keys will include enough information to be unique within the lifespan of the data value. As the timestamp is what defines a unique calculation, we can use that as the cache key to refer to the json response value.

Memcached Solution

Cache stores vary widely, but one popular system is memcached. Setup and configuration for production environments is outside our scope, but local development is quite easy to start with via docker image. Add the following service configuration to a local docker-compose.yml file.

cache:
  image: memcached
  ports:
    - 11211:11211

You can start the docker services with docker compose up -d. Modify the script as follows to use the pymemcache client.

import json
from datetime import datetime, timezone
from pymemcache.client.base import Client
from skyfield.api import load

cache = Client("localhost")

t = load.timescale().from_datetime(
    datetime.now(timezone.utc).replace(second=30, microsecond=0)
)

def distance_of(source, dest):
    return source.at(t).observe(dest).apparent().distance()

response = cache.get(t.utc_iso())
if response is None:
    jpl = load("de421.bsp")

    response = [{"earth": {"sun": distance_of(jpl["earth"], jpl["sun"]).km}}]
    for p in ("mercury", "venus", "mars"):
        response.append(
            {
                p: {
                    "earth": distance_of(jpl["earth"], jpl[p]).km,
                    "sun": distance_of(jpl[p], jpl["sun"]).km,
                }
            }
        )

    cache.set(t.utc_iso(), response, expire=60)

print(t.utc_iso(), json.dumps(response))

In the case of a cache miss, the script will continue as before and calculate the distances, which will then be stored in the cache. Note: an explicit expiration time of 60 seconds is provided for this cache.set() call; memcached will automatically purge this key/value pair from the store once the expiration time has elapsed. If a cache hit occurs, there is no need to perform the calculation again, and instead we skip right to the response. However this code will throw the following error with a cached response:

TypeError: Object of type bytes is not JSON serializable

If you look closely at the cache.set() command, you’ll see that response has the type list[dict]. Unfortunately, memcached can store keys and values only as string; pymemcache tries to help by serializing response to a byte string. This is what is causing the error: trying to convert that byte string back into a json string. Instead, we can pickle the response object by configuring the pymemcache client to use a serde (serializer/deserializer) object to ensure code typing.

from pymemcache import serde

cache = Client(“localhost”, serde=serde.pickle_serde)
...
response: list[dict] = cache.get(t.utc_iso())

Cache Hit

While primarily used for session storage, cache stores can benefit application response times, as well. Downsides such as non-persistent storage is easy to overcome for use cases with reproducible operations; and the in-memory implementation of distributed systems is often orders of magnitude faster than their storage-dependent alternatives. Consider a cache-based solution for your next resource-heavy feature set!

The JBS Quick Launch Lab

Free Qualified Assessment

Quantify what it will take to implement your next big idea!

Our assessment session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best. Let JBS prove to you and your team why over 24 years of experience matters.

Get Your Assessment