Data Testing

Data testing: an important step in writing clean and successful code.

by Nick Whitt

Data Testing

Programming frameworks typically come with embedded testing functionality that make it easy to test multiple facets of an application. These frameworks take most of the setup work out of the process; but it’s not difficult to add a robust test suite to your application, even if you’re not using a dedicated framework.

Test Strategy

Often referred to as Unit Testing, the most prevalent testing strategy involves validating the expectations of individual functions of a given module. Assume a simple function that takes two integers, x and y, and returns the difference between them, as x - y. A good unit test would be to start with known values x and y and ensure the function’s result, z is equal to x - y. Regardless of the current or future implementations of this function, so long as this test passes, we can be confident of its validity.

As the application expands, logic will become dependent on multiple components interacting together. A testing strategy involving more than a single component is referred to as Integration Testing. As an example, consider a router which, when given a web request, will properly decompose the request into a method and action, as well as dependent attributes. Out of a list of valid application methods, i.e. user, role, or admin; a request composed as user/x, where x is an integer, will generate a response of User.list(x).

Obviously, integration tests can become much more complex than unit tests. Often they require the system to be in a specific state before a test can even begin: in the example above, x must be a valid User, for whatever that means to the application. Additionally, interactions between components often involve state manipulations that could be difficult to test, or even have undesired effects. Consider testing a User’s permission before and after Role updates, but the update itself requires approval interactions between two User accounts: Employee and Manager.

Mocks

To help reduce complexity, test strategies involving fake components, or Mocks, can be employed. A mock is a testing component that replaces an application component, either in part or total. This allows the test to bypass complex or time-consuming logic that is unnecessary for the value under test. In the previous example, we don’t care to test manager approvals, so we can mock out that entire process such that, i.e. User.promote(), in our test will just return true and continue without any additional logic paths.

Mocks aren’t just for logic. Application state itself is commonly stored in a database, which must be properly established and configured such that application access is available. All of this can be simplified using a mock database connection, i.e. through an in-memory SQLite interface. This can be especially effective when paired with a Continuous Integration service, i.e. Github Actions, GitLab CI/CD, or Circle CI.

Database Tests

Consider an example app introduced in a previous article that provides baseball statistics. The data models are defined using Sequelize ORM, and our local development and deployed environments are configured to use a PostgreSQL database through Docker Compose or Kubernetes pods respectively. With local and CI testing, however, we don’t want to be bothered by long-running setup scripts or potential environment issues generating noisy errors. Instead, while testing, we’ll utilize a mock database connection to an in-memory SQLite instance.

The easiest way to generate a mock is to provide some way to override its internals. This is similar to extending a parent’s method in a child object; or providing a callback function to utilize within a wrapper. In the case of Sequelize, we can use a configuration object keyed by the environment when generating the connection:

// models/index.js
import { Sequelize } from 'sequelize';
import BatterModel from './batter';

const config = {
  development: { dialect: 'sqlite', storage: '.sqlite3.db' },
  test: 'sqlite::memory:',
};

const sequelize = new Sequelize(config[process.env.NODE_ENV ?? 'development']);

export const Batter = BatterModel(sequelize);
export default sequelize;

For our model, we’ll define a Batter such that statistics can be calculated (as methods) from observed performance (as attributes):

// models/batter.js
import { Model, DataTypes } from 'sequelize';

class Batter extends Model {
  /**
   * Percentage of hits by total at-bats
   */
  battingAverage() {
    const avg = this.hits / this.atBats;
    return (avg === Infinity ? 0 : avg).toFixed(3);
  }

  /*
	 * Number of bases gained by hits
   */
  totalBases() {
    return this.hits + this.doubles + this.tripples * 2 + this.homeRuns * 3;
  }

	/*
   * Number of bases recorded per at-bat
   */
  sluggingPercentage() {
    const slg = this.totalBases() / this.atBats;
    return (slg === Infinity ? 0 : slg).toFixed(3);
  }
}

export default function (sequelize) {
  return Batter.init(
    {
      name: DataTypes.STRING,
      atBats: DataTypes.TINYINT,
      runs: DataTypes.TINYINT,
      hits: DataTypes.TINYINT,
      doubles: DataTypes.TINYINT,
      tripples: DataTypes.TINYINT,
      homeRuns: DataTypes.TINYINT,
      runsBattedIn: DataTypes.TINYINT,
      walks: DataTypes.TINYINT,
      strikeOuts: DataTypes.TINYINT,
    },
    { sequelize }
  );
}

With Jest, we can start to define our tests. Though technically integration tests, as the models require interaction with the database, our mock connection allows us to treat each one as if it is a unit test. To prevent any one test from influencing another, we can use the beforeEach Jest hook to rebuild the database before each test.

// models/__tests__/batter.js
import models, { Batter } from '..';

beforeEach(async () => {
  await models.sync({ force: true });
});

describe('Batters', () => {
  test('batting average', async () => {
	  expect(
		Batter.build({ atBats: 10, hits: 5 }).battingAverage()
	  ).toBe('0.500');
  });

	test('total bases', async () => {
    expect(
      Batter.build({
        hits: 16,
        doubles: 8,
        tripples: 4,
        homeRuns: 2,
      }).totalBases()
    ).toBe(38);
  });

  test('slugging percentage', async () => {
	  expect(
      Batter.build({
        atBats: 64,
        hits: 16,
        doubles: 8,
        tripples: 4,
        homeRuns: 2,
      }).sluggingPercentage()
    ).toBe('0.594');
  });
}

Running our suite confirms that all tests have validated the expected behavior.

$ npm test

 PASS  models/__tests__/batter.js
  Batters
    ✓ batter instance (379 ms)
    ✓ batting average (64 ms)
    ✓ total bases (55 ms)
    ✓ slugging percentage (46 ms)

Test Suites: 1 passed, 1 total
Tests:       4 passed, 4 total
Snapshots:   0 total
Time:        2.489 s, estimated 3 s
Ran all test suites.

Model Factories

As model complexity grows, it becomes more difficult to test basic functionality; especially when associated models become involved. To help make this easier, specialized mocks known as Factories can be used to ensure the application is in a valid state. We can define a BatterFactory using Fishery:

// models/factories/batter.js
import { Factory } from 'fishery';
import { Batter } from '..';
import { faker } from '@faker-js/faker';

export const BatterFactory = Factory.define(({ onCreate }) => {
  onCreate((batter) => batter.save());

  return Batter.build({
    name: faker.name.lastName(),
    atBats: faker.datatype.number(99),
    runs: faker.datatype.number(99),
    hits: faker.datatype.number(99),
    doubles: faker.datatype.number(99),
    tripples: faker.datatype.number(99),
    homeRuns: faker.datatype.number(99),
    runsBattedIn: faker.datatype.number(99),
    walks: faker.datatype.number(99),
    strikeOuts: faker.datatype.number(99),
  });
});

We can now update the Batter tests to import BatterFactory, and replace all calls of Batter.build() with BatterFactory.build(). Notice that we use Faker to generate realistic data within our model; any values not provided explicitly to the .build() command will be populated with this fake data, including other factories.

import { PlayerFactory, GameFactory } from './';

export const BatterFactory = Factory.define(({ onCreate }) => {
  onCreate((batter) => batter.save());

  return Batter.build(
	  {
    	...
	    player: PlayerFactory.build(),
      game: GameFactory.build(),
    },
    { include: [models.Player, models.Game] }
  );
});

With factory models in place, along with our database mock providing consistent starting state, we no longer have to concern ourselves with manual state manipulation for test setup; our factory will produce a valid model for testing—including necessary related models—allowing for custom data manipulation where needed.

Conclusion

While not necessarily trivial, including a testing suite in your code base is not difficult; even without an embedded framework. Proper use of mocks and factories will help reduce data/state manipulation, even as application logic grows in complexity.

The JBS Quick Launch Lab

Free Qualified Assessment

Quantify what it will take to implement your next big idea!

Our assessment session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best. Let JBS prove to you and your team why over 24 years of experience matters.

Get Your Assessment