Taking a Data Warehouse to the Finish Line for a Global Testing Organization

A Clear Path for Quickly Accessing Comprehensive Data to Leverage Its Vital Business Value

Overview

This private, non-profit educational testing and assessment organization—a global leader in its industry—is committed to advancing quality and equity in education by providing fair and valid assessments, research, and related services.

The organization provides a diversity of products and solutions that measure knowledge and skills—as well as promote learning and performance—all while supporting education and professional development endeavors worldwide. It is best known for its long-standing work developing, administering, and evaluating the results of popular standardized tests. Government agencies, states, and organizations routinely hire this company to develop and manage critical assessment programs on their behalf.

The company also provides research and development (R&D) consulting services to address psychometric measurement and training needs that involve significant statistical analysis and test design. They are also instrumental in developing EdTech solutions targeted at fueling an organization’s success. Moreover, this organization has its own research centers and labs where they experiment by developing innovative solutions using artificial intelligence (AI) along with rapid prototyping.

Challenges

A critical component of the work produced by this organization necessitates upholding the highest standards for quality to deliver valid, reliable, timely, and accurate information. The organization makes great use of data that it collects about tests, assessments, and test taker demographics to glean valuable insights to continually improve its products, services, and solutions.

Within the company, the psychometric analysis and research team currently relies on a mix of company-developed and commercial-off-the-shelf (COTS) technology to capture and evaluate data for high-level operational statistical analysis. This core team also has an extensive and comprehensive Amazon Web Services (AWS)-based data warehouse, sometimes referred to as data lake.

The biggest challenge facing the team was how to best extract data from data lake so it could be fully leveraged for its inherent business value from both a research and operational standpoint. Staff could already easily ingest, transform, and load data to data lake; however, they were unable to easily extract the data without the assistance of highly-skilled developers and IT specialists using manual Structured Query Language (SQL) processes. This was continually done on an ad hoc basis that was both excruciatingly sub-optimal and inefficient for all involved.

More specifically, the difficulty with getting data out of the warehouse was largely the result of a lack of all the following:

  • Automation
  • Flexibility
  • User-friendly interface tools
  • Development of reproducible/reusable applications and processes that would provide a clear path to different deployments and environments for multiple use cases

As a result of these data extraction challenges, the team was also restricted in its ability to move data between multiple work-related environments in a self-servicing, expeditious, and efficient manner.

Ultimately, they wanted their non-IT employees—researchers, statisticians, and analysts—to be able to effectively work with this data directly from the warehouse on multiple and contemporaneous projects without the constant need for IT support or intervention to make that happen.

How JBS Helped

In early 2021, through a mutual acquaintance, the testing organization connected with JBS Solutions to discuss the data warehouse access challenge that they were trying to effectively navigate for a best practices solution. Joshua Miller, JBS Solutions’s architect and team lead, said “A fast solution to take this testing company to the finish line with its data warehouse was a business imperative. This extraordinarily talented group of individuals needed to have an easier way to access the valuable data collected to perform statistical analysis, capture valuable insights, and provide research for their global customers.”

After multiple discussions and a refined proposal, JBS Solutions laid out a clear path to bring the company closer—in the shortest timeframe possible—for data lake to become directly available for organization users. The company partnered with JBS Solutions with the understanding that this solution would first offer a “proof of concept” or minimum viable product (MVP) demonstrating its future technical and financial viability.

To build out and productionize the proof of concept, a team led by Joshua Miller was assembled that also included a senior engineer from JBS Solutions. Along with them, the testing organization dedicated three more resources to the project—two Java developers and a quality assurance (QA) expert. This core squad of five would report in to the IT manager of the testing company.

In addition to the proof of concept, the squad would also assist with some of the design and architecture for the existing data warehouse as well as the currently functioning extract, transform, load (ETL) pipeline. Also, the squad would provide support for security, data governance, fault tolerance, tuning, and optimization where needed.

AWS Cloud Development Kit (CDK)

Getting underway, JBS Solutions followed Infrastructure as Code (IaC) best practices when implementing the first AWS Cloud Development Kit (CDK) setup for an entire project at the organization. The AWS CDK is an open-source software development framework to define cloud application resources using familiar programming languages.

IaC helps to reduce bottlenecks and enables self-service environments across an application delivery pipeline using version control and consistent infrastructure. This setup would help the testing organization achieve their DevOps goals of automation, self-service, error reduction, and time-consuming rollbacks as JBS Solutions built out the proof of concept.

Flexible and Reusable Code

With the development of the proof of concept, the organization wanted to move to more flexible technologies and languages with Python being a major focus. While the squad did feature several talented Java developers, it did not have many experienced Python engineers which presented a little bit of a short-term challenge.

To meet these challenges head on, JBS Solutions assisted multiple squads with writing mature and enterprise-quality production code using Python. JBS Solutions also quickly re-factored several Python applications to focus on Pythonic methodologies.

The benefits of using Python were faster times to production and useability while leveraging open-source resources. Overall, the intent was to implement code in such a way as to make it replicable for other purposes—sometimes coined as “Don’t Repeat Yourself” or DRY. Basically, the move to Python was focused on getting a working product out the door quicker.

While Python was used to develop new applications, JBS Solutions did support some of the existing Java applications within the organization. To do so, JBS Solutions used Docker images for Java applications—a widely used way to containerize applications and create pre-defined environments that are replicable. This had the effect of drastically reducing build time to deployment from approximately 46 minutes to 56 seconds.

DevOps

The testing company was new to owning their own DevOps and deployment processes. Jenkins—an open-source continuous integration/continuous delivery (CI/CD) automation tool—was present but not implemented in a maintainable manner. There had been a desire from the company’s leadership to replace Jenkins with Gitlab CI to make their DevOps structure more functional going forward.

Accordingly, JBS Solutions piloted deployments of Gitlab CI—a DevOps platform that combines the ability to develop, secure, and operate software in a single application. While doing so, JBS Solutions embraced containerization to show other teams how Gitlab CI could be effectively leveraged for independent and fast builds along with testing and deployments.

Results

In September of 2021, JBS Solutions delivered the proof of concept to the testing organization: a new query Application Programming Interface (API) that provided the first external programmatic interface to the company’s data warehouse. “They were extremely pleased with the proof of concept,” said Joshua Miller. “While it was initially designed to be a minimum viable product, we will be expanding on this concept to ensure that all non-IT users will be able to easily self-service the data warehouse.”

In addition to the delivery of the proof of concept, JBS Solutions further pushed the organization to the finish line by being an early adopter of the AWS CDK / IaC setup. By doing so, they were able to tackle many blockers with the AWS account by working directly with engineering. These included pushing for new DevOps, feature implementations, and writing code that could be easily leveraged by other groups.

JBS Solutions also implemented an “orchestration engine” to tie together several of the testing organization’s services. Essentially, it acted as the glue for an end-to-end analysis starting at the user interface all the way to the various systems that provide data and processing to deliver reports and business intelligence back to the end-users.

As a result of the exceptional work completed by JBS Solutions, the testing organization recently extended its partnership to other related software projects.