About the Client
A financial information services company that creates tools to analyze, access, and manage risk in the peer-to-peer lending sector.
The client needed an innovative solution that gives users the ability to review various loan metrics in the form of aggregated reports to determine the overall performance of loans to make informed decisions on how to invest. In addition, the solution also needed to allow users to view metrics specific to their portfolio of loans.
JBS created an extensive multi-tenanted system for peer-to-peer loan analytics. The tenants in this system were the major loan originators in the peer-to-peer loan market. In addition to the functionality provided for each tenant, there was also functionality created to deliver inter-tenant analytics to specific customers.
JBS's solution imported loan and payment processing data and datasets that were into the tens of millions of records. In all, the system was processing more than one hundred million records for the various batch and near real-time analytics.
There were two major parts of this system. The first consisted of an extensive ETL process created around each of the originator's loan and payment data. JBS created a flexible and extensible process since there was no commonality between input formats between loan originators. Functionality was implemented to clean and process the data – type checking, data scrubbing, and data normalization. Data normalization was essential because the different originators' monetary amounts had different precision and accuracy. Since the system's validity is based in part on all dollar amounts matching to the penny with the various systems of record, careful attention had to be paid. Another vital part of the ETL process was the reshaping of data. As mentioned, the data from the various tenants came in different schemas, and JBS needed to reshape and transform the data into a common, shared schema that would support all the necessary functionality to provide ease of query construction and performance of data retrieval operations.
The second major part of the system was focused on the actual data analysis. Using a combination of classical statistical inference, Bayesian statistics, and machine learning methods (both supervised and unsupervised methods), JBS could create intelligent data analysis that provided the basis for better and more accurate decisions at both the strategic and operational levels for all tenants in the system.
Distribution curves, weighted averages, Linear Regression (Gradient descent) K-Means clustering, Decision Trees and other techniques were used in this process. A standard data set was created from data sampled from all the tenant data sets and used to train the various algorithms used and as a control for testing validity. In addition, data validity was checked against each tenant's own reference data and information from its own systems of record.
Results and Benefits
As a result of the solution JBS built, this financial services company has a machine learning-based analytics engine that is intelligently able to assess, forecast, and determine the risk factor of each loan.
The client was able to obtain additional rounds of funding and have a robust platform and technology foundation in place to continue building and growing their customer base, which they continue to do today.
AWS EMR, AWS Glue, AWS Redshift, Python, Django, D3, and additional data processing libraries