What's New From AWS reInvent 2020

As an AWS partner, we rely constantly on AWS services to help us build solutions for our clients that are manageable, scalable, and fault-tolerant. At the end of every year, AWS has its annual event, reInvent. It's where they announce all the latest developments in the AWS ecosystem - everything from entirely new services to new features and enhancements of existing ones. As so many of our clients and solutions utilize AWS as their cloud provider, we take a special interest in reInvent; specifically looking for anything new that will help us provide more value to all of our clients. After going through all 145(!) announcements for 2020, here are some that we're particularly excited about at JBS and feel will really help provide value to our clients. Links to all the AWS announcements mentioned will be given at the end.

AWS Lambda

As a serverless compute platform, we use Lambda for everything from building scalable, asynchronous workloads and processing tasks to hosting full-blown microservice-based applications - and everything in between. AWS Lambda is a big part of our solutions toolkit at JBS. With that said, at reInvent this year there were quite a few announcements regarding AWS Lambda, here are some that we feel were particularly important for us.

First and most importantly, is that Lambda now supports container images as a packaging format. Originally, Lambda had its own configured image that was run and this image included many popular third-party software libraries typically used in many common development tasks. However, there is always a need for other not included libraries or modules, and getting Lambda to support additional third-party libraries, modules, packages or other software outside of the ones included in the base image could be problematic. In addition, including extra third-party software could make deployments problematical by introducing further unneeded complexity to the process (such as with external dependencies). By now supporting docker images as packaging/deployment artifacts, all these problems are eliminated. This allows for much simpler, robust, and easier deployments and configuration of additional software that may be needed in the Lambda environment.

Next and while not quite as big an impact as the above is that Lambda now supports up to 10 GB of memory and up to 6 virtual CPU cores. Having the extra memory and processing power available give Lambda much more flexibility with the kinds of workloads it can be used for. ETL, ML, and other computationally complex workloads are now much more viable for Lambda - and all the benefits of going serverless.

Redshift

AWS managed data warehouse platform, Redshift is another key technology in our data solutions we build at JBS. while there were many announcements targeting new features and enhancements to Redshift, here are three that we thought were particularly interesting for us:

  • Automatic Table Optimization
    • Data in Redshift tables are usually organized (at the physical level) via sort keys and distribution keys. Often times at table creation time, it's not always obvious which fields would be proper candidates for these keys. Also, based on changing query patterns or other operational needs, the initial choices may need to change or be altered. All of this requires manual intervention of some sort and making wrong choices can wreak havoc on query performance. With Automatic Table Optimization, Redshift can track and observe workloads and query patterns, and also with the help of machine learning, figure out the optimal sort and distribution keys for any specified table. The other obvious benefit is the further reduction in administrative overhead needed to manage a Redshift cluster. This is a huge win.
  • Support for native JSON and semi-structured data
    • Over time, JSON has become one of the most ubiquitous ways of storing text/string-based data. However, due to its nested, semi-structured format, it's difficult to work within more relational-oriented data stores. There are many ways of working with JSON data in a relational model, but for many reasons, none of them are optimal. With the addition of the new SUPER data type in Redshift, Redshift now has native support for working with JSON data as a first-class citizen. This includes very fast array navigation and unnesting of JSON structures and also supports joins and aggregate operations. For ETL purposes, it also supports dynamic type inference and schemaless semantics so importing JSON and other kinds of semi-structured data will result in much easier pipelines and transformations - in many cases eliminating any existing code related to flattening or "schematizing" JSON documents.
  • Redshift ML
    • Redshift ML uses AWS SageMaker (AWS's managed Machine Learning service) to allow data warehouse users to create and manage machine learning models using SQL and without having to move your data out of Redshift and into some other ML-specific service. This opens up the power of ML to many other data practitioners other than trained data scientists and allows for models to be directly used in queries and reports. It also has the option of picking the most appropriate model to use based on data or use case too (you can also pick specific models if needed). With this new functionality, it's possible to almost completely remove much of the complexity typically seen in most machine learning pipelines and workflows. The ability to build, train, and deploy ML models in SQL and have them be directly run on data in an existing Redshift cluster can be a phenomenal win and an incredibly low-barrier way to start incorporating machine learning into your enterprise.

Miscellaneous

Aside from Lambda and Redshift, which are two important services and technologies for us, there were many, many other announcements that covered the entire spectrum of services that AWS offers. Here are some additional announcements organized by service that we feel are worth mentioning and that we knew we will be leveraging heavily in the future:

  • S3
    • S3 was one of the very first services offered by AWS, but that doesn't mean that it hasn't continued to evolve and grow over time to accommodate changing needs. One big change that was announced was S3 now has a strong read-after-write consistency. In order to model this consistency in S3, you usually had to manage this through application code or more complex infrastructure build-outs. Moving to this new model will be very beneficial during data migrations to the cloud, due to the fact that most on-prem applications are usually written in a way expecting a strong read-after-write consistency model. Another interesting new feature of S3, is that you can now target multiple destination buckets for replication operations. These buckets can also be either in the same or different regions from the replication source. Shared data sets, latency reduction for content retrieval, and fault tolerance are just some of the possibilities offered by this new replication feature.
  • EKS
    • AS Kubernetes continues to dominate as the container management platform, EKS - AWS's managed Kubernetes service continues to grow in popularity. It's a relatively new service, and as such, is evolving at a fast pace. Two new features announced this year that were interesting were enhancements to the EKS console that allows insights into various resources managed by your EKS clusters and built-in logging support for containers using AWS's Fargate compute platform. The former allows for far more insight and observability into EKS clusters and enables the ability to visualize many of your EKS resources and make troubleshooting your clusters much easier. The latter allows logging functionality that would typically be provided by additional software (usually installed as a sidecar) to be provided right out of the box. This eliminates much of the complexity in configuring, installing, deploying, and maintaining further additional software assets.
  • SageMaker
    • SageMaker is AWS's general-purpose, managed machine learning platform. AS machine learning has been a very hot topic over the last several years, SageMaker has become an incredibly popular way to incorporate ML functionality into greenfield or existing solutions (whether or not hosted on AWS). As ML is becoming a competitive advantage for many enterprises (and start-ups), AWS is constantly working on making it better. SageMaker JumpStart, newly announced at reInvent, is a set of pre-built machine learning models that provide common solutions to many typical problems seen by many enterprises such as fraud detection, demand forecasting, or image classification. SageMaker JumpStart allows almost any organization to start immediately incorporating ML into their organization with almost no additional overhead. Another really interesting SageMaker announcement was the introduction of SageMake Data Wrangler. Preparing and transforming data for machine learning algorithm ingestion can be a time-consuming, complex, and difficult process. Normalization, scrubbing, missing values, and data validation are just some of the typical steps that go into preparing data. SageMaker Data Wrangler automates this entire process and can reduce the time needed for these activities to almost nothing. It's very robust with over 300 different out-of-the-box transformations available and ready to use and can work with data in an incredibly wide variety of data sources such as Redshift, S3, Athena, Lake Formation, etc.
  • Lake Formation and Glue
    • Lake Formation and Glue are two data tools from AWS that help with the creation and management of Data Lakes on AWS. Lake Formation automates much of the necessary work in an initial data lake setup that is usually handled by complex ETL pipelines and processes. Lake Formation also handles many of the necessary steps to make sure that your data lake is compliant with typical data governance policies - such as auditing access, data scrubbing, and managing access control settings. Row-level security and transactions are two new features announced that will go an incredibly long way in allowing Lake Formation to cover additional data governance scenarios and should make Lake Formation much more viable for enterprises looking for a very low overhead tool to automate, create, maintain and manage an organizational or enterprise-wide data lake. Glue is AWS-managed, serverless ETL tool, and can also help create and manage data lakes on AWS - it also integrates nicely with the above-mentioned Lake Formation. Glue Elastic Views is a new feature that allows AWS Glue to build a materialized view across various data sources such as S3, DynamoDB, Redshift, just using SQL with no custom coding needed. Glue will grab all the data to create a virtual table with it, and store it in the data store of your choice. It will then monitor all of the incorporated data sources and update the materialized view - ensuring your view is always up to date.

While we couldn't include every new service and feature announced by AWS this year, the above is just a sampling of a few that we think anybody working in the AWS ecosystem could greatly benefit from. Many of these also align with where we can add more value to our clients. With almost 150 new announcements made this year, we urge you to check out all of them and see if there are any that may be able to help add value to your organization.

All AWS reInvent announcements:

AWS Lambda now supports up to 10GB of memory and 6 vCPU cores for Lambda functions

AWS Lambda now supports container images as a packaging format

Automatic Table Optimization

Support for native JSON and semi-structured data processing

Redshift ML

SageMaker JumpStart

SageMaker Data Wrangler

Glue Elastic Views

Transactions and Row-level security for AWS Lake Formation

S3 now has strong read after write consistency

S3 now has support for multiple destinations in the same or different regions replication

EKS now includes Kubernetes Resource information in the AWS Console

EKS now includes built-in logging support for AWS Fargate

Introducing the JBS Quick Launch Lab!

FREE 1/2 Day Assessment

Quantify what it will take to implement your next big idea! Our intensive 1/2 day session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best, and all for FREE. Let JBS show you why over 20 years of experience matters.
Yes, I'd Like A FREE Assessment