Amazon’s DynamoDB is a fantastic NoSQL solution. It’s easy to get started with, but as your data sets grow, so does the emphasis on writing performant queries. In this article, we will go over learnings that have allowed us to create high-performing applications using DynamoDB.
Querying large datasets
DynamoDB offers several functions to retrieve datasets. These methods appear interchangeable, but there are distinct differences that have drastic impacts on performance.
The scan operation is a convenient method to retrieve items from your table. It’s important to note that this convenience comes with a cost. Under the hood, a scan will retrieve every item from the table (up to a maximum of 1 MB of data). This will hamper performance, especially for larger tables.
A performant alternative to scan is the query operation. Query allows you to retrieve only relevant items. Query allows you to easily filter on your partition and/or sort key. For filtering on other attributes, solutions are provided below.
A common method of filtering in DynamoDB is to use filter expressions. On the surface, filter expressions appear similar to using the query method with the partition or sort key. They both allow you to pass in attributes to filter down the result set. The similarities stop there. Unlike a query, filter expressions execute after the query has been run. This distinction is noted in the DynamoDB docs, but it’s easy to overlook.
In use cases where the performance hit of post-query filtering isn’t sufficient, an index can be the ideal solution. An index allows you to define a partition and sort key which can be used to perform a query filter.
For example, let’s use a product table that has a product ID as its partition key. Each product has its own category and subcategory attribute. If we want to show a product list for a category or subcategory we could use a filter expression. This solution may work for the short term, but as the product list grows, it will be an inefficient way to retrieve a product list.
A more scalable approach would be to create indexes with the category and subcategory as the keys. With this approach, you can query the products on those keys and return the results in a performant manner.
There are many scenarios where you may consider paging your data. An example of this would be a listview where showing the full data set isn’t practical. For this scenario, you can query DynamoDB with a limit parameter. This will return up to the number of items provided as the limit. If the number of items in the query response exceeds the limit, DynamoDB will pass back a value for the
This value serves as a pointer to the last item in the list. You can use this value to pass to the next query as the
ExclusiveStartKey. This will maximize efficiency by retrieving the next batch of items starting at the item after the last item pulled.
DynamoDB is a powerful NoSQL solution and when used correctly can produce fantastic results. I hope this guide has been helpful in providing solutions to common performance challenges when using DynamoDB.