Although technology has seen an explosion of innovation, the PDF format still dominates when you need to replicate paper in the digital sphere. Any enterprise-scale website project will eventually require you to provide dynamic information in PDF format. There are services that offer to turn your HTML into PDFs, but if you don't want another bill and the risk of outsourcing vital functionality, you can set up the tools for creating PDFs from HTML yourself.
Some of the most popular tools for quickly turning a website into a PDF are WeasyPrint and wkhtmltopdf. These are available for free, but setting them up can be difficult depending on your server operating system and language stack. For example, if you need the latest version of WeasyPrint but are running an older version of Python for your web application, you're looking at a significant software update or running multiple Python environments.
If you are deploying to many servers, you'll need to install these libraries and their dependencies across your entire fleet. Alternatively, you could install the PDF generating program on a single server with all of the correct dependencies, but that means you have a server running only to make PDFs. If you generate PDFs every few seconds that's probably a reasonable decision, but if generating a PDF-only happens once a day, such as when a new product is added, this is a waste of resources.
If you want a service to generate PDFs that are under your control, available any time, and only running when you need it, AWS Lambda is the obvious solution. The only question is, how do you install the dependencies you need to generate a PDF on Lambda? Fortunately, the hard work has already been done for you.
The Cloud Print Utils project has taken the time to package the tools you need into one convenient bundle. All you need is
docker. Let's walk through the steps to making your own working PDF generator on AWS Lambda.
Step One - Clone the project from GitHub
git clone https://github.com/kotify/cloud-print-utils.git
Step Two - Build the layer
This will download a Lambda Docker image and install the libraries you need for generating PDFs, then save that to a zip file you can use as a layer in Lambda. (You may need to use
sudo if your user isn't part of the
Step Three - Add your layer file to your Lambda account
The layer file, named
weasyprint-layer-python3.8.zip will be stored in the
If you have AWS CLI installed, you can upload the layer with the following command:
aws lambda publish-layer-version --region <region> --layer-name <name> --zip-file fileb://build/weasyprint-layer-python3.8.zip
Don't forget to replace
<region> with your region identifier and
<name> with a memorable name such as
If you are not using AWS CLI, you can also upload your layer through the AWS website. Go to your AWS Lambda console and select "Layers" in the left sidebar. Click the "Create layer" button in the top right, and upload your zip file.
Step Four - Configure your Lambda function
From the Lambda console on the AWS website, click "Functions" in the left sidebar, then click the "Create function" button.
Set whatever name you like, but make sure you select the Python 3.8 runtime, to match the layer you created. Python 3.8 is the newest Python runtime offered in Lamba at this time. Click "Create function".
Once your function is generated, you will need to link it to the layer you uploaded. Click "Layers" under your function name, which will open the Layers section below. Click the "Add a layer" button, and then select the "Custom layers" group. You can find the layer you uploaded in the "Custom layers" select box. You should only have one version of your layers, so select that, and click the "Add" button.
Then, you need to set a few environment variables for your function. Scroll down to the "Environment variables" section of the page and click "Manage environment variables". You need to add three environment variable in order for the layer to function:
- GDK_PIXBUF_MODULE_FILE: /opt/lib/loaders.cache
- FONTCONFIG_PATH: /opt/fonts
- XDG_DATA_DIRS: /opt/lib
Once you've added them, hit "Save".
Step Five - Make your function accessible
You need to be able to trigger your function to use it. If you plan to access it across the internet, an API Gateway should be set up. Click on the "Add trigger" button and select "API Gateway". Configure your gateway and click "Add". This will add a gateway to your project. Click on it to find your function's URL. You'll need that later.
Step Six - Write the function
The Cloud Print Utils projects provide a ready-made lambda function for you. You can view it in the repository you cloned at
weasyprint/lambda_function.py. It supports writing its output to S3 or returning it, as well as generating PDFs and PNGs from raw HTML or links. Copy the contents of that Python file into AWS's Lambda editor. Then click "Deploy" above the function editor.
Step Seven - Generate your PDF
Now, all you need to do is generate a POST request to your lambda function to turn HTML into a PDF. The default function accepts the following arguments:
filename- REQUIRED - The name of the file that will be returned or stored
url- The URL of an HTML resource to be used as the source
html- Raw HTML to be used as the source. If used with
url, this will be ignored
return- Sends the PDF as a base64 encoded response if set to 'base64'. If the return is not set or has any other value, the file will be saved to S3, provided you've set a
Supporting PDFs doesn't have to take a week. Most of the work has already been done for you. All you need to do is customize your function for your specific needs, and let AWS Lambda take care of the rest.