How to Rasterize PDF Files with Python, Tutorial

Share this page

Why Rasterize PDF Files with Python?

The pdfRest Rasterize PDF API Tool is a powerful utility designed to convert PDF documents into rasterized images. This tutorial will guide you through the process of sending an API call to the Rasterize PDF endpoint using Python, enabling you to automate this task within your applications or workflows. By utilizing this tool, you can easily transform vector-based PDFs into raster images, which can be beneficial for various use cases.

You might need to rasterize a PDF to ensure that all elements, such as fonts and vector graphics, are flattened into a single image layer. This can be particularly useful for ensuring consistent rendering across different devices and platforms, or for preparing a document for printing or archiving where vector elements might otherwise be lost or altered.

Rasterize PDF with Python Code Example

from requests_toolbelt import MultipartEncoder
import requests
import json

rasterize_endpoint_url = 'https://api.pdfrest.com/rasterized-pdf'

# The /rasterized endpoint can take a single PDF file or id as input and turn it into a rasterized PDF file.
mp_encoder_rasterize = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'output' : 'example_rasterize_out',
    }
)

# Let's set the headers that the rasterized-pdf endpoint expects.
# Since MultipartEncoder is used, the 'Content-Type' header gets set to 'multipart/form-data' via the content_type attribute below.
headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_rasterize.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here
}

print("Sending POST request to rasterized-pdf endpoint...")
response = requests.post(rasterize_endpoint_url, data=mp_encoder_rasterize, headers=headers)

print("Response status code: " + str(response.status_code))

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

# If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.

Source: GitHub

Breaking Down the Code

The code begins by importing necessary libraries: MultipartEncoder from requests_toolbelt, requests, and json. These are used for handling multipart form data, making HTTP requests, and parsing JSON responses, respectively.

rasterize_endpoint_url = 'https://api.pdfrest.com/rasterized-pdf'

This line defines the endpoint URL for the Rasterize PDF API.

mp_encoder_rasterize = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'output' : 'example_rasterize_out',
    }
)

The MultipartEncoder is used to encode the file and other data into a format suitable for a multipart form submission. The fields dictionary includes:

file: The PDF file to be rasterized, specified with its filename, opened in binary read mode, and its MIME type.
output: The desired output file name for the rasterized PDF.

headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_rasterize.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here
}

The headers dictionary sets the necessary HTTP headers, including:

Accept: Specifies that the client expects a JSON response.
Content-Type: Automatically set to multipart/form-data by the MultipartEncoder.
Api-Key: Your unique API key for authentication.

response = requests.post(rasterize_endpoint_url, data=mp_encoder_rasterize, headers=headers)

This line sends a POST request to the Rasterize PDF endpoint using the URL, encoded data, and headers.

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

Finally, the response is checked. If successful, the JSON response is printed; otherwise, the error text is displayed.

Beyond the Tutorial

In this tutorial, you learned how to use Python to send an API call to the pdfRest Rasterize PDF endpoint, which converts each page of a PDF into a rasterized image. This process can be integrated into larger applications or automated workflows. To explore more functionalities, try out the pdfRest API Tools in the API Lab and refer to the API Reference Guide for detailed documentation.

Note: This example uses a multipart API call. For code samples using JSON payloads, visit here.