How to Convert PDF to PDF/A with Python

Learn how to make a document PDF/A compliant using Python to call the Convert to PDF/A API Tool from pdfRest.
Share this page

Why Use Convert to PDF/A with Python?

The pdfRest Convert to PDF/A API Tool is a powerful resource that allows users to convert PDF files into the PDF/A format, which is an ISO-standardized version of the PDF specialized for digital preservation of electronic documents. This tutorial will guide you through the process of sending an API call to Convert to PDF/A using Python.

This can be particularly useful for archiving documents in a way that preserves their visual appearance over time, ensuring that they can be reliably accessed and rendered in the future.

Python Code Sample for PDF/A

from requests_toolbelt import MultipartEncoder
import requests
import json

pdfa_endpoint_url = 'https://api.pdfrest.com/pdfa'

# The /pdfa endpoint can take a single PDF file or id as input.
mp_encoder_pdfa = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'output_type': 'PDF/A-1b',
        'rasterize_if_errors_encountered': 'on',
        'output' : 'example_pdfa_out',
    }
)

# Let's set the headers that the pdfa endpoint expects.
# Since MultipartEncoder is used, the 'Content-Type' header gets set to 'multipart/form-data' via the content_type attribute below.
headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_pdfa.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here
}

print("Sending POST request to pdfa endpoint...")
response = requests.post(pdfa_endpoint_url, data=mp_encoder_pdfa, headers=headers)

print("Response status code: " + str(response.status_code))

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

# If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.

Source: pdf-rest-api-samples on GitHub

Breaking Down the Python

The provided code block demonstrates how to make a multipart/form-data POST request to the pdfRest API to convert a PDF to PDF/A format.

from requests_toolbelt import MultipartEncoder
import requests
import json

This imports the necessary modules. MultipartEncoder from requests_toolbelt is used to encode the multipart form data.

pdfa_endpoint_url = 'https://api.pdfrest.com/pdfa'

This sets the API endpoint URL for the PDF/A conversion.

mp_encoder_pdfa = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'output_type': 'PDF/A-1b',
        'rasterize_if_errors_encountered': 'on',
        'output' : 'example_pdfa_out',
    }
)

Here we define the payload for the POST request. The fields include:

  • 'file': The PDF file to convert. Replace '/path/to/file' with the actual file path.
  • 'output_type': The type of PDF/A to convert to (e.g., PDF/A-1b).
  • 'rasterize_if_errors_encountered': If set to 'on', the service will rasterize the PDF if it encounters errors during conversion.
  • 'output': The desired name for the output file.
headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_pdfa.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
}

These are the headers for the request. Replace 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with your actual API key.

response = requests.post(pdfa_endpoint_url, data=mp_encoder_pdfa, headers=headers)

This line sends the POST request with the encoded data and headers.

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

If the request is successful, the response is printed in a formatted JSON structure. If not, the error text is printed.

Beyond this Tutorial

We have now successfully made an API call to the pdfRest Convert to PDF/A endpoint using Python. This allows us to convert PDF documents to the archival PDF/A format programmatically. You can demo all of the pdfRest API Tools in the API Lab and refer to the API Reference documentation for more details.

Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at our GitHub Repository.

Generate a self-service API Key now!
Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.