How to Convert PDF to PDF/A with Python
Why Use Convert to PDF/A with Python?
The pdfRest Convert to PDF/A API Tool is a powerful resource that allows users to convert PDF files into the PDF/A format, which is an ISO-standardized version of the PDF specialized for digital preservation of electronic documents. This tutorial will guide you through the process of sending an API call to Convert to PDF/A using Python.
This can be particularly useful for archiving documents in a way that preserves their visual appearance over time, ensuring that they can be reliably accessed and rendered in the future.
Python Code Sample for PDF/A
from requests_toolbelt import MultipartEncoder import requests import json pdfa_endpoint_url = 'https://api.pdfrest.com/pdfa' # The /pdfa endpoint can take a single PDF file or id as input. mp_encoder_pdfa = MultipartEncoder( fields={ 'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'), 'output_type': 'PDF/A-1b', 'rasterize_if_errors_encountered': 'on', 'output' : 'example_pdfa_out', } ) # Let's set the headers that the pdfa endpoint expects. # Since MultipartEncoder is used, the 'Content-Type' header gets set to 'multipart/form-data' via the content_type attribute below. headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_pdfa.content_type, 'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here } print("Sending POST request to pdfa endpoint...") response = requests.post(pdfa_endpoint_url, data=mp_encoder_pdfa, headers=headers) print("Response status code: " + str(response.status_code)) if response.ok: response_json = response.json() print(json.dumps(response_json, indent = 2)) else: print(response.text) # If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.
Source: pdf-rest-api-samples on GitHub
Breaking Down the Python
The provided code block demonstrates how to make a multipart/form-data POST request to the pdfRest API to convert a PDF to PDF/A format.
from requests_toolbelt import MultipartEncoder import requests import json
This imports the necessary modules. MultipartEncoder
from requests_toolbelt
is used to encode the multipart form data.
pdfa_endpoint_url = 'https://api.pdfrest.com/pdfa'
This sets the API endpoint URL for the PDF/A conversion.
mp_encoder_pdfa = MultipartEncoder( fields={ 'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'), 'output_type': 'PDF/A-1b', 'rasterize_if_errors_encountered': 'on', 'output' : 'example_pdfa_out', } )
Here we define the payload for the POST request. The fields include:
'file'
: The PDF file to convert. Replace'/path/to/file'
with the actual file path.'output_type'
: The type of PDF/A to convert to (e.g., PDF/A-1b).'rasterize_if_errors_encountered'
: If set to 'on', the service will rasterize the PDF if it encounters errors during conversion.'output'
: The desired name for the output file.
headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_pdfa.content_type, 'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' }
These are the headers for the request. Replace 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
with your actual API key.
response = requests.post(pdfa_endpoint_url, data=mp_encoder_pdfa, headers=headers)
This line sends the POST request with the encoded data and headers.
if response.ok: response_json = response.json() print(json.dumps(response_json, indent = 2)) else: print(response.text)
If the request is successful, the response is printed in a formatted JSON structure. If not, the error text is printed.
Beyond this Tutorial
We have now successfully made an API call to the pdfRest Convert to PDF/A endpoint using Python. This allows us to convert PDF documents to the archival PDF/A format programmatically. You can demo all of the pdfRest API Tools in the API Lab and refer to the API Reference documentation for more details.
Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at our GitHub Repository.