How to Linearize PDF with Python, Tutorial

Share this page

Why Linearize PDF with Python?

The pdfRest Linearize PDF API Tool is designed to optimize PDF files for faster web viewing. Linearization, also known as "web optimization" or "fast web view," restructures a PDF document so that its pages can be displayed one by one as they are downloaded, rather than waiting for the entire file to be downloaded. This is particularly useful for large PDFs that need to be viewed online, where users can start reading the PDF before the entire document has finished downloading.

In this tutorial, we will demonstrate how to send an API call to the Linearize PDF endpoint using Python. This can be beneficial for developers who want to integrate PDF linearization into their web applications or services to enhance user experience.

Python Code for Linearization

from requests_toolbelt import MultipartEncoder
import requests
import json

linearized_pdf_endpoint_url = 'https://api.pdfrest.com/linearized-pdf'

# The /linearized-pdf endpoint can take a single PDF file or id as input.
# This sample demonstrates linearizing a PDF file.
mp_encoder_linearizedPdf = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'output' : 'example_linearizedPdf_out',
    }
)

# Let's set the headers that the linearized-pdf endpoint expects.
# Since MultipartEncoder is used, the 'Content-Type' header gets set to 'multipart/form-data' via the content_type attribute below.
headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_linearizedPdf.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here
}

print("Sending POST request to linearized-pdf endpoint...")
response = requests.post(linearized_pdf_endpoint_url, data=mp_encoder_linearizedPdf, headers=headers)

print("Response status code: " + str(response.status_code))

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

# If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.

Source: pdf-rest-api-samples on GitHub

Breaking Down the Python

The code block above demonstrates how to make a multipart/form-data POST request to the pdfRest Linearize PDF API endpoint using Python. Let's break down the key parts of the code:

from requests_toolbelt import MultipartEncoder
import requests
import json

This imports the necessary modules: MultipartEncoder for creating multipart-encoded files, requests for making HTTP requests, and json for handling JSON data.

mp_encoder_linearizedPdf = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'output' : 'example_linearizedPdf_out',
    }
)

The MultipartEncoder object is created with a dictionary specifying the fields to be sent. The 'file' field contains a tuple with the file name, the file object opened in binary read mode, and the MIME type. The 'output' field specifies the name of the output file.

headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_linearizedPdf.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
}

Headers are set for the request, including the 'Accept' header indicating the response format, the 'Content-Type' header automatically set by the MultipartEncoder, and the 'Api-Key' header where you must insert your own API key.

response = requests.post(linearized_pdf_endpoint_url, data=mp_encoder_linearizedPdf, headers=headers)

A POST request is sent to the linearized-pdf endpoint with the encoded data and headers. The response is stored in the response variable.

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

If the request is successful, the JSON response is printed formatted. If not, the error text is printed.

So Much More with pdfRest

We have covered how to use Python to make an API call to the pdfRest Linearize PDF endpoint, including how to prepare the data, set the headers, and handle the response. This allows developers to easily integrate PDF linearization into their applications for a better user experience when viewing PDFs online.

I encourage you to demo all of the pdfRest API Tools in the API Lab and refer to the API Reference documentation for more information.

Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at pdf-rest-api-samples on GitHub.