How to Crop PDF with Python

Learn how to use Set Page Boxes API from pdfRest with Python to crop PDF files by setting the CropBox.
Share this page

Why Crop PDF Files with Python?

The pdfRest Set Page Boxes API Tool is a powerful feature that allows you to modify the page boxes of a PDF document, specifically the CropBox, to control the visible content area. This tutorial will guide you through the process of sending an API call to the Set Page Boxes endpoint using Python to crop PDF files.

Imagine you have a PDF document with excessive white space around the content, or perhaps you only want to focus on a specific region of a page. By using the Set Page Boxes API to adjust the CropBox, you can effectively crop the PDF to remove these unwanted areas and highlight the essential information for better viewing or printing.

Crop PDF with Python Code Example

from requests_toolbelt import MultipartEncoder
import requests
import json

pdf_with_page_boxes_endpoint_url = 'https://api.pdfrest.com/pdf-with-page-boxes-set'

# Define the CropBox settings to crop the first page
crop_box_options = {
    "boxes": [
        {
            "box": "crop",
            "pages": [
                {
                    "range": "1",
                    "left": 50,  # Adjust these values to your desired crop
                    "top": 50,
                    "bottom": 50,
                    "right": 50
                }
            ]
        }
    ]
}

mp_encoder_setBoxesPDF = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/your/file.pdf', 'rb'), 'application/pdf'), # Replace with your file path
        'boxes': json.dumps(crop_box_options),
        'output' : 'cropped_example_out'
    }
)

headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_setBoxesPDF.content_type,
    'Api-Key': 'YOUR_API_KEY' # Replace with your actual API key
}

print("Sending POST request to pdf-with-page-boxes-set endpoint to crop PDF...")
response = requests.post(pdf_with_page_boxes_endpoint_url, data=mp_encoder_setBoxesPDF, headers=headers)

print("Response status code: " + str(response.status_code))

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

Source: GitHub (Note: This example has been modified to focus on cropping)

Breaking Down the Code for Cropping

The code starts by importing the necessary libraries for making API calls and handling data.

pdf_with_page_boxes_endpoint_url = 'https://api.pdfrest.com/pdf-with-page-boxes-set'

This line defines the API endpoint URL for setting page boxes.

crop_box_options = {
    "boxes": [
        {
            "box": "crop",
            "pages": [
                {
                    "range": "1",
                    "left": 50,  # Adjust these values to your desired crop
                    "top": 50,
                    "bottom": 50,
                    "right": 50
                }
            ]
        }
    ]
}

The crop_box_options dictionary is specifically configured to crop the PDF. Notice that the "box" key is set to "crop". The "pages" array specifies that this cropping will be applied to the first page ("range": "1"), and the "left", "top", "bottom", and "right" values define the new boundaries of the visible content area. Adjust these margin values (in points) to achieve your desired cropping effect.

mp_encoder_setBoxesPDF = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/your/file.pdf', 'rb'), 'application/pdf'), # Replace with your file path
        'boxes': json.dumps(crop_box_options),
        'output' : 'cropped_example_out'
    }
)

Here, we create a MultipartEncoder to send the PDF file and the crop_box_options as multipart form data. Make sure to replace '/path/to/your/file.pdf' with the actual path to your PDF file.

headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_setBoxesPDF.content_type,
    'Api-Key': 'YOUR_API_KEY' # Replace with your actual API key
}

The headers dictionary includes the necessary information for the API request, including your unique API key. Remember to replace 'YOUR_API_KEY' with your actual pdfRest API key.

response = requests.post(pdf_with_page_boxes_endpoint_url, data=mp_encoder_setBoxesPDF, headers=headers)

This line sends the POST request to the pdfRest API endpoint to crop the PDF using the specified CropBox settings.

Next Steps for Cropping PDFs

This tutorial demonstrated how to use Python and the pdfRest Set Page Boxes API to crop PDF files by setting the CropBox. You can further explore the crop_box_options to apply different cropping settings to various page ranges within your PDF. Experiment with different left, top, bottom, and right values to achieve the precise cropping you need.

To learn more about the Set Page Boxes API and its capabilities, including setting other page boxes, you can demo all of the pdfRest API Tools in the API Lab and refer to the API Reference Guide for detailed documentation.

Note: This example uses a multipart API call. JSON payload examples for setting page boxes, including the CropBox, can be found at GitHub.

Generate a self-service API Key now!
Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.