How to Summarize PDF Text with Python

Learn how to use pdfRest Summarize PDF API to generate summaries of PDF text content with Python.
Share this page

Why Summarize PDF with Python?

The pdfRest Summarize PDF API Tool provides a powerful way to condense the text content of PDF documents into shorter summaries. This tutorial will guide you through the process of sending an API call to the Summarize PDF endpoint using Python. By leveraging this tool, users can efficiently extract key information from lengthy PDFs, making it easier to digest large amounts of data quickly.

Imagine a scenario where a researcher needs to review hundreds of academic papers. Instead of reading each paper in its entirety, the researcher can use the Summarize PDF API to generate concise summaries, saving time and effort. This tool is particularly useful in fields such as academia, legal, and business, where large volumes of text need to be processed and understood swiftly.

Summarize PDF with Python Code Example

from requests_toolbelt import MultipartEncoder
import requests
import json

# By default, we use the US-based API service. This is the primary endpoint for global use.
api_url = "https://api.pdfrest.com"

# For GDPR compliance and enhanced performance for European users, you can switch to the EU-based service by uncommenting the URL below.
# For more information visit https://pdfrest.com/pricing#how-do-eu-gdpr-api-calls-work
#api_url = "https://eu-api.pdfrest.com"

endpoint_url = api_url+'/summarized-pdf-text'

# The endpoint can take a single PDF file or id as input.
mp_encoder = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'target_word_count': '100',
    }
)

headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
}

print("Sending POST request to summarized-pdf-text endpoint...")
response = requests.post(endpoint_url, data=mp_encoder, headers=headers)

print("Response status code: " + str(response.status_code))

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent=2))
else:
    print(response.text)

Source: GitHub

Breaking Down the Code

The code begins by importing necessary libraries: requests_toolbelt for handling multipart form data, requests for making HTTP requests, and json for handling JSON data.

api_url = "https://api.pdfrest.com"

This line sets the base URL for the API. By default, it uses the US-based service. An alternative EU-based URL is provided for GDPR compliance.

endpoint_url = api_url+'/summarized-pdf-text'

The endpoint_url is constructed by appending the specific endpoint path to the base API URL.

mp_encoder = MultipartEncoder(
    fields={
        'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'),
        'target_word_count': '100',
    }
)

This snippet sets up the multipart form data. The fields dictionary includes:

  • file: A tuple containing the filename, file object, and MIME type.
  • target_word_count: Specifies the desired word count for the summary.
headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
}

The headers dictionary includes:

  • Accept: Specifies that the response should be in JSON format.
  • Content-Type: Automatically set to the correct multipart form data type by mp_encoder.
  • Api-Key: A placeholder for the user's API key, which is required for authentication.
response = requests.post(endpoint_url, data=mp_encoder, headers=headers)

This line sends the POST request to the API endpoint with the specified data and headers.

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent=2))
else:
    print(response.text)

The response is checked for success. If successful, the JSON response is printed; otherwise, the error message is displayed.

Beyond the Tutorial

In this tutorial, you learned how to use Python to send a request to the pdfRest Summarize PDF API, allowing you to generate summaries of PDF documents. This is just one of the many tools available in the pdfRest API suite. To explore more, try out the various API Tools in the API Lab. For detailed information on all available endpoints, refer to the API Reference Guide.

Note: This example demonstrates a multipart API call. For code samples using JSON payloads, visit GitHub.

Generate a self-service API Key now!
Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.