How to Convert a PDF to Excel with Python, Tutorial

Share this page

Why Convert PDF to Excel with Python?

The pdfRest PDF to Excel API Tool is a powerful resource for developers who need to convert PDF documents into Excel spreadsheets programmatically. This tutorial will guide you through the process of sending an API call to the PDF to Excel API using Python.

There are many reasons to convert PDF to Excel. One example is extracting tabular data from PDF reports to perform data analysis or manipulation within an Excel spreadsheet, which is more suitable for such tasks.

PDF to Excel with Python Code Example

from requests_toolbelt import MultipartEncoder
import requests
import json

excel_endpoint_url = 'https://api.pdfrest.com/excel'

# The /excel endpoint can take a single PDF file or id as input.
# This sample demonstrates converting a PDF to an Excel document.
mp_encoder_excel = MultipartEncoder(
    fields={
        'file': ('file_name', open('/path/to/file', 'rb'), 'application/pdf'),
        'output' : 'example_excel_out',
    }
)

# Let's set the headers that the Excel endpoint expects.
# Since MultipartEncoder is used, the 'Content-Type' header gets set to 'multipart/form-data' via the content_type attribute below.
headers = {
    'Accept': 'application/json',
    'Content-Type': mp_encoder_excel.content_type,
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here
}

print("Sending POST request to excel endpoint...")
response = requests.post(excel_endpoint_url, data=mp_encoder_excel, headers=headers)

print("Response status code: " + str(response.status_code))

if response.ok:
    response_json = response.json()
    print(json.dumps(response_json, indent = 2))
else:
    print(response.text)

# If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.

Source code reference: pdf-rest-api-samples

Breaking Down the Python Code

The code above demonstrates how to use the pdfRest API to convert a PDF file to an Excel spreadsheet using Python:

MultipartEncoder is used to encode the PDF file and the desired output name as multipart form-data for the POST request.
The fields dictionary contains the file to be converted and the desired output file name. The 'file' key expects a tuple with the filename, file object, and content type.
The headers dictionary sets the required headers, including the 'Accept' header specifying the response format, the 'Content-Type' header set by the MultipartEncoder, and the 'Api-Key' header for authentication.
A POST request is sent to the excel_endpoint_url with the encoded data and headers.
The response is checked for success (response.ok), and the JSON response is printed. If the request fails, the error text is printed instead.

Beyond the Tutorial

In this tutorial, we accomplished sending an API request to convert a PDF to an Excel spreadsheet using the pdfRest API. This is particularly useful for automating the extraction of data from PDFs into a more flexible format like Excel. Users are encouraged to demo all of the pdfRest API Tools in the API Lab and refer to the API Reference documentation for further exploration.

Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at pdf-rest-api-samples.

How to Convert PDF to Excel with Python

Why Convert PDF to Excel with Python?

PDF to Excel with Python Code Example

Breaking Down the Python Code

Beyond the Tutorial