How to Convert PDF to Excel with Python
Why Convert PDF to Excel with Python?
The pdfRest PDF to Excel API Tool is a powerful resource for developers who need to convert PDF documents into Excel spreadsheets programmatically. This tutorial will guide you through the process of sending an API call to the PDF to Excel API using Python.
There are many reasons to convert PDF to Excel. One example is extracting tabular data from PDF reports to perform data analysis or manipulation within an Excel spreadsheet, which is more suitable for such tasks.
PDF to Excel with Python Code Example
from requests_toolbelt import MultipartEncoder import requests import json excel_endpoint_url = 'https://api.pdfrest.com/excel' # The /excel endpoint can take a single PDF file or id as input. # This sample demonstrates converting a PDF to an Excel document. mp_encoder_excel = MultipartEncoder( fields={ 'file': ('file_name', open('/path/to/file', 'rb'), 'application/pdf'), 'output' : 'example_excel_out', } ) # Let's set the headers that the Excel endpoint expects. # Since MultipartEncoder is used, the 'Content-Type' header gets set to 'multipart/form-data' via the content_type attribute below. headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_excel.content_type, 'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here } print("Sending POST request to excel endpoint...") response = requests.post(excel_endpoint_url, data=mp_encoder_excel, headers=headers) print("Response status code: " + str(response.status_code)) if response.ok: response_json = response.json() print(json.dumps(response_json, indent = 2)) else: print(response.text) # If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.
Source code reference: pdf-rest-api-samples
Breaking Down the Python Code
The code above demonstrates how to use the pdfRest API to convert a PDF file to an Excel spreadsheet using Python:
MultipartEncoder
is used to encode the PDF file and the desired output name as multipart form-data for the POST request.- The
fields
dictionary contains the file to be converted and the desired output file name. The 'file' key expects a tuple with the filename, file object, and content type. - The
headers
dictionary sets the required headers, including the 'Accept' header specifying the response format, the 'Content-Type' header set by theMultipartEncoder
, and the 'Api-Key' header for authentication. - A POST request is sent to the
excel_endpoint_url
with the encoded data and headers. - The response is checked for success (
response.ok
), and the JSON response is printed. If the request fails, the error text is printed instead.
Beyond the Tutorial
In this tutorial, we accomplished sending an API request to convert a PDF to an Excel spreadsheet using the pdfRest API. This is particularly useful for automating the extraction of data from PDFs into a more flexible format like Excel. Users are encouraged to demo all of the pdfRest API Tools in the API Lab and refer to the API Reference documentation for further exploration.
Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at pdf-rest-api-samples.