How to Export PDF Form Data with Python
Why Export PDF Form Data with Python?
The pdfRest Export Form Data API Tool is designed to extract data from fillable forms within a PDF document. This functionality is particularly useful for automating the collection of form data for analysis or integration into other systems. For instance, an organization might use this API to extract survey responses or application form data without manual data entry.
In this tutorial, we will demonstrate how to send an API call to the Export Form Data endpoint using Python. We will walk through the code to understand how it works and how to set up the necessary parameters for the API request.
Python Code Sample for Form Export
from requests_toolbelt import MultipartEncoder import requests import json exported_form_data_endpoint_url = 'https://api.pdfrest.com/exported-form-data' mp_encoder_exportedFormData = MultipartEncoder( fields={ 'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'), 'output' : 'example_exportedFormData_out', 'data_format': 'xml', } ) headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_exportedFormData.content_type, 'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' } print("Sending POST request to exported-form-data endpoint...") response = requests.post(exported_form_data_endpoint_url, data=mp_encoder_exportedFormData, headers=headers) print("Response status code: " + str(response.status_code)) if response.ok: response_json = response.json() print(json.dumps(response_json, indent = 2)) else: print(response.text)
The source of the provided code is available at GitHub.
Breaking Down the Python
The code snippet above demonstrates how to call the Export Form Data API endpoint:
from requests_toolbelt import MultipartEncoder import requests import json
This imports the necessary libraries: MultipartEncoder
for creating multipart/form-data payloads, requests
for making HTTP requests, and json
for handling JSON data.
exported_form_data_endpoint_url = 'https://api.pdfrest.com/exported-form-data'
This sets the URL of the Export Form Data API endpoint.
mp_encoder_exportedFormData = MultipartEncoder( fields={ 'file': ('file_name.pdf', open('/path/to/file', 'rb'), 'application/pdf'), 'output' : 'example_exportedFormData_out', 'data_format': 'xml', } )
This creates a MultipartEncoder
object with the fields required by the API:
file
: The PDF file to be processed. It should be opened in binary read mode.output
: The desired name for the output file.data_format
: The format in which to export the form data, which can be 'xml', 'json', or 'csv'.
headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_exportedFormData.content_type, 'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' }
These are the headers for the HTTP request, including the API key for authentication.
response = requests.post(exported_form_data_endpoint_url, data=mp_encoder_exportedFormData, headers=headers)
This sends a POST request to the API endpoint with the multipart form data and headers.
if response.ok: response_json = response.json() print(json.dumps(response_json, indent = 2)) else: print(response.text)
If the request is successful, it prints the JSON response. Otherwise, it prints the error text.
Next Steps with pdfRest
We've gone through the process of setting up and making an API call to the pdfRest Export Form Data endpoint using Python. By following this tutorial, you should be able to extract form data from PDFs programmatically.
Feel free to demo all of the pdfRest API Tools in the API Lab at https://pdfrest.com/apilab/ and refer to the API Reference documentation at https://pdfrest.com/documentation/.
Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at GitHub.