How to Convert PDF to Word with Python
Why Convert PDF to Word with Python?
The pdfRest PDF to Word API Tool allows users to convert PDF documents into editable Word documents. This can be particularly useful in scenarios where a user needs to extract text from a PDF for editing, repurposing content, or when needing to make updates to a document that only exists in PDF format.
In this tutorial, we will explore how to send an API call to the PDF to Word tool using Python.
Convert PDF to Word Python Code Example
from requests_toolbelt import MultipartEncoder import requests import json word_endpoint_url = 'https://api.pdfrest.com/word' # The /word endpoint can take a single PDF file or id as input. # This sample demonstrates converting a PDF to a Word document. mp_encoder_word = MultipartEncoder( fields={ 'file': ('file_name', open('/path/to/file', 'rb'), 'application/pdf'), 'output' : 'example_word_out', } ) # Let's set the headers that the word endpoint expects. # Since MultipartEncoder is used, the 'Content-Type' header gets set to 'multipart/form-data' via the content_type attribute below. headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_word.content_type, 'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here } print("Sending POST request to word endpoint...") response = requests.post(word_endpoint_url, data=mp_encoder_word, headers=headers) print("Response status code: " + str(response.status_code)) if response.ok: response_json = response.json() print(json.dumps(response_json, indent = 2)) else: print(response.text) # If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.py' sample.
The source of the provided code is available at GitHub.
Breaking Down the Code
The code begins by importing the necessary libraries:
from requests_toolbelt import MultipartEncoder import requests import json
The MultipartEncoder
is used for creating a multipart/form-data payload, which is required for file uploads. The requests
library is used to make HTTP requests, and json
is used for handling JSON data.
The API endpoint URL is defined:
word_endpoint_url = 'https://api.pdfrest.com/word'
The MultipartEncoder
is configured with the file to be uploaded and the desired output filename:
mp_encoder_word = MultipartEncoder( fields={ 'file': ('file_name', open('/path/to/file', 'rb'), 'application/pdf'), 'output' : 'example_word_out', } )
The 'file' field contains a tuple with the filename, file object, and content type. The 'output' field specifies the base name for the output file.
Headers are set to accept JSON responses and to include the API key:
headers = { 'Accept': 'application/json', 'Content-Type': mp_encoder_word.content_type, 'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' # place your api key here }
The API key should be replaced with a valid key provided by pdfRest.
The POST request is sent, and the response is handled:
response = requests.post(word_endpoint_url, data=mp_encoder_word, headers=headers) if response.ok: response_json = response.json() print(json.dumps(response_json, indent = 2)) else: print(response.text)
If the request is successful, the JSON response is printed; otherwise, the error message is displayed.
Beyond the Tutorial
In this tutorial, we have learned how to make an API call to the pdfRest PDF to Word API tool using Python. This allows for the conversion of PDF documents to Word format programmatically.
For further exploration and to demo all of the pdfRest API Tools, visit the API Lab. For more detailed information, refer to the API Reference documentation.
Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at GitHub.