The Extract Text tool logo.

Extract Text

Extract Text is a REST API tool that extracts all text from PDF documents, optionally including style and position information.

A bubbling flask with code brackets inside
Try Now with API Lab

Start right from your browser - upload files, choose parameters, generate code, and send API Calls directly from API Lab!  

to receive your free API Key.
Parameters
Required Parameters
POST
/extracted-text
curl -X POST "https://api.pdfrest.com/extracted-text" \ 
  -H "Accept: application/json" \ 
  -H "Content-Type: multipart/form-data" \
  -H "Api-Key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
  
Response
The response for your API Call will display here.
Once you've sent your POST request and received a valid response, you can download your output file using the output URL.

Why is pdfRest the best API to extract text from PDF?

pdfRest offers the best solution for extracting text from PDF documents, because it preserves positional data, includes text style information, and taps into data.

Preserve Positional Data

Unlike most PDF text extraction tools, Extract Text by pdfRest can optionally include page and coordinate metadata for each word extracted from the PDF in easy-to-parse JSON format. Simply turn on the word_coordinates parameter.

This data is essential if you're aiming to preserve the position of text in a different file format or create a PDF viewer with searchable and selectable text. If you don't need this extra information, it's just as easy to turn off.

Include Text Style Info

Turn on the word_style option to include detailed style information about each word extracted from the PDF, including font, size, color, and the color space.

This optional metadata supports use cases that require preserving the original document's same appearance of text in another format or user interface. This can be combined with word_coordinates if you require both style and positional data about each word, or simply turned off when not needed.

Tap into Data

The world's collective archive of PDFs is estimated to contain over 2.5 trillion documents, representing an abundance of opportunity for discovering new sources of untapped data. Accessing and aggregating data from many documents can be challenging without the right tools for the job.

pdfRest Extract Text is just the tool you need to batch process or configure automated workflows to extract data from PDFs and facilitate easy database entry and integration with other services.
Start from Code Examples
  1. First, you'll need an API Key - to:
    • Stay anonymous with a Guest API Key for 10 free API Calls
    • Sign up for an upgraded API Key with unlimited, continuous service
  2. Choose your programming language
  3. Copy and paste the code to your project
  4. Update Api-Key field with your unique API Key
  5. Update file with the local path to your input
  6. Run this code to send an API Call
See more code examples in our
GitHub repository
Try pdfRest with just a few clicks
Download our Postman Postman Collection
Customize Your Solution
Word Style

The word_style parameter allows you to toggle whether or not to add a JSON-formatted list of each word in the document with style information for each word, including font, size, color, and color-space.


Accepts on and off and defaults to off

Generate a self-service API Key now!

Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.

Compare Plans
Contact Us