API Tools
Extract Text

Extract Text

Extract Text is a REST API tool that extracts all text from PDF documents, optionally including style and position information.

A bubbling flask with code brackets inside

Try Now with API Lab

Start right from your browser - upload files, choose parameters, generate code, and send API Calls directly from API Lab!

to receive your free API Key.

Parameters

Api-Key

Required Parameters

POST

/extracted-text

curl -X POST "https://api.pdfrest.com/extracted-text" \ 
  -H "Accept: application/json" \ 
  -H "Content-Type: multipart/form-data" \
  -H "Api-Key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
  curl -X POST "https://api.pdfrest.com/extracted-text" \ 
  -H "Accept: application/json" \ 
  -H "Content-Type: multipart/form-data" \
  -H "Api-Key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
  

Response

The response for your API Call will display here.The response for your API Call will display here.

Once you've sent your POST request and received a valid response, you can download your output file using the output URL.

Build your solution

You have document processing problems, we have Solutions. Explore the many ways pdfRest can align your documents with your business objectives.

Browse all Solutions

Create Searchable PDF Files with OCR

Generating Summaries of PDF Documents using ChatGPT

Translate PDF Text to New Language with ChatGPT

Discover Sentiment Insights from PDF Documents with pdfRest and ChatGPT

Convert PDF to Text to Unlock Trapped Data

Integrate pdfRest with Microsoft Power Automate

Why is pdfRest the best API to extract text from PDF?

pdfRest offers the best solution for extracting text from PDF documents, because it preserves positional data, includes text style information, and taps into data.

Preserve Positional Data

Unlike most PDF text extraction tools, Extract Text by pdfRest can optionally include page and coordinate metadata for each word extracted from the PDF in easy-to-parse JSON format. Simply turn on the word_coordinates parameter.

This data is essential if you're aiming to preserve the position of text in a different file format or create a PDF viewer with searchable and selectable text. If you don't need this extra information, it's just as easy to turn off.

Include Text Style Info

Turn on the word_style option to include detailed style information about each word extracted from the PDF, including font, size, color, and the color space.

This optional metadata supports use cases that require preserving the original document's same appearance of text in another format or user interface. This can be combined with word_coordinates if you require both style and positional data about each word, or simply turned off when not needed.

Tap into Data

The world's collective archive of PDFs is estimated to contain over 2.5 trillion documents, representing an abundance of opportunity for discovering new sources of untapped data. Accessing and aggregating data from many documents can be challenging without the right tools for the job.

pdfRest Extract Text is just the tool you need to batch process or configure automated workflows to extract data from PDFs and facilitate easy database entry and integration with other services.

Check out other videos

Start from Code Examples

First, you'll need an API Key - to:
- Stay anonymous with a Guest API Key for 10 free API Calls
- Sign up for an upgraded API Key with unlimited, continuous service
Choose your programming language
Copy and paste the code to your project
Update Api-Key field with your unique API Key
Update file with the local path to your input
Run this code to send an API Call

See more code examples in our
GitHub repository

Try pdfRest with just a few clicks
Download our Postman Collection

Need more help?

Start with a Tutorial for step-by-step guidance

How to Extract PDF Text in .NET with C#

How to Extract PDF Text with cURL

How to Extract PDF Text with JavaScript in NodeJS

How to Extract PDF Text with PHP

How to Extract PDF Text with Python

1 of 1 pages (5 items)

Customize Your Solution

File

The file parameter allows you to select a local file to be uploaded to pdfRest’s processing server.

The id parameter allows you to submit a resource ID generated by one of our API Tools. Each of our API Tools assigns a unique resource ID to your output file(s), allowing you to chain requests together without having to download intermediate files between requests.

Word Style

The word_style parameter allows you to toggle whether or not to extract styling information about font and color for individual words in the document.

Word Coordinates

The word_coordinates parameter allows you to toggle whether or not to extract coordinate information for the text boxes of individual words in the document.

Full Text

The full_text parameter allows you to specify whether to extract the full text of a document. The three options are as follows: - off : Do not extract the full text - by_page : Extract the full text of each page and return them as separate chunks - document : Extract the full text of the document and return it as a single block of text

See Documentation

Generate a self-service API Key now!

Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.

Compare Plans