Extract Text
Extract Text is a REST API tool that extracts all text from PDF documents, optionally including style and position information.
Start right from your browser - upload files, choose parameters, generate code, and send API Calls directly from API Lab!
You have document processing problems, we have Solutions. Explore the many ways pdfRest can align your documents with your business objectives.
Preserve Positional Data
Unlike most PDF text extraction tools, Extract Text by pdfRest can optionally include page and coordinate metadata for each word extracted from the PDF in easy-to-parse JSON format. Simply turn on the word_coordinates
parameter.
This data is essential if you're aiming to preserve the position of text in a different file format or create a PDF viewer with searchable and selectable text. If you don't need this extra information, it's just as easy to turn off.
Include Text Style Info
Turn on the word_style
option to include detailed style information about each word extracted from the PDF, including font, size, color, and the color space.
This optional metadata supports use cases that require preserving the original document's same appearance of text in another format or user interface. This can be combined with word_coordinates
if you require both style and positional data about each word, or simply turned off when not needed.
Tap into Data
The world's collective archive of PDFs is estimated to contain over 2.5 trillion documents, representing an abundance of opportunity for discovering new sources of untapped data. Accessing and aggregating data from many documents can be challenging without the right tools for the job.
pdfRest Extract Text is just the tool you need to batch process or configure automated workflows to extract data from PDFs and facilitate easy database entry and integration with other services.
Need more help?
Start with a Tutorial for step-by-step guidance
Learn about the parameters for this tool to create your custom solution.
The word_style
parameter allows you to toggle whether or not to extract styling information about font and color for individual words in the document.