OCR PDF API Tool - Make Searchable PDF, Extract Text from Images

Try Now with API Lab

Start right from your browser - upload files, choose parameters, generate code, and send API Calls directly from API Lab!

Request

POST

Headers

Api-Key

Don't have a key? Create an account to get one.

Response-Type

Choose between a full response after processing completes or an immediate response containing only the requestId to poll for the processing status later.

Full Response

Request ID

Required Parameters

file

File to be uploaded and processed

Alphanumeric ID (UUID) of existing file on server to be processed

Optional Parameters

output

Name of the generated output file, without extension

languages

Comma-separated list specifying the languages the OCR engine should recognize within the document. Including many languages may effect performance, particularly CJK languages (Chinese, Japanese, Korean).

Code

curl -X POST "https://api.pdfrest.com/pdf-with-ocr-text" \
  -H "Accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -H "Api-Key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \

Response

The response for your API Call will display here.

Once you've sent your POST request and received a valid response, you can download your output file using the output URL.

Build Your Solution

You have document processing problems, we have Solutions. Explore the many ways pdfRest can align your documents with your business objectives.

Browse all solutions

Parse PDF Files to Streamline Data Extraction

Create Searchable PDF Files with OCR

Integrate pdfRest with Microsoft Power Automate

Ensure GDPR Compliance for PDF Processing with EU-Based Cloud API

Extract Text from PDF using OCR

Integrate PDF API Tools with Salesforce Apex Code

Why is pdfRest the best API to OCR PDF Documents?

pdfRest offers the best solution for applying OCR to PDF documents, because it generates searchable PDF files, supports image-based text extraction, and integrates easily with all projects.

Enhance Searchability and Accessibility with PDF to OCR Technology

Traditional text extraction methods struggle with scanned documents or PDFs containing embedded images. pdfRest addresses this challenge by leveraging Optical Character Recognition (OCR) technology. OCR PDF API Tool accurately detects text within images and strategically places the recognized text behind the image in the PDF document. This enables developers to:

Transform Non-searchable PDFs: Previously inaccessible image-based text becomes selectable and searchable within the PDF.
Boost Efficiency: Eliminate the need for manual data entry, saving development time and resources.
Improved User Experience: Enhance user workflows by enabling them to easily highlight, copy, and search for text within images directly within the PDF.

Extract Text Easily with OCR from PDF Technology

pdfRest offers a comprehensive approach to PDF text extraction. OCR PDF API Tool can be used to make the text within images extractable. This serves as an ideal pre-processing step by adding image text directly to the PDF before applying the Extract Text API Tool. The effect of this combined approach ensures developers can reliably extract all text, including rasterized content, from PDFs.

pdfRest OCR + Text Extraction functionality supports a wide range of applications, including document archival, content search, and data analysis, empowering developers to unlock the full potential of their PDF data.

Seamless PDF and OCR Integration

OCR PDF API Tool empowers you to leverage the power of OCR without sacrificing development efficiency. Focus on core functionalities and streamline your workflows with a solution designed to integrate effortlessly into any development project, regardless of programming language or technology stack.

Unlike traditional methods that require complex setup and configuration, the pdfRest API offers a frictionless integration experience. With well-documented references and readily available code samples, developers can implement workflows to OCR PDF files within their applications with minimal code and effort.

Check out other videos

Start from Code Examples

See more code examples in our GitHub repository

Need more help?

Start with a Tutorial for step-by-step guidance

How to Programmatically OCR PDFs to Create Searchable Documents

How to Use OCR to Extract Text from PDF Images in .NET with C#

How to Use OCR to Extract Text from PDF Images with cURL

How to Use OCR to Extract Text from PDF Images with JavaScript in NodeJS

How to Use OCR to Extract Text from PDF Images with PHP

How to Use OCR to Extract Text from PDF Images with Python

How to Use OCR to Make PDF Image Text Searchable in .NET with C#

How to Use OCR to Make PDF Image Text Searchable with cURL

11 items

Customize Your Solution

Learn about the parameters for this tool to create your custom solution.

File

The file parameter allows you to select a local file to be uploaded to pdfRest’s processing server.

See Documentation

The id parameter allows you to submit a resource ID generated by one of our API Tools. Each of our API Tools assigns a unique resource ID to your output file(s), allowing you to chain requests together without having to download intermediate files between requests.

See Documentation

Output

The output parameter lets you set a filename (without extension) for your OCR-processed PDF.

See Documentation

Languages

The languages parameter allows you to specify the languages that the OCR engine should recognize within your PDF document. This is particularly useful when dealing with multilingual documents or documents containing text in languages other than English.

Supported Languages:

ChineseSimplified
ChineseTraditional
Dutch
English
French
German
Italian
Japanese
Korean
Portuguese
Spanish

How to Use:

Identify Languages: Determine the primary languages present in your PDF document. Query PDF can be used in many cases to detect the metadata value for the document's language.
Specify Languages: Provide a comma-separated list of language codes in the languages parameter of your API request.

Example:

English,German,French

Important Considerations:

Performance Impact: Including multiple languages, especially CJK languages (Chinese, Japanese, Korean), can affect OCR processing time. Carefully consider the languages present in your document and balance accuracy with performance.
Default Language: If the languages parameter is not specified, the OCR engine will default to English.

By effectively utilizing the languages parameter, you can optimize the OCR performance and accuracy for your multilingual PDF documents.

See Documentation

Frequently Asked Questions

Need more help? Contact Us or visit our documentation.

OCR PDF

Key Benefits of OCR PDF API

Enhance Searchability and Accessibility with PDF to OCR Technology

Extract Text Easily with OCR from PDF Technology

Seamless PDF and OCR Integration

Need more help?

What is the OCR PDF API and how does it work?

Why should I use the OCR PDF API for document processing?

Can I automate the text extraction process with the OCR PDF API?

What types of documents can the OCR PDF API process?

How do I integrate the OCR PDF API into my existing systems?

Is there a way to specify languages for OCR processing?

Can I test the OCR PDF API for free before committing?

Does the OCR PDF API support cloud-based or self-hosted deployment?

What makes pdfRest the best OCR software for PDFs?

How can I use pdfRest to OCR PDF online?

Is there a tutorial for using pdfRest's OCR PDF API?

OCR PDF

Key Benefits of OCR PDF API

Enhance Searchability and Accessibility with PDF to OCR Technology

Extract Text Easily with OCR from PDF Technology

Seamless PDF and OCR Integration

Need more help?

What is the OCR PDF API and how does it work?

Why should I use the OCR PDF API for document processing?

Can pdfRest OCR PDFs under GDPR compliance?

Can I automate the text extraction process with the OCR PDF API?

What types of documents can the OCR PDF API process?

How do I integrate the OCR PDF API into my existing systems?

Is there a way to specify languages for OCR processing?

Can I test the OCR PDF API for free before committing?

Does the OCR PDF API support cloud-based or self-hosted deployment?

What makes pdfRest the best OCR software for PDFs?

How can I use pdfRest to OCR PDF online?

Is there a tutorial for using pdfRest's OCR PDF API?