How to Use OCR to Make PDF Image Text Searchable with JavaScript in NodeJS

Learn how to use pdfRest OCR PDF API Tool with JavaScript to make PDF image text searchable
Share this page

Why Use OCR to make Searchable PDF with JavaScript?

The pdfRest OCR PDF API Tool allows developers to convert scanned PDF documents into searchable text by applying Optical Character Recognition (OCR). This tutorial will show how to send an API call to OCR PDF with JavaScript, making it possible to automate the process of converting image-based text into machine-readable text within PDF documents.

Imagine you work in an office where you frequently receive scanned documents. These documents are essentially images, making it difficult to search for specific text within them. By using the OCR PDF API, you can convert these scanned documents into searchable PDFs, significantly improving your workflow and productivity.

OCR PDF with JavaScript Code Example

// This request demonstrates how to apply OCR to a PDF document and insert text behind images of text.
var axios = require('axios');
var FormData = require('form-data');
var fs = require('fs');

// Create a new form data instance and append the PDF file and parameters to it
var data = new FormData();
data.append('file', fs.createReadStream('/path/to/file'));
data.append('output', 'pdfrest_pdf-with-ocr-text');

// define configuration options for axios request
var config = {
  method: 'post',
  maxBodyLength: Infinity, // set maximum length of the request body
  url: 'https://api.pdfrest.com/pdf-with-ocr-text', 
  headers: { 
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx', // Replace with your API key
    ...data.getHeaders() // set headers for the request
  },
  data : data // set the data to be sent with the request
};

// send request and handle response or error
axios(config)
.then(function (response) {
  console.log(JSON.stringify(response.data));
})
.catch(function (error) {
  console.log(error); 
});

// If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.js' sample.

Source: GitHub

Breaking Down the Code

Let's break down the code to understand how it works:

var axios = require('axios');
var FormData = require('form-data');
var fs = require('fs');

This snippet imports the required modules: axios for making HTTP requests, FormData for handling form data, and fs for file system operations.

var data = new FormData();
data.append('file', fs.createReadStream('/path/to/file'));
data.append('output', 'pdfrest_pdf-with-ocr-text');

Here, a new instance of FormData is created, and the PDF file is appended to it using fs.createReadStream. The output parameter specifies the type of output we want, in this case, a PDF with OCR text.

var config = {
  method: 'post',
  maxBodyLength: Infinity, 
  url: 'https://api.pdfrest.com/pdf-with-ocr-text', 
  headers: { 
    'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
    ...data.getHeaders() 
  },
  data : data 
};

The config object is defined to configure the Axios request. It specifies the HTTP method as post, sets the maximum body length to infinity, and provides the URL of the API endpoint. The headers include the API key (which you should replace with your own) and the headers generated by the FormData instance. The data field contains the form data to be sent with the request.

axios(config)
.then(function (response) {
  console.log(JSON.stringify(response.data));
})
.catch(function (error) {
  console.log(error); 
});

This part of the code sends the HTTP request using Axios and handles the response or any errors that occur. The response is logged to the console in JSON format.

Beyond the Tutorial

In this tutorial, we have demonstrated how to use JavaScript to send an API call to the pdfRest OCR PDF API Tool, converting a scanned PDF document into a searchable text PDF. This is just one of the many functionalities offered by pdfRest.

We encourage you to explore all of the pdfRest API Tools in the API Lab. For more detailed information, refer to the API Reference Guide.

Note: This example demonstrates a multipart API call. Code samples using JSON payloads can be found at GitHub.

Generate a self-service API Key now!
Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.