How to Use OCR to Make PDF Image Text Searchable with JavaScript in NodeJS
Why Use OCR to make Searchable PDF with JavaScript?
The pdfRest OCR PDF API Tool allows developers to convert scanned PDF documents into searchable text by applying Optical Character Recognition (OCR). This tutorial will show how to send an API call to OCR PDF with JavaScript, making it possible to automate the process of converting image-based text into machine-readable text within PDF documents.
Imagine you work in an office where you frequently receive scanned documents. These documents are essentially images, making it difficult to search for specific text within them. By using the OCR PDF API, you can convert these scanned documents into searchable PDFs, significantly improving your workflow and productivity.
OCR PDF with JavaScript Code Example
// This request demonstrates how to apply OCR to a PDF document and insert text behind images of text. var axios = require('axios'); var FormData = require('form-data'); var fs = require('fs'); // Create a new form data instance and append the PDF file and parameters to it var data = new FormData(); data.append('file', fs.createReadStream('/path/to/file')); data.append('output', 'pdfrest_pdf-with-ocr-text'); // define configuration options for axios request var config = { method: 'post', maxBodyLength: Infinity, // set maximum length of the request body url: 'https://api.pdfrest.com/pdf-with-ocr-text', headers: { 'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx', // Replace with your API key ...data.getHeaders() // set headers for the request }, data : data // set the data to be sent with the request }; // send request and handle response or error axios(config) .then(function (response) { console.log(JSON.stringify(response.data)); }) .catch(function (error) { console.log(error); }); // If you would like to download the file instead of getting the JSON response, please see the 'get-resource-id-endpoint.js' sample.
Source: GitHub
Breaking Down the Code
Let's break down the code to understand how it works:
var axios = require('axios'); var FormData = require('form-data'); var fs = require('fs');
This snippet imports the required modules: axios
for making HTTP requests, FormData
for handling form data, and fs
for file system operations.
var data = new FormData(); data.append('file', fs.createReadStream('/path/to/file')); data.append('output', 'pdfrest_pdf-with-ocr-text');
Here, a new instance of FormData
is created, and the PDF file is appended to it using fs.createReadStream
. The output
parameter specifies the type of output we want, in this case, a PDF with OCR text.
var config = { method: 'post', maxBodyLength: Infinity, url: 'https://api.pdfrest.com/pdf-with-ocr-text', headers: { 'Api-Key': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx', ...data.getHeaders() }, data : data };
The config
object is defined to configure the Axios request. It specifies the HTTP method as post
, sets the maximum body length to infinity, and provides the URL of the API endpoint. The headers
include the API key (which you should replace with your own) and the headers generated by the FormData
instance. The data
field contains the form data to be sent with the request.
axios(config) .then(function (response) { console.log(JSON.stringify(response.data)); }) .catch(function (error) { console.log(error); });
This part of the code sends the HTTP request using Axios and handles the response or any errors that occur. The response is logged to the console in JSON format.
Beyond the Tutorial
In this tutorial, we have demonstrated how to use JavaScript to send an API call to the pdfRest OCR PDF API Tool, converting a scanned PDF document into a searchable text PDF. This is just one of the many functionalities offered by pdfRest.
We encourage you to explore all of the pdfRest API Tools in the API Lab. For more detailed information, refer to the API Reference Guide.
Note: This example demonstrates a multipart API call. Code samples using JSON payloads can be found at GitHub.