How to Summarize PDF Text with Java

Learn how to leverage pdfRest Summarize PDF API using Java for PDF text content summarization.
Share this page

Why Summarize PDF with Java?

The pdfRest Summarize PDF API Tool is a powerful service that allows developers to extract and summarize text from PDF documents programmatically. This tutorial will guide you through the process of sending an API call to the Summarize PDF endpoint using Java. By integrating this tool into your Java application, you can automate the extraction of key information from PDFs, making it easier to process and analyze document content.

Imagine a scenario where a company receives numerous lengthy PDF reports daily. Manually reading and summarizing these reports can be time-consuming and prone to errors. By using the Summarize PDF API, the company can automatically extract concise summaries, allowing employees to quickly grasp the essential points without having to read through the entire documents. This can significantly enhance productivity and ensure that critical information is not overlooked.

Summarize PDF with Java Code Example

import io.github.cdimascio.dotenv.Dotenv;
import java.io.File;
import java.io.IOException;
import okhttp3.MediaType;
import okhttp3.MultipartBody;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;
import org.json.JSONObject;

public class SummarizedPDFText {

  // By default, we use the US-based API service. This is the primary endpoint for global use.
  private static final String API_URL = "https://api.pdfrest.com";

  // For GDPR compliance and enhanced performance for European users, you can switch to the EU-based
  // service by commenting out the URL above and uncommenting the URL below.
  // For more information visit https://pdfrest.com/pricing#how-do-eu-gdpr-api-calls-work
  // private static final String API_URL = "https://eu-api.pdfrest.com";

  // Specify the path to your file here, or as the first argument when running the program.
  private static final String DEFAULT_FILE_PATH = "/path/to/file.pdf";

  // Specify your API key here, or in the environment variable PDFREST_API_KEY.
  // You can also put the environment variable in a .env file.
  private static final String DEFAULT_API_KEY = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx";

  public static void main(String[] args) {
    File inputFile;
    if (args.length > 0) {
      inputFile = new File(args[0]);
    } else {
      inputFile = new File(DEFAULT_FILE_PATH);
    }

    final Dotenv dotenv = Dotenv.configure().ignoreIfMalformed().ignoreIfMissing().load();

    final RequestBody inputFileRequestBody =
        RequestBody.create(inputFile, MediaType.parse("application/pdf"));
    RequestBody requestBody =
        new MultipartBody.Builder()
            .setType(MultipartBody.FORM)
            .addFormDataPart("file", inputFile.getName(), inputFileRequestBody)
            .addFormDataPart("target_word_count", "100")
            .build();
    Request request =
        new Request.Builder()
            .header("Api-Key", dotenv.get("PDFREST_API_KEY", DEFAULT_API_KEY))
            .url(API_URL + "/summarized-pdf-text")
            .post(requestBody)
            .build();
    try {
      OkHttpClient client = new OkHttpClient().newBuilder().build();
      Response response = client.newCall(request).execute();
      System.out.println("Result code " + response.code());
      if (response.body() != null) {
        System.out.println(prettyJson(response.body().string()));
      }
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

  private static String prettyJson(String json) {
    return new JSONObject(json).toString(4);
  }
}

Source: GitHub

Breaking Down the Code

The code begins by importing necessary libraries such as io.github.cdimascio.dotenv.Dotenv for environment variable management, and okhttp3 for HTTP client functionalities. The org.json.JSONObject is used for JSON manipulation.

private static final String API_URL = "https://api.pdfrest.com";

This line sets the API endpoint URL. For European users, an EU-based endpoint is available for GDPR compliance.

private static final String DEFAULT_FILE_PATH = "/path/to/file.pdf";

This specifies the default file path for the PDF document to be summarized. It can be overridden by providing a file path as a command-line argument.

private static final String DEFAULT_API_KEY = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx";

The API key is required for authentication. It can be set directly in the code or through an environment variable PDFREST_API_KEY.

RequestBody requestBody = new MultipartBody.Builder()
            .setType(MultipartBody.FORM)
            .addFormDataPart("file", inputFile.getName(), inputFileRequestBody)
            .addFormDataPart("target_word_count", "100")
            .build();

This snippet constructs a multipart form request body. The PDF file is added as a form data part, and a target word count for the summary is specified as 100 words.

Request request = new Request.Builder()
            .header("Api-Key", dotenv.get("PDFREST_API_KEY", DEFAULT_API_KEY))
            .url(API_URL + "/summarized-pdf-text")
            .post(requestBody)
            .build();

The request is built with the API key in the header, the endpoint URL, and the multipart request body.

OkHttpClient client = new OkHttpClient().newBuilder().build();
Response response = client.newCall(request).execute();

An OkHttpClient instance is created to execute the request, and the response is captured.

Beyond the Tutorial

In this tutorial, you learned how to use Java to call the pdfRest Summarize PDF API, sending a PDF document and receiving a summarized text response. This is just one of many functionalities offered by pdfRest. We encourage you to explore all the available API tools in the API Lab.

For more detailed information, refer to the API Reference Guide. This example demonstrates a multipart API call. For code samples using JSON payloads, visit this GitHub repository.

Generate a self-service API Key now!
Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.