How to Redact PDF Text with Java

Learn how to redact text on a PDF document using pdfRest Redact PDF API tool with Java.
Share this page

Why Redact PDF Text with Java?

The pdfRest Redact PDF API Tool is a powerful solution for developers who need to programmatically redact sensitive information from PDF documents. This tutorial will guide you through the process of making an API call to the Redact PDF endpoint using Java, allowing you to automate the redaction process and integrate it into your Java applications seamlessly.

In real-world scenarios, businesses often need to share documents while ensuring that sensitive information, such as personal data or confidential details, is not exposed. For instance, a legal firm might need to redact client information from case documents before sharing them with third parties. By using the Redact PDF API, developers can automate this redaction process, saving time and reducing the risk of human error.

Redact PDF Text with Java Code Example

import io.github.cdimascio.dotenv.Dotenv;
import java.io.File;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
import okhttp3.*;
import org.json.JSONObject;

public class PDFWithRedactedTextPreview {

  // Specify the path to your file here, or as the first argument when running the program.
  private static final String DEFAULT_FILE_PATH = "/path/to/file";

  // Specify your API key here, or in the environment variable PDFREST_API_KEY.
  // You can also put the environment variable in a .env file.
  private static final String DEFAULT_API_KEY = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx";

  public static void main(String[] args) {
    File inputFile;
    if (args.length > 0) {
      inputFile = new File(args[0]);
    } else {
      inputFile = new File(DEFAULT_FILE_PATH);
    }

    final Dotenv dotenv = Dotenv.configure().ignoreIfMalformed().ignoreIfMissing().load();

    final String redaction_options =
        "[{\"type\":\"preset\",\"value\":\"email\"},{\"type\":\"regex\",\"value\":\"(\\\\+\\\\d{1,2}\\\\s)?\\\\(?\\\\d{3}\\\\)?[\\\\s.-]\\\\d{3}[\\\\s.-]\\\\d{4}\"},{\"type\":\"literal\",\"value\":\"word\"}]";
    final RequestBody inputFileRequestBody =
        RequestBody.create(inputFile, MediaType.parse("application/pdf"));
    RequestBody requestBody =
        new MultipartBody.Builder()
            .setType(MultipartBody.FORM)
            .addFormDataPart("file", inputFile.getName(), inputFileRequestBody)
            .addFormDataPart("redactions", redaction_options)
            .addFormDataPart("output", "pdfrest_redacted_text")
            .build();
    Request request =
        new Request.Builder()
            .header("Api-Key", dotenv.get("PDFREST_API_KEY", DEFAULT_API_KEY))
            .url("https://api.pdfrest.com/pdf-with-redacted-text-preview")
            .post(requestBody)
            .build();
    try {
      OkHttpClient client =
          new OkHttpClient().newBuilder().readTimeout(60, TimeUnit.SECONDS).build();

      Response response = client.newCall(request).execute();
      System.out.println("Result code " + response.code());
      if (response.body() != null) {
        System.out.println(prettyJson(response.body().string()));
      }
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

  private static String prettyJson(String json) {
    // https://stackoverflow.com/a/9583835/11996393
    return new JSONObject(json).toString(4);
  }
}

Source: GitHub Repository

Breaking Down the Code

The code begins by importing necessary libraries such as `Dotenv` for environment variable management, `OkHttp` for handling HTTP requests, and `JSONObject` for JSON manipulation. The `DEFAULT_FILE_PATH` and `DEFAULT_API_KEY` are placeholders for the file path and API key, respectively.

File inputFile;
if (args.length > 0) {
  inputFile = new File(args[0]);
} else {
  inputFile = new File(DEFAULT_FILE_PATH);
}

This snippet checks if a file path is provided as a command-line argument. If not, it defaults to the `DEFAULT_FILE_PATH`.

final Dotenv dotenv = Dotenv.configure().ignoreIfMalformed().ignoreIfMissing().load();

The `Dotenv` library is used to load environment variables, allowing the API key to be stored securely outside the code.

final String redaction_options = "[{\"type\":\"preset\",\"value\":\"email\"},{\"type\":\"regex\",\"value\":\"(\\\\+\\\\d{1,2}\\\\s)?\\\\(?\\\\d{3}\\\\)?[\\\\s.-]\\\\d{3}[\\\\s.-]\\\\d{4}\"},{\"type\":\"literal\",\"value\":\"word\"}]";

The `redaction_options` variable defines the redaction rules, using a JSON array to specify different types of redactions: preset (e.g., email), regex (e.g., phone numbers), and literal (e.g., specific words).

RequestBody requestBody = new MultipartBody.Builder()
    .setType(MultipartBody.FORM)
    .addFormDataPart("file", inputFile.getName(), inputFileRequestBody)
    .addFormDataPart("redactions", redaction_options)
    .addFormDataPart("output", "pdfrest_redacted_text")
    .build();

This section constructs the multipart request body, including the PDF file, redaction options, and output format.

Request request = new Request.Builder()
    .header("Api-Key", dotenv.get("PDFREST_API_KEY", DEFAULT_API_KEY))
    .url("https://api.pdfrest.com/pdf-with-redacted-text-preview")
    .post(requestBody)
    .build();

The `Request` object is created with the API key header and the endpoint URL, ready for execution.

Beyond the Tutorial

In this tutorial, you learned how to use Java to make an API call to the pdfRest Redact PDF endpoint, automating the redaction of sensitive information from PDF files. You can explore other pdfRest API Tools in the API Lab and refer to the API Reference Guide for more detailed information.

Note: This example demonstrates a multipart API call. For code samples using JSON payloads, visit this GitHub repository.

Generate a self-service API Key now!
Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.