How to Convert PDF to Markdown with Java

Learn how to convert PDF content to structured Markdown output with Java using pdfRest API Toolkit.
Share this page

Why Convert PDF to Markdown with Java?

The pdfRest PDF to Markdown API Tool provides a seamless way to convert PDF documents into Markdown format, which is widely used for formatting text on the web. This tutorial will guide you on how to send an API call to the PDF to Markdown endpoint using Java, enabling you to automate the conversion process within your Java applications.

You might have a repository of PDF documents that you want to convert into Markdown for easier web publishing or integration into a content management system. For example, a technical writer might use this tool to convert PDF manuals into Markdown to maintain a consistent format across a documentation website.

PDF to Markdown with Java Code Example

import io.github.cdimascio.dotenv.Dotenv;
import java.io.File;
import java.io.IOException;
import okhttp3.MediaType;
import okhttp3.MultipartBody;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;
import org.json.JSONObject;

public class Markdown {

  // Specify the path to your file here, or as the first argument when running the program.
  private static final String DEFAULT_FILE_PATH = "/path/to/file.pdf";

  // Specify your API key here, or in the environment variable PDFREST_API_KEY.
  // You can also put the environment variable in a .env file.
  private static final String DEFAULT_API_KEY = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx";

  public static void main(String[] args) {
    File inputFile;
    if (args.length > 0) {
      inputFile = new File(args[0]);
    } else {
      inputFile = new File(DEFAULT_FILE_PATH);
    }

    final Dotenv dotenv = Dotenv.configure().ignoreIfMalformed().ignoreIfMissing().load();

    final RequestBody inputFileRequestBody =
        RequestBody.create(inputFile, MediaType.parse("application/pdf"));
    RequestBody requestBody =
        new MultipartBody.Builder()
            .setType(MultipartBody.FORM)
            .addFormDataPart("file", inputFile.getName(), inputFileRequestBody)
            .addFormDataPart("page_break_comments", "on")
            .build();
    Request request =
        new Request.Builder()
            .header("Api-Key", dotenv.get("PDFREST_API_KEY", DEFAULT_API_KEY))
            .url("https://api.pdfrest.com/markdown")
            .post(requestBody)
            .build();
    try {
      OkHttpClient client = new OkHttpClient().newBuilder().build();
      Response response = client.newCall(request).execute();
      System.out.println("Result code " + response.code());
      if (response.body() != null) {
        System.out.println(prettyJson(response.body().string()));
      }
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

  private static String prettyJson(String json) {
    // https://stackoverflow.com/a/9583835/11996393
    return new JSONObject(json).toString(4);
  }
}

Source: GitHub

Breaking Down the Code

The code begins by importing necessary libraries such as OkHttp for HTTP requests and Dotenv for environment variable management. The DEFAULT_FILE_PATH and DEFAULT_API_KEY are placeholders for the file path and API key, respectively.

File inputFile;
if (args.length > 0) {
  inputFile = new File(args[0]);
} else {
  inputFile = new File(DEFAULT_FILE_PATH);
}

This snippet checks if a file path is provided as a command-line argument. If not, it defaults to the specified file path.

final Dotenv dotenv = Dotenv.configure().ignoreIfMalformed().ignoreIfMissing().load();

The Dotenv library is used to load environment variables, allowing you to manage sensitive information like API keys securely.

final RequestBody inputFileRequestBody =
    RequestBody.create(inputFile, MediaType.parse("application/pdf"));
RequestBody requestBody =
    new MultipartBody.Builder()
        .setType(MultipartBody.FORM)
        .addFormDataPart("file", inputFile.getName(), inputFileRequestBody)
        .addFormDataPart("page_break_comments", "on")
        .build();

This segment constructs the request body for the API call. It specifies the file to be uploaded and includes a form data part named "page_break_comments" set to "on", which instructs the API to include comments for page breaks in the Markdown output.

Request request =
    new Request.Builder()
        .header("Api-Key", dotenv.get("PDFREST_API_KEY", DEFAULT_API_KEY))
        .url("https://api.pdfrest.com/markdown")
        .post(requestBody)
        .build();

The request is built with the API key in the header, targeting the PDF to Markdown endpoint. It uses the POST method to send the multipart form data.

try {
  OkHttpClient client = new OkHttpClient().newBuilder().build();
  Response response = client.newCall(request).execute();
  System.out.println("Result code " + response.code());
  if (response.body() != null) {
    System.out.println(prettyJson(response.body().string()));
  }
} catch (IOException e) {
  throw new RuntimeException(e);
}

Finally, the OkHttp client executes the request, and the response is printed in a formatted JSON string. If an IOException occurs, it is caught and thrown as a RuntimeException.

Beyond the Tutorial

In this tutorial, you learned how to use Java to call the pdfRest PDF to Markdown API, converting PDF files into Markdown format. This process can be integrated into larger Java applications, enabling automated document conversion workflows.

To explore more, you can demo all of the pdfRest API Tools in the API Lab. For detailed documentation, refer to the API Reference Guide.

Note: This example demonstrates a multipart API call. For code samples using JSON payloads, visit GitHub.

Generate a self-service API Key now!
Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.