How to Convert PDF to Markdown with PHP

Learn how to use pdfRest API Toolkit to convert PDF content to structured Markdown output with PHP.
Share this page

Why Convert PDF to Markdown with PHP?

The pdfRest PDF to Markdown API Tool is a powerful service that allows developers to convert PDF documents into Markdown format. This tutorial will guide you through the process of sending an API call to the PDF to Markdown endpoint using PHP, leveraging the capabilities of the Guzzle HTTP client to handle the request and response.

A user might need to convert PDF documents to Markdown to facilitate easier editing and integration into web content management systems. For instance, a content manager might receive reports in PDF format that need to be published on a website. By converting these PDFs to Markdown, the content can be easily edited and formatted within the website's CMS, streamlining the publishing process.

PDF to Markdown with PHP Code Example

require 'vendor/autoload.php'; // Require the autoload file to load Guzzle HTTP client.

use GuzzleHttp\Client; // Import the Guzzle HTTP client namespace.
use GuzzleHttp\Psr7\Request; // Import the PSR-7 Request class.
use GuzzleHttp\Psr7\Utils; // Import the PSR-7 Utils class for working with streams.

$client = new Client(); // Create a new instance of the Guzzle HTTP client.

$headers = [
  'Api-Key' =--> 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' // Set the API key in the headers for authentication.
];

$options = [
  'multipart' => [
    [
      'name' => 'file', // Specify the field name for the file.
      'contents' => Utils::tryFopen('/path/to/file', 'r'), // Open the file specified by '/path/to/file' for reading.
      'filename' => '/path/to/file', // Set the filename for the file to be processed, in this case, '/path/to/file'.
      'headers' => [
        'Content-Type' => 'application/pdf' // Set the Content-Type header for the file.
      ]
    ],
    [
      'name' => 'page_break_comments', // Specify the field name for the page_break_comments option.
      'contents' => 'on' // Set the value for the page_break_comments option (in this case, 'on').
    ]
  ]
];

$request = new Request('POST', 'https://api.pdfrest.com/markdown', $headers); // Create a new HTTP POST request with the updated /markdown endpoint and headers.

$res = $client->sendAsync($request, $options)->wait(); // Send the asynchronous request and wait for the response.

echo $res->getBody(); // Output the response body, which contains the generated markdown from the document.

Source: GitHub

Breaking Down the Code

The code begins by requiring the Composer autoload file, which loads the necessary classes for the Guzzle HTTP client:

require 'vendor/autoload.php';

Next, the Guzzle HTTP client, PSR-7 Request, and Utils classes are imported to handle the HTTP request and file operations:

use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Psr7\Utils;

A new instance of the Guzzle HTTP client is created:

$client = new Client();

The headers array is defined to include the API key for authentication:

$headers = [
  'Api-Key' => 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
];

The options array is set up with a multipart payload. The first part contains the file to be converted, specifying the file's name, contents, filename, and content type:

$options = [
  'multipart' => [
    [
      'name' => 'file',
      'contents' => Utils::tryFopen('/path/to/file', 'r'),
      'filename' => '/path/to/file',
      'headers' => [
        'Content-Type' => 'application/pdf'
      ]
    ],
    [
      'name' => 'page_break_comments',
      'contents' => 'on'
    ]
  ]
];

A new HTTP POST request is created with the endpoint URL and headers:

$request = new Request('POST', 'https://api.pdfrest.com/markdown', $headers);

The asynchronous request is sent, and the response is awaited:

$res = $client->sendAsync($request, $options)->wait();

Finally, the response body, which contains the converted Markdown, is output:

echo $res->getBody();

Beyond the Tutorial

In this tutorial, you learned how to use PHP to send an API request to the pdfRest PDF to Markdown endpoint, converting a PDF document into Markdown format. This process can be particularly useful for integrating PDF content into web platforms.

To explore more features and tools offered by pdfRest, you can try all the API Tools in the API Lab. For detailed information on each API endpoint, refer to the API Reference Guide.

Note that this example demonstrates a multipart API call. For examples using JSON payloads, visit GitHub.

Generate a self-service API Key now!
Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.