How to Extract Images from PDF Files with PHP, Tutorial

Share this page

Why Extract PDF Images with PHP?

The pdfRest Extract Images API Tool is a powerful feature that allows users to extract images from PDF documents efficiently. This tutorial will guide you through the process of sending an API call to extract images using PHP, leveraging the capabilities of the pdfRest API. By using PHP, a widely-used server-side scripting language, you can automate the extraction of images from PDFs, making it a valuable tool for developers and businesses alike.

Imagine a scenario where a company needs to extract images from a large number of PDF reports to create a visual database or gallery. Manually downloading and extracting these images would be time-consuming and error-prone. By using the pdfRest Extract Images API Tool with PHP, the company can automate this process, ensuring accuracy and saving valuable time and resources.

Extract PDF Images with PHP Code Example

require 'vendor/autoload.php'; // Require the autoload file to load Guzzle HTTP client.

use GuzzleHttp\Client; // Import the Guzzle HTTP client namespace.
use GuzzleHttp\Psr7\Request; // Import the PSR-7 Request class.
use GuzzleHttp\Psr7\Utils; // Import the PSR-7 Utils class for working with streams.

$client = new Client(); // Create a new instance of the Guzzle HTTP client.

$headers = [
  'Api-Key' =--> 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' // Set the API key in the headers for authentication.
];

$options = [
  'multipart' => [
    [
      'name' => 'file', // Specify the field name for the file.
      'contents' => Utils::tryFopen('/path/to/file', 'r'), // Open the file specified by the '/path/to/file' for reading.
      'filename' => '/path/to/file', // Set the filename for the file containing images, in this case, '/path/to/file'.
      'headers' => [
        'Content-Type' => '' // Set the Content-Type header for the file.
      ]
    ],
    [
      'name' => 'pages', // Specify the field name for the target page numbers.
      'contents' => '1-last' // Set the value for the target pages (in this case, '1-last', or all pages).
    ],
    [
      'name' => 'output', // Specify the field name for the output option.
      'contents' => 'pdfrest_extracted_images' // Set the value for the output option (in this case, 'pdfrest_extracted_images').
    ]
  ]
];

$request = new Request('POST', 'https://api.pdfrest.com/extracted-images', $headers); // Create a new HTTP POST request with the API endpoint and headers.

$res = $client->sendAsync($request, $options)->wait(); // Send the asynchronous request and wait for the response.

echo $res->getBody(); // Output the response body, which contains the extracted image content.

Source: GitHub PDF Rest API Samples

Breaking Down the Code

Let's break down the code to understand how it works. The code begins by requiring the autoload file to load the Guzzle HTTP client, which is a PHP HTTP client used to make requests.

require 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Psr7\Utils;

Next, we create a new instance of the Guzzle HTTP client to handle our API requests.

$client = new Client();

We then set up the headers for the request, including the API key for authentication. The API key is crucial for accessing the pdfRest API services.

$headers = [
  'Api-Key' => 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
];

The options array configures the multipart form data for the request. It includes the file to be uploaded, the pages to extract images from, and the output format. Each part of the multipart array has a 'name' and 'contents' field, which define the parameter name and its value, respectively.

$options = [
  'multipart' => [
    [
      'name' => 'file',
      'contents' => Utils::tryFopen('/path/to/file', 'r'),
      'filename' => '/path/to/file',
      'headers' => [
        'Content-Type' => ''
      ]
    ],
    [
      'name' => 'pages',
      'contents' => '1-last'
    ],
    [
      'name' => 'output',
      'contents' => 'pdfrest_extracted_images'
    ]
  ]
];

Finally, the code sends an asynchronous POST request to the pdfRest API endpoint and waits for the response. The response body, which contains the extracted images, is then output to the user.

$request = new Request('POST', 'https://api.pdfrest.com/extracted-images', $headers);
$res = $client->sendAsync($request, $options)->wait();
echo $res->getBody();

Beyond the Tutorial

In this tutorial, we successfully demonstrated how to use PHP to call the pdfRest Extract Images API Tool. By understanding the code and its components, you can adapt and expand this example to fit your specific needs. We encourage you to explore all the pdfRest API Tools in the API Lab and refer to the API Reference Guide for more detailed information.

Note: This example uses a multipart API call. For code samples using JSON payloads, visit GitHub JSON Payload Examples.