How to Extract Pages from PDF Files in .NET with C#
Why Extract PDF Pages with C#?
The pdfRest Split PDF API Tool is a powerful resource for developers who need to programmatically extract pages from PDF documents. This API provides a simple way to extract specific pages from a PDF and create new documents from those pages.
Imagine a lengthy legal contract where different departments need specific clauses. Extracting pages to new documents lets you create separate PDFs containing only the relevant sections for each team, streamlining internal communication and reducing unnecessary document sharing.
Extract PDF Pages with C# Code Example
using System.Text; using (var httpClient = new HttpClient { BaseAddress = new Uri("https://api.pdfrest.com") }) { using (var request = new HttpRequestMessage(HttpMethod.Post, "split-pdf")) { request.Headers.TryAddWithoutValidation("Api-Key", "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"); request.Headers.Accept.Add(new("application/json")); var multipartContent = new MultipartFormDataContent(); var byteArray = File.ReadAllBytes("/path/to/file_name.pdf"); var byteAryContent = new ByteArrayContent(byteArray); multipartContent.Add(byteAryContent, "file", "file_name.pdf"); byteAryContent.Headers.TryAddWithoutValidation("Content-Type", "application/pdf"); var byteArrayOption = new ByteArrayContent(Encoding.UTF8.GetBytes("1")); multipartContent.Add(byteArrayOption, "pages[]"); var byteArrayOption2 = new ByteArrayContent(Encoding.UTF8.GetBytes("2-last")); multipartContent.Add(byteArrayOption2, "pages[]"); var byteArrayOption3 = new ByteArrayContent(Encoding.UTF8.GetBytes("split")); multipartContent.Add(byteArrayOption3, "output"); request.Content = multipartContent; var response = await httpClient.SendAsync(request); var apiResult = await response.Content.ReadAsStringAsync(); Console.WriteLine("API response received."); Console.WriteLine(apiResult); } }
Source: pdf-rest-api-samples on GitHub
Breaking Down the Code
The provided code demonstrates how to make an API call to the pdfRest Split PDF endpoint using C#. Let's break down each part of the code:
using (var httpClient = new HttpClient { BaseAddress = new Uri("https://api.pdfrest.com") })
This creates an instance of HttpClient
with the base address set to the pdfRest API.
using (var request = new HttpRequestMessage(HttpMethod.Post, "split-pdf"))
A new HttpRequestMessage
is created for making a POST request to the "split-pdf" endpoint.
request.Headers.TryAddWithoutValidation("Api-Key", "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx");
The API key is added to the request headers. You must replace the placeholder with your actual pdfRest API key.
var multipartContent = new MultipartFormDataContent();
This initializes a new multipart/form-data content to send files and data in the request body.
var byteArray = File.ReadAllBytes("/path/to/file_name.pdf");
Reads the PDF file from the specified path and converts it into a byte array.
multipartContent.Add(byteAryContent, "file", "file_name.pdf");
Adds the PDF content to the multipart form data with the key "file". The "file_name.pdf" should be the name of the PDF file.
var byteArrayOption = new ByteArrayContent(Encoding.UTF8.GetBytes("1")); multipartContent.Add(byteArrayOption, "pages[]");
Adds the page range "1" to the request, indicating that the first page should be split into a separate PDF.
var byteArrayOption2 = new ByteArrayContent(Encoding.UTF8.GetBytes("2-last")); multipartContent.Add(byteArrayOption2, "pages[]");
Adds the page range "2-last" to the request, indicating that all pages from the second to the last should be split into another separate PDF.
var byteArrayOption3 = new ByteArrayContent(Encoding.UTF8.GetBytes("split")); multipartContent.Add(byteArrayOption3, "output");
Specifies the output format, in this case, "split" which indicates that the output should be separate PDF files.
var response = await httpClient.SendAsync(request);
Sends the request to the pdfRest API and awaits the response.
var apiResult = await response.Content.ReadAsStringAsync();
Reads the response content as a string, which contains the result of the split operation.
Beyond the Tutorial
By following the steps above, you have learned how to extract pages from a PDF using the pdfRest Split PDF API with C#. You can now apply this knowledge to automate document processing tasks within your applications. To explore more capabilities and demo all of the pdfRest API Tools, visit the API Lab. For further details on the API, refer to the API Reference documentation.
Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at GitHub repository.