How to Summarize PDF Text in .NET with C#

Learn how to summarize PDF text with pdfRest Summarize PDF API using C#.
Share this page

Why Summarize PDF with C#?

The pdfRest Summarize PDF API Tool provides a powerful way to extract and condense the content of PDF files programmatically. By leveraging this API, developers can integrate PDF summarization capabilities directly into their C# applications. This tutorial will guide you through the process of sending an API call to the Summarize PDF endpoint using C#, enabling you to automate the extraction of key information from PDF documents.

Businesses and individuals often deal with large volumes of PDF documents containing extensive information. For example, a legal firm might need to quickly summarize lengthy legal documents to identify key points and facilitate faster decision-making. By using the Summarize PDF API, they can automate this process, saving time and reducing manual effort.

Summarize PDF with C# Code Example

/*
 * What this sample does:
 * - Summarizes PDF content via multipart/form-data.
 * - Routed from Program.cs as: `dotnet run -- summarized-pdf-text-multipart `.
 *
 * Setup (environment):
 * - Copy .env.example to .env
 * - Set PDFREST_API_KEY=your_api_key_here
 * - Optional: set PDFREST_URL to override the API region. For EU/GDPR compliance and proximity, use:
 *     PDFREST_URL=https://eu-api.pdfrest.com
 *   For more information visit https://pdfrest.com/pricing#how-do-eu-gdpr-api-calls-work
 *
 * Usage:
 *   dotnet run -- summarized-pdf-text-multipart /path/to/input.pdf
 *
 * Output:
 * - Prints the JSON response. Validation errors (args/env) exit non-zero.
 */

using System.Text;

namespace Samples.EndpointExamples.MultipartPayload
{
    public static class SummarizedPdfText
    {
        public static async Task Execute(string[] args)
        {
            if (args == null || args.Length < 1)
            {
                Console.Error.WriteLine("summarized-pdf-text-multipart requires ");
                Environment.Exit(1);
                return;
            }
            var inputPath = args[0];
            if (!File.Exists(inputPath))
            {
                Console.Error.WriteLine($"File not found: {inputPath}");
                Environment.Exit(1);
                return;
            }
            var apiKey = Environment.GetEnvironmentVariable("PDFREST_API_KEY");
            if (string.IsNullOrWhiteSpace(apiKey))
            {
                Console.Error.WriteLine("Missing required environment variable: PDFREST_API_KEY");
                Environment.Exit(1);
                return;
            }
            var baseUrl = Environment.GetEnvironmentVariable("PDFREST_URL") ?? "https://api.pdfrest.com";

            using (var httpClient = new HttpClient { BaseAddress = new Uri(baseUrl) })
            using (var request = new HttpRequestMessage(HttpMethod.Post, "summarized-pdf-text"))
            {
                request.Headers.TryAddWithoutValidation("Api-Key", apiKey);
                request.Headers.Accept.Add(new("application/json"));
                var multipartContent = new MultipartFormDataContent();

                var byteArray = File.ReadAllBytes(inputPath);
                var byteAryContent = new ByteArrayContent(byteArray);
                multipartContent.Add(byteAryContent, "file", Path.GetFileName(inputPath));
                byteAryContent.Headers.TryAddWithoutValidation("Content-Type", "application/octet-stream");

                var byteArrayOption = new ByteArrayContent(Encoding.UTF8.GetBytes("100"));
                multipartContent.Add(byteArrayOption, "target_word_count");

                request.Content = multipartContent;
                var response = await httpClient.SendAsync(request);
                var apiResult = await response.Content.ReadAsStringAsync();

                Console.WriteLine("API response received.");
                Console.WriteLine(apiResult);
            }
        }
    }
}

Source: GitHub

Breaking Down the Code

The code begins by checking if the required arguments are provided. It expects a file path to the PDF that needs summarization:

if (args == null || args.Length < 1)
{
    Console.Error.WriteLine("summarized-pdf-text-multipart requires ");
    Environment.Exit(1);
    return;
}

Next, it verifies the existence of the specified PDF file and retrieves the API key from the environment variables. If either the file is missing or the API key is not set, the program exits with an error:

var inputPath = args[0];
if (!File.Exists(inputPath))
{
    Console.Error.WriteLine($"File not found: {inputPath}");
    Environment.Exit(1);
    return;
}
var apiKey = Environment.GetEnvironmentVariable("PDFREST_API_KEY");
if (string.IsNullOrWhiteSpace(apiKey))
{
    Console.Error.WriteLine("Missing required environment variable: PDFREST_API_KEY");
    Environment.Exit(1);
    return;
}

The code then sets up the HTTP client and request, specifying the endpoint and headers. It uses MultipartFormDataContent to handle the file and parameters:

using (var httpClient = new HttpClient { BaseAddress = new Uri(baseUrl) })
using (var request = new HttpRequestMessage(HttpMethod.Post, "summarized-pdf-text"))
{
    request.Headers.TryAddWithoutValidation("Api-Key", apiKey);
    request.Headers.Accept.Add(new("application/json"));
    var multipartContent = new MultipartFormDataContent();

The PDF file is read into a byte array and added to the multipart content. The target_word_count parameter is also added to specify the desired length of the summary:

var byteArray = File.ReadAllBytes(inputPath);
var byteAryContent = new ByteArrayContent(byteArray);
multipartContent.Add(byteAryContent, "file", Path.GetFileName(inputPath));
byteAryContent.Headers.TryAddWithoutValidation("Content-Type", "application/octet-stream");

var byteArrayOption = new ByteArrayContent(Encoding.UTF8.GetBytes("100"));
multipartContent.Add(byteArrayOption, "target_word_count");

Finally, the request is sent, and the response is printed to the console:

var response = await httpClient.SendAsync(request);
var apiResult = await response.Content.ReadAsStringAsync();

Console.WriteLine("API response received.");
Console.WriteLine(apiResult);

Beyond the Tutorial

In this tutorial, you learned how to make an API call to the pdfRest Summarize PDF endpoint using C#. This allows you to automate the summarization of PDF documents within your applications. To explore more capabilities, try out all the pdfRest API Tools in the API Lab. For detailed information on each endpoint, visit the API Reference Guide.

Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at GitHub.

Generate a self-service API Key now!
Create your FREE API Key to start processing PDFs in seconds, only possible with pdfRest.