How to Redact PDF Text in .NET with C#, Tutorial

Share this page

Why Redact PDF text with C#?

The pdfRest Redact PDF API Tool is a powerful utility that enables developers to programmatically remove sensitive information from PDF documents. This tutorial will demonstrate how to send an API call to the Redact PDF endpoint using C#. By integrating this API into your C# applications, you can automate the process of redacting text in PDFs, ensuring that confidential information is securely handled and removed as needed.

A real-world example of using the Redact PDF API might involve a legal firm that needs to share case documents with clients or opposing counsel. These documents often contain sensitive information, such as phone numbers or specific keywords, that must be redacted to protect privacy and comply with legal requirements. By using the Redact PDF API, the firm can automate this process, ensuring that all sensitive information is consistently and accurately removed before sharing the documents.

Redact PDF Text with C# Code Example

using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.Text;

using (var httpClient = new HttpClient { BaseAddress = new Uri("https://api.pdfrest.com") })
{
    using (var request = new HttpRequestMessage(HttpMethod.Post, "pdf-with-redacted-text-preview"))
    {
        request.Headers.TryAddWithoutValidation("Api-Key", "xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx");
        request.Headers.Accept.Add(new("application/json"));
        var multipartContent = new MultipartFormDataContent();

        var byteArray = File.ReadAllBytes("/path/to/file");
        var byteAryContent = new ByteArrayContent(byteArray);
        multipartContent.Add(byteAryContent, "file", "file_name.pdf");
        byteAryContent.Headers.TryAddWithoutValidation("Content-Type", "application/pdf");

        var redaction_option_array = new JArray();
        var redaction_option1 = new JObject
        {
            ["type"] = "regex",
            ["value"] = "(?:\\(\\d{3}\\)\\s?|\\d{3}[-.\\s]?)?\\d{3}[-.\\s]?\\d{4}"
        };
        var redaction_option2 = new JObject
        {
            ["type"] = "literal",
            ["value"] = "word"
        };
        redaction_option_array.Add(redaction_option1);
        redaction_option_array.Add(redaction_option2);
        var byteArrayOption = new ByteArrayContent(Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(redaction_option_array)));
        multipartContent.Add(byteArrayOption, "redactions");

        request.Content = multipartContent;
        var response = await httpClient.SendAsync(request);

        var apiResult = await response.Content.ReadAsStringAsync();

        Console.WriteLine("API response received.");
        Console.WriteLine(apiResult);
    }
}

Source: GitHub

Breaking Down the Code

The provided code begins by setting up an HttpClient with the base address of the pdfRest API:

using (var httpClient = new HttpClient { BaseAddress = new Uri("https://api.pdfrest.com") })

Next, it creates a HttpRequestMessage for a POST request to the "pdf-with-redacted-text-preview" endpoint:

using (var request = new HttpRequestMessage(HttpMethod.Post, "pdf-with-redacted-text-preview"))

The API key is added to the request headers to authenticate the request:

request.Headers.TryAddWithoutValidation("Api-Key", "xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx");

The request expects a JSON response, indicated by the "Accept" header:

request.Headers.Accept.Add(new("application/json"));

A MultipartFormDataContent object is created to hold the file and redaction options:

var multipartContent = new MultipartFormDataContent();

The PDF file is read into a byte array and added to the multipart content:

var byteArray = File.ReadAllBytes("/path/to/file");
var byteAryContent = new ByteArrayContent(byteArray);
multipartContent.Add(byteAryContent, "file", "file_name.pdf");
byteAryContent.Headers.TryAddWithoutValidation("Content-Type", "application/pdf");

Redaction options are specified using a JSON array, which is serialized and added to the multipart content:

var redaction_option_array = new JArray();
var redaction_option1 = new JObject
{
    ["type"] = "regex",
    ["value"] = "(?:\\(\\d{3}\\)\\s?|\\d{3}[-.\\s]?)?\\d{3}[-.\\s]?\\d{4}"
};
var redaction_option2 = new JObject
{
    ["type"] = "literal",
    ["value"] = "word"
};
redaction_option_array.Add(redaction_option1);
redaction_option_array.Add(redaction_option2);
var byteArrayOption = new ByteArrayContent(Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(redaction_option_array)));
multipartContent.Add(byteArrayOption, "redactions");

The request is sent asynchronously, and the response is read and printed:

var response = await httpClient.SendAsync(request);
var apiResult = await response.Content.ReadAsStringAsync();
Console.WriteLine("API response received.");
Console.WriteLine(apiResult);

Beyond the Tutorial

In this tutorial, you learned how to use the pdfRest Redact PDF API Tool with C# to redact text from a PDF document. This example demonstrated setting up a multipart API call, including file and redaction options, and processing the API response. To explore more, you can demo all of the pdfRest API Tools in the API Lab and refer to the API Reference Guide for detailed documentation.

Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at GitHub.