How to Convert PDF to Markdown in .NET with C#
Why Convert PDF to Markdown with C#?
The pdfRest PDF to Markdown API Tool is an efficient solution for developers looking to convert PDF documents into Markdown format using C#. This tutorial will guide you through the process of sending an API call to the PDF to Markdown endpoint using C#. By following this guide, you will learn how to integrate this functionality into your applications, making it easier to manipulate and present PDF content in a more accessible and editable Markdown format.
Converting PDFs to Markdown can be incredibly useful for content creators and developers who need to repurpose or edit documents. For instance, a technical writer might receive a PDF document containing a user manual that needs to be updated. By converting the PDF to Markdown, the writer can easily edit the content in a text editor, apply version control, and collaborate with others, streamlining the documentation process.
PDF to Markdown with C# Code Example
using System.Text; using (var httpClient = new HttpClient { BaseAddress = new Uri("https://api.pdfrest.com") }) { using (var request = new HttpRequestMessage(HttpMethod.Post, "markdown")) { request.Headers.TryAddWithoutValidation("Api-Key", "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"); request.Headers.Accept.Add(new("application/json")); var multipartContent = new MultipartFormDataContent(); var byteArray = File.ReadAllBytes("/path/to/file"); var byteAryContent = new ByteArrayContent(byteArray); multipartContent.Add(byteAryContent, "file", "file_name"); byteAryContent.Headers.TryAddWithoutValidation("Content-Type", "application/pdf"); var byteArrayOption = new ByteArrayContent(Encoding.UTF8.GetBytes("on")); multipartContent.Add(byteArrayOption, "page_break_comments"); request.Content = multipartContent; var response = await httpClient.SendAsync(request); var apiResult = await response.Content.ReadAsStringAsync(); Console.WriteLine("Markdown API response received."); Console.WriteLine(apiResult); } }
Source: GitHub Repository
Breaking Down the Code
The code begins by creating an instance of HttpClient
with a base address set to the pdfRest API endpoint. This sets up the client to send requests to https://api.pdfrest.com
.
using (var httpClient = new HttpClient { BaseAddress = new Uri("https://api.pdfrest.com") })
A new HttpRequestMessage
is created for a POST request to the "markdown" endpoint. The API key is added to the request headers for authentication, and the request is set to accept JSON responses.
using (var request = new HttpRequestMessage(HttpMethod.Post, "markdown")) { request.Headers.TryAddWithoutValidation("Api-Key", "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"); request.Headers.Accept.Add(new("application/json"));
The MultipartFormDataContent
is used to construct the request body. The PDF file is read as a byte array and added to the form data with the field name "file". The content type is specified as "application/pdf".
var multipartContent = new MultipartFormDataContent(); var byteArray = File.ReadAllBytes("/path/to/file"); var byteAryContent = new ByteArrayContent(byteArray); multipartContent.Add(byteAryContent, "file", "file_name"); byteAryContent.Headers.TryAddWithoutValidation("Content-Type", "application/pdf");
An additional option to include page break comments is added to the form data. This is done by adding a byte array containing the string "on".
var byteArrayOption = new ByteArrayContent(Encoding.UTF8.GetBytes("on")); multipartContent.Add(byteArrayOption, "page_break_comments");
The request is sent asynchronously, and the response is read as a string. The output is printed to the console.
var response = await httpClient.SendAsync(request); var apiResult = await response.Content.ReadAsStringAsync(); Console.WriteLine("Markdown API response received."); Console.WriteLine(apiResult);
Beyond the Tutorial
By following this tutorial, you have learned how to make an API call to convert a PDF document to Markdown using C#. This process can be integrated into your applications to automate document conversion tasks. For further exploration, you can demo all of the pdfRest API Tools in the API Lab and refer to the API Reference Guide for more detailed information.
Note: This is an example of a multipart API call. Code samples using JSON payloads can be found at GitHub Repository.