
Query PDF
Query PDF is a REST API tool that provides a programmatic way to retrieve a wide range of insights about a PDF document. It allows developers to check for conditional properties, metadata, and content details such as forms, fonts, security settings, and digital signatures. This tool is essential for conditional processing, enabling you to automate workflows and trigger subsequent actions based on a file’s unique characteristics.
Key Benefits of Query PDF API
- Perform over 25 different queries in a single API call, including checks for document metadata, embedded fonts, and JavaScript, for a comprehensive overview of a PDF's properties.
- Validate PDF/A conformance with the industry-standard veraPDF validation engine, returning a simple true or false value for easy programmatic checks without complex reporting.
- Automate workflows with conditional processing, saving time and resources by using file properties to determine if you need to apply OCR, convert to PDF/A, or perform other operations.
- Seamlessly audit page boundaries (MediaBox, CropBox, BleedBox, TrimBox, ArtBox) before making precise document dimension adjustments with the Set Page Boxes API.
- Identify accessibility features by checking for the presence of structure tags, ensuring your documents meet compliance standards.
- Retrieve and leverage custom metadata, returned as a JSON list of key:value pairs, enabling you to extract unique data properties added by other applications.
- Extract key document information, including whether a file contains signatures, passwords, or forms (Acroforms or XFA), to drive secure and specialized workflows.
Start right from your browser - upload files, choose parameters, generate code, and send API Calls directly from API Lab!
You have document processing problems, we have Solutions. Explore the many ways pdfRest can align your documents with your business objectives.
Extract PDF Metadata to Automate Document Workflows
Our API delivers precise document intelligence that allows you to programmatically assess files and determine the next steps for each document. By extracting valuable metadata and file properties, you can build smart, conditional logic into your applications to solve real-world processing challenges. Common workflow automations include:
- Preparing files for prepress: Seamlessly audit page boundaries (MediaBox, CropBox, BleedBox, TrimBox, and ArtBox) so you know exactly how to adjust margins and boundaries using our Set Page Boxes API.
- Optimizing document storage: Conditionally split, compress, or route PDFs based on their exact page count or total file size.
- Securing sensitive data: Automatically detect and encrypt files that do not already have the necessary security permissions applied.
- Quality assurance routing: Confirm that inbound PDFs contain expected elements, such as accessibility tags, digital signatures, or forms, before sending them to downstream systems or intended audiences.
Verify PDF/A Compliance and Document Archiving Standards
When preparing documents for long-term storage or legal compliance, knowing whether a file meets strict archiving standards is critical. While competitors often generate convoluted validation reports that require custom code to decipher, pdfRest produces straightforward, actionable results you can depend on. Powered by veraPDF, the industry-recognized standard for PDF/A validation, our API seamlessly checks conformance levels.
- Instant verification: Receive a simple true or false boolean value in your JSON response to instantly confirm a document's compliance status.
- Eliminate developer overhead: Avoid wasting valuable engineering hours trying to parse through complex XML validation logs or superfluous reporting data.
- Smart conversion routing: Automatically trigger a conversion to PDF/A only when a document is flagged as non-conformant, saving server processing time and reducing API costs.
Get PDF Properties and Document Information in a Single API Call
Many PDF processing libraries require developers to make separate, costly API requests just to check different document attributes. pdfRest eliminates this bottleneck by letting you retrieve all the information you need about a PDF and its contents simultaneously. Simply send one API request with your PDF file and a comma-separated list of your required checks.
- Unmatched flexibility: Pick and choose exactly what matters from a list of over 25+ query options, or simply include the 'all' query to return everything at once.
- Clean, structured data: Get a rapid response containing all requested information as easy-to-parse key:value pairs in standard JSON format.
- Optimized performance: Reduce network latency and computational overhead by getting all the answers you need without heavy reports to parse or superfluous data to sift out.
See Customize Your Solution below for more details about all of the supported queries.
Need more help?
Start with a Tutorial for step-by-step guidance
Learn about the parameters for this tool to create your custom solution.
all- A comprehensive query that returns the document's full profile at once. Use this alone to retrieve every supported property without having to list individual options.
tagged- Checks for presence of structure tags in the input document.
- Returns
trueorfalse
image_only- Checks if the document is 'image only' meaning that it will only feature a series of embedded graphical image files, one per page and does not have any text or other features common to PDF documents, except for some metadata.
- Returns
trueorfalse
title- The title of the PDF as listed in the metadata.
- Returns a
stringwhich may be empty if the document does not have a title
subject- The subject of the PDF as listed in the metadata.
- Returns a
stringwhich may be empty if the document does not have a subject
author- The author of the PDF as listed in the metadata.
- Returns a
stringwhich may be empty if the document does not have an author
producer- The producer of the PDF as listed in the metadata.
- Returns a
stringwhich may be empty if the document does not have a producer
creator- The creator of the PDF as listed in the metadata.
- Returns a
stringwhich may be empty if the document does not have a creator
creation_date- The creation date of the PDF as listed in the metadata.
- Returns a
stringwhich may be empty if the document does not have a creation date
modified_date- The most recent modification date of the PDF as listed in the metadata.
- Returns a
stringwhich may be empty if the document does not have a modification date
keywords- The keywords of the PDF as listed in the metadata.
- Returns a
stringwhich may be empty if the document does not have keywords
custom_metadata- Retrieves custom metadata from the PDF
- Returns a JSON list of key:value pairs, where each pair represents a custom property and its value.
doc_language- The language that the file claims to be written in.
- Returns a
string
page_count- The number of pages in the PDF document.
- Returns an integer
page_boxes- Retrieves the dimensions of all page boundaries (MediaBox, CropBox, BleedBox, TrimBox, and ArtBox) for each page in the document.
- Returns a JSON object mapping each page number to its respective box coordinates, along with a boolean indicating whether each box matches the media box for that page.
contains_annotations- Checks whether the document contains annotations, such as notes, highlighted text, file attachments, crossed out text, and text callout boxes.
- Returns
trueorfalse
contains_signature- Checks if the document contains any digital signatures.
- Returns
trueorfalse
pdf_version- Retrieves the version of the PDF standard that the document was created with.
- Returns a
stringof the form X.Y.Z where X, Y, and Z are the major, minor, and extension versions respectively
file_size- Retrieves the size of the input file in bytes.
- Returns an integer
filename- The name of the input file.
- Returns a
string
restrict_permissions_set- Checks whether the document has restrict permissions set to prevent printing, copying, signing etc.
- Returns
trueorfalse
contains_xfa- Checks whether the document contains XFA forms.
- Returns
trueorfalse
contains_acroforms- Checks whether the document contains Acroforms.
- Returns
trueorfalse
contains_javascript- Checks whether the document contains javascript.
- Returns
trueorfalse
contains_transparency- Checks whether the document contains transparent objects.
- Returns
trueorfalse
contains_embedded_file- Checks whether the document contains one or more embedded files.
- Returns
trueorfalse
uses_embedded_fonts- Checks whether the document contains fully embedded fonts.
- Returns
trueorfalse
uses_nonembedded_fonts- Checks whether the document contains non-embedded fonts.
- Returns
trueorfalse
pdfa- Checks whether the document claims and conforms to a PDF/A standard.
- Returns
trueorfalse
requires_password_to_open- Checks whether the document requires a password to open.
- Returns
trueorfalse. - Note: A document requiring a password cannot be opened by this route and will not be able to return much other information
Safe & Secure
Confidently process your sensitive data with pdfRest. Our platform is built for robust, Enterprise-grade security and compliance. We meet rigorous standards for GDPR and HIPAA, and our controls are independently audited to ensure strict SOC 2 Type 2 compliance. Your data's protection is our commitment.




