Developer Tool

Convert PDF to JSON
Structured Data

Extract text, positions, fonts, and metadata from any PDF as clean, structured JSON. Perfect for developers and data pipelines.

{ }

Drop your PDF here

or click to browse your files

Choose PDF
πŸ“„
document.pdf 0 pages Β· 0 KB
Pages
β€”
Title
β€”
Author
β€”
PDF Version
β€”

JSON Preview


      
Size: β€”
Items: β€”
Words: β€”
Processing…

βœ… JSON extraction complete!

Why Use Our PDF to JSON Tool?

Structured data extraction for developers, analysts, and automation workflows.

πŸ”’

100% Private

All extraction happens in your browser using PDF.js. Your documents never reach any server.

πŸ“

Rich Data

Extract not just text but x/y positions, font names, font sizes, and bounding boxes for each text item.

πŸ“‹

Full Metadata

Capture title, author, creation date, PDF version, and page dimensions alongside the text content.

⚑

Dev-Ready Output

Clean, structured JSON β€” pretty-printed or minified. Ready to feed into any API, database, or script.

How It Works

Three Simple Steps

1

Upload PDF

Drag & drop or browse to select any PDF from your device. No size limits enforced.

2

Configure Output

Choose which data fields to include β€” metadata, positions, fonts, word counts, and more.

3

Download or Copy JSON

Get your structured JSON file instantly, or copy it straight to the clipboard.

Frequently Asked Questions

What data is extracted into the JSON?
By default the output includes document metadata (title, author, page count), and for each page: page dimensions, the full text string, and an array of individual text items with their x/y coordinates. You can also enable font names and word counts.
What do the x/y positions represent?
Positions are in PDF user units (points). The origin (0,0) is the bottom-left of the page. 1 point = 0.352778 mm. These are the raw positions as stored inside the PDF.
Can I extract data from scanned PDFs?
Scanned PDFs are essentially images with no embedded text layer. This tool can only extract machine-readable text. For scanned documents, you would need an OCR tool first.
Is my PDF uploaded to your servers?
No. All processing is done entirely in your browser using the open-source PDF.js library. Your file never leaves your device.
What is the JSON output structure?
The root object contains a "metadata" key and a "pages" array. Each page object has pageNumber, width, height, text (full string), and an "items" array of individual text segments with their properties.
Trustpilot