PdfStructuredText

data class PdfStructuredText(val pageWidth: Double, val pageHeight: Double, val blocks: List<PdfTextBlock>)

Structured text — pageGlyphs → spans → lines → blocks (ISO 32000-1 §14.8 is the spec basis; MuPDF's fz_stext_page is the architectural reference).

The renderer text-state-machine already knows how to turn Tj/TJ plus the text-matrix stack into positioned glyph runs. We hijack it: a recording canvas captures every drawText call, then a layout pass clusters those positioned spans into reading order.

Use this when:

  • you want copy/paste-quality text (preserves spacing, line breaks)

  • you need to highlight/search a region of the page

  • you need geometric data alongside the text (positions, fonts)

For a raw concatenated string, PdfPage.extractText is still simpler and cheaper.

Constructors

Link copied to clipboard
constructor(pageWidth: Double, pageHeight: Double, blocks: List<PdfTextBlock>)

Properties

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard

Flattened plain text — paragraph breaks become \n\n, line breaks \n.

Link copied to clipboard

Every text run in reading order.