Package-level declarations

Types

Link copied to clipboard
data class PdfStructuredText(val pageWidth: Double, val pageHeight: Double, val blocks: List<PdfTextBlock>)

Structured text — pageGlyphs → spans → lines → blocks (ISO 32000-1 §14.8 is the spec basis; MuPDF's fz_stext_page is the architectural reference).

Link copied to clipboard
data class PdfTextBlock(val bounds: Rectangle, val lines: List<PdfTextLine>)

One paragraph-ish chunk: a vertical run of lines with no big gap. Block boundaries fall where vertical spacing exceeds GAP_TO_NEW_BLOCK × the median line height — heuristic, not authoritative, but matches what readers consider "paragraph breaks" in the absence of structure tagging.

Link copied to clipboard
data class PdfTextLine(val bounds: Rectangle, val spans: List<PdfTextSpan>)

One line of text: spans whose Y origins cluster within Y_CLUSTER_TOL × font size. Spans are stored left-to-right.

Link copied to clipboard
data class PdfTextSpan(val text: String, val font: PdfFont, val fontSize: Double, val origin: <Error class: unknown class><Double, Double>, val bounds: Rectangle)

One renderer-drawText worth of glyphs that share font, size, and baseline. Position is the device-space origin (where the baseline starts).

Link copied to clipboard

Naive text extraction (ISO 32000-1 §9.4).