Prompt for page numbers
LLMs hallucinate citations when the contract is not enforced.
Open-source auditable LLM extraction for Java enterprise systems. Every extracted field carries source page + line, confidence, and bi-temporal provenance.
When a Java backend extracts a value from a document, DocTruth keeps the source location, match quality, confidence, and provenance attached.
LLMs hallucinate citations when the contract is not enforced.
Brittle on tables, columns, scans, and formatting drift.
The app inherits framework coupling without evidence guarantees.
Java enterprise teams reject the extra runtime topology.
A focused evidence layer that drops into Java backends already calling OpenAI-compatible endpoints, Anthropic, or Gemini.
PDF/DOCX/XLSX/CSV into structured sections with page, line, and offset preserved.
LLM calls are wrapped by a citation contract; missing evidence triggers validation and retry.
Priority truncation, sliding windows, and hierarchical context for over-window documents.
DocTruth.from(provider).extract(...).withProvenance().run(doc).
PROV-O JSON-LD, confidence, retry count, model version, extracted_at, source_published_at.
Published on Maven Central. No framework runtime, no Python service, no extra deployment topology.
<dependency>
<groupId>ai.doctruth</groupId>
<artifactId>doctruth-java</artifactId>
<version>0.2.0-alpha</version>
</dependency> implementation "ai.doctruth:doctruth-java:0.2.0-alpha" record Contract(String partyA, String partyB, BigDecimal totalValue) {}
var doc = PdfDocumentParser.parse(Path.of("contract.pdf"));
var result = DocTruth.from(new OpenAiProvider(System.getenv("OPENAI_API_KEY")))
.extract("Extract the contract terms", Contract.class)
.withProvenance()
.withSourcePublishedAt(Instant.parse("2026-01-01T00:00:00Z"))
.withBitemporal()
.withConfidence()
.run(doc);
Citation cite = result.citations().get("totalValue");
Confidence conf = result.confidence().get("totalValue");
result.toAuditJson(Path.of("audit/contract.jsonld")); The value is not enough. The result also carries the exact quote, source location, match score, confidence rationale, model, version, and timestamps.
A failed citation match is not allowed to vanish. It emits a warning and surfaces a low match score so the caller can decide how to handle risk.