API docs

Evidence-grounded extraction in Java.

Parse a document, extract a typed Java record, and inspect the citation, confidence, and provenance attached to each field.

Install

Use the published alpha from Maven Central.

<dependency>
  <groupId>ai.doctruth</groupId>
  <artifactId>doctruth-java</artifactId>
  <version>0.2.0-alpha</version>
</dependency>

1. Parse a source document

Parsers preserve document identity, metadata, section structure, and source locations so later extraction can cite page and line.

ParsedDocument doc = PdfDocumentParser.parse(Path.of("contract.pdf"));

String docId = doc.docId();
List<ParsedSection> sections = doc.sections();
DocumentMetadata metadata = doc.metadata();

2. Extract a typed record

The builder options explicitly turn citation, confidence, retry, and bitemporal provenance into part of the extraction contract.

record Contract(String partyA, String partyB, BigDecimal totalValue) {}

ExtractionResult<Contract> result = DocTruth.from(new OpenAiProvider(System.getenv("OPENAI_API_KEY")))
    .extract("Extract the contract terms", Contract.class)
    .withProvenance()
    .withSourcePublishedAt(Instant.parse("2026-01-01T00:00:00Z"))
    .withBitemporal()
    .withConfidence()
    .withMaxRetries(2)
    .run(doc);

3. Inspect evidence

Values are useful only when callers can reconstruct where they came from. Field paths map to citations, confidence rationales, and provenance.

Contract value = result.value();

Citation cite = result.citations().get("totalValue");
SourceLocation loc = cite.location();
double matchScore = cite.matchScore();

Confidence confidence = result.confidence().get("totalValue");
Provenance provenance = result.provenance();

4. Use a context strategy

Large documents should not be blindly truncated. Prioritize the sections that matter for the extraction task.

ContextStrategy strategy = new PriorityTruncate(
    List.of("Qualifications", "Scoring Criteria", "Contract Terms"),
    25_000,
    OverBudgetPolicy.STRICT
);

ExtractionResult<Contract> result = DocTruth.from(provider)
    .extract("Extract contract terms", Contract.class)
    .withContextStrategy(strategy)
    .withProvenance()
    .run(doc);

Public surface

Document parsers PdfDocumentParser, DocxDocumentParser, XlsxDocumentParser, CsvDocumentParser
Parsed model ParsedDocument, ParsedSection, TextSection, TableSection, FigureSection, SourceLocation
Extraction DocTruth, ExtractionBuilder<T>, ExtractionResult<T>
Evidence Citation, Confidence, Provenance
Context ContextStrategy, PriorityTruncate, SlidingWindow, Hierarchical
Providers OpenAiProvider, AnthropicProvider, GeminiProvider, LlmProvider
Exceptions ParseException, ExtractionException, ProviderException

Contract rule

Missing source evidence is a validation problem. When citation matching is weak, DocTruth surfaces a low match score instead of silently dropping the field's evidence chain.