NLP Extensions

Spring Prism keeps prism-core deterministic and zero-dependency. Higher-level person-name detection lives in the optional prism-extensions-nlp module so teams can opt in explicitly when they want more recall than regex-style detectors provide.

Design Goals

keep prism-core unchanged and fast
make person-name redaction explicit, never implicit
reduce false positives on technical text such as Spring Boot or Redis Cluster
support a stronger hybrid mode for enterprise deployments that can provide a local OpenNLP model

Detection Pipeline

The person-name extension works in three stages:

Candidate extraction Heuristic matching finds capitalized name-like spans and optional honorifics.
Optional OpenNLP candidate extraction OpenNLP NameFinderME contributes additional person-name candidates when a model is configured.
Contextual scoring Spring Prism merges candidates, applies blocked technical phrases, and scores the span based on: titles, token count, nearby human-oriented context, backend agreement, and known technical terms.

That gives the module a safer profile than raw NER alone:

heuristic keeps rollout simple and conservative
hybrid improves recall when the model and heuristics agree
opennlp is available for teams that want direct model-only behavior

Configuration

spring:
  prism:
    extensions:
      nlp:
        enabled: true
        backend: hybrid
        model-resource: classpath:/models/en-ner-person.bin
        confidence-threshold: 4
        max-tokens: 3
        allow-single-token-with-title: true
        positive-context-terms:
          - customer
          - employee
          - patient
        blocked-phrases:
          - Spring Boot
          - Azure OpenAI
          - Redis Cluster

If you want a practical guide for where to place the model and how to mount it in real deployments, continue with NLP Model Guide.

Backend Modes

`heuristic`

no model dependency
easiest path for initial adoption
best when you want strict rollout control and simple operations

`opennlp`

requires spring.prism.extensions.nlp.model-resource
useful when your team already curates OpenNLP models internally
should still be tested on technical corpora before wide production rollout

`hybrid`

requires spring.prism.extensions.nlp.model-resource
combines heuristic and OpenNLP candidates
recommended enterprise mode when you want stronger recall with better false-positive control

Production Guidance

Treat the OpenNLP model as a versioned deployment artifact and roll it out the same way on every node.
Keep blocked-phrases tuned for your domain vocabulary, especially product names and platform terms.
Validate on realistic corpora such as support tickets, CRM notes, or RAG chunks before broad enablement.
Start with heuristic in conservative environments, then promote to hybrid once the model has been validated against production-shaped text.
Keep the extension disabled in services that do not need person-name redaction.
Use a versioned model artifact path and deploy the exact same file to every node.

Observability

When the extension is enabled through the starter:

NLP_EXTENSIONS appears in the active rule pack list
PERSON_NAME appears in the entity metrics once matches are detected
the default deterministic detector set remains unchanged when the extension is disabled

Test Coverage Expectations

Every change to this module should update:

unit tests in prism-extensions-nlp
starter wiring tests when autoconfiguration changes
prism-integration-tests for end-to-end opt-in behavior and false-positive guardrails

Design Goals​

Detection Pipeline​

Configuration​

Backend Modes​

heuristic​

opennlp​

hybrid​

Production Guidance​

Observability​

Test Coverage Expectations​