Skip to main content

Troubleshooting

This page focuses on the real failure modes teams are likely to hit first in production.

If you are still choosing the right deployment mode, read Production Playbook first.

Response Returned With Prism Tokens Still Present

Likely causes:

  • token restore happened on a different node with no shared Redis vault
  • spring.prism.app-secret differs between nodes
  • token TTL expired before restoration

Check:

  • configuredVaultMode
  • vaultType
  • sharedVaultReady
  • vaultReadinessStatus

Fix:

  • move to spring.prism.vault.type=redis
  • ensure every node shares the same Redis deployment
  • ensure every node shares the same non-default app secret
  • increase TTL if the restore window is too short
  • confirm the node-level metrics endpoint reports sharedVaultReady = true

Startup Fails in Redis Mode

Symptom:

  • startup fails with a message saying spring.prism.vault.type=redis requires a StringRedisTemplate bean

Cause:

  • Prism was explicitly told to use Redis, but the application has no Redis client bean

Fix:

  • add Spring Data Redis and a working Redis configuration
  • or change spring.prism.vault.type back to auto or in-memory for local-only deployments
  • do not leave production manifests half-migrated between local and shared vault modes

Shared Vault Ready Is False

Likely causes:

  • Redis-backed vault is active but the default app secret is still configured
  • the node is still using a local vault

Interpretation:

  • distributed restore may be enabled
  • but the deployment is not yet in a production-ready posture

Fix:

  • override spring.prism.app-secret with a shared non-default secret on every node
  • ensure the runtime vault is actually RedisPrismVault
  • verify the deployment spec, not just local application.yml, because env overrides often win

Restore Works On One Node But Fails After Deployment

Likely causes:

  • rolling deployment introduced different secrets
  • some nodes point at a different Redis instance or namespace
  • TTL is too short for the real request/restore delay

Fix:

  • verify every node shares the same secret and Redis target
  • avoid in-flight secret rotation
  • size TTL for the actual end-to-end latency window

Token Backlog Keeps Growing

Likely causes:

  • responses are not being restored
  • traffic is partially one-way
  • restore latency is higher than expected

Check:

  • tokenBacklog
  • detectionErrorCount
  • integration timing summaries
  • Grafana or dashboard views for request/response activity

Fix:

  • verify the restore path is active
  • verify distributed vault readiness
  • inspect integration-specific timing spikes

Detection Errors Increase Unexpectedly

Interpretation:

  • Prism is still fail-open by default
  • protection continues, but reliability posture is degrading

Fix:

  • inspect the affected integration path
  • review recent deployment changes
  • verify custom rules and locale configuration
  • verify optional NLP rollout changes if the affected entity is PERSON_NAME

When debugging a production issue, collect:

  • configuredVaultMode
  • vaultType
  • sharedVaultReady
  • vaultReadinessStatus
  • tokenBacklog
  • detectionErrorCount
  • top affected integration timing values
  • current TTL
  • confirmation that all nodes share the same secret and Redis target