DocsOverview

Production Checklist

RESOURCES

Ensure your agents are production-ready. Review every item before you ship — security, governance, observability, cost, reliability, performance, testing, and user experience.

Security#

  • API keys are loaded from environment variables — never hardcoded or committed to version control.
  • Guardrails are active for input (PII detection, prompt injection) and output (topic validation, toxicity filtering).
  • Rate limiting is configured on all API endpoints exposed to agents and users.
  • Tool permissions are scoped to least privilege — each agent only accesses tools it strictly needs.
  • Secrets are encrypted at rest and rotated on a defined schedule (90 days max).
  • Network allowlists restrict which external endpoints tools can reach — no unrestricted egress.
  • Human-in-the-loop approval is required for dangerous or irreversible tool actions (e.g. database writes, payments).

Governance#

  • Policy-as-code rules are defined, version-controlled, and enforced via AKIOS AI before every agent action.
  • Agent risk levels are tagged according to regulatory requirements (e.g. EU AI Act risk tiers).
  • Approval gates are in place for high-risk decisions — no autonomous execution without sign-off.
  • Data retention policies are configured per environment, with automatic purge schedules for PII.
  • Compliance audit export has been tested end-to-end — immutable audit trails can be pulled on demand.
  • Role-based access control is enforced per agent, per team, and per environment.

Observability#

  • AKIOS RADAR is connected and receiving telemetry from all deployed agents.
  • Semantic traces are enabled — every LLM call, tool invocation, and decision point is captured.
  • Session replay is configured for debugging and post-incident review of agent conversations.
  • Hallucination detection is active — outputs are cross-checked against grounding sources.
  • Alerting rules are defined for error rate spikes, latency breaches (>2ms policy overhead), and cost anomalies.
  • Dashboards are set up for key operational metrics: request volume, success rate, p95 latency, and cost per agent.

Cost control#

  • AKIOS FLUX budget limits are set per agent — runaway costs are automatically capped.
  • Cost attribution is configured per tool, per user, and per team for chargeback visibility.
  • Max tokens are capped per request to prevent unbounded generation costs.
  • Fallback models are configured for cost optimization — expensive models only used when needed.
  • Auto-scaling limits are defined to prevent unbounded compute spend under traffic spikes.

Reliability#

  • Retry logic is implemented with exponential backoff and jitter for all LLM API calls.
  • Fallback models are configured — if the primary model is unavailable, traffic routes to an alternative (e.g. GPT-4 → Claude 3).
  • Timeout limits are set on all tool executions to prevent hung processes from blocking the pipeline.
  • Circuit breakers are enabled — repeated failures to a provider trigger fast-fail instead of cascading retries.
  • Health check endpoints are exposed for load balancers and orchestrators to verify agent readiness.
  • A graceful degradation strategy is defined — agents return safe fallback responses when upstream services fail.

Performance#

  • System prompts are optimized — short, clear, and free of unnecessary boilerplate.
  • Semantic caching is enabled for frequent or repetitive queries to reduce latency and cost.
  • Dependencies are minimized in the deployment image — no dev tools or unused packages in production.
  • Cold start time has been measured and optimized to stay within acceptable latency budgets.
  • A latency budget is allocated per pipeline step — no single step consumes more than its share of end-to-end SLA.

Testing#

  • Unit tests cover all custom tools — each tool is tested in isolation with deterministic inputs.
  • Integration tests run against mock LLM responses to validate full pipeline behavior without live API calls.
  • Guardrail violation tests confirm that PII, off-topic, and injection attempts are correctly blocked.
  • Load tests simulate production traffic patterns to validate throughput and identify bottlenecks.
  • Chaos engineering exercises simulate model provider failures, network partitions, and timeout scenarios.

User experience#

  • Streaming is enabled for long-running responses — users see incremental output instead of waiting.
  • Error messages are clear and user-friendly — raw stack traces and internal IDs are never exposed.
  • Citation links are provided for RAG responses so users can verify sources.
  • A user feedback mechanism (thumbs up / thumbs down) is in place to capture signal for continuous improvement.
  • Graceful loading states and timeout messages are shown — users are never left staring at a blank screen.