Production Checklist

RESOURCES

Ensure your agents are production-ready. Review every item before you ship — security, governance, observability, cost, reliability, performance, testing, and user experience.

Security#

API keys are loaded from environment variables — never hardcoded or committed to version control.
Guardrails are active for input (PII detection, prompt injection) and output (topic validation, toxicity filtering).
Rate limiting is configured on all API endpoints exposed to agents and users.
Tool permissions are scoped to least privilege — each agent only accesses tools it strictly needs.
Secrets are encrypted at rest and rotated on a defined schedule (90 days max).
Network allowlists restrict which external endpoints tools can reach — no unrestricted egress.
Human-in-the-loop approval is required for dangerous or irreversible tool actions (e.g. database writes, payments).

Governance#

Policy-as-code rules are defined, version-controlled, and enforced via AKIOS AI before every agent action.
Agent risk levels are tagged according to regulatory requirements (e.g. EU AI Act risk tiers).
Approval gates are in place for high-risk decisions — no autonomous execution without sign-off.
Data retention policies are configured per environment, with automatic purge schedules for PII.
Compliance audit export has been tested end-to-end — immutable audit trails can be pulled on demand.
Role-based access control is enforced per agent, per team, and per environment.

Observability#

AKIOS RADAR is connected and receiving telemetry from all deployed agents.
Semantic traces are enabled — every LLM call, tool invocation, and decision point is captured.
Session replay is configured for debugging and post-incident review of agent conversations.
Hallucination detection is active — outputs are cross-checked against grounding sources.
Alerting rules are defined for error rate spikes, latency breaches (>2ms policy overhead), and cost anomalies.
Dashboards are set up for key operational metrics: request volume, success rate, p95 latency, and cost per agent.

Cost control#

AKIOS FLUX budget limits are set per agent — runaway costs are automatically capped.
Cost attribution is configured per tool, per user, and per team for chargeback visibility.
Max tokens are capped per request to prevent unbounded generation costs.
Fallback models are configured for cost optimization — expensive models only used when needed.
Auto-scaling limits are defined to prevent unbounded compute spend under traffic spikes.

Reliability#

Retry logic is implemented with exponential backoff and jitter for all LLM API calls.
Fallback models are configured — if the primary model is unavailable, traffic routes to an alternative (e.g. GPT-4 → Claude 3).
Timeout limits are set on all tool executions to prevent hung processes from blocking the pipeline.
Circuit breakers are enabled — repeated failures to a provider trigger fast-fail instead of cascading retries.
Health check endpoints are exposed for load balancers and orchestrators to verify agent readiness.
A graceful degradation strategy is defined — agents return safe fallback responses when upstream services fail.

Performance#

System prompts are optimized — short, clear, and free of unnecessary boilerplate.
Semantic caching is enabled for frequent or repetitive queries to reduce latency and cost.
Dependencies are minimized in the deployment image — no dev tools or unused packages in production.
Cold start time has been measured and optimized to stay within acceptable latency budgets.
A latency budget is allocated per pipeline step — no single step consumes more than its share of end-to-end SLA.

Testing#

Unit tests cover all custom tools — each tool is tested in isolation with deterministic inputs.
Integration tests run against mock LLM responses to validate full pipeline behavior without live API calls.
Guardrail violation tests confirm that PII, off-topic, and injection attempts are correctly blocked.
Load tests simulate production traffic patterns to validate throughput and identify bottlenecks.
Chaos engineering exercises simulate model provider failures, network partitions, and timeout scenarios.

User experience#

Streaming is enabled for long-running responses — users see incremental output instead of waiting.
Error messages are clear and user-friendly — raw stack traces and internal IDs are never exposed.
Citation links are provided for RAG responses so users can verify sources.
A user feedback mechanism (thumbs up / thumbs down) is in place to capture signal for continuous improvement.
Graceful loading states and timeout messages are shown — users are never left staring at a blank screen.