Production Checklist
RESOURCESEnsure your agents are production-ready. Review every item before you ship — security, governance, observability, cost, reliability, performance, testing, and user experience.
Security#
- API keys are loaded from environment variables — never hardcoded or committed to version control.
- Guardrails are active for input (PII detection, prompt injection) and output (topic validation, toxicity filtering).
- Rate limiting is configured on all API endpoints exposed to agents and users.
- Tool permissions are scoped to least privilege — each agent only accesses tools it strictly needs.
- Secrets are encrypted at rest and rotated on a defined schedule (90 days max).
- Network allowlists restrict which external endpoints tools can reach — no unrestricted egress.
- Human-in-the-loop approval is required for dangerous or irreversible tool actions (e.g. database writes, payments).
Governance#
- Policy-as-code rules are defined, version-controlled, and enforced via AKIOS AI before every agent action.
- Agent risk levels are tagged according to regulatory requirements (e.g. EU AI Act risk tiers).
- Approval gates are in place for high-risk decisions — no autonomous execution without sign-off.
- Data retention policies are configured per environment, with automatic purge schedules for PII.
- Compliance audit export has been tested end-to-end — immutable audit trails can be pulled on demand.
- Role-based access control is enforced per agent, per team, and per environment.
Observability#
- AKIOS RADAR is connected and receiving telemetry from all deployed agents.
- Semantic traces are enabled — every LLM call, tool invocation, and decision point is captured.
- Session replay is configured for debugging and post-incident review of agent conversations.
- Hallucination detection is active — outputs are cross-checked against grounding sources.
- Alerting rules are defined for error rate spikes, latency breaches (>2ms policy overhead), and cost anomalies.
- Dashboards are set up for key operational metrics: request volume, success rate, p95 latency, and cost per agent.
Cost control#
- AKIOS FLUX budget limits are set per agent — runaway costs are automatically capped.
- Cost attribution is configured per tool, per user, and per team for chargeback visibility.
- Max tokens are capped per request to prevent unbounded generation costs.
- Fallback models are configured for cost optimization — expensive models only used when needed.
- Auto-scaling limits are defined to prevent unbounded compute spend under traffic spikes.
Reliability#
- Retry logic is implemented with exponential backoff and jitter for all LLM API calls.
- Fallback models are configured — if the primary model is unavailable, traffic routes to an alternative (e.g. GPT-4 → Claude 3).
- Timeout limits are set on all tool executions to prevent hung processes from blocking the pipeline.
- Circuit breakers are enabled — repeated failures to a provider trigger fast-fail instead of cascading retries.
- Health check endpoints are exposed for load balancers and orchestrators to verify agent readiness.
- A graceful degradation strategy is defined — agents return safe fallback responses when upstream services fail.
Performance#
- System prompts are optimized — short, clear, and free of unnecessary boilerplate.
- Semantic caching is enabled for frequent or repetitive queries to reduce latency and cost.
- Dependencies are minimized in the deployment image — no dev tools or unused packages in production.
- Cold start time has been measured and optimized to stay within acceptable latency budgets.
- A latency budget is allocated per pipeline step — no single step consumes more than its share of end-to-end SLA.
Testing#
- Unit tests cover all custom tools — each tool is tested in isolation with deterministic inputs.
- Integration tests run against mock LLM responses to validate full pipeline behavior without live API calls.
- Guardrail violation tests confirm that PII, off-topic, and injection attempts are correctly blocked.
- Load tests simulate production traffic patterns to validate throughput and identify bottlenecks.
- Chaos engineering exercises simulate model provider failures, network partitions, and timeout scenarios.
User experience#
- Streaming is enabled for long-running responses — users see incremental output instead of waiting.
- Error messages are clear and user-friendly — raw stack traces and internal IDs are never exposed.
- Citation links are provided for RAG responses so users can verify sources.
- A user feedback mechanism (thumbs up / thumbs down) is in place to capture signal for continuous improvement.
- Graceful loading states and timeout messages are shown — users are never left staring at a blank screen.