DocsOverview

Architecture

A deep-dive into how the AKIOS control plane orchestrates autonomous AI agents with deterministic governance, real-time observability, and intelligent cost management across the entire request lifecycle.

System Overview#

AKIOS sits between your application and LLM providers as an engineering control plane for autonomous AI agents. Every prompt and completion passes through a deterministic pipeline of policy evaluation, guardrails, and telemetry — adding less than 2ms of overhead.

Your Infrastructure
Client Apps / Edge Runtime / SDK
TLS 1.3
AKIOS Control Plane
Input
Rails
Router
Output
Rails
Memory
Audit
Secrets
Policy
Sandbox
Proxy
Model Providers
OpenAI|Anthropic|Mistral

Open-Core Model

AKIOS is open-core under GPL-3.0. The core control plane — policy engine, sandboxing, routing — is fully open-source. Pro enterprise plugins (SSO, advanced analytics, SLA) are available under a commercial license.

The AKIOS Trinity#

The AKIOS platform is composed of three products that work in concert. Each product addresses a distinct concern of running autonomous AI agents in production: governance, observability, and cost.

AKIOS AI

GA

Governance & Security

The core control plane. Policy-as-code engine that evaluates every request against your security policies in under 2ms. Includes zero-trust access control, tool sandboxing, guardrails (input/output), PII redaction, prompt injection detection, and immutable audit trails.

Policy EngineSandboxingGuardrailsAudit

AKIOS RADAR

Beta

Observability & Debugging

Real-time observability for AI agents. Semantic tracing captures the full reasoning chain — not just HTTP spans, but the cognitive steps: observation, reasoning, action. Session replay lets you re-watch agent executions step by step for debugging and compliance review.

Semantic TracesSession ReplayAnomaly DetectionAlerting

AKIOS FLUX

Beta

Compute & Cost Management

Intelligent cost attribution and budget enforcement. Every token is tracked and attributed to a specific agent, user, team, or project. Circuit breakers automatically halt runaway agents before they burn through your budget. Smart routing optimizes model selection by cost, latency, and quality.

Cost AttributionCircuit BreakersSmart RoutingBudgets

How They Connect

All three products share the same control plane and data pipeline. AKIOS AI enforces the rules, RADAR watches what happens, and FLUX tracks how much it costs. Enable any combination — they compose seamlessly through the shared event bus.

Request Lifecycle#

Every request through the AKIOS control plane follows a deterministic six-stage pipeline. Each stage is independently configurable, and the entire lifecycle is captured in an immutable audit record.

1

Auth & Rate Limiting

The request hits the AKIOS gateway. The API key is validated against the tenant registry, and rate limits are checked (per-user, per-agent, per-organization). Requests that exceed their budget allocation are rejected before entering the pipeline.

2

Policy Evaluation

The policy engine evaluates the request against your organization's security policies. Policies are defined as code (TypeScript DSL), version-controlled, and deterministically evaluated in under 2ms. Failed policies return structured error responses with the specific rule that was violated.

3

Input Guardrails

Input rails scan the prompt for PII, prompt injection attempts, forbidden topics, and content policy violations. Detected PII is redacted or masked before the prompt reaches the model provider. Injection attempts are blocked and logged.

4

LLM Execution

The router selects the optimal model provider based on your configuration (or FLUX smart routing, if enabled). The sanitized prompt is forwarded to the provider. Streaming responses are proxied back through the control plane. Tool calls are intercepted and executed in sandboxed environments.

5

Output Guardrails

Output rails scan the completion for data leaks (internal URLs, IP addresses, secrets), hallucination indicators, and content policy violations. Previously redacted PII placeholders are restored for the client if needed.

6

Audit & Telemetry

An immutable audit record is written with cryptographic signatures: input hash, output hash, policy decisions, latency, token counts, cost, and model provider. RADAR captures the semantic trace. FLUX updates the cost ledger. The response is returned to the client.

Performance Guarantee

The entire six-stage pipeline adds less than 2ms of latency to your requests. Policy evaluation is compiled to optimized decision trees at deploy time, not interpreted at runtime.

Policy Evaluation Pipeline#

Policies are the core primitive of the AKIOS governance model. They are defined as code, version-controlled alongside your application, and compiled to optimized decision trees that evaluate in microseconds.

Declarative

Policies describe what is allowed, not how to check it. The engine handles evaluation, caching, and optimization.

Composable

Combine policies with boolean logic. Inherit organization-wide defaults and override per-agent or per-environment.

Versioned

Every policy change is tracked. Roll back to any previous version instantly. Diff policies across deployments.

akios.policy.ts
import { Policy, Allow, Deny, RateLimit } from '@akios/policy'

export default Policy.define({
  name: 'production-guardrails',
  version: '2.1.0',

  // Global rules applied to all agents
  rules: [
    // Block dangerous tool calls
    Deny.toolCall({
      tools: ['shell_exec', 'file_delete', 'network_raw'],
      reason: 'Dangerous system-level tools are blocked in production',
    }),

    // Allow only approved model providers
    Allow.models({
      providers: ['openai', 'anthropic', 'mistral'],
      models: ['gpt-4-turbo', 'claude-3-opus', 'mistral-large'],
    }),

    // Enforce rate limits per user
    RateLimit.perUser({
      requests: 100,
      window: '1m',
      tokensPerMinute: 50_000,
    }),

    // Require human approval for high-risk actions
    Allow.toolCall({
      tools: ['database_write', 'send_email'],
      requireApproval: true,
      approvalTimeout: '5m',
    }),
  ],

  // PII detection configuration
  pii: {
    detect: ['email', 'phone', 'ssn', 'credit_card', 'ip_address'],
    action: 'redact', // 'redact' | 'block' | 'warn'
  },

  // Cost controls (requires AKIOS FLUX)
  budget: {
    maxCostPerRequest: 0.50,    // USD
    maxCostPerSession: 5.00,
    dailyBudget: 500.00,
    alertThreshold: 0.8,        // Alert at 80% of budget
  },
})

Policies are evaluated in a deterministic order: Deny rules are checked first, then Allow rules, then RateLimit. If no rule matches, the default-deny policy rejects the request. This ensures that new, uncovered scenarios are blocked by default.

Phase 1
Deny
Phase 2
Allow
Phase 3
Rate Limit
Default
Deny

Sandboxing Architecture#

When an LLM decides to call a tool, the tool does not execute in the same process as your application. AKIOS intercepts the tool call and executes it in an isolated sandbox with strict resource controls. A compromised tool cannot access other tools, agent memory, secrets, or the host system.

Network Isolation

Allowlist-based network access. Tools can only reach explicitly permitted endpoints. DNS resolution is controlled. All traffic is logged.

Filesystem Controls

Read-only, writable, and denied filesystem paths. Tools get a temporary workspace that is destroyed after execution. No access to host filesystem.

Resource Limits

Memory caps, CPU time limits, execution timeouts, and concurrent tool limits. Prevents runaway tools from consuming cluster resources.

sandbox.config.ts
import { Agent, Tool, Sandbox } from '@akios/core'

const agent = new Agent({
  name: 'data-analyst',
  model: 'gpt-4-turbo',
  tools: [sqlQuery, chartGenerator, emailSender],

  sandbox: Sandbox.configure({
    // Network: only allow specific endpoints
    network: {
      allowlist: ['db.internal:5432', 'charts-api.internal:8080'],
      denylist: ['*.external.com', '0.0.0.0/0'],
      logAllTraffic: true,
    },

    // Filesystem: temporary workspace only
    filesystem: {
      readOnly: ['/data/public', '/config/shared'],
      writable: ['/tmp/agent-workspace'],
      denied: ['/etc', '/var', '/home', '/root'],
    },

    // Resource limits per tool execution
    resources: {
      maxMemoryMB: 256,
      timeoutMs: 30_000,
      maxCpuSeconds: 10,
      maxConcurrentTools: 3,
    },

    // Dangerous tools require human approval
    approvalRequired: ['emailSender'],
  }),
})

Default Sandbox Behavior

By default, all tools run in a sandbox with no network access, no filesystem access, 128MB memory limit, and a 10-second timeout. You must explicitly grant permissions. This is the zero-trust principle applied to tool execution.

Deployment & Scalability#

AKIOS is designed to be stateless at the compute layer. All persistent state is offloaded to external stores, enabling horizontal scaling and multi-region high availability.

Compute Layer

Stateless workers that run the policy engine, guardrails, and routing logic. Deploy on Vercel Edge, AWS Lambda, Cloudflare Workers, or Docker containers. No sticky sessions required.

State Stores

Redis for session state and conversation memory. PostgreSQL for audit logs, policy versions, and analytics. Both support replicated, multi-region configurations.

Multi-Region HA

Active-active deployment across regions. Requests are routed to the nearest healthy region via Anycast DNS. Automatic failover with zero downtime. Data residency controls per region.

Edge Runtime

Policy evaluation and input guardrails can run at the edge, adding sub-millisecond latency. The heavy lifting (LLM calls, tool execution) routes to the nearest regional compute cluster.

Multi-Region Deployment
EU-West
Compute
Redis
Postgres
US-East
Compute
Redis
Postgres
AP-Tokyo
Compute
Redis
Postgres
akios.deploy.yml
# AKIOS Multi-Region Deployment Configuration
apiVersion: akios/v1
kind: Deployment
metadata:
  name: production
spec:
  regions:
    - name: eu-west-1
      primary: true
      dataResidency: EU
      compute:
        replicas: 3
        runtime: edge     # Policy eval at edge
      stores:
        redis: redis://eu-redis.internal:6379
        postgres: postgres://eu-db.internal:5432/akios

    - name: us-east-1
      primary: false
      dataResidency: US
      compute:
        replicas: 3
        runtime: container
      stores:
        redis: redis://us-redis.internal:6379
        postgres: postgres://us-db.internal:5432/akios

  failover:
    strategy: active-active
    healthCheck:
      interval: 5s
      timeout: 2s
      unhealthyThreshold: 3
    dns:
      type: anycast
      ttl: 30

Start Simple

You don't need multi-region on day one. Start with a single region and a single replica. AKIOS scales linearly — add regions and replicas as your traffic grows. The stateless architecture means zero migration effort.