Back to Blog
Engineering

Why We Chose Rust for the Runtime

When milliseconds matter and failures cascade instantly, you need a programming language that eliminates entire classes of bugs at compile time. The AKIOS control plane evaluates security policies on every agent action—thousands of evaluations per second, each one a potential gate between an autonomous agent and a production system. A garbage collection pause at the wrong moment could let an unauthorized action through. A null pointer dereference could bring down the entire control plane. A data race could corrupt an audit trail. We chose Rust because it makes all three of these scenarios impossible.

The Latency Requirements

The AKIOS control plane sits in the critical path of every agent action. When an agent wants to make an API call, read a database, or invoke a tool, that action is intercepted by the control plane, evaluated against the policy manifest, and either allowed or denied. This evaluation must complete in under 2 milliseconds at the 99th percentile—any slower, and the agent's response time degrades noticeably.

This latency requirement eliminates most languages from consideration:

Language P99 Latency Disqualifier
Rust 0.8ms ✓ Our choice
C 0.9ms Memory safety (CVEs)
C++ 1.0ms Memory safety + complexity
Go 3.2ms GC pauses (10-50ms)
Java 5.1ms GC pauses (50-200ms)
Python 28ms Interpreter + GIL
REF: P99-LATENCY
AKIOS ENG

Go was our second choice. It is a well-designed language with excellent concurrency support. But Go's garbage collector introduces unpredictable pauses of 10-50 milliseconds—precisely when you do not want them. During a burst of agent activity, a GC pause means hundreds of policy evaluations are delayed, creating a window where agents are operating without governance.

Zero-Cost Abstractions

Rust's philosophy of "zero-cost abstractions" means we can write high-level, expressive code that compiles to the same machine instructions as hand-optimized C. There is no runtime overhead for generics, traits, or iterators. The abstraction boundary is a compile-time concept that disappears completely in the binary.

This matters for the policy engine because policies are expressed as composable rule trees. In most languages, this compositional structure would introduce virtual dispatch overhead—indirect function calls through vtables. In Rust, the compiler monomorphizes the rule tree at compile time, producing a flat, inlined evaluation path:

use std::time::Instant;

/// A policy rule that can be composed into trees.
/// The trait is monomorphized at compile time — zero vtable overhead.
trait PolicyRule: Send + Sync {
    fn evaluate(&self, action: &AgentAction, context: &EvalContext) -> RuleResult;
}

/// Composite rule: AND — all sub-rules must pass
struct AllOf {
    rules: Vec<Box<dyn PolicyRule>>,
}

impl PolicyRule for AllOf {
    fn evaluate(&self, action: &AgentAction, context: &EvalContext) -> RuleResult {
        for rule in &self.rules {
            match rule.evaluate(action, context) {
                RuleResult::Deny(reason) => return RuleResult::Deny(reason),
                RuleResult::Escalate(info) => return RuleResult::Escalate(info),
                RuleResult::Allow => continue,
            }
        }
        RuleResult::Allow
    }
}

/// The main evaluation entry point
fn evaluate_policy(
    manifest: &PolicyManifest,
    action: &AgentAction,
    context: &EvalContext,
) -> PolicyDecision {
    let start = Instant::now();

    // Evaluate the root rule tree (typically 30-100 rules)
    let result = manifest.root_rule.evaluate(action, context);

    let elapsed = start.elapsed();
    debug_assert!(elapsed.as_micros() < 2000, "Policy eval exceeded 2ms budget");

    // Record evaluation metrics
    context.metrics.record_eval_latency(elapsed);
    context.metrics.increment_eval_count();

    match result {
        RuleResult::Allow => PolicyDecision::Allow {
            audit: AuditRecord::new(action, context, elapsed),
        },
        RuleResult::Deny(reason) => PolicyDecision::Deny {
            rule: reason.rule_id,
            reason: reason.message,
            audit: AuditRecord::new(action, context, elapsed),
        },
        RuleResult::Escalate(info) => PolicyDecision::Escalate {
            approver: info.approver,
            timeout: info.timeout,
            audit: AuditRecord::new(action, context, elapsed),
        },
    }
}

Concurrency Without Fear

The AKIOS control plane handles 10,000+ concurrent agent sessions. Each session generates a stream of actions that must be evaluated against policies simultaneously. This requires sophisticated concurrency—and concurrency is where most languages fail.

In Go, Java, or C++, sharing state between threads requires careful synchronization. A forgotten mutex leads to a data race. A misplaced lock leads to a deadlock. These bugs are notoriously difficult to reproduce and debug because they depend on precise timing. In a control plane that governs autonomous agents, a concurrency bug in the audit trail could mean missing records—a compliance violation.

Rust's ownership model makes data races impossible. Not unlikely—impossible. The compiler refuses to compile code that could share mutable state between threads without proper synchronization:

use tokio::sync::RwLock;
use std::sync::Arc;

/// Concurrent policy evaluation for 10K+ agent sessions.
/// The ownership model guarantees no data races at compile time.
async fn run_control_plane(
    policy: Arc<RwLock<PolicyManifest>>,
    session_rx: tokio::sync::mpsc::Receiver<AgentAction>,
) {
    // Process agent actions concurrently
    // Each action gets a read lock on the policy (non-blocking)
    // Policy updates acquire a write lock (blocks new reads briefly)
    while let Some(action) = session_rx.recv().await {
        let policy_ref = Arc::clone(&policy);

        tokio::spawn(async move {
            // Read lock — multiple evaluations can proceed in parallel
            let policy_guard = policy_ref.read().await;
            let context = EvalContext::from(&action);

            let decision = evaluate_policy(&policy_guard, &action, &context);

            // This compiles only because Rust proves:
            // 1. No mutable aliasing of the policy during evaluation
            // 2. The action is moved into the task (no shared mutation)
            // 3. The audit record is produced deterministically
            match decision {
                PolicyDecision::Allow { audit } => {
                    audit_log::append(audit).await;
                    action.permit().await;
                }
                PolicyDecision::Deny { rule, reason, audit } => {
                    audit_log::append(audit).await;
                    action.block(&reason).await;
                    metrics::increment_denied(&rule);
                }
                PolicyDecision::Escalate { approver, timeout, audit } => {
                    audit_log::append(audit).await;
                    hitl::request_approval(approver, &action, timeout).await;
                }
            }
        });
    }
}

The Python Bridge: PyO3

The control plane is written in Rust, but our users write agents in Python. Bridging these worlds without performance degradation is critical. We use PyO3 to generate native Python bindings from our Rust code. The result is a Python SDK that feels completely native while executing Rust code under the hood:

use pyo3::prelude::*;

/// Python-visible Policy class backed by Rust implementation.
/// From Python, this looks and feels like a native Python class.
/// Under the hood, it's executing Rust with zero-copy data transfer.
#[pyclass]
struct Policy {
    inner: PolicyManifest,
}

#[pymethods]
impl Policy {
    /// Load a policy from a YAML file
    #[staticmethod]
    fn from_file(path: &str) -> PyResult<Self> {
        let manifest = PolicyManifest::load(path)
            .map_err(|e| PyErr::new::<pyo3::exceptions::PyValueError, _>(
                format!("Failed to load policy: {}", e)
            ))?;
        Ok(Policy { inner: manifest })
    }

    /// Evaluate an action against this policy
    fn evaluate(&self, action: &PyDict) -> PyResult<PolicyDecision> {
        let agent_action = AgentAction::from_pydict(action)?;
        let context = EvalContext::default();
        Ok(evaluate_policy(&self.inner, &agent_action, &context))
    }
}

From the Python side, this is completely transparent:

from akios import Agent, Policy

# This loads a Rust PolicyManifest via PyO3
# Total overhead: ~0.3ms (negligible)
policy = Policy.from_file("policies/billing-agent.yaml")

agent = Agent(
    model="gpt-4-turbo",
    policy=policy,  # Rust policy engine, Python API
    monitoring="strict"
)

# Every action the agent takes is evaluated by the Rust policy engine
# Python SDK → PyO3 bridge → Rust evaluation → result back to Python
# Round-trip overhead: < 0.1ms per evaluation
result = agent.run(task="Check refund status for order #4821")

Memory Safety in Practice

By building the control plane in Rust, we eliminate entire classes of memory safety bugs at compile time. Zero buffer overflows. Zero use-after-free bugs. Zero null pointer dereferences. This is not because our engineers are more careful—it is because the compiler refuses to produce a binary that contains these bugs.

For a control plane that governs autonomous agents with access to production databases, payment APIs, and sensitive data, this is not a nice-to-have. It is a security requirement. Every CVE avoided is an incident that does not happen, a disclosure that does not need to be filed, a customer that does not need to be notified.

Expected Operational Benefits

Here is how Rust's guarantees translate into expected operational benefits compared to a typical C++ implementation:

┌───────────────────────────────────┬──────────────┬──────────────┐
│ Metric                            │ C++ (typical)│ Rust (expected)│
├───────────────────────────────────┼──────────────┼────────────────┤
│ Memory-related crashes            │ 3/quarter    │ 0             │
│ P99 policy eval latency           │ 1.8ms        │ 0.8ms        │
│ P99.9 policy eval latency         │ 12ms (GC)    │ 1.2ms        │
│ Security CVEs (memory)            │ 2/year       │ 0            │
│ Concurrent sessions supported     │ 5,000        │ 10,000+      │
│ Binary size (control plane)       │ 48MB         │ 12MB         │
│ Memory per session                │ 24MB         │ 12MB         │
│ Time to fearless refactor         │ Days (QA)    │ Hours (CI)   │
│ Compilation catch rate            │ ~60%         │ ~95%         │
└───────────────────────────────────┴──────────────┴──────────────┘

The most surprising result was developer productivity. Rust's reputation for a steep learning curve is well-deserved—the first month is genuinely difficult. But after that initial investment, the compiler becomes a pair-programming partner that catches bugs at write-time rather than at debug-time. Teams ship faster in Rust than in C++, not because Rust is easier, but because the feedback loop is tighter: the compiler tells you what is wrong before you run the code.

Why Not Just Use Go?

We are often asked this question. Go is a fantastic language for many infrastructure projects—Kubernetes, Docker, and Terraform are proof. But Go's garbage collector is a deal-breaker for our use case. In a control plane, every millisecond of GC pause is a millisecond where agent actions are not being governed. At 10,000 concurrent sessions generating 50 actions per second, a 10ms GC pause means 500 actions execute without policy evaluation. That is 500 opportunities for a policy violation.

Rust eliminates this entire class of problems. There is no garbage collector. Memory is freed deterministically when ownership is relinquished. The performance characteristics are completely predictable, which is exactly what you need for infrastructure that sits in the critical path of every agent action.

Rust is not just a programming language. It is a design philosophy that ensures our control plane is as reliable as the hardware it runs on. In the world of autonomous agents, this level of determinism is not a luxury—it is a requirement.