Back to Blog
UX

Visualizing high-dimensional agent trajectories

An autonomous agent does not move in a straight line. It explores a high-dimensional state space, backtracking, forking, and looping. A single ReAct-style agent might consider three candidate tools, partially execute two of them, backtrack after an observation invalidates an assumption, and then fork into parallel sub-tasks. Visualizing this execution path is one of the hardest problems in AI UX—and one of the most important for production operators.

Beyond the Chat Bubble

The standard "chat" interface flattens a complex tree into a linear list. This is a catastrophic loss of information. The chat transcript shows the final path the agent took, but hides everything that mattered: the branches it explored and abandoned, the tools it considered but rejected, the moments where its confidence dropped below threshold.

For a developer building a demo, the chat view is fine. For an operator responsible for an agent processing financial transactions, it is like monitoring an aircraft with only an altimeter—you can see where you are, but not where you are going or what went wrong.

At AKIOS, we believe that an agent's execution trace is not a conversation. It is a trajectory through a decision space, and it must be visualized as such.

The DAG Representation

The first layer of our visualization system in AKIOS Radar renders the agent's execution as a Directed Acyclic Graph (DAG). Each node represents a discrete state: a thought step, a tool invocation, an observation from the environment, or a decision point. Edges encode transitions, annotated with the probability or confidence score that drove the choice.

This representation immediately reveals patterns invisible in a linear transcript:

  • Fan-out nodes: Points where the agent considered multiple actions. Wide fan-out suggests uncertainty or poorly constrained tool selection.
  • Dead-end branches: Paths the agent explored and abandoned. These are often the most diagnostically valuable—they reveal what the agent tried before finding the correct approach.
  • Loops and cycles: Repetitive patterns that indicate the agent is stuck in a retry loop, burning tokens without progress. Radar automatically flags cycles that exceed a configurable depth.
  • Convergence points: Moments where multiple reasoning paths led to the same conclusion, which increases confidence in the result.

Rendering the DAG

We use a force-directed layout engine adapted for temporal ordering. Nodes are positioned left-to-right by execution time, with vertical spread determined by branch divergence. The rendering pipeline is implemented in TypeScript with D3:

import * as d3 from 'd3';

interface TraceNode {
  id: string;
  type: 'thought' | 'tool_call' | 'observation' | 'decision';
  timestamp: number;
  confidence: number;
  content: string;
}

interface TraceEdge {
  source: string;
  target: string;
  probability: number;
}

function renderTrajectoryDAG(
  nodes: TraceNode[],
  edges: TraceEdge[],
  container: SVGElement
): void {
  const simulation = d3.forceSimulation(nodes)
    .force('link', d3.forceLink(edges).id((d: any) => d.id).distance(80))
    .force('charge', d3.forceManyBody().strength(-200))
    .force('x', d3.forceX((d: any) => d.timestamp * 120)) // Temporal ordering
    .force('y', d3.forceY(300)); // Center vertically

  // Color-code nodes by type
  const colorScale: Record<string, string> = {
    thought:     '#00D4FF', // Cyan — reasoning step
    tool_call:   '#FBBF24', // Amber — external action
    observation: '#4ADE80', // Green — environment response
    decision:    '#EF4444', // Red — branching point
  };

  // Render edges with opacity proportional to transition probability
  const links = d3.select(container)
    .selectAll('line')
    .data(edges)
    .enter()
    .append('line')
    .attr('stroke', '#1E1E2A')
    .attr('stroke-opacity', (d: TraceEdge) => d.probability);

  // Render nodes
  const circles = d3.select(container)
    .selectAll('circle')
    .data(nodes)
    .enter()
    .append('circle')
    .attr('r', 6)
    .attr('fill', (d: TraceNode) => colorScale[d.type]);
}

Embedding Space Projection

The DAG shows the structural trajectory—what the agent did. But it doesn't show the semantic trajectory—what the agent was thinking about. For this, we need dimensionality reduction.

At each step in the agent's execution, we capture the embedding vector of its context window. These vectors live in a high-dimensional space (typically 1536 or 3072 dimensions, depending on the embedding model). We project them into 2D using UMAP, which preserves local neighborhood structure far better than t-SNE for this use case.

import numpy as np
from umap import UMAP
from akios.radar import TraceCollector

def project_agent_trajectory(session_id: str) -> np.ndarray:
    """Project an agent's context embeddings into 2D for visualization."""
    collector = TraceCollector(session_id=session_id)
    
    # Collect embedding at each reasoning step
    embeddings = []  # List of (timestamp, embedding_vector) tuples
    for step in collector.get_trace_steps():
        embeddings.append({
            'timestamp': step.timestamp,
            'step_type': step.type,  # thought, tool_call, observation
            'vector': step.context_embedding,  # shape: (1536,)
            'token_count': step.cumulative_tokens,
        })
    
    # Stack into matrix: (n_steps, embedding_dim)
    embedding_matrix = np.stack([e['vector'] for e in embeddings])
    
    # Project to 2D with UMAP
    reducer = UMAP(
        n_components=2,
        n_neighbors=min(15, len(embeddings) - 1),
        min_dist=0.1,
        metric='cosine',  # Cosine distance for text embeddings
    )
    projection = reducer.fit_transform(embedding_matrix)
    
    return projection  # shape: (n_steps, 2)

Semantic Drift Detection

The projected embedding space reveals a critical failure mode: semantic drift. This occurs when an agent's reasoning gradually migrates away from its system prompt's intent. In the embedding projection, drift appears as a trajectory that starts in one cluster and slowly migrates toward a different region of the semantic space.

We compute drift as the cosine distance between the current step's embedding and the system prompt's embedding. When this distance exceeds a configurable threshold, Radar raises an alert:

def detect_semantic_drift(
    step_embedding: np.ndarray,
    system_prompt_embedding: np.ndarray,
    threshold: float = 0.35
) -> bool:
    """Returns True if the agent has drifted from its system prompt intent."""
    cosine_distance = 1.0 - np.dot(step_embedding, system_prompt_embedding) / (
        np.linalg.norm(step_embedding) * np.linalg.norm(system_prompt_embedding)
    )
    return cosine_distance > threshold

This is particularly valuable for long-running agents that accumulate context over extended sessions. A customer support agent that starts by discussing billing inquiries but gradually drifts into providing unauthorized technical support is a policy violation—but one that is invisible in a chat transcript. In the embedding projection, it is immediately obvious as a trajectory moving away from the "billing" cluster toward the "engineering" cluster.

Real-Time vs. Post-Hoc Analysis

AKIOS Radar supports both modes. In real-time mode, the DAG and embedding projections update live as the agent executes. This is essential for monitoring high-stakes agent sessions where an operator needs to intervene if the trajectory deviates. The real-time projection uses incremental UMAP updates to avoid the cost of re-fitting the entire model at each step.

In post-hoc mode, the full trajectory is available for forensic analysis. Operators can replay the agent's execution, examining each decision point and understanding why the agent chose the path it did. This mode is critical for incident review, compliance audits, and improving agent policies.

Toward Trajectory-Native Interfaces

The chat interface was designed for conversations between humans. Agent execution is not a conversation—it is a computation. The tools we build for monitoring agents should reflect this fundamental difference. DAG rendering shows the structure of what happened. Embedding projection shows the semantics of what happened. Together, they give operators the visibility they need to trust, debug, and improve autonomous systems.

The agent's trajectory is the new log file. It is time we learned to read it.