AURA Runtime — durable execution, crash recovery, zero task loss. Read the changelog →
AURA OrgKernel Mission Layer Runtime Control Plane
EXECUTION ENGINE

AURA Runtime

The durable execution engine that powers every agent on Metaprise. Crash recovery, event replay, saga compensation, and checkpoint persistence — tasks are never lost, even across infrastructure failures.

ENGINE ARCHITECTURE
EXECUTION LAYER
Agent Executor
Memory Engine
Tool Gateway
DURABILITY LAYER
Audit Engine
Durable Execution
Saga Compensation
Checkpoint
MONITORING LAYER
Activity Heartbeat
Retry Policy
Continue-As-New
CRASH RECOVERY · EVENT REPLAY · ZERO TASK LOSS
10
Engine Components
0
Tasks Lost
99.99%
Durability SLA
Auto
Crash Recovery

Ten engines.
Zero task loss.

AURA Runtime is organized into three layers: execution, durability, and monitoring. Together they guarantee that every agent task completes — even across crashes, timeouts, and infrastructure failures.

EXECUTION LAYER — CORE TASK PROCESSING
ENGINE 01

Agent Executor

Task Execution Engine

The core execution engine that receives Missions, orchestrates tool calls, manages agent state, and drives the mission through its lifecycle. Every execution is deterministic and reproducible.

ORCHESTRATIONDETERMINISTICREPRODUCIBLE
ENGINE 02

Memory Engine

State Memory Management

Manages the agent's working memory during execution — MissionStateData reads and writes, context retrieval, and inter-step state passing. Every write generates an AuditChain entry.

STATE MGMTAUDIT-ON-WRITECONTEXT
ENGINE 03

Tool Gateway

DualToken-Authenticated Gateway

Every tool call passes through the Tool Gateway, which validates both AgentIdentity and ExecutionToken before allowing the call to proceed. No valid DualToken, no execution.

DUALTOKENGATEWAYZERO TRUST
DURABILITY & FAULT TOLERANCE — CRASH RECOVERY
ENGINE 04

Audit Engine

Audit Write Engine

Dedicated engine for synchronous audit writes. Every action — tool calls, state changes, permission checks — is written to the AuditChain before execution continues.

SYNCHRONOUSAUDITCHAIN
ENGINE 05

Durable Execution

Crash Recovery + Event Replay

If the runtime crashes mid-execution, the Durable Execution Engine replays the event history to recover exact state. No task is ever lost — guaranteed.

EVENT REPLAYCRASH RECOVERY
ENGINE 06

Saga Compensation

Automatic Multi-Step Rollback

When a multi-step Mission fails partway through, the Saga engine executes compensating transactions to roll back completed steps — automatically and in reverse order.

COMPENSATIONAUTO-ROLLBACK
ENGINE 07

Checkpoint

Per-Step State Persistence

After each execution step, the agent's state is checkpointed. On recovery, execution resumes from the last checkpoint — not from scratch.

PER-STEPINCREMENTAL
MONITORING & LONGEVITY — KEEP TASKS ALIVE
ENGINE 08

Activity Heartbeat

Progress Monitoring

Long-running tasks emit heartbeats to prove they're still alive. If heartbeats stop, the runtime intervenes — rescheduling on healthy infrastructure or triggering compensation.

LIVENESSLONG-RUNNINGAUTO-RESCHEDULE
ENGINE 09

Retry Policy Engine

Configurable Retry Strategies

Exponential backoff, maximum attempts, non-retryable error classification — all configurable per Mission. Failed steps retry automatically according to policy.

EXPONENTIALMAX ATTEMPTSNON-RETRYABLE
ENGINE 10

Continue-As-New

State Renewal for Ultra-Long Executions

For executions that exceed single-history limits, Continue-As-New refreshes the execution state while preserving all progress — overcoming platform limits on execution duration.

STATE RENEWALUNLIMITEDSEAMLESS
DURABILITY

Durable Execution
& Crash Recovery

Every execution step is recorded as an event. If the runtime crashes — hardware failure, network partition, OOM kill — the Durable Execution Engine replays the event history to recover exact state. The agent resumes from where it stopped, not from the beginning.

EVENT SOURCING
Event History: Every execution step is persisted as an immutable event before execution proceeds
Deterministic Replay: Events are replayed in exact order to reconstruct pre-crash state
Idempotent Actions: Replayed tool calls detect prior completion and skip re-execution
Zero Data Loss: The combination of event sourcing and checkpoints guarantees no work is ever lost
RECOVERY GUARANTEES
Automatic Detection: Heartbeat failure triggers immediate recovery sequence
Infrastructure Migration: Recovery can resume on different infrastructure if the original host is down
State Consistency: Post-recovery state is bit-identical to pre-crash state
Transparent to Agents: The agent doesn't need to handle recovery — the runtime manages everything
FAULT TOLERANCE

Saga Compensation
& Checkpoints

Multi-step Missions often involve irreversible side effects — sending emails, transferring funds, updating records. When a later step fails, the Saga engine executes compensating transactions to undo completed steps. Checkpoints ensure recovery starts from the last successful step, not from scratch.

SAGA COMPENSATION
Compensating Transactions: Each forward action has a registered compensation action
Reverse-Order Execution: Compensations run in reverse order of the original steps
Partial Failure Handling: Only steps that completed are compensated — no unnecessary rollbacks
Compensation Audit: Every compensation action is recorded in the AuditChain
CHECKPOINT PERSISTENCE
Per-Step Checkpoints: State is persisted after each execution step
Incremental Snapshots: Only state deltas are persisted — not full copies
Resume from Last: Recovery starts from the most recent checkpoint, not from zero
Storage Efficient: Automatic cleanup of old checkpoints after Mission completion
MONITORING & LONGEVITY

Heartbeat &
Continue-As-New

Long-running agent tasks — multi-hour data analysis, overnight compliance scans, continuous monitoring — need special handling. Activity Heartbeat ensures liveness detection, while Continue-As-New allows executions to run indefinitely by refreshing state at platform limits.

ACTIVITY HEARTBEAT
Liveness Signal: Long-running activities emit periodic heartbeats to prove they're alive
Timeout Detection: If heartbeats stop, the runtime detects the task as stalled
Auto-Reschedule: Stalled tasks are rescheduled on healthy infrastructure
Progress Reporting: Heartbeats can carry progress data for real-time monitoring
CONTINUE-AS-NEW
State Renewal: When execution history grows too large, state is compacted and execution continues
Seamless Transition: The agent doesn't perceive the renewal — it's transparent
Unlimited Duration: No practical limit on how long a Mission can run
History Compaction: Old events are archived; active state is carried forward

From Mission to outcome — durable at every step.

Every step is checkpointed, every action is audited, every failure triggers recovery. The runtime ensures completion — not just execution.

01
Mission In
Approved Mission received
02
Executor
Agent Executor begins task
03
Tool Gateway
DualToken validated
04
Memory Write
State persisted + audited
05
Checkpoint
State snapshot saved
06
Heartbeat
Liveness confirmed
07
Outcome
Result sealed immutably

Zero task loss guarantee: The combination of event sourcing, checkpoint persistence, saga compensation, and automatic recovery means no agent task is ever lost — even across hardware failures, network partitions, and infrastructure migrations.

Agent execution that survives anything.

Durable execution, crash recovery, saga compensation — tasks complete no matter what.