Product & Company Updates
How Workforce Works: A Technical Overview
Deep dive into Workforce's agent engine architecture, five-layer memory model, session continuity, and self-hosted deployment for autonomous AI engineering agents.

Every AI agent platform gives you the pitch: agents that write code, handle tickets, ship features. The question a CTO should ask next is: how does it actually work?
This is the technical reference for Workforce — the architecture, the data model, the operational mechanics. If you've seen a demo and want to understand the system before bringing it to your team, start here.
The Agent Engine
Workforce's runtime is a single Rust binary built around a central Agent Engine with four subsystems:
Intelligence — The Knowledge Graph indexes your codebase into entities (functions, structs, traits, modules) and the relationships between them (calls, imports, implements). A production codebase might index 7,000+ entities and 25,000+ relationships across hundreds of files. The Sentinel Scanner runs alongside it, detecting dangerous code patterns in real time.
Automation — Workflows, Heartbeats, the Scheduler, and Sub-Agents. Workflows are defined in TOML with steps, branching logic, and pause gates for human approval. Heartbeats are cron-scheduled monitoring cycles. Sub-agents spawn for parallel execution.
Integrations — GitHub, Linear, and Slack are first-class integrations, not bolted-on API wrappers. Each has deep, bidirectional capabilities covered in detail below.
Core — The LLM Pool and Router, the Memory system with its Vault, and Context Compaction. This is the infrastructure layer that keeps agents running efficiently.
Defining an Agent
Agents are configured in workforce.toml. A typical agent definition includes an identity, a role template, memory workspace, and explicit capability grants:
Inside the identity directory, three files define who the agent is:
SOUL.md — The agent's core disposition, values, and working style. Think of it as a constitution that doesn't change between tasks.
IDENTITY.md — Role-specific context: what this agent is responsible for, how it collaborates with other agents, what its specialisations are.
USER.md — Preferences and conventions from the humans on the team. Coding standards, review expectations, communication norms.
These files aren't decorative. They're loaded during every bootstrap sequence and their integrity is hash-verified at startup. If any file has been modified outside normal channels, the system flags it immediately.
The Five-Layer Memory Model
Most AI tools are stateless. Every session starts blank. Workforce agents carry persistent memory across sessions, structured in five layers:
Working Memory — The active context window. What the agent is doing right now, the current task, relevant code, open threads.
Episodic Memory — Records of past sessions. What happened, what was decided, what was tried and failed. Agents can recall prior work on a ticket or a codebase without re-discovering everything from scratch.
Semantic Memory — Distilled knowledge about the codebase, team conventions, architectural patterns. Not raw event logs — curated understanding.
Team Memory — Shared context between agents. When one agent reviews a PR and leaves comments, other agents on the team can access that context. This is how multi-agent coordination avoids redundant work.
Organisation Memory — Global knowledge: company-wide conventions, cross-team standards, architectural decisions that affect everyone.
Memory is backed by both the filesystem (markdown files in each agent's workspace) and a vector store for semantic retrieval. Agents don't just accumulate memories passively. During heartbeat cycles, they actively curate MEMORY.md — reviewing daily notes and distilling what's worth retaining long-term.
Session Bootstrap and Continuity
When an agent wakes up — whether it's the start of a workday or a recovery from a context reset — it follows a deterministic bootstrap sequence:
Load and verify identity files (SOUL.md, IDENTITY.md, USER.md)
Read daily notes and active task state
Load curated memory from MEMORY.md
Check for pending work in Linear, GitHub, and Slack
Resume or pick up the next task
This isn't a loose heuristic. It's a fixed sequence that ensures agents start every session with full orientation, regardless of what happened to the previous session.
If context resets mid-task — because the context window filled up, a process restarted, or a provider switch occurred — the continuity protocol kicks in. Key facts from the active session are extracted to vector memory before compaction. When the agent resumes, it rebuilds working context from these saved checkpoints. The result: agents don't lose track of what they were doing.
Context compaction itself runs three strategies:
Adaptive at 75% context usage — older, less relevant context is compressed
Emergency at 95% — aggressive compaction to keep the agent operational
Proactive tool eviction — tool results that have already been processed are cleared first
Integrations in Depth
GitHub
Workforce agents manage the full PR lifecycle through three chained skills: review-pr → prepare-pr → merge-pr.
An agent can clone a repo, create a branch, write code, and open a PR with labels and reviewers assigned. Other agents review with inline comments and formal review states (APPROVE, REQUEST_CHANGES, COMMENT). Before merging, the system performs SHA-pinned safety verification — it confirms that the exact commit that was reviewed is the commit being merged. No race conditions between review and merge.
CI status checks are monitored automatically. If a build fails, the agent can read the failure, diagnose it, push a fix, and re-trigger the pipeline.
Linear
Multi-workspace issue tracking with automatic ticket prefix resolution. If an agent encounters STA-255, it knows that routes to the Starmoire workspace. DW-404 routes to the Driftwerk workspace. Full CRUD operations, search, bulk updates, and linked issue relations are all supported.
Agents pick up assigned tickets, read the description and any linked context, and start working. Status updates flow back to Linear as work progresses.
Slack
Slack runs on a Socket Mode persistent WebSocket connection — not polling. Agents maintain thread-aware conversations with per-channel engagement modes:
all — Respond to any relevant message
mention_only — Only engage when directly mentioned
quiet — Monitor but don't participate
In group chats, agents follow what we call "the human rule": they don't respond to every message. They participate like a team member would — contributing when they have something relevant to add, staying quiet otherwise.
Security Model
Three layers, each addressing a different threat surface:
Layer 1: Policy Engine — Tool-level access control with priority, scope (global, team, or agent-specific), and pattern matching. An agent can be given GitHub access but restricted from force-pushing. Agents can propose new policy rules, but a human must approve them.
Layer 2: Sentinel Scanner — Detects dangerous code patterns including reverse shells, crypto mining, data exfiltration attempts, and path traversal. Runs automatically on file changes and available for on-demand scans. Each detection is classified by severity.
Layer 3: Integrity Verification — Cryptographic hashes are computed at load time for SOUL.md, IDENTITY.md, and AGENTS.md. If any hash doesn't match the expected value, the system alerts immediately. This prevents identity tampering — an agent's core instructions can't be silently modified.
Credentials are managed through an encrypted Vault with dot-notation paths. Secrets are never logged and never exposed in agent context windows.
Adaptive Model Routing and Cost
Workforce doesn't burn frontier model tokens on every request. The LLM Pool and Router automatically classify task complexity and route accordingly:
Complex reasoning tasks (architecture decisions, nuanced code review) route to frontier models like Claude Sonnet
Routine tasks (formatting, simple lookups, status updates) drop to cheaper models like Haiku
The system detects read-only rounds, fast rounds, and failure escalation patterns
In practice, this saves 40–60% on token costs compared to running everything through a single model tier.
Cross-provider failover adds resilience: if Claude goes down, agents switch to Grok, then Gemini. A token pool with multiple accounts handles automatic rate-limit rotation. Agents don't stall waiting for API availability.
Per-ticket cost tracking lets you see exactly what each piece of work costs in tokens and dollars. Efficiency reports break down tool usage, model selection, and total spend.
Self-Hosted Deployment
Workforce runs on your infrastructure. Your code, your credentials, your agent conversations — none of it transits external servers beyond the LLM API calls themselves.
In practice, this means you deploy the Rust binary on your own machines or cloud instances, configure it against your existing GitHub, Linear, and Slack accounts, and the system operates entirely within your security boundary. There's no Workforce-hosted backend processing your source code.
For teams where data residency, compliance, or intellectual property protection matters, self-hosted isn't a premium add-on. It's the default architecture.
Book a demo to see Workforce in action.
MORE RESOURCES



