Soul Protocol

A Portable Standard for AI Companion Identity, Memory, Cognition, and Emotion.
Version 0.2.3 · March 2026

In 1995, Daniel Goleman argued that emotional intelligence matters more than IQ for human success. Thirty years later, AI memory systems still optimize purely for IQ. They treat persistence as a retrieval accuracy problem: find the most similar text, stuff it into context, move on. The emotional and cognitive dimensions that make memory human are ignored entirely.

Soul Protocol is an open standard for persistent AI companion identity. It combines a psychology-informed memory architecture with a portable file format. The protocol specification is 695 lines of Python. A reference runtime of 9,693 lines implements one opinionated version of the spec. Other runtimes can implement it differently.

A companion's full state — personality, memories, emotional bonds, learned skills, knowledge graph — serializes into a .soul file. The file belongs to the user. It works with any LLM. It survives platform changes.

We validated the protocol through six tiers of evaluation: 1,000 heuristic agent simulations, 100 LLM-backed agents, multi-judge quality tests across five models from four providers, a four-condition component ablation, a head-to-head benchmark against Mem0, and a 7-dimension Soul Health Score framework. Soul-enabled agents scored 9.3/10 on emotional continuity (vs. 1.9 stateless), 8.4/10 on long-range recall through 30+ turns of noise, and outperformed Mem0 by +2.5 points overall. The SHS framework grades a live soul instance at 90.2/100 with heuristic evaluation alone — and when tested with an LLM judge, sentiment accuracy jumps from 70% to 97%.

01 The problem

Stateless AI

Most AI assistants start every conversation from zero. Users re-explain their preferences, their context, their history. Not because the technology can't store it, but because persistent identity hasn't been treated as a design priority.

When memory does exist, it's bolted on. A vector database holds conversation chunks. A RAG pipeline retrieves them. A summarization buffer compresses recent turns. These solve retrieval. They don't solve identity.

Retrieval is IQ. Memory is EQ.

Goleman distinguished between cognitive intelligence (IQ — the ability to process information) and emotional intelligence (EQ — the ability to process feeling, context, and relationships). Current AI memory systems are pure IQ. They ask: "What text is most similar to this query?"

But consider what actually determines whether a human memory sticks:

A debugging session at 2am where something finally clicked. Emotionally charged, therefore memorable.
A casual "hello" that happened 100 times. Trivially similar to any greeting query, but meaningless to store.
A fact learned three months ago, recalled twice this week. More accessible than an "important" fact from last week that was never revisited.
Repeated patterns of helping with code. Evidence that eventually forms a self-belief: "I'm good at this."

Cosine similarity captures none of this. Vector databases solve one problem well. Memory requires solving many problems at once. The missing ingredient isn't better retrieval. It's emotional and cognitive context.

The portability gap

Where persistent memory does exist, it's locked to one platform. A companion's history lives in OpenAI's infrastructure, or Anthropic's memory layer, or a custom database tied to one application. Switch providers, start over. Change apps, start over.

Cognee builds knowledge graphs but locks them to its runtime. Mem0 offers vector retrieval without identity. ERC-8004 defines reputation without cognition. No existing system combines portable identity, structured memory, and cognitive processing in an open standard.

02 Design: protocol, not product

HTTP doesn't tell you how to build your website. It defines how data moves between client and server. The spec is small. The implementations are infinite.

Soul Protocol follows the same principle:

spec/

695 lines. Identity, MemoryStore, SoulContainer, .soul format, JSON Schemas.

portable

runtime/

9,693 lines. OCEAN personality, 5-tier memory, psychology pipeline, bonds, skills.

opinionated

cli + mcp

15 commands, 13 MCP tools, 3 resources. Rich TUI. Claude Code and Cursor ready.

batteries

spec/ defines the primitives any runtime must implement: Identity, MemoryStore interface, MemoryEntry format, SoulContainer, .soul file pack/unpack, EmbeddingProvider interface, EternalStorageProvider interface, and similarity functions. It depends on Pydantic. Nothing else.

runtime/ is one way to run the protocol. It implements OCEAN personality, five-tier memory with psychology-informed processing, knowledge graphs, a cognitive engine, emotional bonds, and skill progression. Other runtimes can implement the same spec/ interfaces with entirely different approaches.

What the protocol enforces

Every soul has a unique identity
Every memory entry has a timestamp, type, and content
The .soul file format is standardized (ZIP archive, JSON payloads)
Provider interfaces (memory, embedding, storage) are stable contracts
Container operations (create, open, save) follow defined semantics

What it does not

No required personality model. OCEAN is a runtime choice.
No required memory backend. In-memory, SQLite, Neo4j — whatever works.
No required graph structure or embedding approach.
No required layer names or domain isolation strategy.
No required LLM provider.

The spec/ layer is 695 lines of data models and interface definitions, designed for porting to Go or Rust. JSON Schemas are auto-generated from the protocol models, so any language with a JSON Schema validator can read and write .soul files today without the Python SDK.

03 Architecture

Five-tier memory

Core Permanent facts: persona, values, bonded entity profile. Edits replace rather than append. Never decays. always loaded

Episodic Timestamped interactions that pass the significance gate. Routine exchanges filtered out. Emotionally salient experiences preserved. significance-gated

Semantic Extracted facts: names, preferences, work context. Deduplicated. Conflicts tracked via superseded_by chain. fact extraction

Procedural Learned patterns. How this person likes explanations. What approaches work. How-to knowledge. how-to patterns

Knowledge Graph Entity-relationship graph with temporal edges. Each relationship carries valid_from and valid_to timestamps for point-in-time queries.

Archival memory compresses old conversations. Summaries and key moments remain searchable by keyword and date range. A compression pipeline handles deduplication and importance-based pruning.

The observe() pipeline

Every interaction passes through a processing pipeline before storage:

💬

Input

User input + Agent output enter the pipeline

↓

♥

Sentiment Detection

Tag emotional context — valence, arousal, and label

Damasio: somatic markers

↓

⚙

Significance Gate

Is this worth remembering? Threshold: 0.35. Below = skip episodic storage.

LIDA: attention bottleneck

Episodic — if significant

Semantic — fact extraction + conflict check

Graph — entity extraction, temporal edges

Self-Model — Klein: what does this say about who I am?

This pipeline is the EQ layer. It decides not just what to store, but whether to store, how to feel about it, and what it means for the soul's sense of self. Retrieval-only systems skip all of this.

Bond, encryption, evolution

Bond. Emotional attachment to a bonded entity. Strength ranges 0 to 100, increases through positive interactions (logarithmic growth — deep trust is hard-earned), weakens through neglect (linear decay — sharp). Interaction count and last-contact timestamps track the relationship.

Encryption at rest. .soul files can be password-encrypted with AES-256-GCM. Key derivation uses scrypt with OWASP-recommended parameters (n=2¹⁷, r=8, p=1). The manifest stays readable for detection; all other files are encrypted. Wrong password raises a clear error, not silent corruption.

GDPR-compliant deletion. Three methods: forget(query) for targeted deletion, forget_entity(entity) for cascading entity removal, and forget_before(timestamp) for time-based erasure. Deletions cascade across all memory tiers with an audit trail.

Evolution. Personality traits shift over time through supervised or autonomous mutation, within configurable bounds. A soul bonded to an introverted user might drift lower in extraversion over months. Changes require approval by default.

04 The psychology stack: EQ for AI

Goleman identified five components of emotional intelligence: self-awareness, self-regulation, motivation, empathy, and social skill. Soul Protocol doesn't implement all five. But it implements the memory and cognition foundations that make them possible, grounded in four established theories.

Somatic markers (Damasio, 1994)

Damasio's somatic marker hypothesis: emotions aren't separate from cognition. They're signals that guide memory formation and decision-making. A bad experience leaves a "gut feeling" that shapes future choices before conscious reasoning kicks in.

In Soul Protocol, every interaction gets a somatic marker:

valence

float

-1.0 (negative) to 1.0 (positive)

arousal

float

0.0 (calm) to 1.0 (intense)

label

string

"joy", "frustration", "curiosity"

A heated debugging session at midnight: high arousal, moderate negative valence. A breakthrough: high arousal, high positive valence. A routine greeting: near-zero on both axes. These markers travel with the memory. Emotional context boosts retrieval priority.

ACT-R activation decay (Anderson, 1993)

Human memory follows a power law. Recent and frequently accessed memories are more available. Old, untouched memories fade. Not deleted, just harder to reach.

ACT-R Activation

base_level = ln( Σ t_j^(-0.5) ) # power-law decay over access history
spreading = token_overlap(query, content) # relevance to current query
emotional = arousal + |valence| × 0.3 # somatic boost

activation = 1.0 × base + 1.5 × spread + 0.5 × emotional

A memory recalled twice this morning outranks an "important" memory from last week that was never revisited. This is the opposite of how most AI memory systems work, where importance is a static score assigned once at storage time.

LIDA significance gate (Franklin, 2003)

Not everything deserves to become a memory. The LIDA model proposes an attention bottleneck: most input is discarded, only significant events enter long-term storage.

LIDA Significance

significance = 0.3 × novelty
             + 0.2 × emotional_intensity
             + 0.2 × goal_relevance
             + 0.3 × content_richness

Threshold: 0.35

Below threshold, fact extraction still runs, but the interaction doesn't enter episodic memory. "Hello" doesn't clutter the store. "I just got promoted" does. This gate is the primary defense against memory bloat in long-running companions.

Klein's self-concept (Klein, 2004)

Klein's theory: self-knowledge isn't programmed in. It's discovered from accumulated experience. Someone who helps with code hundreds of times develops the self-concept "I'm a technical person." This belief then shapes how they interpret future interactions.

The soul starts with no taxonomy. Domains emerge from interaction patterns:

Klein Confidence Curve

confidence = min(0.95, 0.1 + 0.85 × (1 - 1/(1 + evidence × 0.1)))

After 1 interaction: ~18% confidence
After 10 interactions: ~56% confidence
After 50 interactions: ~82% confidence

The curve never reaches 1.0. Uncertainty is permanent.

When an LLM is available, the self-model step uses it for genuine reflection, reasoning about identity rather than counting keywords.

05 The CognitiveEngine

The runtime defines a CognitiveEngine base class:

class CognitiveEngine:
    async def think(self, prompt: str) -> str: ...

Any LLM works. Claude, GPT, Gemini, Ollama, a local model. The soul uses it for sentiment detection, significance assessment, fact extraction, self-reflection, and memory consolidation.

When no LLM is available, a HeuristicEngine provides deterministic fallback. Word-list sentiment, formula-based significance, regex fact extraction. The heuristics won't hallucinate, won't call external APIs, and won't crash.

One integration point. Consumers provide a brain. The soul handles the rest.

06 The .soul file format

A .soul file is a ZIP archive:

aria.soul (ZIP, DEFLATED)
├── manifest.json     # version, soul ID, export timestamp, checksums
├── soul.json         # SoulConfig: identity, DNA, evolution state
├── state.json        # current mood, energy, focus, social battery
├── memory/
│   ├── core.json       # persona + bonded-entity profile
│   ├── episodic.json   # interaction history (significance-gated)
│   ├── semantic.json   # extracted facts (with conflict resolution)
│   ├── procedural.json# learned patterns
│   ├── graph.json      # temporal entity relationships
│   └── self_model.json# emergent domain confidence scores
└── dna.md            # human-readable personality blueprint

Rename it to .zip. Open it with any archive tool. Read the JSON. Load it with Claude today, Ollama tomorrow. One file, full state, no external database, no cloud dependency.

JSON Schemas generated from the protocol models enable validation in any language. You don't need the Python SDK to work with .soul files.

07 Personality

Most AI systems treat personality as a system prompt string. Soul Protocol treats it as structured data:

Personality (OCEAN Big Five):
  openness:          0.85
  conscientiousness: 0.72
  extraversion:      0.45
  agreeableness:     0.78
  neuroticism:       0.30

CommunicationStyle:
  warmth:       "moderate"
  verbosity:    "moderate"
  humor_style:  "dry"
  emoji_usage:  "none"

Biorhythms:
  chronotype:        "neutral"
  social_battery:    72.0
  energy_regen_rate: 5.0

Numeric traits generate a reproducible system prompt. Traits can shift over time through supervised mutation. A soul bonded to an introverted user for months might drift lower in extraversion. The personality adapts to the relationship.

OCEAN is a runtime choice, not a protocol requirement. The spec/ layer defines Identity as schema-free key-value pairs. Other runtimes can use Myers-Briggs, Enneagram, or something custom.

08 Vector search and embedding

The protocol defines a pluggable embedding interface:

class EmbeddingProvider(Protocol):
    def embed(self, text: str) -> list[float]: ...
    @property
    def dimensions(self) -> int: ...

The runtime ships two reference implementations: a deterministic MD5-based HashEmbedder (zero dependencies, good for testing) and a TF-IDF embedder (corpus-fitted, good for domain-specific retrieval without external APIs).

A VectorSearchStrategy connects embeddings to the memory system with index caching, threshold filtering, and cosine similarity ranking. Swap in OpenAI embeddings, Cohere, or a local model by implementing the interface.

Similarity functions (cosine, euclidean, dot product) live in spec/ with vector length guards. Mismatched dimensions raise errors instead of silently truncating.

09 Eternal storage

Souls can be archived to decentralized storage:

Local  ->  IPFS  ->  Arweave  ->  Chain
(free)   (pinned)  (permanent)  (proof)

The protocol defines an EternalStorageProvider interface with archive(), recover(), and verify() methods. The runtime includes an EternalStorageManager that orchestrates multi-tier archival with fallback recovery.

Current providers are mocks (content-addressed CID simulation, transaction ID generation) for testing. Production IPFS and Arweave integrations are planned.

CLI: soul archive, soul recover, soul eternal-status.

10 Comparison

System	Solves	Doesn't solve
Mem0	Persistent vector memory	Identity, personality, portability, cognitive processing
Cognee	Knowledge graphs, domain isolation	Portability, identity, emotional memory
MemGPT / Letta	Context window management	Personality, portable files, emotional memory
LangChain Memory	RAG retrieval	Significance filtering, self-model, portability
OpenAI Memory	User facts per-account	Platform lock-in, personality, portable export
ANP	Agent discovery, communication	Memory, identity persistence, cognition
ERC-8004	On-chain agent reputation	Memory, personality, cognition
MCP	Tool integration for LLMs	Complementary. Soul Protocol ships an MCP server.

Head-to-head: Soul Protocol vs. Mem0

We ran both systems on identical conversations, with the same LLM judge scoring both.

Test	Soul Protocol	Mem0 (v1.0.5)	Stateless
Hard Recall	7.8	5.1	4.2
Emotional Continuity	9.2	7.0	1.8
Overall	8.5	6.0	3.0

Mem0 is a good memory system. It captures facts, retrieves them, and substantially outperforms a stateless baseline. But it doesn't track emotional arcs or personality. When asked "how do you think this whole experience has been for me?", Mem0 recognized the user's situation but missed the full emotional trajectory. Soul Protocol captured the journey from excitement through devastation to cautious recovery, because somatic markers traveled with the memories.

The gap isn't about retrieval quality. It's about what gets stored alongside the facts.

Soul Protocol is not a retrieval replacement. It's a layer that sits alongside retrieval. The psychology pipeline decides what to store and how to score it. Vector search, graphs, and RAG plug in through provider interfaces.

11 Current implementation

Python 3.12. Open source. MIT license.

10,300+ lines across 80+ modules
1,189 tests, six-tier validation (including Soul Health Score framework)
15-command CLI with rich TUI
MCP server (13 tools, 3 resources)
JSON Schemas for cross-language validation
Two-layer architecture: spec/ (695 lines, portable) + runtime/ (9,693 lines, opinionated)
Zero required cloud dependencies
Validated against Mem0, multi-judge LLM evaluation, component ablation, and SHS evaluation

Working

Identity model (DID, OCEAN personality, communication style, biorhythms)
Psychology pipeline (somatic markers, significance gate, fact extraction, self-model)
ACT-R activation scoring with power-law decay
Klein emergent self-model (67+ self-discovered domains in testing)
.soul file roundtrip (pack, unpack, verify)
AES-256-GCM encryption at rest with scrypt key derivation
GDPR-compliant memory deletion (targeted, entity, time-based) with audit trail
CognitiveEngine with HeuristicEngine fallback
Optional DSPy integration for learnable/optimizable memory pipeline
Pluggable SearchStrategy (BM25 default) and EmbeddingProvider
Vector search (HashEmbedder, TFIDFEmbedder, VectorSearchStrategy)
Structured logging across all 13 runtime modules (PII-safe)
Personality-modulated recall (OCEAN traits influence retrieval ranking)
Eternal storage protocol with mock providers
Bond system (0–100 strength, logarithmic growth, interaction tracking)
Temporal knowledge graph (point-in-time queries, relationship evolution)
Fact conflict detection (superseded_by chain)
Evolution (supervised mutations, approval workflow)
Reincarnation (Soul.reincarnate() — preserves memories, increments incarnation, tracks lineage)
Archival memory store (compressed conversation archives with keyword search and date-range queries)

Not working yet

Skills/XP system. Data models exist but full leveling (domain expertise with XP tracking and portable skill history) is not wired into the runtime.
Archival memory wiring. ArchivalMemoryStore exists with search and date-range queries, but is not integrated into the observe() pipeline or auto-compression.
Learning events. The system records what happened, not what was learned. No formalized feedback loop from experience to procedural knowledge.
Domain isolation. Memory layers exist but aren't namespaced. A billing agent and a legal agent share the same pool.
Trust chain. No cryptographic verification of history. Can't prove what a soul learned or where.
Production eternal storage. Providers are mocks. Real IPFS/Arweave integration needs network dependencies.
Semantic precision. Heuristic keyword recall hits ~13%. Without an LLM, retrieval is limited. BM25 and the LLM engine layer close this gap significantly.

12 Empirical validation

We validated Soul Protocol through six tiers of evaluation, from systems-level correctness to live soul instance grading. Total cost: under $5.

Tier 1: Systems validation

1,000 agents, zero cost

1,000 agents with randomized OCEAN personalities processed 5 multi-turn scenarios each across four use cases (customer support, coding assistant, personal companion, knowledge worker). No LLM. Pure heuristic engine.

Metric	No Memory	With Memory
Recall hit rate	0.0%	82.0%
Recall precision	0.0%	19.6%
Bond growth	50.0	57.2
Skills discovered	0	0.2

The binary result confirms the pipeline works. Memory storage, retrieval, bond updates, and skill discovery all function correctly at scale. 20,000 scenario runs, zero failures.

Tier 2: LLM validation

100 agents, $2.20

Repeating with Claude Haiku as the cognitive engine. Real API calls for sentiment detection, fact extraction, significance scoring, and entity extraction. The LLM engine extracted 2.5x more memories per agent (12.4 vs. 5.0). Recall hit rate stayed identical because the test scenarios were designed for heuristic-level difficulty. The additional memories would become relevant in longer, more complex conversations. 2,500 API calls. $2.20 total.

Tier 3: Quality validation

5 judges, 4 providers

Four targeted tests, each judged by five models from four providers: Claude Haiku (Anthropic), Gemini 3 Flash and Gemini 2.5 Flash Lite (Google), DeepSeek V3 (DeepSeek), and Llama 3.3 70B (Meta). Responses randomly assigned to positions A/B to prevent position bias.

Test	Soul	Baseline	Gap	Winner
Response Quality	8.8	6.5	+2.3	5/5 Soul
Personality Consistency	9.0	5.0	+4.0	5/5 Soul
Hard Recall	8.5	4.8	+3.7	5/5 Soul
Emotional Continuity	9.7	1.9	+7.8	5/5 Soul
Overall	9.0	4.5	+4.5	20/20 Soul

Every single judgment favored soul-enabled agents. All twenty. Across model families that compete with each other commercially. Inter-judge standard deviation stayed below 0.8.

The emotional continuity test produced the largest gap. Three judges gave the soul response a perfect 10/10. The soul tracked an 8-turn emotional arc (excited → devastated → angry → recovering → cautiously optimistic) and reflected the full journey back to the user. The baseline scored 1.9, essentially admitting it had no context.

The hard recall test planted a fact ("prefers GraphQL over REST") at turn 3, buried it under 30 unrelated interactions, then probed at turn 34 with an indirect question about API architecture. The soul recalled the fact at rank 1 in four out of five runs and wove it naturally into the response. The baseline gave generic advice.

Tier 4: Component ablation

25 scenario variations

Which components actually matter? We ran a four-condition ablation with randomized scenarios (SEED=42):

Full Soul — personality + significance-weighted memory with somatic markers and bond context
RAG Only — same recalled facts, but generic prompt and stripped emotional framing
Personality Only — OCEAN-modulated prompt, no memory context
Bare Baseline — generic prompt, no memory, no personality

Test	Full Soul	RAG Only	Personality Only
Response Quality (n=10)	8.3 ± 0.3	7.8 ± 0.3	7.8 ± 0.4
Hard Recall (n=5)	8.4 ± 0.4	8.2 ± 0.2	5.9 ± 0.7
Emotional Continuity (n=10)	9.3 ± 0.2	9.3 ± 0.2	7.2 ± 0.7
Overall	8.7 ± 0.2	8.4 ± 0.2	7.0 ± 0.4

The ablation reveals something interesting: memory and personality contribute differently depending on the task.

For hard recall, memory is the driver. RAG Only (8.2) captures most of the gain. Personality Only (5.9) barely helps, because personality doesn't help you remember facts.

For emotional continuity, retrieved emotional context matters most. RAG Only matches Full Soul at 9.3. Personality Only reaches 7.2. The emotional arc was stored in memory, and retrieval surfaced it.

For response quality, the gap narrows. Either memory or personality provides substantial benefit (both 7.8), and Full Soul (8.3) adds a modest lift by combining them.

The key finding: Full Soul consistently matches or exceeds individual components. The integrated approach never hurts.

Tier 5: Mem0 comparison

See Section 10 for the head-to-head results. Soul Protocol outperformed Mem0 by 2.5 points overall, with the largest gap in emotional continuity (+2.2) where Mem0 captured facts but not emotional arcs.

Tier 6: Soul Health Score

7 dimensions, zero cost

Does a live soul instance actually work? The Soul Health Score (SHS) is a 0–100 composite across seven psychology-informed dimensions. Spin up a soul, run it through structured scenarios, grade the results.

Dimension	Score	What it measures
D1: Memory Recall	—	Long-horizon fact retrieval (not run; requires extended scenarios)
D2: Emotional Intelligence	72.8	Sentiment accuracy, significance gating, mood dynamics, emotional arc coherence
D3: Personality Expression	96.0	OCEAN prompt fidelity, communication style, value alignment, trait stability
D4: Bond / Relationship	100.0	Logarithmic growth curve, positive reinforcement, interaction tracking
D5: Self-Model	88.0	Domain classification accuracy, emergence timing, confidence calibration
D6: Identity Continuity	100.0	Export/import round-trip fidelity, memory count preservation, state recovery
D7: Portability	100.0	Engine-independent design verification
Composite SHS	90.2	Weighted average (D1 excluded)

The entire eval suite runs without an LLM. No API calls, no cost, fully reproducible. The heuristic baseline is the honest floor, not the ceiling.

LLM judge validation. When we ran the same scenarios with Claude Haiku as an evaluator, sentiment accuracy jumped from 70% to 97%. The 17 items the heuristic missed were all context-dependent emotions that no word-list can detect ("my daughter took her first steps" → joy, "they cancelled the project I poured six months into" → frustration). The architecture handles them correctly when a real LLM is plugged in. The gap between 70% and 97% isn't a flaw — it's the difference between the HeuristicEngine fallback and a CognitiveEngine with actual reasoning.

Psychology stack validation

We validated the psychology foundations through 475+ heuristic-only interactions:

Damasio (somatic markers): Negative emotional markers were stickier than positive ones. Recovery from negative valence took 5 interactions; entering negative state took 1. This asymmetry emerged from the math, not explicit programming.
LIDA (significance gate): 23% of interactions passed into episodic memory. 77% were filtered. The gate prevented memory bloat while preserving emotionally charged and novel experiences.
ACT-R (activation decay): Recent and frequently accessed memories outranked older "important" ones. Power-law decay worked as predicted.
Klein (self-model): 67 distinct self-concept domains emerged from 100 diverse interactions with no hardcoded taxonomy. Two souls with opposite OCEAN profiles receiving identical messages developed different domain specializations: the agreeable soul formed emotional domains, the disagreeable soul formed process-oriented ones. Same inputs, different identities.

Portability

A soul carrying 40 conversations serialized into a 4,293-byte .soul file. After re-awakening: every count matched (episodic, semantic, graph). Recall behavior was identical. Nothing lost in transit.

13 Roadmap

v0.6.0

Learning events

A soul that only records what happened is a log file. A soul that records what it learned is a cognitive system. Learning Events formalize the feedback loop: trigger, lesson, confidence, source, and domain. These events travel in the .soul file. When imported into a new runtime, the agent starts with accumulated knowledge instead of from zero.

v0.7.0

Domain isolation and open layers

Memory layers should be user-defined namespaces, not a hardcoded enum. Domains should isolate context: a billing agent shouldn't see legal memory. The protocol defines the namespace mechanism. The runtime provides defaults.

v0.8.0

Trust chain

Cryptographic verification of a soul's history. Every memory write, personality mutation, and learning event gets signed and appended to a Merkle-verifiable chain. A soul can prove what it learned, when, and from what source.

Beyond

The long view

Conway autobiographical hierarchy (episodes to general events to lifetime narrative). Multi-soul communication and shared memory. Federation protocol. Production IPFS/Arweave integration. Protocol implementations in Go and Rust.

14 Conclusion

The problem with current AI memory isn't storage capacity or retrieval speed. These systems weren't designed to model a mind. They were designed to model a search engine.

Human memory is selective. It tags experiences with feeling. It strengthens what gets recalled and lets the rest fade. It gradually forms beliefs about who you are. A vector database bolted onto an LLM does none of this.

The empirical evidence supports this. When we tested Soul Protocol against a stateless baseline across five judge models from four competing providers, every single judgment favored the soul-enabled agent. All twenty. The largest gain came from emotional continuity: tracking the user's emotional arc and reflecting it back produced a 7.8-point improvement. When we benchmarked against Mem0, Soul Protocol led by 2.5 points overall. Not because our retrieval is better, but because we store emotional context alongside facts.

The component ablation told us which pieces matter. Memory retrieval drives recall. Emotional context drives continuity. Personality drives consistency. No single component is sufficient. The integrated approach consistently matches or exceeds any individual component.

The Soul Health Score framework gave us a different lens: end-to-end grading of a live soul instance across seven dimensions. The reference implementation scores 90.2/100 without any LLM. Bond tracking, identity continuity, and portability scored perfect marks. Personality expression hit 96%. The weakest link is heuristic sentiment detection at 70% accuracy — but when we plugged in Claude Haiku as a judge, accuracy jumped to 97%. The architecture is correct. The heuristic fallback is honest about its limitations instead of hiding them.

Goleman's argument was that the qualities that make humans effective aren't cognitive, they're emotional: self-awareness, the ability to read context, the capacity to learn from experience rather than just store it. The same argument applies to AI companions. The ones that will feel real, that users will actually bond with, won't be the ones with the best retrieval precision. They'll be the ones that remember what matters, forget what doesn't, and slowly become more themselves.

The protocol is 695 lines. Everything else is one implementation. We want others to build their own.

References

Goleman, D. (1995). Emotional Intelligence: Why It Can Matter More Than IQ. Bantam Books.
Damasio, A. (1994). Descartes' Error: Emotion, Reason, and the Human Brain. Putnam.
Anderson, J.R. (1993). Rules of the Mind. Lawrence Erlbaum Associates.
Franklin, S. et al. (2003). IDA: A Cognitive Agent Architecture. IEEE International Conference on Systems, Man and Cybernetics.
Klein, S.B. (2004). The Cognitive Neuroscience of Knowing One's Self. In M.S. Gazzaniga (Ed.), The Cognitive Neurosciences III.

Technical reference

Source: github.com/qbtrix/soul-protocol
Install: pip install soul-protocol
Schemas: schemas/ directory
Architecture: docs/architecture.md
License: MIT