Architecture

System Layers

┌─────────────────────────────────┐
│      Agent Tools                │  memory_search, memory_block_update
├─────────────────────────────────┤
│      MemoryManager              │  Orchestration & extraction
├─────────────────────────────────┤
│      LanceDBMemoryStore         │  Storage operations
├─────────────────────────────────┤
│      LanceDB                    │  Vector & full-text storage
└─────────────────────────────────┘

Components

LanceDBMemoryStore (lancedb_store.py)

Storage layer

Responsibilities:

LanceDB connection management
Core memory block CRUD
Archival memory with vector embeddings
Semantic and hybrid search (vector + full-text)
Statistics

Key Methods:

get_all_memory_blocks() - Retrieve with scoping priority
add_memory() - Store with embedding
semantic_search() - Pure vector similarity
hybrid_search() - Combined vector + FTS scoring

Scoping: chat-specific > user-level > global

MemoryManager (manager.py)

Orchestration layer

Responsibilities:

Core memory formatting
Automatic fact extraction
Deduplication
Embedding generation

Key Methods:

get_core_memory() - All blocks with defaults
format_core_memory_for_context() - Prompt injection
retrieve_relevant_memories() - Auto-retrieval
process_conversation_turn_for_memories() - Extract from full context
refresh_core_memory_facts() - Auto-summarize core memory
search_memories() - Agent-facing search

Extraction Process:

Full Conversation Turn (User + Agent Steps + Response)
LLM extraction with "Rich Context" prompts
Deduplication check
Store unique facts
Monitor for high-importance facts -> Trigger Core Memory Refresh

Memory Tools (tools.py)

Agent interface

MemorySearchTool:

Semantic search across archival
Formatted results with scores
Thread-safe execution

MemoryBlockUpdateTool:

Update core blocks
Operations: replace, append, search_replace
Auto-scoping (user vs chat level)

Thread Safety: Uses asyncio.run_coroutine_threadsafe() for safe execution from worker threads.

Memory Context (memory_context.py)

Prompt templates

format_core_memory_section() - Agent context
format_retrieved_memories_section() - Search results with intent/outcome
FACT_EXTRACTION_SYSTEM_PROMPT - Rich extraction instructions
CORE_MEMORY_SUMMARIZATION_PROMPT - Auto-summary instructions

Data Flow

Read Path (Memory Injection)

User Query
  ↓
manager.retrieve_relevant_memories()
  ↓
Generate embedding → Hybrid search
  ↓
Format results (Content + Context + Outcome)
  ↓
Inject into agent prompt

Write Path (Extraction)

Conversation Turn (User + Agent + Response)
  ↓
manager.process_conversation_turn_for_memories()
  ↓
LLM extracts facts + context
  ↓
Store unique facts
  ↓
If High Importance -> refresh_core_memory_facts()

Tool Usage

Agent decides to search
  ↓
Calls memory_search(query)
  ↓
Execute in main loop
  ↓
Return formatted results

Memory Scoping

User-Level

Scope: All chats
Storage: user_id="x", chat_id=NULL
Use: Preferences, facts, persona

Chat-Level

Scope: Single conversation
Storage: user_id="x", chat_id="y"
Use: Current context, session state

Global

Scope: All users/chats
Storage: user_id=NULL, chat_id=NULL
Use: Default persona

Priority

Chat-specific (most specific)
User-level (persistent)
Global (fallback)

File Structure

src/suzent/memory/
├── __init__.py
├── lancedb_store.py     # Storage layer
├── manager.py           # Orchestration
├── memory_context.py    # Templates
├── tools.py             # Agent interface
├── models.py            # Pydantic models
└── lifecycle.py         # Initialization

Design Principles

Separation of Concerns - Clear layer boundaries
Async by Default - Non-blocking I/O
Flexible Scoping - Automatic priority resolution
Automatic Management - Facts extracted without commands
Production Ready - File-based storage with vector + FTS indexing

System Layers​

Components​

LanceDBMemoryStore (lancedb_store.py)​

MemoryManager (manager.py)​

Memory Tools (tools.py)​

Memory Context (memory_context.py)​

Data Flow​

Read Path (Memory Injection)​

Write Path (Extraction)​

Tool Usage​

Memory Scoping​

User-Level​

Chat-Level​

Global​

Priority​

File Structure​

Design Principles​