Architecture
System Layers
┌─────────────────────────────────┐
│ Agent Tools │ memory_search, memory_block_update
├─────────────────────────────────┤
│ MemoryManager │ Orchestration & extraction
├─────────────────────────────────┤
│ LanceDBMemoryStore │ Storage operations
├─────────────────────────────────┤
│ LanceDB │ Vector & full-text storage
└─────────────────────────────────┘
Components
LanceDBMemoryStore (lancedb_store.py)
Storage layer
Responsibilities:
- LanceDB connection management
- Core memory block CRUD
- Archival memory with vector embeddings
- Semantic and hybrid search (vector + full-text)
- Statistics
Key Methods:
get_all_memory_blocks()- Retrieve with scoping priorityadd_memory()- Store with embeddingsemantic_search()- Pure vector similarityhybrid_search()- Combined vector + FTS scoring
Scoping: chat-specific > user-level > global
MemoryManager (manager.py)
Orchestration layer
Responsibilities:
- Core memory formatting
- Automatic fact extraction
- Deduplication
- Embedding generation
Key Methods:
get_core_memory()- All blocks with defaultsformat_core_memory_for_context()- Prompt injectionretrieve_relevant_memories()- Auto-retrievalprocess_conversation_turn_for_memories()- Extract from full contextrefresh_core_memory_facts()- Auto-summarize core memorysearch_memories()- Agent-facing search
Extraction Process:
- Full Conversation Turn (User + Agent Steps + Response)
- LLM extraction with "Rich Context" prompts
- Deduplication check
- Store unique facts
- Monitor for high-importance facts -> Trigger Core Memory Refresh
Memory Tools (tools.py)
Agent interface
MemorySearchTool:
- Semantic search across archival
- Formatted results with scores
- Thread-safe execution
MemoryBlockUpdateTool:
- Update core blocks
- Operations: replace, append, search_replace
- Auto-scoping (user vs chat level)
Thread Safety:
Uses asyncio.run_coroutine_threadsafe() for safe execution from worker threads.
Memory Context (memory_context.py)
Prompt templates
format_core_memory_section()- Agent contextformat_retrieved_memories_section()- Search results with intent/outcomeFACT_EXTRACTION_SYSTEM_PROMPT- Rich extraction instructionsCORE_MEMORY_SUMMARIZATION_PROMPT- Auto-summary instructions
Data Flow
Read Path (Memory Injection)
User Query
↓
manager.retrieve_relevant_memories()
↓
Generate embedding → Hybrid search
↓
Format results (Content + Context + Outcome)
↓
Inject into agent prompt
Write Path (Extraction)
Conversation Turn (User + Agent + Response)
↓
manager.process_conversation_turn_for_memories()
↓
LLM extracts facts + context
↓
Store unique facts
↓
If High Importance -> refresh_core_memory_facts()
Tool Usage
Agent decides to search
↓
Calls memory_search(query)
↓
Execute in main loop
↓
Return formatted results
Memory Scoping
User-Level
- Scope: All chats
- Storage:
user_id="x", chat_id=NULL - Use: Preferences, facts, persona
Chat-Level
- Scope: Single conversation
- Storage:
user_id="x", chat_id="y" - Use: Current context, session state
Global
- Scope: All users/chats
- Storage:
user_id=NULL, chat_id=NULL - Use: Default persona
Priority
- Chat-specific (most specific)
- User-level (persistent)
- Global (fallback)
File Structure
src/suzent/memory/
├── __init__.py
├── lancedb_store.py # Storage layer
├── manager.py # Orchestration
├── memory_context.py # Templates
├── tools.py # Agent interface
├── models.py # Pydantic models
└── lifecycle.py # Initialization
Design Principles
- Separation of Concerns - Clear layer boundaries
- Async by Default - Non-blocking I/O
- Flexible Scoping - Automatic priority resolution
- Automatic Management - Facts extracted without commands
- Production Ready - File-based storage with vector + FTS indexing