跳到主要内容

Chat Post-Processing

When an agent turn finishes, Suzent runs a series of background tasks — writing the transcript, updating memory, compressing context, and saving state. This page explains how that works and what to do when something goes wrong.

Why it runs in the background

Saving state after a turn involves several steps that can take a few seconds each (memory extraction, context compression). Rather than making you wait, Suzent closes the stream immediately and handles those steps in the background. You can send your next message right away.

Two phases

Phase A: Quick snapshot

As soon as the agent finishes streaming, Suzent saves a lightweight snapshot of the conversation — just enough that if you send another message before the background work completes, the new turn still has the full history to work with.

Phase B: Background job

In parallel, a background job runs through these steps in order:

StepWhat it does
TranscriptAppends the turn to a JSONL log file on disk
MemoryExtracts and indexes any facts worth remembering
CompressionTrims the context window if it's getting large
PersistenceWrites the final conversation state to the database

Each step records whether it succeeded or failed, so it's easy to see exactly where something went wrong.

Fast-follow turns

If you send another message before the background job finishes, Suzent detects the overlap and skips the stale write — the newer turn's job takes over. This is expected and harmless. The quick snapshot from Phase A means no history is lost.

Retries

Failed jobs can be retried up to 3 times. A job is only eligible for retry if it failed (not if it was intentionally skipped as stale).

Troubleshooting

Something feels off with message history

Check whether the most recent job for that chat succeeded:

from suzent.database import get_database

db = get_database()
jobs = db.list_postprocess_jobs("your-chat-id", limit=5)

for job in jobs:
icon = "✓" if job.outcome == "success" else "✗"
print(f"{icon} {job.job_id[:8]} {job.outcome or job.status} {job.duration_ms}ms")
if job.error_message:
print(f" {job.error_message}")

Jobs are failing consistently

Find which step is failing:

import json

for job in db.list_postprocess_jobs("your-chat-id", limit=20):
if job.step_status_json:
steps = json.loads(job.step_status_json)
for step, info in steps.items():
if info["status"] == "failed":
print(f"{job.job_id[:8]} {step}: {info.get('error', '')[:80]}")

Common causes:

  • Transcript step — disk full or the transcript directory isn't writable
  • Memory step — embedding model unavailable; check that memory is enabled in config
  • Compression step — usually a malformed message in the conversation history
  • Persistence step — database write error; run sqlite3 chats.db "PRAGMA integrity_check;" to verify

To queue eligible jobs for retry:

for job in db.get_retriable_postprocess_jobs():
db.prepare_job_for_retry(job.job_id)

A lot of "skipped stale" jobs

This is normal under heavy use — it just means you were sending messages faster than the background job could finish. As long as the most recent job succeeded, nothing was lost.

Overall health check

metrics = db.get_postprocess_metrics()
started = metrics["job_started"]

if started > 0:
print(f"Success: {metrics['job_success'] / started * 100:.0f}%")
print(f"Failed: {metrics['job_failed'] / started * 100:.0f}%")
print(f"Stale: {metrics['job_skipped_stale'] / started * 100:.0f}%")

A failure rate above ~5% is worth investigating. A stale rate of 10–30% is normal for active use.

Useful SQL

If you need to dig into the database directly:

-- Recent jobs for a specific chat
SELECT job_id, status, outcome, duration_ms, error_message
FROM postprocess_jobs
WHERE chat_id = 'your-chat-id'
ORDER BY created_at DESC
LIMIT 10;

-- Jobs currently running (anything here for > 5 min is stuck)
SELECT job_id, chat_id, started_at
FROM postprocess_jobs
WHERE status = 'running';

-- Success rate over the last 24 hours
SELECT
ROUND(100.0 * SUM(CASE WHEN outcome='success' THEN 1 ELSE 0 END) / COUNT(*), 1) AS success_pct,
COUNT(*) AS total
FROM postprocess_jobs
WHERE finished_at > datetime('now', '-1 day');