Sync domain -- L1¶
Sync loop¶
The sync engine runs as a Rust async task per enabled account. Four triggers cause a sync cycle:
- startup
- remote push notification
- periodic poll
- manual sync
Remote push notifications are scheduling hints. JMAP notifications must carry changed object state or a checkpoint to trigger sync. IMAP IDLE returns are selected-mailbox observation hints, so the supervisor can schedule sync even when no changed IDs are named; provider policy decides when that hint requires a full observation path, as Gmail does for label and multi-mailbox state.
The poll timer is scheduled after a sync cycle completes and uses skipped missed-tick behavior. Startup, manual, and push-triggered syncs therefore reset the next poll deadline instead of allowing an overdue periodic tick to run immediately after a long cycle.
For each cycle, the engine loads stored cursors from SQLite, then syncs mailbox state and email state. There is no standalone thread delta fetch for the local conversation list; thread and conversation projections are derived from synced Email records, using JMAP threadId as the authoritative grouping key for JMAP accounts. Manual sync commands can request incremental or fullMetadata mode. fullMetadata drops the stored message cursor for that cycle so the provider performs an authoritative message metadata snapshot. JMAP sync requests both sentAt and receivedAt; the local message date prefers sentAt for user-visible chronology and falls back to receivedAt when needed. The email cursor stores a local metadata-version wrapper around the server state so metadata projection changes can trigger one authoritative full snapshot for existing local rows.
The runtime emits INFO-level structured progress logs for sync start, provider discovery, mailbox/message fetch phases, store writes, and sync completion/failure. Each sync cycle has a sync_id span field so nested gateway and store events can be queried as one operation. IMAP sync logs per-mailbox planning decisions, conservative STATUS no-op skips, per-mailbox header fetch start/completion, and chunked header fetch progress; JMAP full snapshots log ID discovery and metadata chunk progress. These diagnostics are intentionally backend logs first; user-facing progress UI consumes a smaller account progress model rather than raw log lines.
Local cache diagnostics are structured backend logs as well. Candidate generation logs the account, driver, fetch unit, candidate counts, scored byte totals, and per-candidate score details at TRACE. Cache worker batches log budget inputs, scan limits, candidate counts, admission decisions, fetch starts, fetch failures, and successful stores at DEBUG. Supervisor logs summarize each worker batch with scanned, attempted, cached, failed, and skipped counts.
While a sync is running, the supervisor stores compact user-facing progress in
AccountRuntimeOverview.syncProgress. Progress contains the sync ID, trigger,
start timestamp, stage (connecting, discovering, planning, fetching,
storing, or waiting), a short detail string, and optional mailbox/message
counts. Gateways report provider-specific phases through this stable model:
JMAP reports discovery and mailbox/message fetch phases, while IMAP reports
capability discovery, mailbox planning, per-mailbox no-op skips, and mailbox
fetch phases. The supervisor clears progress on success or failure.
Local cache planning¶
Metadata remains mandatory and outside eviction. Optional local cache objects are scored per layer:
- body
- raw message
- attachment blob
The first cache planner uses manual utility scoring. A candidate's priority is:
Default alpha is 0.7, so large objects are penalized without making high
utility large attachments impossible to cache.
Message utility is a weighted sum of normalized signals:
message_utility =
0.35 * recency
+ 0.20 * thread_activity
+ 0.15 * sender_affinity
+ 0.10 * explicit_importance
+ 0.10 * search_context
+ 0.10 * local_behavior
Recency uses a 30-day half-life. Thread, sender, and local behavior signals are saturating decayed counts. Explicit importance is derived from flags, unread state, and Inbox membership. Search context is stronger for tight result sets and top-ranked visible results:
Every message has exactly one structural body cache_object row, keyed by
(account_id, message_id, layer = body, object_id = ''). message remains the
source of truth for message existence and metadata; cache_object is the child
ledger for optional-content work and cache state. Sync writes create the body
row in the same transaction as message metadata, lazy body writes mark it
cached, message deletes remove cache child state, and store startup repairs
legacy databases that have messages without body cache rows.
The first implemented candidate source is baseline sync-time scoring. It uses
metadata available in the synced message record: recency, Inbox membership,
unread state, flagged state, and fetch size. Structural rows start with neutral
byte counts until the scorer materializes provider-aware values. Local user/app
activity then updates message-level signals separately from metadata sync.
Search result visibility is the first signal producer: visible ranked results
write search context and a rank-decayed direct user boost into
cache_message_signal, enqueue the message in cache_rescore_queue with a
cheap local-signal rescore_priority, and wake the account runtime for cache
maintenance. The re-score queue is priority-ordered: local user/app signals such
as visible search results, opened messages, and pinned/thread-active messages
must outrank structural repair and stale periodic maintenance, even when the
maintenance rows are older. If a signal lands on a legacy message that is
missing its structural body row, the store materializes that row before queueing
the re-score. Opening/starred thread behavior and thread-level activity should
use the same signal queue instead of adding fetch-specific shortcuts.
Layer weights are 1.0 for bodies, 0.45 for raw messages, and 0.25 for
attachment blobs. Attachment blobs receive object modifiers for inline
attachments and previously opened attachments.
value_bytes is the useful local content being prioritized. fetch_bytes is
the remote fetch/storage unit needed to satisfy the candidate and is the value
used by size_cost. JMAP body candidates can use a body_only fetch unit,
while IMAP body candidates use raw_message because IMAP cannot reliably fetch
the parsed body/attachment split that JMAP exposes. That makes large IMAP
messages naturally score lower for speculative background caching while still
allowing a high-utility message to win.
Cache budgets have a soft cap and a hard cap. Interactive work such as opening a message or narrowing a search can raise the temporary target between those caps:
Admission is allowed when the candidate fits under effective_target. When it
does not, the candidate must beat the lowest-priority evictable cached object.
Admission is never allowed when it would cross the hard cap.
Metadata sync only records cache candidates; it does not fetch optional content. The account runtime runs cache maintenance through a supervisor-owned resource governor instead of fixed sleeps plus fixed batch sizes. The governor grants short leases for re-score rows, provider fetch requests, and estimated fetch bytes. Background leases refill slowly and stay bounded; interactive maintenance triggered by visible search/opening activity receives a small immediate burst. Gateway/cache failures lower the network-rate multiplier and can place fetch work in short backoff while still allowing local re-score work to continue. Each maintenance slice logs the granted lease and the feedback used to tune the next one.
Each maintenance batch first queues a bounded oldest-first set of stale cache
objects whose last_scored_at is older than the runtime threshold, excluding
objects already queued or currently fetching. This lets time-sensitive signals
such as recency converge even when no sync/search/user signal touches a message.
Stale maintenance may use existing cache priority to order background work, but
its rescore_priority stays below the local-signal band.
The batch then consumes dirty re-score rows ordered by rescore_priority DESC,
then queued_at ASC, rebuilds each candidate's current signal set from message
metadata plus cache_message_signal, updates
provider-aware fetch_unit, value_bytes, fetch_bytes, and
cache_object.priority, and marks non-cached/non-fetching objects wanted.
Unscored structural rows have fetch_bytes = 0 and are not eligible for fetch
selection until this re-score step gives them a concrete fetch cost. The worker
then scans a bounded priority-ordered window of wanted body candidates, admits
only candidates that fit both the cache budget and the current fetch byte/request
lease, marks them fetching, fetches through the same gateway body path as lazy
open, applies the body to the local store, and marks the cache object cached.
Periodic maintenance uses background pressure; interactive maintenance triggered
by visible search results may use the burst space between the soft and hard
caps. Scanning more rows than the fetch-attempt cap prevents one large
over-budget or over-lease message from starving smaller candidates behind it.
Gateway failures mark the object failed with the service error code and do not
fail the whole runtime. Eviction and attachment-blob workers are later policy
layers, not part of the first worker slice.
State management¶
State strings are per-type, per-account, and stored in sync_cursor. The engine reads them on startup and after every successful cycle.
If the server returns cannotCalculateChanges, the engine falls back:
- mailbox sync falls back from
Mailbox/changestoMailbox/query + Mailbox/get - email sync falls back from
Email/changestoEmail/query + Email/get
This fallback is part of the normal sync contract and must preserve correctness even after long offline periods. RFC 8620 treats cannotCalculateChanges as cache invalidation for the affected object type. A full fallback snapshot is authoritative: local objects of that type that are absent from the snapshot are stale and must be removed before the new cursor is persisted.
SyncBatch and apply_sync_batch¶
The primary write path is apply_sync_batch, which receives a single SyncBatch and executes it within one SQLite transaction. Bodies fetched on demand are still stored separately because they happen outside the metadata sync loop.
Required SyncBatch shape:
pub struct SyncBatch {
pub mailboxes: Vec<MailboxRecord>,
pub messages: Vec<MessageRecord>,
pub imap_mailbox_states: Vec<ImapMailboxSyncState>,
pub imap_message_locations: Vec<ImapMessageLocation>,
pub deleted_imap_message_locations: Vec<ImapMessageLocationKey>,
pub deleted_mailbox_ids: Vec<MailboxId>,
pub deleted_message_ids: Vec<MessageId>,
pub replace_all_mailboxes: bool,
pub replace_all_messages: bool,
pub cursors: Vec<SyncCursor>,
}
The important details are replace_all_mailboxes and replace_all_messages:
falsefor delta mailbox syncs fromMailbox/changestruefor full mailbox snapshots fromMailbox/query + Mailbox/getfalsefor delta email syncs fromEmail/changestruefor full email snapshots fromEmail/query + Email/get
When replace_all_mailboxes is true, the store compares the current local mailbox IDs for the account to the mailbox IDs present in the incoming snapshot and deletes any stale local rows before applying upserts. This is what prevents removed server mailboxes from persisting forever in the sidebar after a full resync.
When replace_all_messages is true, the store performs the same reconciliation for messages. This prevents deleted or expunged remote email from surviving locally after an Email/changes cursor gets too old for the server to calculate.
IMAP sync batches also carry imap_mailbox_states,
imap_message_locations, and deleted_imap_message_locations. Upserted
locations are full rows; deleted locations are identity keys without row metadata.
They are applied in the same transaction as their MessageRecords so later sync
cycles, lazy body fetches, and mutations can use persisted IMAP command state
without deriving it from presentation fields. Deleted IMAP locations are
mailbox-scoped: removing one location removes that mailbox membership, but it
must not delete the canonical message while another IMAP location still addresses
it. When an IMAP account has local mailbox role overrides, the store applies
those overrides over the incoming discovery role before writing the mailbox
projection.
SQLite schema¶
The runtime schema is centered around raw message state plus locally derived projections:
mailboxmailbox_role_overridemessageconversationconversation_messagemessage_mailboxmessage_keywordmessage_bodythread_viewsync_cursorevent_logsource_projectionautomation_backfill_jobsender_address_cachecache_objectcache_message_signalcache_rescore_queue
Important derived tables:
conversationstores the latest message pointer, display subject, unread count, and message count used by the paginated middle pane. For JMAP accounts, conversation identity is derived from serverthreadId, not from subject or RFC 5322 threading headers.conversation_messagelinks conversation IDs to concrete(account_id, message_id)pairs.event_logstores ordered domain events with a monotonically increasingseq, which powers/v1/events.sync_cursorstores per-account mailbox and message state strings.imap_mailbox_sync_statestores IMAP cursor state per account and mailbox: mailbox ID/name,UIDVALIDITY, highest UID, andHIGHESTMODSEQwhen available. This table is separate from JMAP-stylesync_cursorbecause IMAP validity and delta state are mailbox-scoped.imap_message_locationstores the mailbox UID locations that make an IMAP message addressable for fetch and mutation commands. This is separate from message identity so provider-stable IDs, or the Gmail IMAP profile's current RFCMessage-IDcanonical fallback, can deduplicate messages across labels while retaining per-mailbox UIDs.mailbox_role_overridestores user-assigned mailbox roles for providers such as IMAP/SMTP where SPECIAL-USE roles are discovered but not portably mutable. A row withrole = NULLis an explicit local clear; absence of a row means use the discovered role.automation_backfill_jobstores durable per-account work for the current automation-rule fingerprint, so completed backfills are not repeated after restart while changed rules enqueue fresh work.sender_address_cachestores account-scoped sender addresses that previously passed provider submission. Entries are keyed by(account_id, normalized_email), ordered bylast_used_at, and used only as compose suggestions.cache_objectstores scored optional-content child objects and their state (wanted,fetching,cached,failed, orevicted). Every message has one structural body row; later layers add raw-message or attachment rows as policy decides. Rows includelayer,fetch_unit,value_bytes,fetch_bytes, priority, reason, timestamps, and last error code. Body cache workers read scored wanted rows from this table and cached rows contribute to the configured cache budget.cache_message_signalstores local cache utility signals that do not come from provider metadata, including search result visibility, local behavior scores, direct user boost, and pinned state.cache_rescore_queuestores account/message pairs whose cache objects need priority re-scoring after signal changes. Rows includereason,queued_at, andrescore_priority; runtime workers drain high-priority signal work before low-priority structural/stale maintenance, then select fetch work from scoredcache_objectrows.
The store maintains account-scoped indexes for message-page reads, including received date and the sortable sender, subject, flagged, and attachment keys used by the message list. These indexes support seek pagination without making the frontend maintain a duplicate message index.
Conversation pagination¶
Conversation pages are generated inside the store from the conversation and message projections using seek pagination:
- sort key:
latest_received_at DESC, conversation_id DESC - cursor contents:
latest_received_atplusconversation_id - query strategy: "strictly older than this cursor", never
OFFSET
This matters because the frontend keeps many pages cached while live updates keep arriving at the top. Seek pagination is stable under prepend-heavy workloads; offset pagination is not.
Event propagation¶
apply_sync_batch emits domain events as it mutates the store:
- mailbox upserts and deletions emit
mailbox.updated - message deletions emit
message.updated - mailbox membership changes can emit
message.arrivedandmessage.mailboxes_changed
These events are inserted into event_log and also published over the local broadcast channel used by /v1/events. Event payloads include a resources array when the changed resource can be expressed generically. SSE is a transport for durable resource-change facts, not a procedural frontend command channel. A successful sync appends sync.completed with the trigger, mode, batch counts, and sync/message/mailbox resources. Settings writes append settings.updated; smart mailbox config writes append smart_mailbox.*; config reload appends config.reloaded. Separate desktop windows consume the same backend stream, map resources to read-model invalidations, then refetch authoritative state through REST.
Account runtime transitions emit account.status_changed. Its payload includes
status, push, lastSyncAt, lastSyncError, lastSyncErrorCode, and
syncProgress, using the same camelCase enum values as REST responses. Progress
updates reuse this event so clients can update account settings and list rows
without polling logs.
Automation actions¶
After a sync batch is written, global automation rules evaluate matching synced message records from that batch. Rules have explicit triggers, smart-mailbox-style conditions, and typed actions. Account and mailbox targeting is expressed as ordinary conditions. The backend still evaluates each rule inside the current account runtime and adds the current source_id plus the synced message IDs to the internal query before applying actions through that account's gateway. The initial settings UI creates an account-conditioned rule equivalent to:
- if a message belongs to account
Aand its sender display name or email contains textX, apply user tagY
Actions mutate the remote server through the same JMAP command paths as manual message actions, then persist the returned mutation locally. Supported action variants include applying/removing user tags, read/unread, flag/unflag, and moving a message to a target mailbox. Action execution is idempotent: if the target state is already true, the action is skipped. Action mutations happen after apply_sync_batch, so they do not weaken the atomicity of the incoming metadata write.
The account runtime also performs automatic backfill for existing local messages. Backfill is intentionally low priority: it runs only while the account runtime has a live gateway, starts after a delay, processes a small bounded batch, publishes resulting mutation events, then waits before the next batch. Foreground sync, push handling, and manual commands remain the primary work of the runtime.
Backfill scheduling is backend-owned and durable. The domain service fingerprints the enabled backfill = true automation rules, creates a pending automation_backfill_job for each enabled account when rules change or when an account runtime first observes missing current work, and marks the job completed once a batch reports no more matching work. Worker failures increment the job attempt count and keep it pending for a later retry.
Conflict model¶
Mutations include optimistic concurrency checks when the gateway supports them. If the server returns stateMismatch, the engine re-syncs the affected type and presents the updated state to the UI. The original mutation is not retried blindly.
Error handling¶
The important sync failure mode is cannotCalculateChanges. That is not treated as a terminal error; it is converted into a full resync for the affected object type. Database failures still abort the transaction and surface as sync failures.
Invariants¶
- The local SQLite database is the single source of truth for the UI. The frontend reads via the REST API, never directly from JMAP.
- State strings are persisted atomically with data in a single SQLite transaction.
- A failed sync never leaves the database inconsistent.
apply_sync_batchruns in a single transaction. - Full mailbox snapshots are authoritative: missing remote mailboxes are pruned locally when
replace_all_mailboxesis true. - Full email snapshots are authoritative: missing remote messages are pruned locally when
replace_all_messagesis true. - Conversation projections are derived from synced message state and use server
threadIdas the grouping key for JMAP mail. - Metadata sync never fetches email bodies. Bodies are optional content fetched by lazy open or by the background cache worker after metadata is committed.
Assertions¶
| ID | Sev. | Assertion |
|---|---|---|
| cache-source-of-truth | MUST | Frontend reads only via REST API, never directly from JMAP |
| state-atomic | MUST | State strings persisted in same SQLite transaction as data via apply_sync_batch |
| snapshot-authoritative | MUST | Full mailbox snapshots prune stale local mailboxes when they disappear remotely |
| message-snapshot-authoritative | MUST | Full email snapshots prune stale local messages when they disappear remotely |
| body-lazy | MUST | Metadata sync does not fetch email bodies; bodies are fetched by lazy open or cache maintenance after metadata is committed |
| fallback-resync | MUST | On cannotCalculateChanges, engine performs full resync for the affected type |
| cache-priority-size-aware | MUST | Optional cache priority uses fetch-unit bytes, so IMAP raw-message body fetches are penalized by combined message size |
| cache-worker-budget | MUST | Cache workers fetch only candidates admitted under the current cache budget and mark fetch failures in the cache ledger |
| cache-resource-governor | MUST | Cache maintenance uses bounded resource leases for re-score rows, fetch requests, and fetch bytes, logs lease/feedback decisions, and backs off fetches after failures |
| cache-object-parity | MUST | Every synced message has one structural body cache_object child row, and message deletion removes its cache object, cache signals, and rescore queue rows |
| cache-signal-rescore | MUST | Local cache utility signals update cache_message_signal, enqueue cache_rescore_queue with higher re-score priority than background maintenance, and are applied by re-scoring before fetch selection |
| cache-stale-rescore | MUST | Cache maintenance periodically queues bounded oldest-first stale cache objects for re-scoring so recency and other time-sensitive utility signals converge |
| imap-state-per-mailbox | MUST | IMAP sync state is stored per account and mailbox, including UIDVALIDITY and optional MODSEQ |
| imap-locations | MUST | IMAP message command locations are stored separately from local message identity |
| imap-modseq-for-keyword-skip | MUST | IMAP mailbox sync may skip metadata fetch only when MODSEQ state proves flags are unchanged |
| gmail-label-canonicalization | SHOULD | Gmail IMAP label observations are applied as one canonical local message with multiple IMAP locations |
| conversation-derived | MUST | Conversation summaries are derived from local message projections using JMAP threadId for JMAP sources |
| event-log-ordered | MUST | Local domain events are ordered by event_log.seq and replayable via afterSeq |
| transaction-scope | MUST | apply_sync_batch executes within a single SQLite transaction |
| automation-backfill-durable | MUST | Automation backfill progress is stored in SQLite and completed jobs for the same rule fingerprint do not rerun after restart |
| sync-progress-runtime | SHOULD | Running account syncs expose compact user-facing progress and clear it on success or failure |
| cache-priority-size-aware | SHOULD | Optional body, raw-message, and attachment cache candidates are prioritized by manual utility divided by size cost |
| cache-admission-hard-cap | MUST | Optional cache admission may exceed the soft cap under pressure but must not exceed the hard cap |