disco-reaper/docs/backup-specs.md
2026-03-21 19:43:39 +05:30

3 KiB

Discord Reaper: Backup System Technical Specification

Discord Reaper uses a high-performance SQLite-based backup system designed to handle massive community snapshots with millions of messages, deduplicated media, and cross-session persistence.


1. Architectural Overview

The backup system has transitioned from a legacy JSON-based flat-file structure to a central SQLite Database (reaper.db) architecture. This allows for:

  • O(1) lookups: Instant mapping of original message IDs to target message IDs.
  • Memory Efficiency: The system no longer loads massive message lists into RAM; it streams data directly from disk.

Component Relationship

graph TD
    A[Discord API] --> B[DiscordReader]
    B --> C[BackupDatabase]
    C --> D[reaper.db]
    C --> E[Media Pool /cas/]
    D --> F[Migration Shuttle]

2. The Database Schema (reaper.db)

The backup is stored in a single SQLite file, typically found in your ReaperFiles-{ServerID}/ directory. The schema is normalized to prevent redundancy.

Core Tables

  • guild_profile: Stores server name, ID, description, owner, and icon/banner URLs.
  • roles & permissions: Captures every custom role (colors, bits) and complex channel-specific permission overwrites.
  • channels & threads: Detailed metadata including category nesting, NSFW flags, bitrates, and thread archive statuses.
  • messages: The central history table. Stores sender IDs, timestamps, content, and references.
  • attachments / embeds / reactions: Relational tables linked to messages for storing rich data.
  • users: A deduplicated author cache (usernames, avatars, server roles).
  • user_alias: Stores mapping between User IDs and their generated "Privacy Aliases" (e.g., SwiftFox).

3. Media Pool & CAS Logic

Reaper implements a Content-Addressable Storage (CAS) system for all media (images, videos, stickers, avatars).

How it works:

  1. Hashing: When a file is downloaded, Reaper calculates its SHA-256 hash.
  2. Deduplication: If another message contains the exact same image (even years later or in a different channel), Reaper sees the hash already exists in the media_pool table.
  3. Referencing: Instead of downloading the file again, Reaper simply creates a reference to the existing file in the /cas/ directory.

Benefit: This significantly reduces the disk footprint of backups for servers where users frequently share the same memes or assets.


4. Incremental Synchronization

Reaper uses the Snowflake ID logic to perform "Delta Backups":

  1. Scan: The system queries the messages table for the highest (newest) ID in a specific channel.
  2. Fetch: It then calls the Discord API to fetch messages after that specific ID.
  3. Merge: Only the brand-new messages are inserted into the database.

This makes "keeping a backup updated" an extremely fast operation, even for servers with millions of existing messages.