3 KiB
Discord Reaper: Backup System Technical Specification
Discord Reaper uses a high-performance SQLite-based backup system designed to handle massive community snapshots with millions of messages, deduplicated media, and cross-session persistence.
1. Architectural Overview
The backup system has transitioned from a legacy JSON-based flat-file structure to a central SQLite Database (reaper.db) architecture. This allows for:
- O(1) lookups: Instant mapping of original message IDs to target message IDs.
- Memory Efficiency: The system no longer loads massive message lists into RAM; it streams data directly from disk.
Component Relationship
graph TD
A[Discord API] --> B[DiscordReader]
B --> C[BackupDatabase]
C --> D[reaper.db]
C --> E[Media Pool /cas/]
D --> F[Migration Shuttle]
2. The Database Schema (reaper.db)
The backup is stored in a single SQLite file, typically found in your ReaperFiles-{ServerID}/ directory. The schema is normalized to prevent redundancy.
Core Tables
guild_profile: Stores server name, ID, description, owner, and icon/banner URLs.roles&permissions: Captures every custom role (colors, bits) and complex channel-specific permission overwrites.channels&threads: Detailed metadata including category nesting, NSFW flags, bitrates, and thread archive statuses.messages: The central history table. Stores sender IDs, timestamps, content, and references.attachments/embeds/reactions: Relational tables linked tomessagesfor storing rich data.users: A deduplicated author cache (usernames, avatars, server roles).user_alias: Stores mapping between User IDs and their generated "Privacy Aliases" (e.g.,SwiftFox).
3. Media Pool & CAS Logic
Reaper implements a Content-Addressable Storage (CAS) system for all media (images, videos, stickers, avatars).
How it works:
- Hashing: When a file is downloaded, Reaper calculates its SHA-256 hash.
- Deduplication: If another message contains the exact same image (even years later or in a different channel), Reaper sees the hash already exists in the
media_pooltable. - Referencing: Instead of downloading the file again, Reaper simply creates a reference to the existing file in the
/cas/directory.
Benefit: This significantly reduces the disk footprint of backups for servers where users frequently share the same memes or assets.
4. Incremental Synchronization
Reaper uses the Snowflake ID logic to perform "Delta Backups":
- Scan: The system queries the
messagestable for the highest (newest) ID in a specific channel. - Fetch: It then calls the Discord API to fetch messages
afterthat specific ID. - Merge: Only the brand-new messages are inserted into the database.
This makes "keeping a backup updated" an extremely fast operation, even for servers with millions of existing messages.