add docs
This commit is contained in:
parent
c4a6e18e2b
commit
11b9230078
5 changed files with 231 additions and 375 deletions
375
BACKUP.md
375
BACKUP.md
|
|
@ -1,375 +0,0 @@
|
||||||
# Discord Reaper: Backup System Technical Specification
|
|
||||||
|
|
||||||
This document provides a deep-dive into the architecture, data lifecycle, and resilience strategies of the Discord Reaper backup system.
|
|
||||||
|
|
||||||
## 1. Architectural Overview
|
|
||||||
|
|
||||||
The backup system is built on a decoupled architecture that separates the API communication layer from the business logic and I/O operations.
|
|
||||||
|
|
||||||
- **`DiscordReader` (API Provider)**: A high-level wrapper around the `discord.py` library. It handles authentication, rate limiting, and provides an asynchronous interface for fetching guild data, message history, and binary assets. It focuses on *fetching* rather than *processing*.
|
|
||||||
- **`DiscordExporter` (Orchestration & Serialization)**: The core engine that defines the export lifecycle. It consumes data from the `Reader`, transforms it into standardized schemas, and manages local filesystem operations.
|
|
||||||
|
|
||||||
### Component Interaction Diagram
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
graph TD
|
|
||||||
A[UI / CLI] --> B[DiscordExporter]
|
|
||||||
B --> C[DiscordReader]
|
|
||||||
C --> D[Discord API]
|
|
||||||
B --> E[Local Filesystem]
|
|
||||||
B --> F[User Cache Object]
|
|
||||||
```
|
|
||||||
|
|
||||||
### File Tree Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
DISCORD_BACKUP-{ServerID}/
|
|
||||||
├── server_profile/
|
|
||||||
│ ├── profile.json # Server metadata (name, ID, icon/banner paths)
|
|
||||||
│ ├── roles.json # All server roles (permissions, colors, positions)
|
|
||||||
│ ├── structure.json # Full category and channel hierarchy
|
|
||||||
│ ├── assets.json # Index of custom emojis and stickers
|
|
||||||
│ └── assets/ # Binary media files
|
|
||||||
│ ├── server_icon.png
|
|
||||||
│ ├── server_banner.png
|
|
||||||
│ ├── emoji_{name}_{id}.png
|
|
||||||
│ └── sticker_{name}_{id}.png
|
|
||||||
└── message_backup/
|
|
||||||
├── users/
|
|
||||||
│ ├── user_info.json # Deduplicated user profile cache
|
|
||||||
│ └── avatars/ # User avatar images
|
|
||||||
│ └── {user_id}.png
|
|
||||||
└── {channel_id}/
|
|
||||||
├── messages.json # Channel message history + metadata
|
|
||||||
├── attachments/ # Channel-level attachments
|
|
||||||
│ └── {filename}-{id_last_5}.{ext}
|
|
||||||
└── {thread_id}/ # Thread nested inside parent channel
|
|
||||||
├── thread_messages.json
|
|
||||||
└── thread_attachments/
|
|
||||||
└── {filename}-{id_last_5}.{ext}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 2. Data Lifecycle & Serialization
|
|
||||||
|
|
||||||
### 2.1 Incremental Synchronization Algorithm
|
|
||||||
To achieve idempotency and efficiency, the system implements an incremental sync strategy using Discord's snowflake IDs.
|
|
||||||
|
|
||||||
1. **State Loading**: The `Exporter` reads the existing `{channel_id}/messages.json` (if present).
|
|
||||||
2. **Snowflake Extraction**: It extracts the `lastMessageID` from the metadata.
|
|
||||||
3. **Filtered Fetch**: It calls `fetch_message_history(after_id=last_id)`.
|
|
||||||
4. **In-Memory Merge**: New messages are appended to the existing list.
|
|
||||||
5. **Atomic Write**: The updated JSON is written back to disk, ensuring that only new delta data is fetched from the API.
|
|
||||||
|
|
||||||
### 2.2 User Profile Deduplication (`user_info.json`)
|
|
||||||
The system avoids redundant storage of user metadata (usernames, roles, colors) by using a global `user_cache` map.
|
|
||||||
- **Key**: `userID` (Snowflake).
|
|
||||||
- **Policy**: Users are added to the cache only on their first appearance in any channel's history.
|
|
||||||
- **Avatar Persistence**: User avatars are stored in `message_backup/users/avatars/` and referenced by relative paths in the JSON schemas.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 3. Special Channel Type Specifications
|
|
||||||
|
|
||||||
### 3.1 Forum Channels & Threads
|
|
||||||
Forums present a hierarchical challenge where the "starter message" and the "conversation" exist in separate contexts.
|
|
||||||
|
|
||||||
- **Forum Index (`{forum_id}/messages.json`)**: Contains an enriched list of "starter messages" representing each thread. These entries include thread titles, applied tags, and total attachment stats (summed from the entire thread).
|
|
||||||
- **Thread Persistence**: All threads nest inside their parent channel directory:
|
|
||||||
- **Forum Threads**: `message_backup/{forum_id}/{thread_id}/thread_messages.json`
|
|
||||||
- **Regular Threads**: `message_backup/{parent_channel_id}/{thread_id}/thread_messages.json`
|
|
||||||
- **Starter Identification**: The system uses `thread.history(limit=1, after=snowflake(thread_id - 1))` to reliably capture the first post even if it has been edited or pinned.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 4. Resilience & Error Handling
|
|
||||||
|
|
||||||
### 4.1 Permission Resilience (403 Forbidden)
|
|
||||||
The system is designed to "fail-soft" when encountering restricted content:
|
|
||||||
- **Server Level**: If the bot lacks `view_channel` or `read_message_history` globally, the backup aborts with a clear error.
|
|
||||||
- **Channel Level**: If a specific channel is restricted, the error is logged, and the system proceeds to the next channel to ensure a partial backup is still completed.
|
|
||||||
- **Asset Level**: If an emoji or sticker cannot be downloaded due to permissions, the metadata is preserved with a `null` local path.
|
|
||||||
|
|
||||||
### 4.2 Lottie Sticker Workaround
|
|
||||||
Discord's Lottie stickers (format: 3) are not supported by standard `discord.py` save methods. The system implements a bypass:
|
|
||||||
1. Extracts the internal `aiohttp` session from the client: `client.http._HTTPClient__session`.
|
|
||||||
2. Performs a direct `GET` request to the sticker URL.
|
|
||||||
3. Streams the raw byte data directly to a `.json` file locally.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 5. Technical Schemas
|
|
||||||
|
|
||||||
### 5.1 Message Object (`_format_message`)
|
|
||||||
The internal representation of a message focuses on portability:
|
|
||||||
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `messageID` | `String` | Original Discord Snowflake |
|
|
||||||
| `type` | `String` | Normalized type (Text, ThreadStarter, Forward, etc.) |
|
|
||||||
| `timestamp` | `ISO8601` | Created date/time |
|
|
||||||
| `isPinned` | `Boolean` | Pin status |
|
|
||||||
| `content` | `String` | Raw markdown content (or snapshot content for forwards) |
|
|
||||||
| `userID` | `String` | Reference to `user_info.json` |
|
|
||||||
| `attachments`| `Array` | List of local file references and metadata |
|
|
||||||
| `embeds` | `Array` | Raw Discord-formatted embed objects |
|
|
||||||
| `stickers` | `Array` | List of Message Sticker objects (see below) |
|
|
||||||
| `reactions` | `Array` | List of Reaction objects |
|
|
||||||
|
|
||||||
#### Message Sticker Object
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `id` | `String` | Sticker Snowflake ID |
|
|
||||||
| `name` | `String` | Sticker name |
|
|
||||||
| `format` | `String` | File format (PNG, APNG, LOTTIE, GIF) |
|
|
||||||
| `localPath` | `String` | Relative path to local file in `{channel_id}/attachments/` |
|
|
||||||
|
|
||||||
#### Reaction Object
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `emoji` | `String` | String representation (`unicode` or `name:id`) |
|
|
||||||
| `count` | `Integer` | Total count of this reaction |
|
|
||||||
|
|
||||||
### 5.2 Asset Naming Logic
|
|
||||||
To prevent filename collisions (e.g., multiple files named `image.png`), the system uses a suffixing strategy:
|
|
||||||
`{filename_stem}-{snowflake_last_5}.{ext}`
|
|
||||||
|
|
||||||
Example: `sunset-54321.png`
|
|
||||||
|
|
||||||
### 5.3 `profile.json` Specification
|
|
||||||
Path: `server_profile/profile.json`
|
|
||||||
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `name` | `String` | Original Discord guild name |
|
|
||||||
| `id` | `String` | Guild Snowflake ID |
|
|
||||||
| `icon` | `String` | Relative path to local guild icon in `server_profile/assets/` |
|
|
||||||
| `banner` | `String` | Relative path to local guild banner in `server_profile/assets/` |
|
|
||||||
| `last_backup` | `ISO8601` | Timestamp of the last successful backup run |
|
|
||||||
| `ignore_channels` | `Array` | List of channel Snowflakes explicitly excluded from backup |
|
|
||||||
|
|
||||||
### 5.4 `roles.json` Specification (Array of objects)
|
|
||||||
Path: `server_profile/roles.json`
|
|
||||||
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `id` | `String` | Role Snowflake ID |
|
|
||||||
| `name` | `String` | Role name |
|
|
||||||
| `color` | `String` | Hex-string representation of role color (e.g. `"#ffffff"`) |
|
|
||||||
| `position` | `Integer` | Vertical position in the hierarchy (0 is bottom) |
|
|
||||||
| `permissions`| `Integer` | Bitwise integer representing the role's Discord permissions |
|
|
||||||
| `hoist` | `Boolean` | Whether the role is displayed separately in the sidebar |
|
|
||||||
| `mentionable`| `Boolean` | Whether the role can be mentioned |
|
|
||||||
|
|
||||||
### 5.5 `assets.json` Specification
|
|
||||||
Path: `server_profile/assets.json`
|
|
||||||
Contains two primary arrays: `emojis` and `stickers`.
|
|
||||||
|
|
||||||
#### Emoji Object
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `id` | `String` | Emoji Snowflake ID |
|
|
||||||
| `name` | `String` | Emoji name (without colons) |
|
|
||||||
| `animated` | `Boolean` | True if the emoji is a GIF |
|
|
||||||
| `filename` | `String` | Filename within `server_profile/assets/` |
|
|
||||||
|
|
||||||
#### Sticker Object
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `id` | `String` | Sticker Snowflake ID |
|
|
||||||
| `name` | `String` | Sticker name |
|
|
||||||
| `filename` | `String` | Filename within `server_profile/assets/` |
|
|
||||||
|
|
||||||
### 5.6 `structure.json` Specification (Array of Category objects)
|
|
||||||
Path: `server_profile/structure.json`
|
|
||||||
|
|
||||||
#### Category Object
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `type` | `String` | Always `"category"` |
|
|
||||||
| `id` | `String` | Category Snowflake ID (or `"uncategorized"`) |
|
|
||||||
| `name` | `String` | Category name |
|
|
||||||
| `position` | `Integer` | Vertical position in hierarchy |
|
|
||||||
| `channels` | `Array` | List of Channel objects (see below) |
|
|
||||||
|
|
||||||
#### Channel Object
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `id` | `String` | Channel Snowflake ID |
|
|
||||||
| `name` | `String` | Channel name |
|
|
||||||
| `type` | `String` | "text", "voice", "forum", "news", or "thread" |
|
|
||||||
| `position` | `Integer` | Vertical position within the category |
|
|
||||||
| `topic` | `String` | Channel description/topic (null if empty) |
|
|
||||||
| `nsfw` | `Boolean` | True if marked Restricted/NSFW |
|
|
||||||
| `available_tags` | `Array` | List of Forum Tag objects (see below) |
|
|
||||||
|
|
||||||
#### Forum Tag Object
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `id` | `String` | Tag Snowflake ID |
|
|
||||||
| `name` | `String` | Tag display name |
|
|
||||||
| `moderated` | `Boolean` | True if restricted to moderators |
|
|
||||||
| `emoji_id` | `String` | ID of the tag's emoji (null if unicode/none) |
|
|
||||||
| `emoji_name` | `String` | Name of the tag's emoji |
|
|
||||||
|
|
||||||
### 5.7 `user_info.json` Specification (Array of User objects)
|
|
||||||
Path: `message_backup/users/user_info.json`
|
|
||||||
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `userID` | `String` | User Snowflake ID |
|
|
||||||
| `username` | `String` | Current global username |
|
|
||||||
| `userNickname`| `String` | Server-specific nickname (display name) |
|
|
||||||
| `userColor` | `String` | Role-derived color for the user |
|
|
||||||
| `userIsBot` | `Boolean` | True if the account is a bot |
|
|
||||||
| `userRoles` | `Array` | List of role snippets (name, id, color, position) |
|
|
||||||
| `userAvatar` | `String` | Relative path to local avatar in `users/avatars/` |
|
|
||||||
|
|
||||||
### 5.8 Channel History JSON Specification
|
|
||||||
Path: `message_backup/{channel_id}/messages.json`
|
|
||||||
|
|
||||||
This file contains the full history of a channel along with synchronization metadata.
|
|
||||||
|
|
||||||
| Field | Type | Description |
|
|
||||||
| :--- | :--- | :--- |
|
|
||||||
| `channelName` | `String` | Human-readable name of the channel |
|
|
||||||
| `channelID` | `String` | Channel Snowflake ID |
|
|
||||||
| `channelType` | `String` | "Text", "Thread", "News", or "Forum" |
|
|
||||||
| `messageCount` | `Integer` | Total number of messages stored in the `messages` array |
|
|
||||||
| `threadCount` | `Integer` | (If Parent) Count of threads associated with this channel |
|
|
||||||
| `lastMessageID`| `String` | ID of the most recent message (used for incremental sync) |
|
|
||||||
| `totalAttachmentSizeBytes`| `Integer`| Summed size of all attachments for this channel |
|
|
||||||
| `numberOfAttachments` | `Integer`| Total count of attachments |
|
|
||||||
| `lastBackup` | `ISO8601` | Timestamp of last message fetch |
|
|
||||||
| `messages` | `Array` | The message objects (see Section 5.1) |
|
|
||||||
| `parentID` | `String` | (If Thread) Snowflake of the parent channel |
|
|
||||||
|
|
||||||
### 5.9 Thread History JSON Specification
|
|
||||||
Path: `message_backup/{channel_id}/{thread_id}/thread_messages.json`
|
|
||||||
|
|
||||||
Same schema as Section 5.8, with `channelType` set to `"Thread"` and `parentID` always present.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 7. Backup Reader Implementation Guide
|
|
||||||
|
|
||||||
This section is a technical manual for developers building third-party tools (viewers, search engines, or analytics) to consume Discord Reaper backups.
|
|
||||||
|
|
||||||
### 7.1 Entry Point Discovery
|
|
||||||
A reader should start by identifying the backup root directory (prefixed with `DISCORD_BACKUP-`).
|
|
||||||
|
|
||||||
1. **Parse `server_profile/profile.json`**: Extract the server name, ID, and assets (icon/banner).
|
|
||||||
2. **Load `server_profile/structure.json`**: This defines the navigation tree for your UI.
|
|
||||||
- Iterate through categories.
|
|
||||||
- Map channels to their respective types (text, voice, forum).
|
|
||||||
- Store the `position` to preserve the original visual order.
|
|
||||||
|
|
||||||
### 7.2 Relational Data Mapping
|
|
||||||
The backup data is normalized to minimize duplication. A reader must implement the following resolve logic:
|
|
||||||
|
|
||||||
- **User Resolution**: When parsing a message in `{channel_id}/messages.json`, the `userID` must be cross-referenced against the `userID` keys in `message_backup/users/user_info.json`.
|
|
||||||
- **Role Resolution**: Use the `userRoles` array (IDs) from the user object and resolve them against the role metadata in `server_profile/roles.json` to get colors and names.
|
|
||||||
- **Static Asset Resolution**:
|
|
||||||
- **Server Assets**: Prepend `server_profile/assets/` to filenames found in `server_profile/assets.json`.
|
|
||||||
- **User Avatars**: Resolve `userAvatar` paths found in `user_info.json` (pointing to `users/avatars/`).
|
|
||||||
|
|
||||||
### 7.3 Message Rendering Logic
|
|
||||||
When rendering the `messages` array from a channel JSON:
|
|
||||||
|
|
||||||
| Feature | Reader Implementation Logic |
|
|
||||||
| :--- | :--- |
|
|
||||||
| **Markdown** | Content is raw Discord markdown. Use a library like `markdown-it` with discord-specific plugins. |
|
|
||||||
| **Attachments** | Resolve `url` field (`{channel_id}/attachments/{filename}`) relative to the `message_backup/` directory. |
|
|
||||||
| **Emojis/Stickers** | If a message contains custom emojis/stickers, resolve their metadata via `server_profile/assets.json`. |
|
|
||||||
| **Replies** | Use the `reference` object to find the target `messageId`. Note: The target might be in the same file or a different channel/thread. |
|
|
||||||
|
|
||||||
### 7.4 Thread & Forum Reconstruction
|
|
||||||
Reconstructing the hierarchy requires specific pointer logic:
|
|
||||||
|
|
||||||
1. **Forums**:
|
|
||||||
- Read `message_backup/{forum_id}/messages.json`.
|
|
||||||
- Each message in this file is a `ThreadStarter`.
|
|
||||||
- The `messageID` of the starter message *is usually* the same as the `thread_id`.
|
|
||||||
- To load the full thread, open `message_backup/{forum_id}/{thread_id}/thread_messages.json`.
|
|
||||||
2. **Regular Threads**:
|
|
||||||
- Discoverable via the `parentID` field in any message or by scanning for `thread_messages.json` inside channel directories.
|
|
||||||
- Match the `thread.id` in a `ThreadStarter` message to the respective subdirectory.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 8. Discord.py Model Hydration Guide
|
|
||||||
|
|
||||||
If you are building a `discord.py` API-compatible wrapper to read these backups directly into familiar Discord objects, here is the explicit property mapping from the schema to the standard `discord.py` object attributes.
|
|
||||||
|
|
||||||
### 8.1 Base Server (Guild)
|
|
||||||
File: `server_profile/profile.json` & `server_profile/roles.json` & `server_profile/structure.json`
|
|
||||||
- **`discord.Guild`**:
|
|
||||||
- `id`: Cast `id` (str) to `int`.
|
|
||||||
- `name`: Mapped directly from `name`.
|
|
||||||
- `icon` / `banner`: Represented as `discord.Asset` objects. Use the local file paths from `icon` / `banner` as the asset URL/filepath.
|
|
||||||
- `roles`: Hydrated from `server_profile/roles.json`.
|
|
||||||
- `channels` / `categories`: Hydrated from `server_profile/structure.json`.
|
|
||||||
|
|
||||||
### 8.2 Roles (`discord.Role`)
|
|
||||||
File: `server_profile/roles.json`
|
|
||||||
- `id`: Cast `id` to `int`.
|
|
||||||
- `name`: Mapped directly.
|
|
||||||
- `color`: Parse the hex string to `discord.Color(value)`.
|
|
||||||
- `position`: Mapped directly.
|
|
||||||
- `permissions`: Initialize `discord.Permissions(value=int(permissions))`.
|
|
||||||
- `hoist`: Mapped directly to boolean.
|
|
||||||
- `mentionable`: Mapped directly to boolean.
|
|
||||||
|
|
||||||
### 8.3 Users & Members (`discord.Member` / `discord.User`)
|
|
||||||
File: `message_backup/users/user_info.json`
|
|
||||||
- `id`: Cast `userID` to `int`.
|
|
||||||
- `name`: Mapped from `username`.
|
|
||||||
- `display_name`: Mapped from `userNickname`.
|
|
||||||
- `bot`: Mapped from `userIsBot`.
|
|
||||||
- `color`: Parse `userColor` string to `discord.Color`.
|
|
||||||
- `roles`: List of hydrated `discord.Role` objects via matching `id`s from the `userRoles` array.
|
|
||||||
- `avatar`: Mocked `discord.Asset` using the `userAvatar` local path.
|
|
||||||
|
|
||||||
### 8.4 Channels (`discord.TextChannel`, `discord.CategoryChannel`, `discord.ForumChannel`)
|
|
||||||
File: `server_profile/structure.json`
|
|
||||||
- Iterate over the top-level array (Categories):
|
|
||||||
- **`discord.CategoryChannel`**:
|
|
||||||
- `id`: Cast `id` to `int`.
|
|
||||||
- `name`: Mapped directly.
|
|
||||||
- `position`: Mapped directly.
|
|
||||||
- Iterate over the nested `channels` array:
|
|
||||||
- **`discord.abc.GuildChannel` classes**:
|
|
||||||
- `id`: Cast `id` to `int`.
|
|
||||||
- `name`: Mapped directly.
|
|
||||||
- `position`: Mapped directly.
|
|
||||||
- `type`: Match the `type` string back to the `discord.ChannelType` enum.
|
|
||||||
- `category_id`: Inherited from the parent category block.
|
|
||||||
- `topic`: Mapped directly (if applicable).
|
|
||||||
- `nsfw`: Mapped directly to boolean.
|
|
||||||
|
|
||||||
### 8.5 Messages (`discord.Message`)
|
|
||||||
File: `message_backup/{channel_id}/messages.json` (Iterating the `messages` array)
|
|
||||||
- `id`: Cast `messageID` to `int`.
|
|
||||||
- `type`: Map the string `type` (e.g., "Default", "Reply") to `discord.MessageType`.
|
|
||||||
- `created_at`: Parse `timestamp` (ISO-8601 string) into a timezone-aware `datetime` object.
|
|
||||||
- `pinned`: Mapped from `isPinned`.
|
|
||||||
- `content`: Mapped from `content`.
|
|
||||||
- `author`: Resolve the `userID` against the loaded `discord.Member` mocks.
|
|
||||||
- `embeds`: Instantiate using `discord.Embed.from_dict(embed_dict)` directly on the elements of the `embeds` array.
|
|
||||||
- **Reference (Replies)**:
|
|
||||||
- If `reference` exists, hydrate a `discord.MessageReference`.
|
|
||||||
- `message_id`: Cast `reference.messageId` to `int`.
|
|
||||||
- `channel_id`: Cast `reference.channelId` to `int`.
|
|
||||||
|
|
||||||
### 8.6 Attachments (`discord.Attachment`)
|
|
||||||
Nested within Message objects.
|
|
||||||
- `id`: Cast `id` to `int`.
|
|
||||||
- `filename`: Mapped from `fileName`.
|
|
||||||
- `size`: Mapped from `fileSizeBytes`.
|
|
||||||
- `url` / `proxy_url`: Point to the local relative path (`{channel_id}/attachments/{resolved_filename}`).
|
|
||||||
|
|
||||||
### 8.7 Reactions (`discord.Reaction` & `discord.PartialEmoji`)
|
|
||||||
Nested within Message objects.
|
|
||||||
- `count`: Mapped from `count`.
|
|
||||||
- `emoji`: Iterate the `emoji` string. If custom (contains a `:`), split it to mock a `discord.PartialEmoji(name=..., id=...)`. Otherwise, mock standard unicode strings.
|
|
||||||
62
docs/backup-specs.md
Normal file
62
docs/backup-specs.md
Normal file
|
|
@ -0,0 +1,62 @@
|
||||||
|
# Discord Reaper: Backup System Technical Specification
|
||||||
|
|
||||||
|
Discord Reaper uses a high-performance **SQLite-based** backup system designed to handle massive community snapshots with millions of messages, deduplicated media, and cross-session persistence.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Architectural Overview
|
||||||
|
|
||||||
|
The backup system has transitioned from a legacy JSON-based flat-file structure to a central **SQLite Database** (`reaper.db`) architecture. This allows for:
|
||||||
|
- **O(1) lookups**: Instant mapping of original message IDs to target message IDs.
|
||||||
|
- **Memory Efficiency**: The system no longer loads massive message lists into RAM; it streams data directly from disk.
|
||||||
|
|
||||||
|
### Component Relationship
|
||||||
|
```mermaid
|
||||||
|
graph TD
|
||||||
|
A[Discord API] --> B[DiscordReader]
|
||||||
|
B --> C[BackupDatabase]
|
||||||
|
C --> D[reaper.db]
|
||||||
|
C --> E[Media Pool /cas/]
|
||||||
|
D --> F[Migration Shuttle]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. The Database Schema (`reaper.db`)
|
||||||
|
|
||||||
|
The backup is stored in a single SQLite file, typically found in your `ReaperFiles-{ServerID}/` directory. The schema is normalized to prevent redundancy.
|
||||||
|
|
||||||
|
### Core Tables
|
||||||
|
* **`guild_profile`**: Stores server name, ID, description, owner, and icon/banner URLs.
|
||||||
|
* **`roles` & `permissions`**: Captures every custom role (colors, bits) and complex channel-specific permission overwrites.
|
||||||
|
* **`channels` & `threads`**: Detailed metadata including category nesting, NSFW flags, bitrates, and thread archive statuses.
|
||||||
|
* **`messages`**: The central history table. Stores sender IDs, timestamps, content, and references.
|
||||||
|
* **`attachments` / `embeds` / `reactions`**: Relational tables linked to `messages` for storing rich data.
|
||||||
|
* **`users`**: A deduplicated author cache (usernames, avatars, server roles).
|
||||||
|
* **`user_alias`**: Stores mapping between User IDs and their generated "Privacy Aliases" (e.g., `SwiftFox`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Media Pool & CAS Logic
|
||||||
|
|
||||||
|
Reaper implements a **Content-Addressable Storage (CAS)** system for all media (images, videos, stickers, avatars).
|
||||||
|
|
||||||
|
### How it works:
|
||||||
|
1. **Hashing**: When a file is downloaded, Reaper calculates its **SHA-256 hash**.
|
||||||
|
2. **Deduplication**: If another message contains the exact same image (even years later or in a different channel), Reaper sees the hash already exists in the `media_pool` table.
|
||||||
|
3. **Referencing**: Instead of downloading the file again, Reaper simply creates a reference to the existing file in the `/cas/` directory.
|
||||||
|
|
||||||
|
**Benefit**: This significantly reduces the disk footprint of backups for servers where users frequently share the same memes or assets.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Incremental Synchronization
|
||||||
|
|
||||||
|
Reaper uses the **Snowflake ID** logic to perform "Delta Backups":
|
||||||
|
1. **Scan**: The system queries the `messages` table for the highest (newest) ID in a specific channel.
|
||||||
|
2. **Fetch**: It then calls the Discord API to fetch messages `after` that specific ID.
|
||||||
|
3. **Merge**: Only the brand-new messages are inserted into the database.
|
||||||
|
|
||||||
|
This makes "keeping a backup updated" an extremely fast operation, even for servers with millions of existing messages.
|
||||||
|
|
||||||
|
---
|
||||||
47
docs/faq.md
Normal file
47
docs/faq.md
Normal file
|
|
@ -0,0 +1,47 @@
|
||||||
|
# DiscoReaper - Help & FAQ
|
||||||
|
|
||||||
|
### Ctrl+V doesnt work for pasting the tokens?
|
||||||
|
- You can try pasting with `Ctrl+Shift+V`, this is also the default keybind for linux systems.
|
||||||
|
- Windows 10 has been known to be problematic, it seems to use a separate clipboard for the terminal.
|
||||||
|
- As a workaround, you can directly paste tokens in **config.yaml** file inside the ReaperFiles-XXX folder
|
||||||
|
```
|
||||||
|
discord_bot_token: YOUR_DISCORD_BOT_TOKEN
|
||||||
|
fluxer_bot_token: YOUR_FLUXER_BOT_TOKEN
|
||||||
|
stoat_bot_token: YOUR_STOAT_BOT_TOKEN
|
||||||
|
```
|
||||||
|
|
||||||
|
### Messages are empty, Only Timestamps are migrated?
|
||||||
|
- The Bot might be missing the **Message Content Intent** Privilege.
|
||||||
|
- **Enable it** in the Discord Developer Portal under the **Bot** tab.
|
||||||
|
|
||||||
|
### Why are some Discord channels are missing?
|
||||||
|
- If the missing channel has any custom permission overrides, **add the bot role manually** to the **channel or its parent category**, this will allow the bot to access them.
|
||||||
|
- The bot role may not have access to the private channels in your discord server, Even when you give Read Messages & View Channels permission to the bot role.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Can I use the tool to delete messages in the Discord server?
|
||||||
|
- **No.** The tool operates using the Discord Bot API with **read-only permissions** (view and read access).
|
||||||
|
- It does not perform any write actions, so it cannot modify, delete, or change anything in your Discord server.
|
||||||
|
|
||||||
|
### Can I migrate Personal messages (DMs)?
|
||||||
|
- **No.** The tool uses the Discord Bot API, which does not grant access to personal messages (DMs), so they cannot be migrated or exported.
|
||||||
|
|
||||||
|
|
||||||
|
### Which Platforms are supported?
|
||||||
|
- **Fluxer** and **Stoat** are the supported target platforms.
|
||||||
|
- Eligibility criteria for new platforms:
|
||||||
|
- Open Source
|
||||||
|
- Self-Hostable
|
||||||
|
- Bot API
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Where do I get help?
|
||||||
|
|
||||||
|
Ping me in the [Reaper Community](https://fluxer.gg/9KxDP8WH) on Fluxer
|
||||||
|
|
||||||
|
Provide the following details when asking for help:
|
||||||
|
- Your Operating System
|
||||||
|
- Reaper Version
|
||||||
|
- Briefly describe your issue
|
||||||
85
docs/features.md
Normal file
85
docs/features.md
Normal file
|
|
@ -0,0 +1,85 @@
|
||||||
|
# Extensive Feature list
|
||||||
|
Here's complete list of everything the tool can handle and any specific caveats or workarounds.
|
||||||
|
|
||||||
|
- 🟩Fully supported
|
||||||
|
- 🟧Work Around implemented
|
||||||
|
- 🟥Limitation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Migration Tool
|
||||||
|
|
||||||
|
## Server Identity
|
||||||
|
* 🟩 **Server Name**
|
||||||
|
* 🟩 **Server Icon**
|
||||||
|
* 🟩 **Server Banner**
|
||||||
|
|
||||||
|
## Server Structure
|
||||||
|
* 🟩 **Categories**
|
||||||
|
- 🟩 **Metadata**: Copies Category names, also preserve the position of the Categories and its child channels.
|
||||||
|
- 🟩 **Permission Overwrites**: Copies role-specific permission overwrites set at the category level.
|
||||||
|
* 🟩 **Channels**
|
||||||
|
- 🟩 **Metadata**: Copies Channel names, topics/descriptions.
|
||||||
|
- 🟩 **NSFW Flag** is preserved for age restricted channels.
|
||||||
|
- 🟧 **Slowmode** timing rules are preserved for text channels (Stoat doesnt have slowmode).
|
||||||
|
- 🟩 **Bitrate** setting is also preserved Voice channels
|
||||||
|
- 🟩 **Permission Overwrites**: Copies role-specific permission overwrites set on individual channels.
|
||||||
|
|
||||||
|
## Roles
|
||||||
|
* 🟩 **Role Cloning**: Clones every custom role with its original name, color (hex code), and hierarchical position.
|
||||||
|
* 🟩 **Role Settings**: Copies the "Display role members separately from online members" (hoist) and "Allow anyone to @mention this role" settings.
|
||||||
|
|
||||||
|
### Text Messages
|
||||||
|
* 🟩 **Markdown Support**: Preserves all standard and advanced Discord markdown:
|
||||||
|
- `**Bold**`, `*Italics*`, `__Underline__`, `~~Strikethrough~~`.
|
||||||
|
- `> Blockquotes` and `>>> Multi-line quotes`.
|
||||||
|
- `Code blocks` and ` ```Full code blocks with syntax highlighting``` `.
|
||||||
|
- `||Spoilers||`.
|
||||||
|
* 🟩 **Message Authors**: Preserves the original author's name and avatar in migrated messages.
|
||||||
|
* 🟩 **Reply Links**: Preserves the connection of replies if the referenced message was previously migrated.
|
||||||
|
* 🟩 **Embeds**: Clones rich embeds sent by bots or webhooks, including their colors, fields, and descriptions.
|
||||||
|
|
||||||
|
### File Attachments
|
||||||
|
* 🟩 **Media**: Downloads and re-uploads images, videos, and audio clips.
|
||||||
|
* 🟩 **Documents**: Preserves PDFs, text files, and other message attachments.
|
||||||
|
* 🟧 **Size Limits**: Automatically checks size limits. If an attachment is too large for the target platform.
|
||||||
|
> Stoat doesnt support many file types, so those attachments are skipped and an error message will be sent in place of the attachment
|
||||||
|
|
||||||
|
### Emojis
|
||||||
|
* 🟩 **Standard Emojis**: Full support for Unicode emojis.
|
||||||
|
* 🟩 **Custom Server Emojis**: Automatically clones custom server emojis (both static and animated) to the target server.
|
||||||
|
* 🟩 **Emoji Mapping in messages**: References to custom emojis in migrated messages are updated to point to the new emoji IDs on the target platform.
|
||||||
|
* 🟥 **External Emojis**: Emojis from external servers are not migrated; only the `:emoji_name:` will be displayed.
|
||||||
|
|
||||||
|
### Stickers
|
||||||
|
* 🟧 **Normal Stickers**: Migrates static and animated stickers (`.png`, `.apng`, `.gif`).
|
||||||
|
* 🟧 **Lottie Stickers**: Converted during migration:
|
||||||
|
- **Fluxer**: Converted to `.webp` for compatibility.
|
||||||
|
- **Stoat**: Converted to `.gif`.
|
||||||
|
> Stickers are currently migrated as attachments since sending stickers via api is not yet fully supported on target platforms.
|
||||||
|
|
||||||
|
### Mentions & Discord Links
|
||||||
|
* 🟩 **Channel Mentions**: Automatically resolves `#channel` mentions to point to the correct migrated channel on the target server.
|
||||||
|
* 🟩 **Role Mentions**: Automatically resolves `@role` mentions to point to the cloned role on the target platform.
|
||||||
|
* 🟩 **User Mentions**: Mentions of users are preserved as text [`@nickname`] if that user.
|
||||||
|
* 🟩 **Message Links**: Standard Discord message URLs are resolved to the new URL on Fluxer or Stoat (as long as the target message has been migrated).
|
||||||
|
* 🟩 **Channel Links**: Discord channel URLs (like `discord.com/channels/...`) are resolved to their matching target channel link.
|
||||||
|
|
||||||
|
### Threads & Forums
|
||||||
|
* 🟩 **Active & Archived Threads**: Scans and migrates messages from all threads.
|
||||||
|
* 🟧 **Thread Structure**: Maintains the nesting of threads within their parent channels.
|
||||||
|
> This is a workaround as Threads and Forums are not yet implement in Fluxer and Stoat
|
||||||
|
|
||||||
|
### Misc features
|
||||||
|
* 🟩 **Incremental Sync**: Scans migration database to find the "Last Message ID." It skips duplicate data and continues where it left off.
|
||||||
|
* 🟩 **Audit Logging**: Creates a `#reaper-logs` channel in the target community and sends status updates and error reports.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Backup Tool
|
||||||
|
You can create a local snapshot of your Discord server using the Backup Tool.
|
||||||
|
|
||||||
|
* 🟩 **SQLite Database**: All server mappings, message IDs, and content are stored in a SQLite database.
|
||||||
|
* 🟩 **Deduplicated Attachments**: All the attachment media is stored in a content-addressable storage system.
|
||||||
|
* 🟩 **Incremental Syncing**: Scans your existing local files and only downloads messages that have been sent since your last backup.
|
||||||
|
* 🟩 **Supported Migrated**: Local backup can be used to later migrate your data without any dependance on discord API - even if that server was deleted.
|
||||||
37
docs/guide.md
Normal file
37
docs/guide.md
Normal file
|
|
@ -0,0 +1,37 @@
|
||||||
|
# Discord Reaper Operations
|
||||||
|
|
||||||
|
## 1. Migration & Shuttle Tools
|
||||||
|
The primary suite of tools for moving data between Discord & Fluxer/Stoat.
|
||||||
|
|
||||||
|
* **Shuttle Migration (Direct)**:
|
||||||
|
Connects directly to Discord and your target platform to migrate categories, channels, roles, and messages in real-time. It uses intelligent mapping to ensure links and mentions work in the new community.
|
||||||
|
* **Resumed Migration**:
|
||||||
|
If a large migration is interrupted or hits a rate limit, the Resumed Migration tool uses the local SQLite database to pick up exactly where it left off, skipping messages that have already been transferred.
|
||||||
|
* **Offline Migration (from Backup)**:
|
||||||
|
Allows you to migrate data without reaching out to Discord's servers. It uses a previously created local backup as the source, making it the safest and fastest way to migrate if the original Discord server has already been deleted or restricted.
|
||||||
|
* **Batch Channel Processing**:
|
||||||
|
Allows you to select up to 50 specific channels or categories to migrate at once. Reaper will process them sequentially to stay within platform rate limits.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Backup & Archiving
|
||||||
|
Tools for creating and maintaining permanent local copies of your data.
|
||||||
|
|
||||||
|
* **Full Server Backup**:
|
||||||
|
Downloads the entire server structure-including all categories, channels, custom roles, and permission overwrites-along with full message histories and all file attachments into a high-performance SQLite database.
|
||||||
|
* **Incremental Syncing**:
|
||||||
|
Scans your existing local backup database and only downloads messages sent since the last run. This is extremely efficient for keeping a continuous archive of active communities.
|
||||||
|
* **Media Deduplication**:
|
||||||
|
An background operation that ensures identical file attachments (images, stickers, etc.) are only stored once on your hard drive, even if they appear in multiple messages or channels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Danger Zone (Advanced Management)
|
||||||
|
Powerful tools for server cleanup and reset. Use with caution.
|
||||||
|
|
||||||
|
* **Server Wipe**:
|
||||||
|
A high-speed utility that deletes all categories, channels, and custom roles on a target server. This is used to "reset" a community before testing a fresh migration. Requires explicit confirmation.
|
||||||
|
* **Permission Purge**:
|
||||||
|
Scans every channel and category on the target server and removes all role and user-specific permission overwrites, restoring the server to its default permission state.
|
||||||
|
|
||||||
|
---
|
||||||
Loading…
Add table
Reference in a new issue