change backup file structure

This commit is contained in:
rambros 2026-03-05 11:47:24 +05:30
parent 146f6f7dd1
commit 5bec9e3ce7
4 changed files with 159 additions and 102 deletions

120
BACKUP.md
View file

@ -20,6 +20,35 @@ graph TD
B --> F[User Cache Object] B --> F[User Cache Object]
``` ```
### File Tree Structure
```
DISCORD_BACKUP-{ServerID}/
├── server_profile/
│ ├── profile.json # Server metadata (name, ID, icon/banner paths)
│ ├── roles.json # All server roles (permissions, colors, positions)
│ ├── structure.json # Full category and channel hierarchy
│ ├── assets.json # Index of custom emojis and stickers
│ └── assets/ # Binary media files
│ ├── server_icon.png
│ ├── server_banner.png
│ ├── emoji_{name}_{id}.png
│ └── sticker_{name}_{id}.png
└── message_backup/
├── users/
│ ├── user_info.json # Deduplicated user profile cache
│ └── avatars/ # User avatar images
│ └── {user_id}.png
└── {channel_id}/
├── messages.json # Channel message history + metadata
├── attachments/ # Channel-level attachments
│ └── {filename}-{id_last_5}.{ext}
└── {thread_id}/ # Thread nested inside parent channel
├── thread_messages.json
└── thread_attachments/
└── {filename}-{id_last_5}.{ext}
```
--- ---
## 2. Data Lifecycle & Serialization ## 2. Data Lifecycle & Serialization
@ -27,7 +56,7 @@ graph TD
### 2.1 Incremental Synchronization Algorithm ### 2.1 Incremental Synchronization Algorithm
To achieve idempotency and efficiency, the system implements an incremental sync strategy using Discord's snowflake IDs. To achieve idempotency and efficiency, the system implements an incremental sync strategy using Discord's snowflake IDs.
1. **State Loading**: The `Exporter` reads the existing `{channel_id}.json` (if present). 1. **State Loading**: The `Exporter` reads the existing `{channel_id}/messages.json` (if present).
2. **Snowflake Extraction**: It extracts the `lastMessageID` from the metadata. 2. **Snowflake Extraction**: It extracts the `lastMessageID` from the metadata.
3. **Filtered Fetch**: It calls `fetch_message_history(after_id=last_id)`. 3. **Filtered Fetch**: It calls `fetch_message_history(after_id=last_id)`.
4. **In-Memory Merge**: New messages are appended to the existing list. 4. **In-Memory Merge**: New messages are appended to the existing list.
@ -37,7 +66,7 @@ To achieve idempotency and efficiency, the system implements an incremental sync
The system avoids redundant storage of user metadata (usernames, roles, colors) by using a global `user_cache` map. The system avoids redundant storage of user metadata (usernames, roles, colors) by using a global `user_cache` map.
- **Key**: `userID` (Snowflake). - **Key**: `userID` (Snowflake).
- **Policy**: Users are added to the cache only on their first appearance in any channel's history. - **Policy**: Users are added to the cache only on their first appearance in any channel's history.
- **Avatar Persistence**: User avatars are stored in a centralized `user_avatars/` directory and referenced by relative paths in the JSON schemas. - **Avatar Persistence**: User avatars are stored in `message_backup/users/avatars/` and referenced by relative paths in the JSON schemas.
--- ---
@ -46,10 +75,10 @@ The system avoids redundant storage of user metadata (usernames, roles, colors)
### 3.1 Forum Channels & Threads ### 3.1 Forum Channels & Threads
Forums present a hierarchical challenge where the "starter message" and the "conversation" exist in separate contexts. Forums present a hierarchical challenge where the "starter message" and the "conversation" exist in separate contexts.
- **Forum Index (`{channel_id}.json`)**: Contains an enriched list of "starter messages" representing each thread. These entries include thread titles, applied tags, and total attachment stats (summed from the entire thread). - **Forum Index (`{forum_id}/messages.json`)**: Contains an enriched list of "starter messages" representing each thread. These entries include thread titles, applied tags, and total attachment stats (summed from the entire thread).
- **Thread Persistence**: - **Thread Persistence**: All threads nest inside their parent channel directory:
- **Regular Threads**: `message_backup/threads/{thread_id}.json` - **Forum Threads**: `message_backup/{forum_id}/{thread_id}/thread_messages.json`
- **Forum Threads**: `message_backup/{forum_id}/{thread_id}.json` - **Regular Threads**: `message_backup/{parent_channel_id}/{thread_id}/thread_messages.json`
- **Starter Identification**: The system uses `thread.history(limit=1, after=snowflake(thread_id - 1))` to reliably capture the first post even if it has been edited or pinned. - **Starter Identification**: The system uses `thread.history(limit=1, after=snowflake(thread_id - 1))` to reliably capture the first post even if it has been edited or pinned.
--- ---
@ -84,7 +113,7 @@ The internal representation of a message focuses on portability:
| `content` | `String` | Raw markdown content (or snapshot content for forwards) | | `content` | `String` | Raw markdown content (or snapshot content for forwards) |
| `userID` | `String` | Reference to `user_info.json` | | `userID` | `String` | Reference to `user_info.json` |
| `attachments`| `Array` | List of local file references and metadata | | `attachments`| `Array` | List of local file references and metadata |
| `embeds` | `Array` | Raw Dicord-formatted embed objects | | `embeds` | `Array` | Raw Discord-formatted embed objects |
| `stickers` | `Array` | List of Message Sticker objects (see below) | | `stickers` | `Array` | List of Message Sticker objects (see below) |
| `reactions` | `Array` | List of Reaction objects | | `reactions` | `Array` | List of Reaction objects |
@ -94,7 +123,7 @@ The internal representation of a message focuses on portability:
| `id` | `String` | Sticker Snowflake ID | | `id` | `String` | Sticker Snowflake ID |
| `name` | `String` | Sticker name | | `name` | `String` | Sticker name |
| `format` | `String` | File format (PNG, APNG, LOTTIE, GIF) | | `format` | `String` | File format (PNG, APNG, LOTTIE, GIF) |
| `localPath` | `String` | Relative path to local file in `{channel_id}/` | | `localPath` | `String` | Relative path to local file in `{channel_id}/attachments/` |
#### Reaction Object #### Reaction Object
| Field | Type | Description | | Field | Type | Description |
@ -108,17 +137,21 @@ To prevent filename collisions (e.g., multiple files named `image.png`), the sys
Example: `sunset-54321.png` Example: `sunset-54321.png`
### 5.3 `server_profile.json` Specification ### 5.3 `profile.json` Specification
Path: `server_profile/profile.json`
| Field | Type | Description | | Field | Type | Description |
| :--- | :--- | :--- | | :--- | :--- | :--- |
| `name` | `String` | Original Discord guild name | | `name` | `String` | Original Discord guild name |
| `id` | `String` | Guild Snowflake ID | | `id` | `String` | Guild Snowflake ID |
| `icon` | `String` | Relative path to local guild icon in `server_media/` | | `icon` | `String` | Relative path to local guild icon in `server_profile/assets/` |
| `banner` | `String` | Relative path to local guild banner in `server_media/` | | `banner` | `String` | Relative path to local guild banner in `server_profile/assets/` |
| `last_backup` | `ISO8601` | Timestamp of the last successful backup run | | `last_backup` | `ISO8601` | Timestamp of the last successful backup run |
| `ignore_channels` | `Array` | List of channel Snowflakes explicitly excluded from backup | | `ignore_channels` | `Array` | List of channel Snowflakes explicitly excluded from backup |
### 5.4 `server_roles.json` Specification (Array of objects) ### 5.4 `roles.json` Specification (Array of objects)
Path: `server_profile/roles.json`
| Field | Type | Description | | Field | Type | Description |
| :--- | :--- | :--- | | :--- | :--- | :--- |
| `id` | `String` | Role Snowflake ID | | `id` | `String` | Role Snowflake ID |
@ -129,24 +162,28 @@ Example: `sunset-54321.png`
| `hoist` | `Boolean` | Whether the role is displayed separately in the sidebar | | `hoist` | `Boolean` | Whether the role is displayed separately in the sidebar |
| `mentionable`| `Boolean` | Whether the role can be mentioned | | `mentionable`| `Boolean` | Whether the role can be mentioned |
### 5.5 `server_assets.json` Specification ### 5.5 `assets.json` Specification
Path: `server_profile/assets.json`
Contains two primary arrays: `emojis` and `stickers`. Contains two primary arrays: `emojis` and `stickers`.
#### Emoji Object #### Emoji Object
| Field | Type | Description | | Field | Type | Description |
| :--- | :--- | :--- | | :--- | :--- | :--- |
| `id` | `String` | Emoji Snowflake ID | | `id` | `String` | Emoji Snowflake ID |
| `name` | `String` | Emoji name (without colons) | | `name` | `String` | Emoji name (without colons) |
| `animated` | `Boolean` | True if the emoji is a GIF | | `animated` | `Boolean` | True if the emoji is a GIF |
| `filename` | `String` | Filename within `server_media/` | | `filename` | `String` | Filename within `server_profile/assets/` |
#### Sticker Object #### Sticker Object
| Field | Type | Description | | Field | Type | Description |
| :--- | :--- | :--- | | :--- | :--- | :--- |
| `id` | `String` | Sticker Snowflake ID | | `id` | `String` | Sticker Snowflake ID |
| `name` | `String` | Sticker name | | `name` | `String` | Sticker name |
| `filename` | `String` | Filename within `server_media/` | | `filename` | `String` | Filename within `server_profile/assets/` |
### 5.6 `structure.json` Specification (Array of Category objects)
Path: `server_profile/structure.json`
### 5.6 `server_structure.json` Specification (Array of Category objects)
#### Category Object #### Category Object
| Field | Type | Description | | Field | Type | Description |
| :--- | :--- | :--- | | :--- | :--- | :--- |
@ -177,6 +214,8 @@ Contains two primary arrays: `emojis` and `stickers`.
| `emoji_name` | `String` | Name of the tag's emoji | | `emoji_name` | `String` | Name of the tag's emoji |
### 5.7 `user_info.json` Specification (Array of User objects) ### 5.7 `user_info.json` Specification (Array of User objects)
Path: `message_backup/users/user_info.json`
| Field | Type | Description | | Field | Type | Description |
| :--- | :--- | :--- | | :--- | :--- | :--- |
| `userID` | `String` | User Snowflake ID | | `userID` | `String` | User Snowflake ID |
@ -185,10 +224,10 @@ Contains two primary arrays: `emojis` and `stickers`.
| `userColor` | `String` | Role-derived color for the user | | `userColor` | `String` | Role-derived color for the user |
| `userIsBot` | `Boolean` | True if the account is a bot | | `userIsBot` | `Boolean` | True if the account is a bot |
| `userRoles` | `Array` | List of role snippets (name, id, color, position) | | `userRoles` | `Array` | List of role snippets (name, id, color, position) |
| `userAvatar` | `String` | Relative path to local avatar in `user_avatars/` | | `userAvatar` | `String` | Relative path to local avatar in `users/avatars/` |
### 5.8 Channel History JSON Specification ### 5.8 Channel History JSON Specification
File: `message_backup/{channel_id}.json` Path: `message_backup/{channel_id}/messages.json`
This file contains the full history of a channel along with synchronization metadata. This file contains the full history of a channel along with synchronization metadata.
@ -206,6 +245,11 @@ This file contains the full history of a channel along with synchronization meta
| `messages` | `Array` | The message objects (see Section 5.1) | | `messages` | `Array` | The message objects (see Section 5.1) |
| `parentID` | `String` | (If Thread) Snowflake of the parent channel | | `parentID` | `String` | (If Thread) Snowflake of the parent channel |
### 5.9 Thread History JSON Specification
Path: `message_backup/{channel_id}/{thread_id}/thread_messages.json`
Same schema as Section 5.8, with `channelType` set to `"Thread"` and `parentID` always present.
--- ---
## 7. Backup Reader Implementation Guide ## 7. Backup Reader Implementation Guide
@ -215,8 +259,8 @@ This section is a technical manual for developers building third-party tools (vi
### 7.1 Entry Point Discovery ### 7.1 Entry Point Discovery
A reader should start by identifying the backup root directory (prefixed with `DISCORD_BACKUP-`). A reader should start by identifying the backup root directory (prefixed with `DISCORD_BACKUP-`).
1. **Parse `server_profile.json`**: Extract the server name, ID, and assets (icon/banner). 1. **Parse `server_profile/profile.json`**: Extract the server name, ID, and assets (icon/banner).
2. **Load `server_structure.json`**: This defines the navigation tree for your UI. 2. **Load `server_profile/structure.json`**: This defines the navigation tree for your UI.
- Iterate through categories. - Iterate through categories.
- Map channels to their respective types (text, voice, forum). - Map channels to their respective types (text, voice, forum).
- Store the `position` to preserve the original visual order. - Store the `position` to preserve the original visual order.
@ -224,11 +268,11 @@ A reader should start by identifying the backup root directory (prefixed with `D
### 7.2 Relational Data Mapping ### 7.2 Relational Data Mapping
The backup data is normalized to minimize duplication. A reader must implement the following resolve logic: The backup data is normalized to minimize duplication. A reader must implement the following resolve logic:
- **User Resolution**: When parsing a message in `{channel_id}.json`, the `userID` must be cross-referenced against the `userID` keys in `message_backup/user_info.json`. - **User Resolution**: When parsing a message in `{channel_id}/messages.json`, the `userID` must be cross-referenced against the `userID` keys in `message_backup/users/user_info.json`.
- **Role Resolution**: Use the `userRoles` array (IDs) from the user object and resolve them against the role metadata in `server_roles.json` to get colors and names. - **Role Resolution**: Use the `userRoles` array (IDs) from the user object and resolve them against the role metadata in `server_profile/roles.json` to get colors and names.
- **Static Asset Resolution**: - **Static Asset Resolution**:
- **Server Assets**: Prepend `server_media/` to filenames found in `server_assets.json`. - **Server Assets**: Prepend `server_profile/assets/` to filenames found in `server_profile/assets.json`.
- **User Avatars**: Resolve `userAvatar` paths found in `user_info.json` (pointing to `user_avatars/`). - **User Avatars**: Resolve `userAvatar` paths found in `user_info.json` (pointing to `users/avatars/`).
### 7.3 Message Rendering Logic ### 7.3 Message Rendering Logic
When rendering the `messages` array from a channel JSON: When rendering the `messages` array from a channel JSON:
@ -236,21 +280,21 @@ When rendering the `messages` array from a channel JSON:
| Feature | Reader Implementation Logic | | Feature | Reader Implementation Logic |
| :--- | :--- | | :--- | :--- |
| **Markdown** | Content is raw Discord markdown. Use a library like `markdown-it` with discord-specific plugins. | | **Markdown** | Content is raw Discord markdown. Use a library like `markdown-it` with discord-specific plugins. |
| **Attachments** | Resolve `url` field (`{channel_id}/{filename}`) relative to the `message_backup/` directory. | | **Attachments** | Resolve `url` field (`{channel_id}/attachments/{filename}`) relative to the `message_backup/` directory. |
| **Emojis/Stickers** | If a message contains custom emojis/stickers, resolve their metadata via `server_assets.json`. | | **Emojis/Stickers** | If a message contains custom emojis/stickers, resolve their metadata via `server_profile/assets.json`. |
| **Replies** | Use the `reference` object to find the target `messageId`. Note: The target might be in the same file or a different channel/thread. | | **Replies** | Use the `reference` object to find the target `messageId`. Note: The target might be in the same file or a different channel/thread. |
### 7.4 Thread & Forum Reconstruction ### 7.4 Thread & Forum Reconstruction
Reconstructing the hierarchy requires specific pointer logic: Reconstructing the hierarchy requires specific pointer logic:
1. **Forums**: 1. **Forums**:
- Read `message_backup/{forum_id}.json`. - Read `message_backup/{forum_id}/messages.json`.
- Each message in this file is a `Thread_starter_message`. - Each message in this file is a `Thread_starter_message`.
- The `messageID` of the starter message *is usually* the same as the `thread_id`. - The `messageID` of the starter message *is usually* the same as the `thread_id`.
- To load the full thread, open `message_backup/{forum_id}/{thread_id}.json`. - To load the full thread, open `message_backup/{forum_id}/{thread_id}/thread_messages.json`.
2. **Regular Threads**: 2. **Regular Threads**:
- Discoverable via the `parentID` field in any message or by scanning `message_backup/threads/`. - Discoverable via the `parentID` field in any message or by scanning for `thread_messages.json` inside channel directories.
- Match the `thread.id` in a `ThreadStarter` message to the respective JSON in the `threads/` folder. - Match the `thread.id` in a `ThreadStarter` message to the respective subdirectory.
--- ---
@ -259,16 +303,16 @@ Reconstructing the hierarchy requires specific pointer logic:
If you are building a `discord.py` API-compatible wrapper to read these backups directly into familiar Discord objects, here is the explicit property mapping from the schema to the standard `discord.py` object attributes. If you are building a `discord.py` API-compatible wrapper to read these backups directly into familiar Discord objects, here is the explicit property mapping from the schema to the standard `discord.py` object attributes.
### 8.1 Base Server (Guild) ### 8.1 Base Server (Guild)
File: `server_profile.json` & `server_roles.json` & `server_structure.json` File: `server_profile/profile.json` & `server_profile/roles.json` & `server_profile/structure.json`
- **`discord.Guild`**: - **`discord.Guild`**:
- `id`: Cast `id` (str) to `int`. - `id`: Cast `id` (str) to `int`.
- `name`: Mapped directly from `name`. - `name`: Mapped directly from `name`.
- `icon` / `banner`: Represented as `discord.Asset` objects. Use the local file paths from `icon` / `banner` as the asset URL/filepath. - `icon` / `banner`: Represented as `discord.Asset` objects. Use the local file paths from `icon` / `banner` as the asset URL/filepath.
- `roles`: Hydrated from `server_roles.json`. - `roles`: Hydrated from `server_profile/roles.json`.
- `channels` / `categories`: Hydrated from `server_structure.json`. - `channels` / `categories`: Hydrated from `server_profile/structure.json`.
### 8.2 Roles (`discord.Role`) ### 8.2 Roles (`discord.Role`)
File: `server_roles.json` File: `server_profile/roles.json`
- `id`: Cast `id` to `int`. - `id`: Cast `id` to `int`.
- `name`: Mapped directly. - `name`: Mapped directly.
- `color`: Parse the hex string to `discord.Color(value)`. - `color`: Parse the hex string to `discord.Color(value)`.
@ -278,7 +322,7 @@ File: `server_roles.json`
- `mentionable`: Mapped directly to boolean. - `mentionable`: Mapped directly to boolean.
### 8.3 Users & Members (`discord.Member` / `discord.User`) ### 8.3 Users & Members (`discord.Member` / `discord.User`)
File: `message_backup/user_info.json` File: `message_backup/users/user_info.json`
- `id`: Cast `userID` to `int`. - `id`: Cast `userID` to `int`.
- `name`: Mapped from `username`. - `name`: Mapped from `username`.
- `display_name`: Mapped from `userNickname`. - `display_name`: Mapped from `userNickname`.
@ -288,7 +332,7 @@ File: `message_backup/user_info.json`
- `avatar`: Mocked `discord.Asset` using the `userAvatar` local path. - `avatar`: Mocked `discord.Asset` using the `userAvatar` local path.
### 8.4 Channels (`discord.TextChannel`, `discord.CategoryChannel`, `discord.ForumChannel`) ### 8.4 Channels (`discord.TextChannel`, `discord.CategoryChannel`, `discord.ForumChannel`)
File: `server_structure.json` File: `server_profile/structure.json`
- Iterate over the top-level array (Categories): - Iterate over the top-level array (Categories):
- **`discord.CategoryChannel`**: - **`discord.CategoryChannel`**:
- `id`: Cast `id` to `int`. - `id`: Cast `id` to `int`.
@ -305,7 +349,7 @@ File: `server_structure.json`
- `nsfw`: Mapped directly to boolean. - `nsfw`: Mapped directly to boolean.
### 8.5 Messages (`discord.Message`) ### 8.5 Messages (`discord.Message`)
File: `message_backup/{channel_id}.json` (Iterating the `messages` array) File: `message_backup/{channel_id}/messages.json` (Iterating the `messages` array)
- `id`: Cast `messageID` to `int`. - `id`: Cast `messageID` to `int`.
- `type`: Map the string `type` (e.g., "Default", "Reply") to `discord.MessageType`. - `type`: Map the string `type` (e.g., "Default", "Reply") to `discord.MessageType`.
- `created_at`: Parse `timestamp` (ISO-8601 string) into a timezone-aware `datetime` object. - `created_at`: Parse `timestamp` (ISO-8601 string) into a timezone-aware `datetime` object.
@ -323,7 +367,7 @@ Nested within Message objects.
- `id`: Cast `id` to `int`. - `id`: Cast `id` to `int`.
- `filename`: Mapped from `fileName`. - `filename`: Mapped from `fileName`.
- `size`: Mapped from `fileSizeBytes`. - `size`: Mapped from `fileSizeBytes`.
- `url` / `proxy_url`: Point to the local relative path (`{channel_id}/{resolved_filename}`). - `url` / `proxy_url`: Point to the local relative path (`{channel_id}/attachments/{resolved_filename}`).
### 8.7 Reactions (`discord.Reaction` & `discord.PartialEmoji`) ### 8.7 Reactions (`discord.Reaction` & `discord.PartialEmoji`)
Nested within Message objects. Nested within Message objects.

View file

@ -687,17 +687,17 @@ class BackupReader:
bp = self.backup_path bp = self.backup_path
# 1. Server profile -> BackupGuild # 1. Server profile -> BackupGuild
profile_file = bp / "server_profile.json" profile_file = bp / "server_profile" / "profile.json"
if profile_file.exists(): if profile_file.exists():
profile = json.loads(profile_file.read_text(encoding="utf-8")) profile = json.loads(profile_file.read_text(encoding="utf-8"))
self.guild = BackupGuild(profile, bp, reader=self) self.guild = BackupGuild(profile, bp, reader=self)
logger.info(f"[Backup] Loaded server profile: {self.guild.name} ({self.guild.id})") logger.info(f"[Backup] Loaded server profile: {self.guild.name} ({self.guild.id})")
else: else:
logger.warning(f"[Backup] server_profile.json not found in {bp}") logger.warning(f"[Backup] server_profile/profile.json not found in {bp}")
self.guild = None self.guild = None
# 2. Roles # 2. Roles
roles_file = bp / "server_roles.json" roles_file = bp / "server_profile" / "roles.json"
if roles_file.exists(): if roles_file.exists():
roles_data = json.loads(roles_file.read_text(encoding="utf-8")) roles_data = json.loads(roles_file.read_text(encoding="utf-8"))
self._roles = [BackupRole(r) for r in roles_data] self._roles = [BackupRole(r) for r in roles_data]
@ -705,7 +705,7 @@ class BackupReader:
logger.info(f"[Backup] Loaded {len(self._roles)} roles") logger.info(f"[Backup] Loaded {len(self._roles)} roles")
# 3. Structure -> categories + channels # 3. Structure -> categories + channels
struct_file = bp / "server_structure.json" struct_file = bp / "server_profile" / "structure.json"
if struct_file.exists(): if struct_file.exists():
structure = json.loads(struct_file.read_text(encoding="utf-8")) structure = json.loads(struct_file.read_text(encoding="utf-8"))
for cat_data in structure: for cat_data in structure:
@ -722,8 +722,8 @@ class BackupReader:
f"{len(self._channels)} channels") f"{len(self._channels)} channels")
# 4. Assets (emojis + stickers) # 4. Assets (emojis + stickers)
assets_file = bp / "server_assets.json" assets_file = bp / "server_profile" / "assets.json"
media_dir = bp / "server_media" media_dir = bp / "server_profile" / "assets"
if assets_file.exists(): if assets_file.exists():
assets = json.loads(assets_file.read_text(encoding="utf-8")) assets = json.loads(assets_file.read_text(encoding="utf-8"))
self._emojis = [BackupEmoji(e, media_dir) for e in assets.get("emojis", [])] self._emojis = [BackupEmoji(e, media_dir) for e in assets.get("emojis", [])]
@ -732,7 +732,7 @@ class BackupReader:
f"{len(self._stickers)} stickers") f"{len(self._stickers)} stickers")
# 5. Users # 5. Users
user_info_file = bp / "message_backup" / "user_info.json" user_info_file = bp / "message_backup" / "users" / "user_info.json"
if user_info_file.exists(): if user_info_file.exists():
try: try:
users = json.loads(user_info_file.read_text(encoding="utf-8")) users = json.loads(user_info_file.read_text(encoding="utf-8"))
@ -764,7 +764,7 @@ class BackupReader:
if not bp.exists() or not bp.is_dir(): if not bp.exists() or not bp.is_dir():
return results return results
profile = bp / "server_profile.json" profile = bp / "server_profile" / "profile.json"
if profile.exists(): if profile.exists():
try: try:
data = json.loads(profile.read_text(encoding="utf-8")) data = json.loads(profile.read_text(encoding="utf-8"))
@ -804,19 +804,23 @@ class BackupReader:
return channels return channels
async def get_backed_up_channel_ids(self) -> List[int]: async def get_backed_up_channel_ids(self) -> List[int]:
"""Returns a list of channel IDs that have corresponding backup JSON files.""" """Returns a list of channel IDs that have corresponding backup directories."""
backup_dir = self.backup_path / "message_backup" backup_dir = self.backup_path / "message_backup"
if not backup_dir.exists(): if not backup_dir.exists():
return [] return []
ids = [] ids = []
for f in backup_dir.glob("*.json"): for d in backup_dir.iterdir():
if f.name == "user_info.json": if not d.is_dir():
continue continue
try: if d.name == "users":
ids.append(int(f.stem)) continue
except ValueError: # A backed-up channel has a messages.json inside its directory
pass if (d / "messages.json").exists():
try:
ids.append(int(d.name))
except ValueError:
pass
return ids return ids
async def get_channel(self, channel_id: int) -> BackupChannel | None: async def get_channel(self, channel_id: int) -> BackupChannel | None:
@ -857,12 +861,12 @@ class BackupReader:
def _load_channel_messages(self, channel_id: int) -> list[dict]: def _load_channel_messages(self, channel_id: int) -> list[dict]:
"""Loads the messages array from a channel JSON file.""" """Loads the messages array from a channel JSON file."""
bp = self.backup_path / "message_backup" bp = self.backup_path / "message_backup"
json_file = bp / f"{channel_id}.json"
# Primary: message_backup/{channel_id}/messages.json
json_file = bp / str(channel_id) / "messages.json"
if not json_file.exists(): if not json_file.exists():
for candidate in [ # Fallback: search for thread_messages.json inside any parent channel
bp / "threads" / f"{channel_id}.json", for candidate in bp.glob(f"*/{channel_id}/thread_messages.json"):
*bp.glob(f"*/{channel_id}.json"),
]:
if candidate.exists(): if candidate.exists():
json_file = candidate json_file = candidate
break break

View file

@ -31,8 +31,12 @@ class DiscordExporter:
self.export_path = self.base_dir / f"DISCORD_BACKUP-{self.server_id}" self.export_path = self.base_dir / f"DISCORD_BACKUP-{self.server_id}"
self.export_path.mkdir(parents=True, exist_ok=True) self.export_path.mkdir(parents=True, exist_ok=True)
# Consolidate media into one folder # Server profile directory for metadata and assets
self.media_path = self.export_path / "server_media" self.profile_path = self.export_path / "server_profile"
self.profile_path.mkdir(exist_ok=True)
# Consolidate media into server_profile/assets/
self.media_path = self.profile_path / "assets"
self.media_path.mkdir(exist_ok=True) self.media_path.mkdir(exist_ok=True)
logger.info(f"Export directory set to: {self.export_path}") logger.info(f"Export directory set to: {self.export_path}")
@ -56,13 +60,13 @@ class DiscordExporter:
if self.reader.guild: if self.reader.guild:
if self.reader.guild.icon: if self.reader.guild.icon:
ext = "gif" if self.reader.guild.icon.is_animated() else "png" ext = "gif" if self.reader.guild.icon.is_animated() else "png"
metadata["icon"] = f"server_media/server_icon.{ext}" metadata["icon"] = f"server_profile/assets/server_icon.{ext}"
else: else:
metadata["icon"] = None metadata["icon"] = None
if self.reader.guild.banner: if self.reader.guild.banner:
ext = "gif" if self.reader.guild.banner.is_animated() else "png" ext = "gif" if self.reader.guild.banner.is_animated() else "png"
metadata["banner"] = f"server_media/server_banner.{ext}" metadata["banner"] = f"server_profile/assets/server_banner.{ext}"
else: else:
metadata["banner"] = None metadata["banner"] = None
@ -70,7 +74,7 @@ class DiscordExporter:
from datetime import datetime from datetime import datetime
metadata["last_backup"] = datetime.now().isoformat() metadata["last_backup"] = datetime.now().isoformat()
output_file = self.export_path / "server_profile.json" output_file = self.profile_path / "profile.json"
# Preserve ignore_channels if the file already exists # Preserve ignore_channels if the file already exists
ignore_channels = [] ignore_channels = []
@ -80,7 +84,7 @@ class DiscordExporter:
old_data = json.load(f) old_data = json.load(f)
ignore_channels = old_data.get("ignore_channels", []) ignore_channels = old_data.get("ignore_channels", [])
except Exception as e: except Exception as e:
logger.warning(f"Could not read existing server_profile.json to preserve ignore_channels: {e}") logger.warning(f"Could not read existing profile.json to preserve ignore_channels: {e}")
metadata["ignore_channels"] = ignore_channels metadata["ignore_channels"] = ignore_channels
@ -102,7 +106,7 @@ class DiscordExporter:
"mentionable": r.mentionable "mentionable": r.mentionable
}) })
output_file = self.export_path / "server_roles.json" output_file = self.profile_path / "roles.json"
await self._save_json(output_file, role_data) await self._save_json(output_file, role_data)
return role_data return role_data
@ -148,7 +152,7 @@ class DiscordExporter:
logger.info("No server banner found to download.") logger.info("No server banner found to download.")
async def export_assets(self): async def export_assets(self):
"""Exports emojis, stickers, and server media to server_assets.json and server_media/.""" """Exports emojis, stickers, and server media to assets.json and server_profile/assets/."""
await self.download_server_assets() await self.download_server_assets()
emojis = await self.reader.get_emojis() emojis = await self.reader.get_emojis()
@ -201,7 +205,7 @@ class DiscordExporter:
logger.error(f"Failed to download sticker {s.name}: {ex}") logger.error(f"Failed to download sticker {s.name}: {ex}")
# Try to load existing customization to merge (if it exists) # Try to load existing customization to merge (if it exists)
custom_file = self.export_path / "server_assets.json" custom_file = self.profile_path / "assets.json"
customization = {"emojis": emoji_data, "stickers": sticker_data, "members": []} customization = {"emojis": emoji_data, "stickers": sticker_data, "members": []}
if custom_file.exists(): if custom_file.exists():
try: try:
@ -262,7 +266,7 @@ class DiscordExporter:
# No need to increment cat_count for 'Uncategorized' usually, # No need to increment cat_count for 'Uncategorized' usually,
# but let's see if the user wants it. For now, cat_count is real Discord categories. # but let's see if the user wants it. For now, cat_count is real Discord categories.
output_file = self.export_path / "server_structure.json" output_file = self.profile_path / "structure.json"
await self._save_json(output_file, structure) await self._save_json(output_file, structure)
return structure, cat_count, chan_count return structure, cat_count, chan_count
@ -313,31 +317,29 @@ class DiscordExporter:
if is_thread: if is_thread:
parent = await self.reader.get_channel(channel.parent_id) parent = await self.reader.get_channel(channel.parent_id)
if isinstance(parent, discord.ForumChannel): # All threads nest inside their parent channel directory
# Forum thread: nested inside forum folder backup_dir = backup_root / str(channel.parent_id) / str(channel_id)
backup_dir = backup_root / str(channel.parent_id) backup_dir.mkdir(parents=True, exist_ok=True)
avatar_rel_base = "../../user_avatars" avatar_rel_base = "../../users/avatars"
else:
# Regular thread
backup_dir = backup_root / "threads"
avatar_rel_base = "../user_avatars"
elif is_forum: elif is_forum:
# Forum metadata root # Forum metadata root: message_backup/{forum_id}/
backup_dir = backup_root backup_dir = backup_root / str(channel_id)
avatar_rel_base = "user_avatars" backup_dir.mkdir(parents=True, exist_ok=True)
avatar_rel_base = "../users/avatars"
else: else:
# Regular channel # Regular channel: message_backup/{channel_id}/
backup_dir = backup_root backup_dir = backup_root / str(channel_id)
avatar_rel_base = "user_avatars" backup_dir.mkdir(parents=True, exist_ok=True)
avatar_rel_base = "../users/avatars"
backup_dir.mkdir(parents=True, exist_ok=True) # Shared avatars directory: message_backup/users/avatars/
users_dir = backup_root / "users"
# Shared avatars directory (always at root of message_backup) users_dir.mkdir(exist_ok=True)
avatar_dir = backup_root / "user_avatars" avatar_dir = users_dir / "avatars"
avatar_dir.mkdir(exist_ok=True) avatar_dir.mkdir(exist_ok=True)
# Load existing user_info.json # Load existing user_info.json
user_info_file = backup_root / "user_info.json" user_info_file = users_dir / "user_info.json"
if not self.user_cache and user_info_file.exists(): if not self.user_cache and user_info_file.exists():
try: try:
with open(user_info_file, "r", encoding="utf-8") as f: with open(user_info_file, "r", encoding="utf-8") as f:
@ -346,9 +348,16 @@ class DiscordExporter:
except Exception: except Exception:
self.user_cache = {} self.user_cache = {}
base_filename = str(channel_id) # Determine file names based on type
json_file = backup_dir / f"{base_filename}.json" if is_thread:
asset_dir = backup_dir / base_filename json_file = backup_dir / "thread_messages.json"
asset_dir = backup_dir / "thread_attachments"
# asset_prefix is the relative path from message_backup/ for URL references
asset_prefix = f"{channel.parent_id}/{channel_id}/thread_attachments"
else:
json_file = backup_dir / "messages.json"
asset_dir = backup_dir / "attachments"
asset_prefix = f"{channel_id}/attachments"
if force and asset_dir.exists(): if force and asset_dir.exists():
import shutil import shutil
@ -386,7 +395,7 @@ class DiscordExporter:
async for msg in self.reader.fetch_message_history(channel_id, after_id=last_id): async for msg in self.reader.fetch_message_history(channel_id, after_id=last_id):
if not self.is_running: break if not self.is_running: break
await asyncio.sleep(0) # Yield control await asyncio.sleep(0) # Yield control
msg_data = await self._format_message(msg, asset_dir, base_filename, avatar_dir, avatar_rel_base) msg_data = await self._format_message(msg, asset_dir, asset_prefix, avatar_dir, avatar_rel_base)
messages.append(msg_data) messages.append(msg_data)
new_count += 1 new_count += 1
accumulated_count += 1 accumulated_count += 1
@ -547,7 +556,7 @@ class DiscordExporter:
"userColor": str(author.color) if hasattr(author, "color") else None, "userColor": str(author.color) if hasattr(author, "color") else None,
"userIsBot": author.bot, "userIsBot": author.bot,
"userRoles": roles, "userRoles": roles,
"userAvatar": f"user_avatars/{user_id}.png" if author.avatar else None, "userAvatar": f"users/avatars/{user_id}.png" if author.avatar else None,
"userAvatarUrl": str(author.display_avatar.url) if author.avatar else None "userAvatarUrl": str(author.display_avatar.url) if author.avatar else None
} }
@ -684,9 +693,9 @@ class DiscordExporter:
is_forum = isinstance(channel, discord.ForumChannel) is_forum = isinstance(channel, discord.ForumChannel)
backup_root = self.export_path / "message_backup" backup_root = self.export_path / "message_backup"
forum_json_file = backup_root / f"{channel_id}.json" forum_json_file = backup_root / str(channel_id) / "messages.json"
forum_asset_dir = backup_root / str(channel_id) forum_asset_dir = backup_root / str(channel_id)
avatar_dir = backup_root / "user_avatars" avatar_dir = backup_root / "users" / "avatars"
thread_count = 0 thread_count = 0
if all_threads: if all_threads:
@ -713,15 +722,15 @@ class DiscordExporter:
logger.debug(f"Found starter message {msg.id} for {thread.name}") logger.debug(f"Found starter message {msg.id} for {thread.name}")
# Save assets in the thread's own directory inside the forum directory # Save assets in the thread's own directory inside the forum directory
thread_asset_dir = forum_asset_dir / str(thread.id) thread_asset_dir = forum_asset_dir / str(thread.id) / "thread_attachments"
thread_asset_dir.mkdir(parents=True, exist_ok=True) thread_asset_dir.mkdir(parents=True, exist_ok=True)
msg_data = await self._format_message( msg_data = await self._format_message(
msg, msg,
thread_asset_dir, thread_asset_dir,
f"{channel_id}/{thread.id}", # Full relative path from message_backup/ f"{channel_id}/{thread.id}/thread_attachments", # Full relative path from message_backup/
avatar_dir, avatar_dir,
"../../user_avatars" # Two levels up from {forum_id}/{thread_id}/ "../../users/avatars" # Two levels up from {forum_id}/{thread_id}/
) )
# Override type and add title for forum starter messages # Override type and add title for forum starter messages
msg_data["type"] = "Thread_starter_message" msg_data["type"] = "Thread_starter_message"
@ -732,7 +741,7 @@ class DiscordExporter:
# Enrich totalFileSizeBytes with the child thread's totalAttachmentSizeBytes # Enrich totalFileSizeBytes with the child thread's totalAttachmentSizeBytes
# (the thread JSON has already been written above) # (the thread JSON has already been written above)
thread_json = backup_root / str(channel_id) / f"{thread.id}.json" thread_json = backup_root / str(channel_id) / str(thread.id) / "thread_messages.json"
if thread_json.exists(): if thread_json.exists():
try: try:
with open(thread_json, "r", encoding="utf-8") as f: with open(thread_json, "r", encoding="utf-8") as f:

View file

@ -91,7 +91,7 @@ class BackupPane(Container):
self._update_ui(f"[red]Error: {e}[/red]", "", "", False) self._update_ui(f"[red]Error: {e}[/red]", "", "", False)
def _get_backup_info(self) -> str | None: def _get_backup_info(self) -> str | None:
profile_file = Path(f"Reaper-{self.cfg_name}") / "server_profile.json" profile_file = Path(f"Reaper-{self.cfg_name}") / "server_profile" / "profile.json"
if profile_file.exists(): if profile_file.exists():
try: try:
with open(profile_file, "r", encoding="utf-8") as f: with open(profile_file, "r", encoding="utf-8") as f:
@ -195,7 +195,7 @@ class BackupPane(Container):
any_found = False any_found = False
backed_up_ids = set() backed_up_ids = set()
for chan in eligible_channels: for chan in eligible_channels:
if (self.exporter.export_path / "message_backup" / f"{chan.id}.json").exists(): if (self.exporter.export_path / "message_backup" / str(chan.id) / "messages.json").exists():
any_found = True any_found = True
backed_up_ids.add(chan.id) backed_up_ids.add(chan.id)
@ -271,7 +271,7 @@ class BackupPane(Container):
break break
await asyncio.sleep(0.01) # Yield to UI thread to keep it responsive await asyncio.sleep(0.01) # Yield to UI thread to keep it responsive
backup_exists = (self.exporter.export_path / "message_backup" / f"{chan.id}.json").exists() backup_exists = (self.exporter.export_path / "message_backup" / str(chan.id) / "messages.json").exists()
is_sync = backup_exists and not force_overwrite is_sync = backup_exists and not force_overwrite
label = "Syncing Backup" if is_sync else "Backing up" label = "Syncing Backup" if is_sync else "Backing up"
@ -346,7 +346,7 @@ class BackupPane(Container):
selected_channels = [ selected_channels = [
c for c in eligible_channels c for c in eligible_channels
if (self.exporter.export_path / "message_backup" / f"{c.id}.json").exists() if (self.exporter.export_path / "message_backup" / str(c.id) / "messages.json").exists()
] ]
if not selected_channels: if not selected_channels: