feat: improve download speed with the in-built downloader#6632
feat: improve download speed with the in-built downloader#6632LesnyRumcajs merged 1 commit intomainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 WalkthroughAdds a built-in, configurable parallel HTTP snapshot downloader (default 5 connections), larger read buffer (512KB), filename logging and resolved-URL peek usage in daemon flows, updates docs and changelog, and adds new dictionary tokens. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Download as DownloadManager
participant Preflight as Preflight Check
participant Server as HTTP Server
participant FileIO as File I/O
Client->>Download: download_http(url, dir, filename, option, callback)
Download->>Preflight: Check range support (HEAD/GET with Range)
Preflight->>Server: HEAD/GET
Server-->>Preflight: Response (supports Range? / Content-Length / MD5/ETag)
alt Range supported
Preflight-->>Download: Enable parallel mode
Download->>Download: Split into chunks (N)
par Concurrent chunk fetches
Download->>Server: GET Range chunk 1
Download->>Server: GET Range chunk 2
Download->>Server: GET Range chunk N
and
Server-->>Download: Chunk data
Download->>FileIO: Write chunk to temp file
end
Download->>FileIO: Rename temp -> final
else Range not supported
Preflight-->>Download: Fallback to single connection
Download->>Server: Single GET
Server-->>Download: Full file stream
Download->>FileIO: Write to temp file
Download->>FileIO: Rename temp -> final
end
Download-->>Client: DownloadFileResult(path, cache_hit)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/utils/net/download_file.rs (3)
450-451: Minor: doc comment mentions "private HTTP client" but implementation usesglobal_http_client().The doc comment references a "private HTTP client," but the function delegates to
download_http_parallelanddownload_http_single, which both useglobal_http_client().Suggested doc comment fix
-/// Download the file at `url` with a private HTTP client, returning the path to the downloaded file +/// Download the file at `url`, returning the path to the downloaded file. +/// +/// Uses parallel connections for `Resumable` option, single connection otherwise. pub async fn download_http(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/net/download_file.rs` around lines 450 - 451, The doc comment for download_http incorrectly says it uses a "private HTTP client" while the implementation delegates to download_http_parallel and download_http_single which use global_http_client(); update the doc comment on download_http to accurately state that it uses the shared/global HTTP client (via download_http_parallel/download_http_single) or otherwise describe that it delegates to those functions—reference download_http, download_http_parallel, download_http_single, and global_http_client() in the comment so readers know which client is actually used.
446-446: Consider usingtokio::fs::renamefor consistency.Same sync operation concern as in
download_http_parallel. Consider usingtokio::fs::renamefor consistency with the async context.Proposed fix
tempfile.flush().await.context("couldn't flush file")?; - std::fs::rename(&tmp_dst_path, &dst_path).context("couldn't rename file")?; + tokio::fs::rename(&tmp_dst_path, &dst_path) + .await + .context("couldn't rename file")?; Ok(dst_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/net/download_file.rs` at line 446, Replace the synchronous std::fs::rename call with the asynchronous tokio::fs::rename and .await it so the rename runs on the async runtime (e.g., change std::fs::rename(&tmp_dst_path, &dst_path).context("couldn't rename file")? to tokio::fs::rename(&tmp_dst_path, &dst_path).await.context("couldn't rename file")?); ensure the surrounding function is async or already running on tokio, import tokio::fs if needed, and mirror the same change pattern used in download_http_parallel for consistency.
398-403: Use async file operations in async context.
std::fs::renameis a blocking call that can stall the async runtime. While renames are typically fast, usingtokio::fs::renamemaintains consistency with the async context.Also, the enumerate index at line 399 reflects completion order rather than the original chunk number (which is correctly embedded in the inner error context from line 387).
Proposed fix
- for (i, result) in results.into_iter().enumerate() { - result.with_context(|| format!("Chunk {} failed", i))?; + for result in results { + result?; } // Rename to final destination - std::fs::rename(&tmp_dst_path, &dst_path).context("couldn't rename file")?; + tokio::fs::rename(&tmp_dst_path, &dst_path) + .await + .context("couldn't rename file")?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/net/download_file.rs` around lines 398 - 403, The loop over results uses enumerate which reflects completion order not original chunk indices—replace "for (i, result) in results.into_iter().enumerate()" with a simple "for result in results.into_iter()" and propagate each result (keeping the inner error context already added) so you don't log/attach the wrong chunk index; then replace the blocking std::fs::rename(&tmp_dst_path, &dst_path) call with the async tokio::fs::rename(&tmp_dst_path, &dst_path).await and convert the .context(...) usage to work with the awaited future (e.g., await then .context(...)?), ensuring the surrounding function remains async and returns the same Result type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/net/download_file.rs`:
- Around line 267-291: The chunking logic can underflow when total_size <
num_connections: chunk_size = total_size / num_connections may be 0 and
computing end = (i + 1) as u64 * chunk_size - 1 produces u64::MAX; to fix,
compute an effective_connections = min(num_connections as u64,
total_size.max(1)) (or clamp to at most total_size) and derive chunk_size =
total_size / effective_connections, then compute start/end using
effective_connections and ensure the last chunk covers the remainder (using
min(total_size-1, ...)) in the download_tasks mapping (references: chunk_size,
num_connections, start, end, download_tasks).
---
Nitpick comments:
In `@src/utils/net/download_file.rs`:
- Around line 450-451: The doc comment for download_http incorrectly says it
uses a "private HTTP client" while the implementation delegates to
download_http_parallel and download_http_single which use global_http_client();
update the doc comment on download_http to accurately state that it uses the
shared/global HTTP client (via download_http_parallel/download_http_single) or
otherwise describe that it delegates to those functions—reference download_http,
download_http_parallel, download_http_single, and global_http_client() in the
comment so readers know which client is actually used.
- Line 446: Replace the synchronous std::fs::rename call with the asynchronous
tokio::fs::rename and .await it so the rename runs on the async runtime (e.g.,
change std::fs::rename(&tmp_dst_path, &dst_path).context("couldn't rename
file")? to tokio::fs::rename(&tmp_dst_path, &dst_path).await.context("couldn't
rename file")?); ensure the surrounding function is async or already running on
tokio, import tokio::fs if needed, and mirror the same change pattern used in
download_http_parallel for consistency.
- Around line 398-403: The loop over results uses enumerate which reflects
completion order not original chunk indices—replace "for (i, result) in
results.into_iter().enumerate()" with a simple "for result in
results.into_iter()" and propagate each result (keeping the inner error context
already added) so you don't log/attach the wrong chunk index; then replace the
blocking std::fs::rename(&tmp_dst_path, &dst_path) call with the async
tokio::fs::rename(&tmp_dst_path, &dst_path).await and convert the .context(...)
usage to work with the awaited future (e.g., await then .context(...)?),
ensuring the surrounding function remains async and returns the same Result
type.
b255c79 to
7387df3
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/net/download_file.rs`:
- Around line 545-905: The handle_file_request function currently uses direct
slicing content[start as usize..=end as usize] which can panic on edge cases
(empty content or out-of-bounds ranges); update the code in handle_file_request
to use content.get(start as usize..=end as usize) and handle the Option safely:
if .get(...) returns Some(range_content) use it to build the PARTIAL_CONTENT
response (headers: CONTENT_RANGE, CONTENT_LENGTH, ACCEPT_RANGES), otherwise
return an appropriate error response (e.g., StatusCode::RANGE_NOT_SATISFIABLE or
fall back to full content) to avoid panics; ensure variables referenced (start,
end, content_len, range_content) and the Response::builder(...) calls are
adjusted accordingly.
- Around line 429-500: download_http_single currently always calls
crate::utils::net::reader(..., DownloadFileOption::Resumable, ...) which ignores
the caller's intent; change the signature of download_http_single to accept
option: DownloadFileOption and pass that option to reader() instead of the
hardcoded Resumable, then update both call sites in download_http() to forward
the caller's option; additionally, where download_http_parallel falls back to
download_http_single after a failed Resumable parallel attempt, call
download_http_single(..., DownloadFileOption::Resumable) explicitly so the
fallback still uses resumable semantics expected from that path.
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 5 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
7387df3 to
147962b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/utils/net/download_file.rs (2)
503-544:⚠️ Potential issue | 🟡 MinorDocument public helpers that are part of the API surface.
download_file_with_retryanddownload_toare public and should have doc comments to meet the project’s documentation requirement.As per coding guidelines: "Document all public functions and structs with doc comments".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/net/download_file.rs` around lines 503 - 544, Add public doc comments for the two exported helpers download_file_with_retry and download_to: describe what each function does, its parameters (url, directory/destination, filename/option, callback) and the return value/result and error conditions; ensure the comments use Rust doc comment style (///) placed immediately above the function signatures and include examples or usage notes if helpful for callers. Also document the semantics of retry behavior for download_file_with_retry and clarify that download_to writes to destination.parent() and returns an anyhow::Result<()> on failure.
89-106:⚠️ Potential issue | 🟡 MinorAdd doc comments for public API items.
DownloadFileOption,DownloadFileResult, anddownload_file_with_cacheare public but undocumented, which violates the project’s documentation requirement.📝 Doc comment skeleton
+/// Options controlling HTTP download behavior. pub enum DownloadFileOption { NonResumable, Resumable, } +/// Result of a cached download. pub struct DownloadFileResult { pub path: PathBuf, #[allow(dead_code)] pub cache_hit: bool, } +/// Download a URL into the cache directory and return the cached path + hit flag. pub async fn download_file_with_cache( url: &Url, cache_dir: &Path, option: DownloadFileOption, ) -> anyhow::Result<DownloadFileResult> {As per coding guidelines: "Document all public functions and structs with doc comments".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/net/download_file.rs` around lines 89 - 106, Add doc comments for the public API: document the DownloadFileOption enum (describe NonResumable vs Resumable behavior), the DownloadFileResult struct (document the path field and cache_hit meaning), and the download_file_with_cache function (document parameters url, cache_dir, option, the async behavior and the returned Result). Place /// doc comments above the enum, struct, fields (use /// on each public field), and function signature, briefly describing purpose, behavior, and return value so the public items are properly documented.
🧹 Nitpick comments (1)
src/utils/net/download_file.rs (1)
450-455: Consider relying onBufWriterdrop for flushing.The explicit
tempfile.flush()is likely redundant and can be removed to align with the project’s preferred pattern.🧹 Optional cleanup
- tempfile.flush().await.context("couldn't flush file")?;Based on learnings: "In the Forest project, LesnyRumcajs prefers to rely on BufWriter's Drop implementation for automatic flushing rather than explicit flush() calls".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/net/download_file.rs` around lines 450 - 455, The tempfile.flush().await.context("couldn't flush file")? call is redundant; remove that explicit flush and rely on the BufWriter created by tokio::io::BufWriter::with_capacity (the tempfile variable) to flush on drop, keeping the tokio::io::copy(...).await.context("couldn't download file")? and the final tokio::fs::rename(&tmp_dst_path, &dst_path) calls intact so error handling and file rename behavior remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/net/download_file.rs`:
- Around line 674-683: The range parsing currently uses direct indexing on parts
(parts[0], parts[1]) which can panic; change it to use .get() on parts and parse
via safe combinators: retrieve start via parts.get(0).and_then(|s|
s.parse::<u64>().ok()).unwrap_or(0), and compute end via parts.get(1).map(|s|
s.as_str()).filter(|s| !s.is_empty()).and_then(|s|
s.parse::<u64>().ok()).unwrap_or(content_len.saturating_sub(1)); keep the
existing strip_prefix("bytes=") check and preserve the variables range_str,
parts, start, end and content_len so behavior is identical but safe from panics.
---
Outside diff comments:
In `@src/utils/net/download_file.rs`:
- Around line 503-544: Add public doc comments for the two exported helpers
download_file_with_retry and download_to: describe what each function does, its
parameters (url, directory/destination, filename/option, callback) and the
return value/result and error conditions; ensure the comments use Rust doc
comment style (///) placed immediately above the function signatures and include
examples or usage notes if helpful for callers. Also document the semantics of
retry behavior for download_file_with_retry and clarify that download_to writes
to destination.parent() and returns an anyhow::Result<()> on failure.
- Around line 89-106: Add doc comments for the public API: document the
DownloadFileOption enum (describe NonResumable vs Resumable behavior), the
DownloadFileResult struct (document the path field and cache_hit meaning), and
the download_file_with_cache function (document parameters url, cache_dir,
option, the async behavior and the returned Result). Place /// doc comments
above the enum, struct, fields (use /// on each public field), and function
signature, briefly describing purpose, behavior, and return value so the public
items are properly documented.
---
Nitpick comments:
In `@src/utils/net/download_file.rs`:
- Around line 450-455: The tempfile.flush().await.context("couldn't flush
file")? call is redundant; remove that explicit flush and rely on the BufWriter
created by tokio::io::BufWriter::with_capacity (the tempfile variable) to flush
on drop, keeping the tokio::io::copy(...).await.context("couldn't download
file")? and the final tokio::fs::rename(&tmp_dst_path, &dst_path) calls intact
so error handling and file rename behavior remains unchanged.
147962b to
837e8f1
Compare
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes #3715
Other information and links
Change checklist
Outside contributions
Summary by CodeRabbit
New Features
Documentation
Chores