PR for Download File From Link function#241
Open
marindigen wants to merge 6 commits intogeometric-intelligence:mainfrom
Open
PR for Download File From Link function#241marindigen wants to merge 6 commits intogeometric-intelligence:mainfrom
marindigen wants to merge 6 commits intogeometric-intelligence:mainfrom
Conversation
…o check and test the training. To be able to download dataset in the function 'download_file_from_link' in requests.get() verify parameter should be specified as False. Note also that currently the run script on the data doesn't run as it fails to download data even if verify parameter set to False
…ll_to_dict and process_mat. I have also modified download_file_from_link by specifying verify=False in requests.get()
…is flag in the config file
marindigen
added a commit
to marindigen/TopoBenchForNeuro
that referenced
this pull request
Nov 26, 2025
…ion. Move the test class to appropriate file. Note, that the same changes were done in the PR geometric-intelligence#241 (they are duplicated here, as the script wouldn't run otherwise and would require additional adaptation to the old download_file_from_link function.
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR improves the
download_file_from_linkutility to support robust, memory-efficient downloads for large datasets and adds a dedicated test suite to ensure correct behaviour under different network conditions.Motivation
Some of the datasets used in TopoBench (e.g. those hosted on external academic servers) can be:
response.contentdownloads memory-inefficient.verify=False, which previously wasn’t configurable.The old implementation used a single
requests.getcall, loaded the entire response into memory, and did not retry on transient failures. This could lead to frequent failures or hangs when downloading large files over slow connections.What this PR does
1. Improve
download_file_from_linkThe function
download_file_from_linkintopobench.data.utils.io_utilsis updated to:os.makedirs(path_to_save, exist_ok=True).verifyargument (defaultTrue).timeoutargument(default: 60 seconds for the read timeout, 30 seconds for connection).
retriesargument.content-lengthis available:Behavioural notes:
200, the function logs an error and returns without creating a file (same high-level behaviour as before, but now explicit).2. Add tests for
download_file_from_linkThis PR introduces a new test file (e.g.
tests/data/utils/test_io_utils.py) containing a test suite fordownload_file_from_link. The tests useunittest.mockandpytestto cover:iter_contentwith multiple chunks totalling 5MB.path_to_save.404response.requests.getcall raisesrequests.exceptions.Timeout.requests.getis called twice.requests.getcalls raiserequests.exceptions.Timeout.["zip", "tar", "tar.gz"].iter_content.content-lengthheader.download_file_from_linkwithverify=False.requests.getwas invoked withverify=False.timeoutvalue.requests.getuses(30, custom_timeout)for(connect, read)timeouts.Backwards compatibility
file_link,path_to_save,dataset_name,file_format) are unchanged.verify,timeout,retries) have sensible defaults and should not break existing call sites.Testing
download_file_from_link(seetests/data/utils/test_io_utils.py).