Util Net¶
- exception dffml.util.net.DirectoryNotExtractedError(directory_path)[source]¶
Raised when extraction of directory failed!
- exception dffml.util.net.ProtocolNotAllowedError(url, allowlist)[source]¶
Raised when a URL’s protocol is not in allowed list of protocols.
- async dffml.util.net.cached_download(url: Union[str, Request], target_path: Union[str, Path], expected_hash: str, protocol_allowlist: List[str] = ['https://'])[source]¶
Download a file and verify the hash of the downloaded file. If the file already exists and the hash matches, do not re-download the file.
The path to the downloaded file is prepended to the argument list of the wrapped function.
You can use tools like
curl
to download the file, and thensha384sum
to calculate the hash value used for theexpected_hash
argument.$ curl -sSL 'https://github.com/intel/dffml/raw/152c2b92535fac6beec419236f8639b0d75d707d/MANIFEST.in' | sha384sum f7aadf5cdcf39f161a779b4fa77ec56a49630cf7680e21fb3dc6c36ce2d8c6fae0d03d5d3094a6aec4fea1561393c14c -
- Parameters:
url (str) – The URL to download
target_path (str, pathlib.Path) – Path on disk to store download
expected_hash (str) – SHA384 hash of the contents
protocol_allowlist (list, optional) – List of strings, one of which the URL must start with. If you want to be able to download
http://
(rather thanhttps://
) links, you’ll need to override this.
Examples
>>> import asyncio >>> from dffml import * >>> >>> cached_manifest = asyncio.run( ... cached_download( ... "https://github.com/intel/dffml/raw/152c2b92535fac6beec419236f8639b0d75d707d/MANIFEST.in", ... "MANIFEST.in", ... "f7aadf5cdcf39f161a779b4fa77ec56a49630cf7680e21fb3dc6c36ce2d8c6fae0d03d5d3094a6aec4fea1561393c14c", ... ) ... ) >>> >>> with open(cached_manifest) as manifest: ... print(manifest.read().split()[:2]) ['include', 'README.md']
- async dffml.util.net.cached_download_unpack_archive(url, file_path, directory_path, expected_hash, protocol_allowlist=['https://'])[source]¶
Download an archive and extract it to a directory on disk.
Verify the hash of the downloaded file. If the hash matches the file is not re-downloaded.
The path to the extracted directory is prepended to the argument list of the wrapped function.
See
cached_download
for instructions on how to calculateexpected_hash
.Warning
This function does not verify the integrity of the unpacked archive on disk. Only the downloaded file.
- Parameters:
url (str) – The URL to download
file_path (str, pathlib.Path) – Path on disk to store download
directory_path (str, pathlib.Path) – Path on disk to store extracted contents of downloaded archive
expected_hash (str) – SHA384 hash of the contents
protocol_allowlist (list, optional) – List of strings, one of which the URL must start with. If you want to be able to download
http://
(rather thanhttps://
) links, you’ll need to override this.
Examples
>>> import asyncio >>> from dffml import cached_download_unpack_archive >>> >>> dffml_dir = asyncio.run( ... cached_download_unpack_archive( ... "https://github.com/intel/dffml/archive/c4469abfe6007a50144858d485537324046ff229.tar.gz", ... "dffml.tar.gz", ... "dffml", ... "bb9bb47c4e6e4c6b7147bb3c000bc4069d69c0c77a3e560b69f476a78e6b5084adf5467ee83cbbcc47ba5a4a0696fdfc", ... ) ... ) >>> print(len(list(dffml_dir.rglob("**/*")))) 124
- dffml.util.net.progress_reporthook(blocknum, blocksize, totalsize, logger)[source]¶
Serve as a reporthook for monitoring download progress.
- dffml.util.net.progressbar_no_totalsize(blocknum, logger)[source]¶
Progress bar that bounces back and forth since we don’t know total size.
- dffml.util.net.sync_urlopen(url: Union[str, Request], protocol_allowlist: List[str] = ['https://'], **kwargs)[source]¶
Check that
url
has a protocol defined inprotocol_allowlist
, then return the result of callingurllib.request.urlopen()
passing iturl
and any keyword arguments.
- dffml.util.net.sync_urlretrieve(url: Union[str, Request], protocol_allowlist: List[str] = ['https://'], **kwargs) Tuple[Path, EmailMessage] [source]¶
Check that
url
has a protocol defined inprotocol_allowlist
, then return the result of callingurllib.request.urlretrieve()
passing iturl
and any keyword arguments.
- dffml.util.net.validate_protocol(url: Union[str, Request], protocol_allowlist=['https://']) str [source]¶
Check that
url
has a protocol defined inprotocol_allowlist
. RaiseProtocolNotAllowedError
Examples
>>> from dffml.util.net import validate_protocol, DEFAULT_PROTOCOL_ALLOWLIST >>> >>> validate_protocol("http://example.com") Traceback (most recent call last): ... dffml.util.net.ProtocolNotAllowedError: Protocol of URL 'http://example.com' is not in allowlist: ['https://'] >>> >>> validate_protocol("https://example.com") 'https://example.com' >>> >>> validate_protocol("sshfs://example.com", ["sshfs://"] + DEFAULT_PROTOCOL_ALLOWLIST) 'sshfs://example.com'