Util Net

exception dffml.util.net.DirectoryNotExtractedError(directory_path)[source]

Raised when extraction of directory failed!

exception dffml.util.net.ProtocolNotAllowedError(url, allowlist)[source]

Raised when a URL’s protocol is not in allowed list of protocols.

async dffml.util.net.cached_download(url: Union[str, Request], target_path: Union[str, Path], expected_hash: str, protocol_allowlist: List[str] = ['https://'])[source]

Download a file and verify the hash of the downloaded file. If the file already exists and the hash matches, do not re-download the file.

The path to the downloaded file is prepended to the argument list of the wrapped function.

You can use tools like curl to download the file, and then sha384sum to calculate the hash value used for the expected_hash argument.

$ curl -sSL 'https://github.com/intel/dffml/raw/152c2b92535fac6beec419236f8639b0d75d707d/MANIFEST.in' | sha384sum
f7aadf5cdcf39f161a779b4fa77ec56a49630cf7680e21fb3dc6c36ce2d8c6fae0d03d5d3094a6aec4fea1561393c14c  -
Parameters:
  • url (str) – The URL to download

  • target_path (str, pathlib.Path) – Path on disk to store download

  • expected_hash (str) – SHA384 hash of the contents

  • protocol_allowlist (list, optional) – List of strings, one of which the URL must start with. If you want to be able to download http:// (rather than https://) links, you’ll need to override this.

Examples

>>> import asyncio
>>> from dffml import *
>>>
>>> cached_manifest = asyncio.run(
...     cached_download(
...         "https://github.com/intel/dffml/raw/152c2b92535fac6beec419236f8639b0d75d707d/MANIFEST.in",
...         "MANIFEST.in",
...         "f7aadf5cdcf39f161a779b4fa77ec56a49630cf7680e21fb3dc6c36ce2d8c6fae0d03d5d3094a6aec4fea1561393c14c",
...     )
... )
>>>
>>> with open(cached_manifest) as manifest:
...     print(manifest.read().split()[:2])
['include', 'README.md']
async dffml.util.net.cached_download_unpack_archive(url, file_path, directory_path, expected_hash, protocol_allowlist=['https://'])[source]

Download an archive and extract it to a directory on disk.

Verify the hash of the downloaded file. If the hash matches the file is not re-downloaded.

The path to the extracted directory is prepended to the argument list of the wrapped function.

See cached_download for instructions on how to calculate expected_hash.

Warning

This function does not verify the integrity of the unpacked archive on disk. Only the downloaded file.

Parameters:
  • url (str) – The URL to download

  • file_path (str, pathlib.Path) – Path on disk to store download

  • directory_path (str, pathlib.Path) – Path on disk to store extracted contents of downloaded archive

  • expected_hash (str) – SHA384 hash of the contents

  • protocol_allowlist (list, optional) – List of strings, one of which the URL must start with. If you want to be able to download http:// (rather than https://) links, you’ll need to override this.

Examples

>>> import asyncio
>>> from dffml import cached_download_unpack_archive
>>>
>>> dffml_dir = asyncio.run(
...     cached_download_unpack_archive(
...         "https://github.com/intel/dffml/archive/c4469abfe6007a50144858d485537324046ff229.tar.gz",
...         "dffml.tar.gz",
...         "dffml",
...         "bb9bb47c4e6e4c6b7147bb3c000bc4069d69c0c77a3e560b69f476a78e6b5084adf5467ee83cbbcc47ba5a4a0696fdfc",
...     )
... )
>>> print(len(list(dffml_dir.rglob("**/*"))))
124
dffml.util.net.progress_reporthook(blocknum, blocksize, totalsize, logger)[source]

Serve as a reporthook for monitoring download progress.

dffml.util.net.progressbar(percent, logger)[source]

Simple progressbar to show download progress.

dffml.util.net.progressbar_no_totalsize(blocknum, logger)[source]

Progress bar that bounces back and forth since we don’t know total size.

dffml.util.net.sync_urlopen(url: Union[str, Request], protocol_allowlist: List[str] = ['https://'], **kwargs)[source]

Check that url has a protocol defined in protocol_allowlist, then return the result of calling urllib.request.urlopen() passing it url and any keyword arguments.

dffml.util.net.sync_urlretrieve(url: Union[str, Request], protocol_allowlist: List[str] = ['https://'], **kwargs) Tuple[Path, EmailMessage][source]

Check that url has a protocol defined in protocol_allowlist, then return the result of calling urllib.request.urlretrieve() passing it url and any keyword arguments.

dffml.util.net.validate_protocol(url: Union[str, Request], protocol_allowlist=['https://']) str[source]

Check that url has a protocol defined in protocol_allowlist. Raise ProtocolNotAllowedError

Examples

>>> from dffml.util.net import validate_protocol, DEFAULT_PROTOCOL_ALLOWLIST
>>>
>>> validate_protocol("http://example.com")
Traceback (most recent call last):
    ...
dffml.util.net.ProtocolNotAllowedError: Protocol of URL 'http://example.com' is not in allowlist: ['https://']
>>>
>>> validate_protocol("https://example.com")
'https://example.com'
>>>
>>> validate_protocol("sshfs://example.com", ["sshfs://"] + DEFAULT_PROTOCOL_ALLOWLIST)
'sshfs://example.com'