All notable changes to this project will be documented in this file.
[0.3.7] - 2020-04-14¶
IO operations demo and
>>>can now be enabled or disabled for easy copying of code into interactive sessions.
Whitespace check now checks .rst and .md files too.
GetMultioperation which gets all Inputs of a given definition
Python usage example for LogisticRegression and its related tests.
Support for async generator operations
Example CLI commands and Python code for
savefunction in high level API to quickly save all given records to a source
Ability to configure sources and models for HTTP API from command line when starting server
Documentation page for command line usage of HTTP API
Usage of HTTP API to the quickstart to use trained model
CSV source sorts feature names within headers when saving
Moved HTTP service testing code to HTTP service
Issue parsing string values when using the
dataflow runcommand and specifying extra inputs.
[0.3.6] - 2020-04-04¶
Operations for taking input from the user
AcceptUserInputand for printing the output
Hugging Face Transformers tensorflow based NER models.
PNG ConfigLoader for reading images as arrays to predict using MNIST trained models
Docstrings and doctestable examples to
Inputs can be validated using operations
New db source can utilize any database that inherits from
Logistic Regression with SAG optimizer
Test tensorflow DNNEstimator documentation examples in CI
shouldi got an operation to run cargo-audit on rust code.
Moved all the downloads to tests/downloads to speed the CI test.
Test tensorflow DNNEstimator documentation exaples in CI
Add python code for tensorflow DNNEstimator
Ability to run a subflow as if it were an operation using the
Support for operations without inputs.
Partial doctestable examples to
Doctestable examples for
Instructions for setting up debugging environment in VSCode
New model tutorial mentions file paths that should be edited.
DataFlow is no longer a dataclass to prevent it from being exported incorrectly.
Ignore generated files in
"~"as the the home directory rather than a literal
Windows support by selecting
asyncio.ProactorEventLoopand not using
Moved SLR into the main dffml package and removed
[0.3.5] - 2020-03-10¶
Parent flows can now forward inputs to active contexts of subflows.
Documentation on writing examples and running doctests
Doctestable Examples to high-level API.
Docstrings and doctestable examples for
record.py(features and evaluated)
Simplified model API with SimpleModel
Documentation on how DataFlows work conceptually.
Style guide now contains information on class, variable, and function naming.
Restructured contributing documentation
Use randomly generated data for scikit tests
Change Core to Official to clarify who maintains each plugin
Name of output of unsupervised model from “Prediction” to “cluster”
Test scikit LR documentation examples in CI
Create a fresh archive of the git repo for release instead of cleaning existing repo with
git cleanfor development service release command.
Simplified SLR tests for scratch model
Test tensorflow DNNClassifier documentation exaples in CI
config directories and files associated with ConfigLoaders have been renamed to configloader.
Model config directory parameters are now
New model tutorial and
skel/modeluse simplifeid model API.
[0.3.4] - 2020-02-28¶
Tensorflow hub NLP models.
Notes on development dependencies in
setup.pyfiles to codebase notes.
dffml.util.net.cached_download_unpack_archiveto run a cached download and unpack the archive, very useful for testing. Documented on the Networking Helpers API docs page.
Directions on how to read the CI under the Git and GitHub page of the contributing documentation.
Static file serving from a dirctory with
api.jsfile serving with the
shouldi got an operation to run golangci-lint on Golang code
Note about using black via VSCode
Port assignment for the HTTP API via the
Definitions with a
speccan use the
subspecparameter to declare that they are a list or a dict where the values are of the
spectype. Rather than the list or dict itself being of the
Fixed the URL mentioned in example to configure a model.
Sphinx doctests are now run in the CI in the DOCS task.
[0.3.3] - 2020-02-10¶
Moved from TensorFlow 1 to TensorFlow 2.
IDX Sources to read binary data files and train models on MNIST Dataset
allowemptyadded to source config parameters.
Quickstart document to show how to use models from Python.
The latest release of the documentation now includes a link to the documentation for the master branch (on GitHub pages).
Virtual environment, GitPod, and Docker development environment setup notes to the CONTRIBUTING.md file.
Changelog now included in documenation website.
Documented style for imports.
Documented use of numpy docstrings.
Inputscan now be sanitized using function passed in
Helper utilities to take callables with numpy style docstrings and create config classes out of them using
File listing endpoint to HTTP service.
When an operation throws an exception the name of the instance and the parameters it was executed with will be thrown via an
Network utilities to preformed cached downloads with hash validation.
Development service got a new command, which can retrieve an argument passed to setuptools
setupfunction within a
All instances of
readonlyparameter in source config is now changed to
predictparameter of all model config classes has been changed from
Defining features on the command line no longer requires that defined features be prefixed with
The model predict operation will now raise an exception if the model it is passed via it’s config is a class rather than an instance.
entry_pointand friends have been renamed to
FastChildWatcherwhen run via the CLI to prevent
TensorFlow based neural network classifier had the
classificationparameter in it’s config changed to
SciKit models use
reposare now dictionary.
All instances of
BaseConfigurablewill now auto instantiate their respective config classes using
kwargsif the config argument isn’t given and keyword arguments are.
The quickstart documentation was improved as well as the structure of docs.
-ein the wrong place in the getting setup section.
Since moving to auto
config(), BaseConfigurable no longer produces odd typenames in conjunction with docs.py.
Autoconvert Definitions with spec into their spec
The model predict operation erroneously had a
msgparameter in it’s config.
Unused imports identified by deepsource.io
Evaluation code from feature.py file as well as tests for those evaluations.
[0.3.2] - 2020-01-03¶
AsyncExitStackTestCasewhich instantiates and enters async and non-async
contextlibexit stacks. Provides temporary file creation.
Automatic releases to PyPi via GitHub Actions
Automatic documentation deployment to GitHub Pages
Function to create a config class dynamically, analogous to
ConfigLoadersclass which loads config files from a file or directory to a dictionary.
CLI tests and integration tests derive from
SciKit models now use the auto args and config methods.
Correctly identify when functions decorated with
selfto reference the
shouldi safety operation uses subprocess communicate method instead of stdin pipe writes.
Negative values are correctly parsed when input via the command line.
Do not lowercase development mode install location when reporting version.
[0.3.1] - 2019-12-12¶
Integration tests using the command line interface.
run_dataflowto run a dataflow and test for the same.
Features were moved from ModelContext to ModelConfig
CI is now run via GitHub Actions
CI testing script is now verbose
args and config methods of all classes no longer require implementation. BaseConfigurable handles exporting of arguments and creation of config objects for each class based off of the CONFIG property of that class. The CONFIG property is a class which has been decorated with dffml.base.config to make it a dataclass.
Speed up development service install of all plugins in development mode
Speed up named plugin load times
DataFlows with multiple possibilities for a source for an input, now correctly look through all possible sources instead of just the first one.
DataFlow MemoryRedundancyCheckerContext was using all inputs in an input set and all their ancestors to check redundancy (a hold over from pre uid days). It now correctly only uses the inputs in the parameter set. This fixes a major performance issue.
MySQL packaging issue.
Develop service running one off operations correctly json-loads dict types.
Operations with configs can be run via the development service
JSON dumping numpy int* and float* caused crash on dump.
CSV source always loads
operationsremoved in favor of
Duplicate dataflow diagram code from development service
[0.3.0] - 2019-10-26¶
Real DataFlows, see operations tutorial and usage examples
Async helper concurrently nocancel optional keyword argument which, if set is a set of tasks not to cancel when the concurrently execution loop completes.
FileSourceTest has a
test_labelmethod which checks that a FileSource knows how to properly load and save repos under a given label.
Test case for Merge CLI command
Repo.feature method to select a single piece of feature data within a repo.
Dev service to help with hacking on DFFML and to create models from templates in the skel/ directory.
Classification type parameter to DNNClassifierModelConfig to specifiy data type of given classification options.
util.cli CMD classes have their argparse description set to their docstring.
util.cli CMD classes can specify the formatter class used in
Skeleton for service creation was added
Simple Linear Regression model from scratch
Scikit Linear Regression model
Community link in CONTRIBUTING.md.
Explained three main parts of DFFML on docs homepage
Documentation on how to use ML models on docs Models plugin page.
Mailing list info
Issue template for questions
Multiple Scikit Models with dynamic config
Entrypoint listing command to development service to aid in debugging issues with entrypoints.
HTTP API service to enable interacting with DFFML over HTTP. Currently includes APIs for configuring and using Sources and Models.
MySQL protocol source to work with data from a MySQL protocol compatible db
shouldi example got a bandit operation which tells users not to install if there are more than 5 issues of high severity and confidence.
dev service got the ability to run a single operation in a standalone fashion.
About page to docs.
Tensorflow DNNEstimator based regression model.
feature/codesec became it’s own branch, binsec
run_operationsstrict is default to true. With strict as true errors will be raised and not just logged.
MemoryInputNetworkContext got an
saddmethod which is shorthand for creating a MemoryInputSet with a StringInputSetContext.
basic_configmethod takes list of operations and optional config for them.
shouldi example uses updated
MemoryOrchestrator.basic_configmethod and includes more explanation in comments.
CSVSource allows for setting the Repo’s
src_urlfrom a csv column
util Entrypoint defines a new class for each loaded class and sets the
ENTRY_POINT_LABELparameter within the newly defined class.
Tensorflow model removed usages of repo.classifications methods.
Entrypoint prints traceback of loaded classes to standard error if they fail to load.
Updated Tensorflow model README.md to match functionality of DNNClassifierModel.
DNNClassifierModel no longer splits data for the user.
blackon whole codebase, including all submodules
CI style check now checks whole codebase
Merged HACKING.md into CONTRIBUTING.md
shouldi example runs bandit now in addition to safety
The way safety gets called
Switched documentation to Read The Docs theme
Models yield only a repo object instead of the value and confidence of the prediction as well. Models are not responsible for calling the predicted method on the repo. This will ease the process of making predict feature specific.
Updated Tensorflow model README.md to include usage of regression model
Docs get version from dffml.version.VERSION.
FileSource zipfiles are wrapped with TextIOWrapper because CSVSource expects the underlying file object to return str instances rather than bytes.
FileSourceTest inherits from SourceTest and is used to test json and csv sources.
A temporary directory is used to replicate
mktemp -ufunctionality so as to provide tests using a FileSource with a valid tempfile name.
Labels for JSON sources
Labels for CSV sources
util.cli CMD’s correcly set the description of subparsers instead of their help, they also accept the
CSV source now has
JSON source now has
Strict flag in df.memory is now on by default
Dynamically created scikit models get config args correctly
DNNClassifierModelContextfirst init arg from
BaseSource now has
Repo objects are no longer classification specific. Their
classificationmethods were removed.
[0.2.1] - 2019-06-07¶
Definition spec field to specify a class representative of key value pairs for definitions with primitives which are dictionaries
Auto generation of documentation for operation implementations, models, and sources. Generated docs include information on configuration options and inputs and outputs for operation implementations.
Async helpers got an
aenter_stackmethod which creates and returns and
contextlib.AsyncExitStackafter entering all the context’s passed to it.
Example of how to use Data Flow Facilitator / Orchestrator / Operations by writing a Python meta static analysis tool, shouldi
add_orig_labelmethods now use op.name instead of
Make output specs and remap arguments optional for Operations CLI commands.
Feature skeleton project is now operations skeleton project
MemoryOperationImplementationNetwork instantiates OperationImplementations using their
MemorySource now decorated with
MemorySource takes arguments correctly via
skel modules have
long_description_content_typeset to “text/markdown”
__aexit__methods were moved to the Memory Orchestrator because they are specific to that config.
inspect.isfunctionso it will bind lambdas
[0.2.0] - 2019-05-23¶
Support for zip file source
Async helper for running tasks concurrently
Gitter badge to README
Documentation on the Data Flow Facilitator subsystem
codesec plugin containing operations which gather security related metrics on code and binaries.
auth plugin containing an scrypt operation as an example of thread pool usage.
Standardized the API for most classes in DFFML via inheritance from dffml.base
Configuration of classes is now done via the args() and config() methods
Documentation is now generated using Sphinx
Corrected maxsplit in util.cli.parser
Check that dtype is a class in Tensorlfow DNN
CI script no longer always exits 0 for plugin tests
Corrected render type in setup.py to markdown
[0.1.2] - 2019-03-29¶
Example usage of Git features
New Model and Feature creation script
New Feature skeleton directory
New Model skeleton directory
New Feature creation tutorial
New Model creation tutorial
Update functionality to the CSV source
Support for Gzip file source
Support for bz2 file source
Travis checks for additions to CHANGELOG.md
Travis checks for trailing whitespace
Support for lzma file source
Support for xz file source
Data Flow Facilitator
Restructured documentation to docs folder and moved from rST to markdown
Git feature cloc logs if no binaries are in path
Enable source.file to read from /dev/fd/XX
[0.1.0] - 2019-03-07¶
Feature class to collect a feature in a dataset
Git features to collect feature data from Git repos
Model class to wrap implementations of machine learning models
Tensorflow DNN model for generic usage of the DNN estimator
CLI interface and framework
Source class to manage dataset storage