Command Line

Almost anything you can get done with the Python API you can get done with the command line interface too (and HTTP API).

There are many more commands than what is listed here. Use the -h flag to see them all.


Train, asses accuracy, and use models for prediction. See the plugin docs for Models for usage.


Create, modify, run, and visualize DataFlows.


Ouput the dataflow description to standard output using the specified config format.

$ dffml dataflow create -config yaml get_single clone_git_repo > df.yaml


Combine two dataflows into one. Dataflows must either be all linked or all not linked.

$ dffml dataflow merge base.yaml overrides.yaml


Ouput the dataflow description to standard output using the specified config format.

The -no-strict flag tell DFFML not to exit if one key fails, continue running the dataflow until everything is complete, useful for error prone scraping tasks.

$ dffml dataflow run records set \
    -keys \
    -record-def URL \
    -dataflow df.yaml \
    -sources gathered=json \
    -source-filename /tmp/data.json \


Output a mermaidjs graph description of a DataFlow.

$ dffml dataflow diagram -simple shouldi.json

You can now copy the graph description and paste it in the mermaidjs live editor (or use the CLI tool) to generate an SVG or other format of the graph.



Convert one config file format into another.

$ dffml config convert -config-out yaml config_in.json


Services are various command line utilities that are associated with DFFML.

For a complete list of services maintained within the core codebase see the Services plugin docs.


Everything you can do via the Python library or command line interface you can also do over an HTTP interface. See the HTTP API docs for more information.


Development utilities for creating new packages or hacking on the core codebase.


Given the entrypoint of an object, covert the object to it’s dict representation, and export it using the given config format.

$ dffml service dev export -config json shouldi.cli:DATAFLOW


You can create a new python package and start implementing a new plugin for DFFML right away with the create command of dev.

$ dffml service dev create model cool-ml-model
$ cd cool-ml-model
$ python test

When you’re done you can upload it to PyPi and it’ll be pip installable so that other DFFML users can use it in their code or via the CLI. If you don’t want to mess with uploading to PyPi, you can install it from your git repo (wherever it may be that you upload it to).

$ python -m pip install -U git+

Make sure to look in and edit the entry_points to match whatever you’ve edited. This way whatever you make will be usable by others within the DFFML CLI and HTTP API as soon as they pip install your package, nothing else required.


DFFML makes heavy use of the Python entrypoint system. The following tools will help you with development and use of the entrypoints system.


Sometimes you’ll find that you’ve installed a package in development mode, but the code that’s being run when your using the CLI or HTTP API isn’t the code you’ve made modifications to, but instead it seems to be the latest released version. That’s because if the latest released version is installed, the development mode source will be ignored by Python.

If you face this problem the first thing you’ll want to do is identify the entrypoint your plugin is being loaded from. Then you’ll want to run this command giving it that entrypoint. It will list all the registered plugins for that entrypoint, along with the location of the source code being used.

In the following example, we see that the is_binary_pie operation registered under the dffml.operation entrypoint is using the source from the site-packages directory. When you see site-packages you’ll know that the development version is not the one being used! That’s the location where release packages get installed. You’ll want to remove the directory (and .dist-info directory) of the package name you don’t want to used the released version of from the site-packages directory. Then Python will start using the development version (provided you have installed that source with the -e flag to pip install).

$ dffml service dev entrypoints list dffml.operation
is_binary_pie = dffml_operations_binsec.operations:is_binary_pie.op -> dffml-operations-binsec 0.0.1 (/home/user/.pyenv/versions/3.7.2/lib/python3.7/site-packages)
pypi_package_json = shouldi.pypi:pypi_package_json -> shouldi 0.0.1 (/home/user/Documents/python/dffml/examples/shouldi)
clone_git_repo = dffml_feature_git.feature.operations:clone_git_repo -> dffml-feature-git 0.2.0 (/home/user/Documents/python/dffml/feature/git)


Utilities for working with files.


import a file return the value of the specified keyword argument.

$ dffml service dev setuppy kwarg name model/tensorflow/


Utilities for bumping version numbers.


Update the version of DFFML used by all of the plugins.

dffml service dev bump main

Update the version number of a package or all packages. Increments the version of each packages by the version string given.

dffml service dev bump packages -log debug -skip dffml -- 0.0.1