Codebase Layout And Notes¶
Plugins¶
DFFML is plugin based. This means that the source code for the main
package dffml
, is separate from the source code for many of the things you
might want to use in conjunction with it. For example, if you wanted to use the
machine learning models based on scikit, you’d install dffml-model-scikit
.
If you wanted to use machine learning models based on TensorFlow, you’d install
dffml-model-tensorflow
. The source code for all Official plugins is within
the same Git repo (https://github.com/intel/dffml).
A Official plugin is any plugin maintained within the main Git repo.
This means users only have to install what they need. TensorFlow is several hundred megabytes, not everyone wants that, or needs that to get machine learning models that perform accurately on their problem.
All plugins have their base class that they derive from in the main package,
which is located in the dffml
directory at the root of the git repo.
The plugin packages are located within their respective directories at the root
of the git repo. For example, source base classes are in dffml/source/
and
source plugin packages are in source/
.
Adding A New Plugin¶
To add a new Official plugin to DFFML. You need to first create the plugin in the appropriate directory. Then add it to the lists of core plugins.
Warning
The release process is automated. You should NOT upload the package to PyPi! Someone from Intel has to be the one to do that for Official plugins.
For Official plugins, the name given to create should be in the form of
dffml-{PLUGIN_TYPE}-{NAME}
.
$ cd model/
$ dffml service dev create model dffml-model-someframework
$ mv dffml-model-someframework someframework
Now that we’ve created the plugin, we need to add it to a few lists
Add the plugin to
CORE_PLUGINS
list indffml/plugins.py
Update
.github/workflows/testing.yml
Add the path to the plugin to the
jobs.test.strategy.matrix.plugin
list.Open an issue to have the
PYPI_{PLUGIN_TYPE}_{NAME}
token added underjobs.steps[-1].run
.Sample issue format:
pypi: Add token for dffml-{PLUGIN_TYPE}-{NAME} to testing.yml
Double Context Entry Pattern¶
All classes in DFFML follow a double asynchronous context entry pattern. This is ideal for usages such as creating a connection pool, then using a connection. Be that with a database, client HTTP sessions, etc.
import asyncio
from dffml.record import Record
from dffml.source.csv import CSVSource, CSVSourceConfig
async def main():
# One
async with CSVSource(
CSVSourceConfig(
filename="test.csv",
allowempty=True,
readwrite=True,
)
) as source:
# Two
async with source() as sctx:
# Punch
await sctx.update(Record("0", data={
"features": {
"first_column": 42,
"second_column": 1776,
}
}))
asyncio.run(main())
Config¶
Much of the DFFML codebase is dedicated to transforming configuration structures
between their incoming form to a dict
which can be used to determine what
plugin needs to be loaded, and what the arguments for the configuration class of
that plugin are.
For example:
model:
plugin: tfdnnc
config:
epochs: 400
steps: 4000
classifications:
- '0'
- '1'
predict:
dtype: int
length: 1
name: maintained
features:
- dtype: int
length: 10
name: authors
- dtype: int
length: 10
name: commits
- dtype: int
length: 10
name: work
Here, plugin
is the ...Arg
class which signifies the plugin to load.
config
is the ...Config
class as a dict for that plugin.
The command line equivalent for the model is…
$ dffml ... \
-model tfdnnc \
-model-epochs 400 \
-model-steps 4000 \
-model-classifications 0 1 \
-model-predict maintained:str:1 \
-model-features \
authors:int:10 \
commits:int:10 \
work:int:10 \
setup.py¶
There are various setup.py
files throughout the codebase, one for the main
package, one for each plugin, and one in skel/
. There are also
setup_common.py
files.
Notes on Various Subsystems¶
DFFML is comprised of various subsystems. The following are some notes that might be helpful when working on each of them.
Working on skel/
¶
The packages in skel/
are used to create new DFFML packages.
For example, to create a new package containing operations we run the following.
$ dffml service dev create operations dffml-operations-feedface
If you want to work on any of the packages in skel/
, you’ll need to run the
skel link
command first fromt he dev
service. This will symlink required
files in from common/
so that testing will work.
$ dffml service dev skel link