GSoC 2019

DFFML participated in Google Summer of Code under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/

Student Contributions

Huge thanks to our students of the 2019 GSoC program who significantly grew DFFML’s capabilities in using machine learning models and accessing various data sources. As well as various bug fixes.

Sudharsana @sudharsana-kjl

Project: Labeled and Versioned data sources and expansion of data source backends.

Yash @yashlamba

Project: Addition of new Machine Learning Models written from scratch and using SciKit sklearn APIs.

About the DFFML

DFFML is a plugin based library / framework for machine learning. It allows users to wrap high or low level implementations of models that use various machine learning libraries, so as to interact will lots of different model implementations in the same way.

DFFML is also a tool for dataset generation. DFFML defines a Feature abstract base class which is responsible for generating feature data given a unique key.

Project Ideas

We currently have three project ideas, you can read about them and discuss in their respective issues:

  1. GSoC 2019 Project Idea: File Source Compression. (Difficulty: easy)

  2. GSoC 2019 Project Idea: Labeled and Versioned Datasets. (Difficulty: intermediate)

  3. GSoC 2019 Project Idea: YOLO/darknet Model (Difficulty: hard)

If you’ve got a brilliant idea you’d like to propose, please make a new issue with the gsoc and project tags to discuss it! Students are also welcome to add “stretch goal” ideas to their application if they’d like to start with one of our ideas but have a few extra feature ideas of their own they’d like to work on at the end of the summer if everything stays on schedule. Take a look at the current open issues to see what users want. Issues which we’ve talked to someone who would use this as a part of their product or service for their business have the label customer. Those are cool because we know they will get used!

Getting Started

  • Follow the README and make sure you can run the tensorflow and git examples, Looking at the Travis CI may come in handy here.

  • Run the tests. DFFML has unit tests which are at about 90% coverage (amount of lines of code tested) for the main library, the Git features, and the Tensorflow model. Make sure you know how to run them, and if you’ve never done Python unittests before you might want to read up on python’s unittest library. Figure out how to run a single test! Running one test instead of all of them will speed up your workflow when you are writing your tests!

  • Make your first contribution!

    • Work on anything labeled good first issue.

    • Help us increase the test coverage in any of the packages (check out the python package coverage to learn how to do this).

    • Write a new feature! Features can do anything you want, they generate some data based on a unique key, think of them like a scraper, see the new feature guide for more info. Make sure to include tests!

    • Write a new model! Models are wrappers around any machine learning implementation or library, see the new model guide for more info. Make sure to include tests!

Writing your GSoC application

Instructions on How to apply can be found on the Python GSoC website. Please don’t forget to use our name (dffml) in your application title!

Contacting the DFFML team

Most of our communication will take place in the issue tracker under the label ‘gsoc’. Not sure where to ask? Try here!

IRC: Contact us using the main python-gsoc channel, #python-gsoc on freenode. (How to connect.). Note that all our developers are located in US Pacific Standard time at this time.

Thanks

Thanks to Terri for helping DFFML be a part of GSoC and letting us copy her format she used for CVE Binary Tool, another awesome project with a security focus that’s a part of GSoC 2019.