Simple source for new file types¶
This tutorial will help you with implementing your own Source
for new file
types. You may want to do this if you have some specific file formats from which
you want to load and save data from.
Here we will go through implementing a source for .ini
file formats.
Create the Package¶
To create a new source we first create a new Python package. DFFML has a script to do it for you.
$ dffml service dev create source dffml-source-ini
$ cd dffml-source-ini
We will start writing our source in ./dffml_source_ini/misc.py
About INI files¶
An INI file is a configuration file used by Operating Systems and programs to initialize program settings. It contains sections for settings and preferences (delimited by a string in square brackets) with each section containing one or more name and value parameters.
Import modules¶
dffml_source_ini/misc.py
from configparser import ConfigParser
from dffml import config, entrypoint, Record, FileSource, MemorySource, parser_helper
Here we are importing some common modules which will be required. The configparser
module will be helpful in parsing INI files. The FileSource
and MemorySource
will be used as base class for our new Source. The config was imported to set the
configuration options for our new INISource. The entrypoint will be used to add the
entrypoint to our INISource. A Record
is a unique entry in a source.
Add configuration¶
dffml_source_ini/misc.py
@config
class INISourceConfig:
filename: str
readwrite: bool = False
allowempty: bool = False
Here we will be writing the configuration options which will be available for our INISource. The INISourceConfig will be decorated by config. Here we have provided three configuration options.
Create Source¶
dffml_source_ini/misc.py
@entrypoint("ini")
class INISource(FileSource, MemorySource):
"""
Source to read files in .ini format.
"""
CONFIG = INISourceConfig
Here we have added the entrypoint to INISource class as “ini”. Do note that “ini” source
already exist in dffml list of sources, so you may want to call your own source as “myini”.
The new Source should inherit from FileSource and MemorySource as it abstracts the saving
and loading of files so that we only have to implement the load_fd and dump_fd methods.
It takes care of decompression on load and re-compression on save if the files extension
signifies that it’s compressed. We inherit from MemorySource because it implements the
methods required by a Source
provided
that self.mem contains Record objects.
Set the CONFIG variable to the INISourceConfig which we created earlier. Setting the CONFIG variable is important because the instantiated version of CONFIG is accessible as self.config.
Next we will writing the load and dump methods for INISource.
Add load method¶
dffml_source_ini/misc.py
async def load_fd(self, fileobj):
# Creating an instance of configparser
parser = ConfigParser()
# Read from a file object
parser.read_file(fileobj)
# Get all the sections present in the file
sections = parser.sections()
self.mem = {}
# Go over each section
for section in sections:
# Get data under each section as a dict
temp_dict = {}
for k, v in parser.items(section):
temp_dict[k] = parser_helper(v)
# Each section used as a record
self.mem[str(section)] = Record(
str(section), data={"features": temp_dict}
)
self.logger.debug("%r loaded %d sections", self, len(self.mem))
This method will be used to load the data from the file(s). We will be reading data
from the file object (fileobj) and loading that data into memory (self.mem). Each
Record
instance consist of key (str type) of the
record and data (dict type), with data having a key features
which stores all
the data for that record.
Going over the code, we have defined a coroutine with parameter fileobj, here
fileobj is the file object. we are reading from the fileobj file object. Each section of the
INI file is used as a Record, with the name of the section used as key for that Record.
Each section consists of key value pairs stored as a dict. We’re going to treat
this data as the feature data for each Record. To do so we pass the data as
the value for the features
key under the data
keyword argument when
creating a new Record.
Add dump method¶
dffml_source_ini/misc.py
async def dump_fd(self, fileobj):
# Create an instance of configparser
parser = ConfigParser()
# Go over each section and record in mem
for section, record in self.mem.items():
# Get each section data as a dict
section_data = record.features()
if section not in parser.keys():
# If section does not exist add new section
parser.add_section(section)
# Set section data
parser[section] = section_data
# Write to the fileobject
parser.write(fileobj)
self.logger.debug("%r saved %d sections", self, len(self.mem))
This method will be used to dump the data to the file. We will read data from memory (self.mem) and save that data in file object (fileobj).
Going over the code, we have defined a coroutine with parameter fileobj, here fileobj is the file object. We are going over each section name and its corresponding Record. We are reading all the data from the memory (self.mem) and writing that data to our file object (fileobj). Hence dumping all our data into file.
Add Tests¶
tests/test_source.py
import os
from tempfile import TemporaryDirectory
from dffml import Record, load, save, AsyncTestCase
from dffml_source_ini.misc import INISource
Before writing the test we need to import some modules which we will be using. We need
to import the source file which we created earlier. We need to import save
and
load
from high_level. save method will be used to save the records to the source and
load will be used to yield records from a source. AsyncTestCase will be used to run our
test methods as coroutines in default event loop.
tests/test_source.py
class TestINISource(AsyncTestCase):
async def test_ini(self):
with TemporaryDirectory() as testdir:
self.testfile = os.path.join(testdir, "testfile.ini")
# Create a source
source = INISource(
filename=self.testfile, allowempty=True, readwrite=True
)
# Save some data in the source
await save(
source,
Record("section1", data={"features": {"A": 1, "B": 2}}),
Record("section2", data={"features": {"C": 3, "D": 4}}),
)
# Load all the records
records = [record async for record in load(source)]
self.assertIsInstance(records, list)
self.assertEqual(len(records), 2)
self.assertDictEqual(records[0].features(), {"a": 1, "b": 2})
self.assertDictEqual(records[1].features(), {"c": 3, "d": 4})
To test the working of the INISource created we will create a class TestINISource, since we are using the unittest testing framework. The TestINISource will inherit from AsyncTestCase so that it can be run as a coroutine in the default event loop. In the test method we will create a TemporaryDirectory which will contain our .ini file.
We will create an instance of our INISource with configuration options. Next we will use the save method to save some records to the source. Do not forget to await the save method as it is a coroutine. Next we will use the load method to yield records from the source. load method will return AsyncIterator[Record] type object. Lastly, We need to check that the records we saved is the same record which gets loaded.
Run the tests¶
To run the tests
$ python -m unittest discover -v
This will look into the file test_source.py and run all the tests.
Add the entrypoint¶
To register your source under dffml entrypoint you need to make sure you have a
shorthand
equals python.path.to:Class
line in the entry_points.txt
file.
entry_points.txt
[dffml.source]
myini = dffml_source_ini.misc:INISource
This will add the newly created source to the dffml entrypoints and hence can also be used in CLI.
Install your package¶
To install your new source run
$ python -m pip install -e .[dev]
CLI Usage¶
Create a .ini
file with some record data in it.
data.ini
[dffml]
third_party = yes
maintained = true
[python]
third_party = no
maintained = true
To use your newly created source in CLI, try listing some records.
$ dffml list records -sources data=myini -source-filename data.ini
[
{
"extra": {},
"features": {
"maintained": true,
"third_party": true
},
"key": "dffml"
},
{
"extra": {},
"features": {
"maintained": true,
"third_party": false
},
"key": "python"
}
]