Source Dataframe¶
Expose Pandas DataFrame as DFFML Source
- class dffml.source.dataframe.DataFrameSource(config)[source]¶
Proxy for a pandas DataFrame
Examples
You can pass a pandas DataFrame to this class directly via the Python API. Or you can create DataFrames from other data sources via the Python API or the command line.
Example of creating a DataFrame from HTML via command line.
Create an HTML table.
index.html
<table> <tr> <th>Years</th> <th>Salary</th> </tr> <tr> <td>0</td> <td>10</td> </tr> <tr> <td>1</td> <td>20</td> </tr> <tr> <td>2</td> <td>30</td> </tr> </table>
Start the HTTP server to server the HTML page with the table
$ python -m http.server 8000
In another terminal. List all the records in the source.
$ dffml list records \ -sources table=dataframe \ -source-table-html http://127.0.0.1:8000/index.html \ -source-table-protocol_allowlist http:// [ { "extra": {}, "features": { "Salary": 10, "Years": 0 }, "key": "0" }, { "extra": {}, "features": { "Salary": 20, "Years": 1 }, "key": "1" }, { "extra": {}, "features": { "Salary": 30, "Years": 2 }, "key": "2" } ]
- CONFIG¶
alias of
DataFrameSourceConfig
- CONTEXT¶
alias of
DataFrameSourceContext
- class dffml.source.dataframe.DataFrameSourceConfig(dataframe: 'pandas.DataFrame' = None, predictions: List[str] = <factory>, html: str = None, html_table_index: int = 0, protocol_allowlist: List[str] = <factory>)[source]¶
- no_enforce_immutable()¶
By default, all properties of a config object are immutable. If you would like to mutate immutable properties, you must explicitly call this method using it as a context manager.
Examples
>>> from dffml import config >>> >>> @config ... class MyConfig: ... C: int >>> >>> config = MyConfig(C=2) >>> with config.no_enforce_immutable(): ... config.C = 1
- class dffml.source.dataframe.DataFrameSourceContext(parent: BaseSource)[source]¶
- async record(key: str) Record [source]¶
Get a record from the source or add it if it doesn’t exist.
Examples
>>> import asyncio >>> from dffml import * >>> >>> async def main(): ... async with MemorySource(records=[Record("example", data=dict(features=dict(dead="beef")))]) as source: ... # Open, update, and close ... async with source() as ctx: ... example = await ctx.record("example") ... # Let's also try calling `record` for a record that doesnt exist. ... one = await ctx.record("one") ... await ctx.update(one) ... async for record in ctx.records(): ... print(record.export()) >>> >>> asyncio.run(main()) {'key': 'example', 'features': {'dead': 'beef'}, 'extra': {}} {'key': 'one', 'extra': {}}
- async records() AsyncIterator[Record] [source]¶
Returns a list of records retrieved from self.src
Examples
>>> import asyncio >>> from dffml import * >>> >>> async def main(): ... async with MemorySource(records=[Record("example", data=dict(features=dict(dead="beef")))]) as source: ... async with source() as ctx: ... async for record in ctx.records(): ... print(record.export()) >>> >>> asyncio.run(main()) {'key': 'example', 'features': {'dead': 'beef'}, 'extra': {}}
- async update(record: Record)[source]¶
Updates a record for a source
Examples
>>> import asyncio >>> from dffml import * >>> >>> async def main(): ... async with MemorySource(records=[]) as source: ... # Open, update, and close ... async with source() as ctx: ... example = Record("one", data=dict(features=dict(feed="face"))) ... # ... Update one into our records ... ... await ctx.update(example) ... # Let's check out our records after calling `record` and `update`. ... async for record in ctx.records(): ... print(record.export()) >>> >>> asyncio.run(main()) {'key': 'one', 'features': {'feed': 'face'}, 'extra': {}}