I am working on a structured data analysis framework which is based on streaming data between nodes. Currently nodes are implemented as subclasses of root Node class provided by the framework. For each node class/factory I need metadata, such as list of node’s attributes, their description, node output. The metadata might be both: for end-users in front-end application or for programming use – some other stream management tools. In the future there will be more of them.
(Note that I just started to learn python while writing that code)
Currently the metadata are provided in a class variable
class AggregateNode(base.Node):
"""Aggregate"""
__node_info__ = {
"label" : "Aggregate Node",
"description" : "Aggregate values grouping by key fields.",
"output" : "Key fields followed by aggregations for each aggregated field. Last field is "
"record count.",
"attributes" : [
{
"name": "keys",
"description": "List of fields according to which records are grouped"
},
{
"name": "record_count_field",
"description": "Name of a field where record count will be stored. "
"Default is `record_count`"
}
]
}
More examples can be found here.
I feel that it can be done in much cleaner way. There is one restriction: as nodes are custom subclasses classes, there should be minimal interference with potential future attribute names.
What I was thinking to do was to split the current node_info. It was meant to be private to the framework, but now I realize it has much wider use. I was thinking about using node_ attributes: will have common attribute namespace, not taking too much of names from potential custom node attributes.
My question is: What is the most common way of providing such metadata in python programs? Single variable with a dictionary? Multiple variables, one for each metadata attribute? (this would conflict with the restriction) Custom class/structure? Use some kind of prefix, like node_* and use multiple variables?
I’m not sure if there is some “standard” way to store custom metadata in python objects, but as an example, the python implementation of dbus adds attributes with the “
_dbus” prefix to the published methods and signals.