-
Notifications
You must be signed in to change notification settings - Fork 0
Tool extension example
The MLSToolbox Code Generator is designed to be extendable. The typical extension that is done is the addition of new tasks, that are represented as new type of nodes in the editor. This page contains the details to extend the editor and the code generator component with new nodes.
We use an example that needs the new node named IrisDatasetLoader that corresponds to the Data Collection stage. This new task allows data scientists to add a new node to their pipelines, which collects the data and target for the Iris Dataset.
To extend the tool, the needed changes are:
- Updating a JSON configuration file to add the definition of the new node
- Implementing the source code for a new class as subclass of the Task class (defined in the mls_library)
After all these changes, the task can be used in the editor when defining the data collection stage of an ML pipeline and the generated code for that pipeline will contain the code corresponding to the task defined.
The tool needs a new entry in the JSON configuration named nodes.json and located into the repository mls_code_generator_config, specifying the necessary metadata for the new task. This ensures that the tool recognizes the task and integrates it seamlessly into the editor.
The new node is configured with the following properties:
{
"node" : "IrisDatasetLoader",
"category" : "Data Collection",
"info" : {
"title" : "Loads Iris dataset from sklearn"
},
"params" : [
{"param_label" : "description", "param_type" : "description", "show" : true}
],
"inputs" : [],
"outputs" : [
{"port_label" : "data", "port_type" : "DataFrame"},
{"port_label" : "target", "port_type" : "DataFrame"}
],
"dependencies" : {
"data_collection" : {
"origin": "custom",
"value": "IrisDatasetLoader"
}
},
"external_dependencies" : [],
"origin" : {
"custom" : "IrisDatasetLoader"
}
}
Where:
-
"node"
provides the task's name -
category
specifies in which stage the node will be added -
info
provides a description about the purpose of the node -
params
provide details about the node -
inputs
defines the ports expected as inputs (in this example there are no inputs) -
outputs
defines the ports for the task outputs. For each port needs:-
port_label
for the name -
port_type
for the type of data (for this example bothDataFrame
)
-
-
dependencies
to define where the name of the class with the code to be generated, there is an item named as thestage folder
with-
origin
always with value "custom" -
value
for the name of the class
-
external dependencies
-
origin
-
custom
for the name of the class
-
With all this information, a new option will apperar on the General Editor menu "Add" used to add tasks, the new option will appear on the correponding category.
To define a new type of task, a subclass that inherits from Task must be implemented in the mls_lib repository. This class must implement the execute() method, this method encapsulates the logic required to perform the desired operation.
This new task should be placed in the corresponding folder, in this example this class is added to the data_collection folder:
""" IRIS DATASET LOADER """
from mls_lib.orchestration.task import Task
from mls_lib.objects.data_frame import DataFrame
from sklearn import datasets
class IrisDatasetLoader(Task):
""" Iris Dataset Loader """
def __init__(self) -> None:
super().__init__()
def execute(self):
iris = datasets.load_iris()
iris_data = iris.data
iris_target = iris.target
iris_data_df = DataFrame()
iris_data_df.set_data(iris_data)
iris_target_df = DataFrame()
iris_target_df.set_data(iris_target)
self._set_output("data", iris_data_df)
self._set_output("target", iris_target_df)
It is important to update the __init__.py
of the data_collection folder so the class can be imported directly:
""" Data Collection Components """
# PREVIOUS MODULES
from . iris_dataset_loader import IrisDatasetLoader
The source code used in this example can be found in the branch new_feature_demo
of the mls_lib and mls_code_generator_config repositories.
- Home
- How to install
- How to use
- How to configure and extend
- Demos
-
- MLSToolbox related Wikis