Skip to content

Tool extension example

Lidia edited this page Sep 11, 2025 · 1 revision

The MLSToolbox Code Generator is designed to be extendable. The typical extension that is done is the addition of new tasks, that are represented as new type of nodes in the editor. This page contains the details to extend the editor and the code generator component with new nodes.

We use an example that needs the new node named IrisDatasetLoader that corresponds to the Data Collection stage. This new task allows data scientists to add a new node to their pipelines, which collects the data and target for the Iris Dataset.

To extend the tool, the needed changes are:

  • Updating a JSON configuration file to add the definition of the new node
  • Implementing the source code for a new class as subclass of the Task class (defined in the mls_library)

After all these changes, the task can be used in the editor when defining the data collection stage of an ML pipeline and the generated code for that pipeline will contain the code corresponding to the task defined.

Updating a Configuration File

The tool needs a new entry in the JSON configuration named nodes.json and located into the repository mls_code_generator_config, specifying the necessary metadata for the new task. This ensures that the tool recognizes the task and integrates it seamlessly into the editor.

The new node is configured with the following properties:

{ 
    "node" : "IrisDatasetLoader",
    "category" : "Data Collection",
    "info" : {
        "title" : "Loads Iris dataset from sklearn"
    },
    "params" : [
        {"param_label" : "description", "param_type" : "description", "show" : true}
    ],
    "inputs" : [],
    "outputs" : [
        {"port_label" : "data", "port_type" : "DataFrame"},
        {"port_label" : "target", "port_type" : "DataFrame"}
    ],
    "dependencies" : {
        "data_collection" : {
            "origin": "custom",
            "value": "IrisDatasetLoader"
        }
    },
    "external_dependencies" : [],
    "origin" : {
        "custom" : "IrisDatasetLoader"
    }
}

Where:

  • "node" provides the task's name
  • category specifies in which stage the node will be added
  • info provides a description about the purpose of the node
  • params provide details about the node
  • inputs defines the ports expected as inputs (in this example there are no inputs)
  • outputs defines the ports for the task outputs. For each port needs:
    • port_label for the name
    • port_type for the type of data (for this example both DataFrame)
  • dependencies to define where the name of the class with the code to be generated, there is an item named as the stage folder with
    • origin always with value "custom"
    • value for the name of the class
  • external dependencies
  • origin
    • customfor the name of the class

With all this information, a new option will apperar on the General Editor menu "Add" used to add tasks, the new option will appear on the correponding category.

Implementing a New Class

To define a new type of task, a subclass that inherits from Task must be implemented in the mls_lib repository. This class must implement the execute() method, this method encapsulates the logic required to perform the desired operation.

This new task should be placed in the corresponding folder, in this example this class is added to the data_collection folder:

""" IRIS DATASET LOADER """
from mls_lib.orchestration.task import Task
from mls_lib.objects.data_frame import DataFrame
from sklearn import datasets

class IrisDatasetLoader(Task):
    """ Iris Dataset Loader """

    def __init__(self) -> None:
        super().__init__()
    
    def execute(self):
        iris = datasets.load_iris()
        iris_data = iris.data
        iris_target = iris.target
        
        iris_data_df = DataFrame()
        iris_data_df.set_data(iris_data)
        
        iris_target_df = DataFrame()
        iris_target_df.set_data(iris_target)
        
        self._set_output("data", iris_data_df)
        self._set_output("target", iris_target_df)

It is important to update the __init__.py of the data_collection folder so the class can be imported directly:

""" Data Collection Components """
# PREVIOUS MODULES
from . iris_dataset_loader import IrisDatasetLoader

Source files with this example

The source code used in this example can be found in the branch new_feature_demo of the mls_lib and mls_code_generator_config repositories.

Clone this wiki locally