Explainable Concept Drift in Process Mining

Welcome to the repository for "LLM-Augmented Concept Drift Detection for Object Centric Process Mining"! This project extends existing frameworks for concept drift detection in process mining by integrating Large Language Models (LLMs) to provide human-readable, causal explanations and actionable insights.

Building upon the work by Adams et al. (2023), which uses Granger causality and the PELT algorithm on object-centric logs, our method enhances interpretability. For each statistically explainable drift, we leverage LLMs to generate detailed explanations, focusing on both quantitative and qualitative aspects, and investigating the impact of providing rich contextual information.

Project Overview

This repository provides the code to reproduce the experiments, results, and figures presented in the corresponding research paper. Our key contributions include:

LLM-Augmented Explanations: Generating natural language explanations for detected concept drifts, making complex statistical findings accessible to domain experts.
Targeted Prompt Design: A systematic approach to crafting specific LLM prompts to elicit both quantitative (e.g., percentage changes) and qualitative (e.g., business impact, causal relationships, recommendations) responses.
Contextual Impact Analysis: Investigating how providing varying levels of domain-specific context within prompts influences the quality and relevance of LLM-generated explanations.
Multi-LLM Support: Integration with OpenAI (GPT), Anthropic (Claude), and Google (Gemini) models for comparative analysis of their explanatory capabilities.

Code Structure

llm_explainer.py: The core Python script responsible for loading drift data, constructing prompts, calling various LLM APIs, and saving the generated explanations.
prompts.json: A JSON file containing the templates for different prompt categories (quantitative, qualitative) and types (plain data, context-rich). This allows for flexible and systematic prompt management.
drift_results.json: An example input file containing pre-detected concept drift data, including time series values and Granger causality p-values.
llm_explanations.json: The output file where all LLM-generated explanations are saved in JSON format.
environment.yml: Conda environment configuration file for easy dependency management.
experiments.py: (Assumed from user input) Script to reproduce the main experiments and generate figures related to drift detection.

Quickstart

Follow these steps to set up the environment and run the experiments:

1. Prerequisites

Ensure you have Conda installed on your system.

2. Data Preparation

First, unzip example_logs/mdl/BPI2017.zip into the same directory, i.e., example_logs/mdl/BPI2017.csv.

3. Environment Setup

Create and activate the Conda environment by running the following commands in your terminal:

conda env create --file environment.yml
conda activate explainable_concept_drift_experiments

4. API Keys Configuration

This project uses external LLM APIs. You need to set your API keys as environment variables:

export OPENAI_API_KEY="your_openai_api_key"
export ANTHROPIC_API_KEY="your_anthropic_api_key"
export GOOGLE_API_KEY="your_google_api_key"

Replace "your_openai_api_key", "your_anthropic_api_key", and "your_google_api_key" with your actual API keys.

5. Run Experiments

Navigate to the root directory of this repository and run the main experiment script:

python experiments.py

This will reproduce the core drift detection and analysis. To generate LLM explanations based on the detected drifts, run:

python llm_explainer.py

This script will generate llm_explanations.json containing the LLM outputs for various prompt strategies and models.

Contact

For any questions or inquiries, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__MACOSX		__MACOSX
example_logs/mdl		example_logs/mdl
ocpa		ocpa
results/drift_results		results/drift_results
Inclusion_function_comparison_feature_exec_throughput.png		Inclusion_function_comparison_feature_exec_throughput.png
Inclusion_function_comparison_feature_num_objects.png		Inclusion_function_comparison_feature_num_objects.png
README.md		README.md
Window_size_comparison_feature_exec_throughput.png		Window_size_comparison_feature_exec_throughput.png
Window_size_comparison_feature_num_objects.png		Window_size_comparison_feature_num_objects.png
drift_analysis.log		drift_analysis.log
drift_detection.py		drift_detection.py
drift_detection2.py		drift_detection2.py
drift_detection3.py		drift_detection3.py
drift_results.json		drift_results.json
llm_explainer.log		llm_explainer.log
llm_explainer.py		llm_explainer.py
llm_explainer2.py		llm_explainer2.py
llm_explanations.json		llm_explanations.json
llm_explanations_qualitative_business_impact_context_rich.json		llm_explanations_qualitative_business_impact_context_rich.json
llm_explanations_qualitative_business_impact_plain_data.json		llm_explanations_qualitative_business_impact_plain_data.json
llm_explanations_qualitative_causal_relationships_context_rich.json		llm_explanations_qualitative_causal_relationships_context_rich.json
llm_explanations_qualitative_causal_relationships_plain_data.json		llm_explanations_qualitative_causal_relationships_plain_data.json
llm_explanations_qualitative_recommendations_context_rich.json		llm_explanations_qualitative_recommendations_context_rich.json
llm_explanations_qualitative_recommendations_plain_data.json		llm_explanations_qualitative_recommendations_plain_data.json
llm_explanations_quantitative_analysis_context_rich.json		llm_explanations_quantitative_analysis_context_rich.json
llm_explanations_quantitative_analysis_plain_data.json		llm_explanations_quantitative_analysis_plain_data.json
prompts.json		prompts.json
synthetic_csv.zip		synthetic_csv.zip
time_series_example.png		time_series_example.png
time_series_exampleevent_service.png		time_series_exampleevent_service.png
time_series_exampleexec_identity.png		time_series_exampleexec_identity.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Explainable Concept Drift in Process Mining

Project Overview

Code Structure

Quickstart

1. Prerequisites

2. Data Preparation

3. Environment Setup

4. API Keys Configuration

5. Run Experiments

Contact

About

Uh oh!

Releases

Packages

Languages

arseniykan/llm_explainable_drift

Folders and files

Latest commit

History

Repository files navigation

Explainable Concept Drift in Process Mining

Project Overview

Code Structure

Quickstart

1. Prerequisites

2. Data Preparation

3. Environment Setup

4. API Keys Configuration

5. Run Experiments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages