LogDelta - Go Beyond Grepping with NLP-based Log Analysis!
Textual log line level anomaly detection. Which one is anomaly?
See YouTube demonstrating the tool in action.
We recommend using a virtual environment to ensure smooth operations.
conda create -n logdelta python=3.11
conda activate logdelta
Install logdelta.
pip install logdelta
Download source code, and navigate to demo folder
git clone https://github.com/EvoTestOps/LogDelta.git
cd LogDelta/demo
Get data
wget -O Hadoop.zip https://zenodo.org/records/8196385/files/Hadoop.zip?download=1
unzip Hadoop.zip -d Hadoop
Run analysis
python -m logdelta.config_runner -c config.yml`
Observer results in LogDelta/demo/Output
.
For more examples see LogDelta/demo/label_investigation and LogDelta/demo/full
LogDelta assumes your folders represent a collection of software logs of interest. LogDelta performs a comparison between two or more folders using matching file names. A target run represents a software run we are interested in analyzing. LogDelta uses comparison runs as a baseline. For example, the "My_passing_logs1", "My_passing_logs2", "My_passing_logs3" folders can be comparison runs, while "My_failing_logs" would be your target run that you want to analyze with respect to comparison runs.
In LogDelta, three types of analysis are available:
-
Visualize
- Multiple logs files or runs with UMAP based on two dimensional scaling of the log contents.
- Individual log files with log anomaly scoring (see step 3 for details anomaly detection supported)
-
Measure the distance between two logs or sets of logs using:
- Jaccard distance
- Cosine distance
- Containment distance
- Compression distance
-
Build an anomaly detection model from a set of logs and use it to score anomalies (higher scores more anomalous) in a log file using :
- KMeans (kmeans)
- IsolationForest (IF)
- RarityModel (RM)
- Out-of-Vocabulary Detector (OOVD)
Analysis can be done at four different levels:
- Run (folder) level, investigating the names of files without looking at their contents.
- Run (folder) level, investigating run contents (this is slower than what is done in 1).
- File level, investigating file contents (matched with the same names between runs).
- Line level, investigating line contents (matched with the same names between runs).
logai. LogDelta shares many similarities with LogAI, a tool developed by Salesforce. However, the last time we checked, LogAI was not actively maintained. With some help from the issue tracker, we wer able to get it running. Yet, Impression was that it was a bit on the slow side compared to LogDelta. LogDelta runs on top of Polars, which offers excellent performance for processing log files with more than ten million rows on a laptop computer.
angel-grinder performs statistical analysis on log files, such as calculating the average response time in the logs. This is complementary to our tool as it allows analysis to be done within a single log file. Logdelta is not really useful for single log file analysis; rather, it requires 2 to n log files.
lnav - Logfile navigator is advertised as a tool for merging, tailing, searching, filtering, and querying log files. This is a great complement to LogDelta. In fact, during our Hadoop use case, we implemented a small script for log querying, but we would likely have been much better off using lnav.
Loglizer performs anomaly detection on logs. The last commit was 18 months ago, so it might no longer be actively maintained. However, it assumes parsed log data (e.g., with Drain), whereas LogDelta accepts raw text files. Loglizer does not appear to offer any visualizations. It seems to be more focused on anomaly detection benchmarking and, in this sense, is similar to our previous tool, LogLead, which was published a year ago. LogDelta is build on top of LogLead1. https://pypi.org/project/LogLead/
Footnotes
-
Mäntylä MV, Wang Y, Nyyssölä J. Loglead-fast and integrated log loader, enhancer, and anomaly detector. In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024 Mar 12 (pp. 395-399). IEEE. ↩