Add performance benchmarking and regression detection #494

theturtle32 · 2025-10-06T16:45:01Z

Summary

Implements Phase 6.3 Performance Regression Detection with comprehensive benchmark suite and automated tracking.

Key Features:

✅ Performance benchmarks for critical operations
✅ Baseline tracking and regression detection
✅ GitHub Actions workflow for CI integration
✅ Automated alerts for >15% performance degradation

Benchmarks Implemented

Frame Operations

Small text frames (17 bytes): 4.3M ops/sec (unmasked), 3M ops/sec (masked)
Medium binary frames (1KB): 4.2M ops/sec
Large binary frames (64KB): 4.1M ops/sec

Connection Operations

Connection creation: 30K ops/sec
Send UTF-8 messages: 25-35K ops/sec
Send binary messages: 27K ops/sec
Ping/Pong frames: 34-38K ops/sec

Usage

# Run benchmarks
pnpm bench

# Save current performance as baseline
pnpm bench:baseline

# Check for regressions (CI)
pnpm bench:check

Regression Detection

The tracking script automatically detects performance regressions:

>15% slower: ❌ Regression (CI warning)
>5% slower: ⚠️ Slower (informational)
>5% faster: 🚀 Faster (improvement)

Files Added

test/benchmark/frame-operations.bench.mjs - Frame serialization benchmarks
test/benchmark/connection-operations.bench.mjs - Connection operation benchmarks
test/benchmark/track-performance.mjs - Baseline tracking script
test/benchmark/baseline.json - Initial performance baseline
test/benchmark/README.md - Documentation
.github/workflows/performance.yml - CI workflow
vitest.bench.config.mjs - Benchmark configuration

Test plan

✅ All benchmarks run successfully
✅ Baseline tracking works correctly
✅ Regression detection script validates properly
✅ All existing tests still pass (628 unit/integration)

🤖 Generated with Claude Code

Implements Phase 6.3 Performance Regression Detection with: - Frame serialization benchmarks (4M+ ops/sec) - Connection operations benchmarks (25-35K ops/sec) - Vitest bench infrastructure with dedicated config - README documentation for benchmark interpretation Benchmarks cover: - Small text frames (17 bytes) - Medium binary frames (1KB) - Large binary frames (64KB) - Connection lifecycle and message sending - Ping/Pong frames 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Automated baseline save/compare script - 15% regression threshold for CI failures - npm scripts for baseline management - Initial performance baseline saved Commands: - pnpm bench:baseline - Save current perf as baseline - pnpm bench:check - Check for regressions (exits 1 if found) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Runs benchmarks on PRs that modify lib/ or benchmarks - Manual workflow_dispatch trigger available - Regression check is informational (warning only) - Uses Node.js 20.x for consistent results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

gemini-code-assist · 2025-10-06T16:45:21Z

Summary of Changes

Hello @theturtle32, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes a robust performance monitoring system for the project, designed to proactively identify and prevent performance regressions in critical WebSocket operations. By integrating a comprehensive benchmarking suite with automated baseline tracking and CI checks, the system ensures that future code changes do not inadvertently degrade the application's speed and responsiveness, thereby maintaining a high level of performance.

Highlights

Performance Benchmarking Suite: Introduced a comprehensive suite of performance benchmarks using Vitest, covering critical WebSocket frame serialization and connection operations (e.g., creating connections, sending messages, ping/pong frames).
Automated Regression Detection: Implemented a script to track performance baselines and automatically detect regressions. It flags performance degradations greater than 15% as regressions and provides warnings for changes between 5% and 15%.
CI Integration: Integrated the performance benchmarks and regression checks into the CI pipeline via a new GitHub Actions workflow, ensuring continuous monitoring of performance with every code change.
Developer Experience Enhancements: Updated package.json scripts and CLAUDE.md documentation to provide clear commands for running benchmarks, saving new baselines, and checking for regressions, making performance monitoring accessible to developers.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/performance.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive performance benchmarking and regression detection system, which is a great addition for maintaining performance. My review focuses on improving the accuracy of the benchmarks, increasing the robustness of the tracking script, and clarifying the configuration. I've suggested refactoring the connection operation benchmarks to separate setup logic from the measured operations for more precise results. I've also proposed a fix to the performance tracking script to handle benchmark output more reliably and a simplification to the vitest configuration.

gemini-code-assist · 2025-10-06T16:49:35Z

test/benchmark/connection-operations.bench.mjs

+  bench('send small UTF-8 message', () => {
+    const socket = new MockSocket();
+    const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
+    connection._addSocketEventListeners();
+    connection.state = 'open';
+    connection.sendUTF('Hello, WebSocket!');
+  });
+
+  bench('send medium UTF-8 message (1KB)', () => {
+    const socket = new MockSocket();
+    const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
+    connection._addSocketEventListeners();
+    connection.state = 'open';
+    const message = 'x'.repeat(1024);
+    connection.sendUTF(message);
+  });
+
+  bench('send binary message (1KB)', () => {
+    const socket = new MockSocket();
+    const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
+    connection._addSocketEventListeners();
+    connection.state = 'open';
+    const buffer = Buffer.alloc(1024);
+    connection.sendBytes(buffer);
+  });
+
+  bench('send ping frame', () => {
+    const socket = new MockSocket();
+    const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
+    connection._addSocketEventListeners();
+    connection.state = 'open';
+    connection.ping();
+  });
+
+  bench('send pong frame', () => {
+    const socket = new MockSocket();
+    const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
+    connection._addSocketEventListeners();
+    connection.state = 'open';
+    connection.pong();
+  });


The current implementation of these benchmarks includes the setup time (creating MockSocket and WebSocketConnection) in the measurement for each operation. This can lead to inaccurate performance metrics for the operations you intend to measure (e.g., sendUTF).

To get more accurate results, you should separate the setup logic from the code being benchmarked using a beforeEach hook. This ensures only the operation itself is timed.

Here is a suggested refactoring that groups the connection operation benchmarks and uses beforeEach for setup. Note that this will change the benchmark suite names in the output (by nesting them), so you will need to regenerate the baseline.json file by running pnpm bench:baseline.

describe('operations on an open connection', () => { let connection; const mediumMessage = 'x'.repeat(1024); const mediumBuffer = Buffer.alloc(1024); beforeEach(() => { const socket = new MockSocket(); connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {}); connection._addSocketEventListeners(); connection.state = 'open'; }); bench('send small UTF-8 message', () => { connection.sendUTF('Hello, WebSocket!'); }); bench('send medium UTF-8 message (1KB)', () => { connection.sendUTF(mediumMessage); }); bench('send binary message (1KB)', () => { connection.sendBytes(mediumBuffer); }); bench('send ping frame', () => { connection.ping(); }); bench('send pong frame', () => { connection.pong(); }); });

gemini-code-assist · 2025-10-06T16:49:35Z

test/benchmark/README.md

+pnpm run bench
+
+# Compare with previous results
+pnpm run bench:compare


The pnpm run bench:compare command is mentioned here, but it's not defined as a script in package.json. The track-performance.mjs script does accept a compare argument. For consistency and ease of use, you should either add the bench:compare script to package.json or update this documentation to refer to an existing script like bench:check.

Suggested change

pnpm run bench:compare

pnpm run bench:check

gemini-code-assist · 2025-10-06T16:49:35Z

test/benchmark/track-performance.mjs

+function parseBenchmarkOutput(output) {
+  const results = {};
+  const lines = output.split('\n');
+
+  let currentSuite = null;
+  for (const line of lines) {
+    // Detect suite name
+    if (line.includes('> WebSocket')) {
+      currentSuite = line.match(/> (.*)/)[1].trim();
+      results[currentSuite] = {};
+    }
+
+    // Parse benchmark results
+    const benchMatch = line.match(/^\s*[·•]\s+(.+?)\s+(\d+(?:,\d+)*(?:\.\d+)?)\s/);
+    if (benchMatch && currentSuite) {
+      const [, name, hz] = benchMatch;
+      results[currentSuite][name.trim()] = parseFloat(hz.replace(/,/g, ''));
+    }
+  }


The current parsing logic for benchmark output can create empty suite entries in baseline.json if the vitest output contains suite headers without immediate benchmark results. This can happen if vitest prints a summary before the detailed results.

You can make the parser more robust by lazily initializing the suite in the results object only when the first benchmark for that suite is found. This will keep the baseline.json file clean and prevent potential issues.

function parseBenchmarkOutput(output) { const results = {}; const lines = output.split('\n'); let currentSuite = null; for (const line of lines) { // Detect suite name if (line.includes('> WebSocket')) { currentSuite = line.match(/> (.*)/)[1].trim(); } // Parse benchmark results const benchMatch = line.match(/^\s*[·•]\s+(.+?)\s+(\d+(?:,\d+)*(?:\.\d+)?)\s/); if (benchMatch && currentSuite) { if (!results[currentSuite]) { results[currentSuite] = {}; } const [, name, hz] = benchMatch; results[currentSuite][name.trim()] = parseFloat(hz.replace(/,/g, '')); } }

gemini-code-assist · 2025-10-06T16:49:36Z

vitest.bench.config.mjs

+  test: {
+    include: ['test/benchmark/**/*.bench.mjs'],
+    benchmark: {
+      include: ['test/benchmark/**/*.bench.mjs'],
+      exclude: ['node_modules/', 'test/unit/', 'test/integration/'],
+    },
+  },


The include property at test.include is redundant. When running vitest bench, the test.benchmark.include path is used to find benchmark files. The top-level test.include is for regular test files and is not needed in this benchmark-specific configuration. Removing it simplifies the config and avoids potential confusion.

test: { benchmark: { include: ['test/benchmark/**/*.bench.mjs'], exclude: ['node_modules/', 'test/unit/', 'test/integration/'], }, },

High Priority: - Separate setup logic from measured operations in connection benchmarks - Create shared connection once, reuse across all benchmark iterations - Dramatically improved benchmark accuracy (2.4M ops/sec for ping vs 33K before) Medium Priority: - Lazily initialize suite results in track-performance.mjs to prevent empty entries - Remove redundant test.include from vitest.bench.config.mjs - Add bench:compare script to package.json to match README documentation Updated baseline.json with new accurate performance measurements: - Connection operations now measure actual send performance (not setup) - Frame serialization remains consistent at 4M+ ops/sec - Ping/pong operations: 2.4M / 1.9M ops/sec (previously 33K) - Message sending: 900K / 220K / 108K ops/sec (previously 25-28K) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

* Address Gemini code review comments on PR #494 High Priority: - Separate setup logic from measured operations in connection benchmarks - Create shared connection once, reuse across all benchmark iterations - Dramatically improved benchmark accuracy (2.4M ops/sec for ping vs 33K before) Medium Priority: - Lazily initialize suite results in track-performance.mjs to prevent empty entries - Remove redundant test.include from vitest.bench.config.mjs - Add bench:compare script to package.json to match README documentation Updated baseline.json with new accurate performance measurements: - Connection operations now measure actual send performance (not setup) - Frame serialization remains consistent at 4M+ ops/sec - Ping/pong operations: 2.4M / 1.9M ops/sec (previously 33K) - Message sending: 900K / 220K / 108K ops/sec (previously 25-28K) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add explanation for not using beforeAll in benchmarks Vitest's benchmark runner doesn't execute hooks (beforeAll/beforeEach) before benchmarks in the same way as test(). Direct initialization at module scope ensures the shared connection is available when benchmarks run. Attempted using beforeAll() but it resulted in 0 hz for all benchmarks using sharedConnection, indicating the hook wasn't executed before benchmark iterations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude Code <[email protected]> Co-authored-by: Claude <[email protected]>

Claude Code and others added 3 commits October 6, 2025 16:42

theturtle32 merged commit 6116cd1 into v2 Oct 6, 2025
4 checks passed

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

theturtle32 mentioned this pull request Oct 6, 2025

Address Gemini code review comments on PR #494 #496

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add performance benchmarking and regression detection #494

Add performance benchmarking and regression detection #494

Uh oh!

theturtle32 commented Oct 6, 2025

Uh oh!

gemini-code-assist bot commented Oct 6, 2025

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Uh oh!

gemini-code-assist bot Oct 6, 2025

Uh oh!

gemini-code-assist bot Oct 6, 2025

Uh oh!

gemini-code-assist bot Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add performance benchmarking and regression detection #494

Add performance benchmarking and regression detection #494

Uh oh!

Conversation

theturtle32 commented Oct 6, 2025

Summary

Benchmarks Implemented

Frame Operations

Connection Operations

Usage

Regression Detection

Files Added

Test plan

Uh oh!

gemini-code-assist bot commented Oct 6, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants