Skip to content

Conversation

@theturtle32
Copy link
Owner

Summary

Implements Phase 6.3 Performance Regression Detection with comprehensive benchmark suite and automated tracking.

Key Features:

  • ✅ Performance benchmarks for critical operations
  • ✅ Baseline tracking and regression detection
  • ✅ GitHub Actions workflow for CI integration
  • ✅ Automated alerts for >15% performance degradation

Benchmarks Implemented

Frame Operations

  • Small text frames (17 bytes): 4.3M ops/sec (unmasked), 3M ops/sec (masked)
  • Medium binary frames (1KB): 4.2M ops/sec
  • Large binary frames (64KB): 4.1M ops/sec

Connection Operations

  • Connection creation: 30K ops/sec
  • Send UTF-8 messages: 25-35K ops/sec
  • Send binary messages: 27K ops/sec
  • Ping/Pong frames: 34-38K ops/sec

Usage

# Run benchmarks
pnpm bench

# Save current performance as baseline
pnpm bench:baseline

# Check for regressions (CI)
pnpm bench:check

Regression Detection

The tracking script automatically detects performance regressions:

  • >15% slower: ❌ Regression (CI warning)
  • >5% slower: ⚠️ Slower (informational)
  • >5% faster: 🚀 Faster (improvement)

Files Added

  • test/benchmark/frame-operations.bench.mjs - Frame serialization benchmarks
  • test/benchmark/connection-operations.bench.mjs - Connection operation benchmarks
  • test/benchmark/track-performance.mjs - Baseline tracking script
  • test/benchmark/baseline.json - Initial performance baseline
  • test/benchmark/README.md - Documentation
  • .github/workflows/performance.yml - CI workflow
  • vitest.bench.config.mjs - Benchmark configuration

Test plan

  • ✅ All benchmarks run successfully
  • ✅ Baseline tracking works correctly
  • ✅ Regression detection script validates properly
  • ✅ All existing tests still pass (628 unit/integration)

🤖 Generated with Claude Code

Claude Code and others added 3 commits October 6, 2025 16:42
Implements Phase 6.3 Performance Regression Detection with:
- Frame serialization benchmarks (4M+ ops/sec)
- Connection operations benchmarks (25-35K ops/sec)
- Vitest bench infrastructure with dedicated config
- README documentation for benchmark interpretation

Benchmarks cover:
- Small text frames (17 bytes)
- Medium binary frames (1KB)
- Large binary frames (64KB)
- Connection lifecycle and message sending
- Ping/Pong frames

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Automated baseline save/compare script
- 15% regression threshold for CI failures
- npm scripts for baseline management
- Initial performance baseline saved

Commands:
- pnpm bench:baseline - Save current perf as baseline
- pnpm bench:check - Check for regressions (exits 1 if found)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Runs benchmarks on PRs that modify lib/ or benchmarks
- Manual workflow_dispatch trigger available
- Regression check is informational (warning only)
- Uses Node.js 20.x for consistent results

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@gemini-code-assist
Copy link

Summary of Changes

Hello @theturtle32, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes a robust performance monitoring system for the project, designed to proactively identify and prevent performance regressions in critical WebSocket operations. By integrating a comprehensive benchmarking suite with automated baseline tracking and CI checks, the system ensures that future code changes do not inadvertently degrade the application's speed and responsiveness, thereby maintaining a high level of performance.

Highlights

  • Performance Benchmarking Suite: Introduced a comprehensive suite of performance benchmarks using Vitest, covering critical WebSocket frame serialization and connection operations (e.g., creating connections, sending messages, ping/pong frames).
  • Automated Regression Detection: Implemented a script to track performance baselines and automatically detect regressions. It flags performance degradations greater than 15% as regressions and provides warnings for changes between 5% and 15%.
  • CI Integration: Integrated the performance benchmarks and regression checks into the CI pipeline via a new GitHub Actions workflow, ensuring continuous monitoring of performance with every code change.
  • Developer Experience Enhancements: Updated package.json scripts and CLAUDE.md documentation to provide clear commands for running benchmarks, saving new baselines, and checking for regressions, making performance monitoring accessible to developers.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/performance.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@theturtle32 theturtle32 merged commit 6116cd1 into v2 Oct 6, 2025
4 checks passed
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive performance benchmarking and regression detection system, which is a great addition for maintaining performance. My review focuses on improving the accuracy of the benchmarks, increasing the robustness of the tracking script, and clarifying the configuration. I've suggested refactoring the connection operation benchmarks to separate setup logic from the measured operations for more precise results. I've also proposed a fix to the performance tracking script to handle benchmark output more reliably and a simplification to the vitest configuration.

Comment on lines +12 to +52
bench('send small UTF-8 message', () => {
const socket = new MockSocket();
const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
connection._addSocketEventListeners();
connection.state = 'open';
connection.sendUTF('Hello, WebSocket!');
});

bench('send medium UTF-8 message (1KB)', () => {
const socket = new MockSocket();
const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
connection._addSocketEventListeners();
connection.state = 'open';
const message = 'x'.repeat(1024);
connection.sendUTF(message);
});

bench('send binary message (1KB)', () => {
const socket = new MockSocket();
const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
connection._addSocketEventListeners();
connection.state = 'open';
const buffer = Buffer.alloc(1024);
connection.sendBytes(buffer);
});

bench('send ping frame', () => {
const socket = new MockSocket();
const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
connection._addSocketEventListeners();
connection.state = 'open';
connection.ping();
});

bench('send pong frame', () => {
const socket = new MockSocket();
const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
connection._addSocketEventListeners();
connection.state = 'open';
connection.pong();
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of these benchmarks includes the setup time (creating MockSocket and WebSocketConnection) in the measurement for each operation. This can lead to inaccurate performance metrics for the operations you intend to measure (e.g., sendUTF).

To get more accurate results, you should separate the setup logic from the code being benchmarked using a beforeEach hook. This ensures only the operation itself is timed.

Here is a suggested refactoring that groups the connection operation benchmarks and uses beforeEach for setup. Note that this will change the benchmark suite names in the output (by nesting them), so you will need to regenerate the baseline.json file by running pnpm bench:baseline.

  describe('operations on an open connection', () => {
    let connection;
    const mediumMessage = 'x'.repeat(1024);
    const mediumBuffer = Buffer.alloc(1024);

    beforeEach(() => {
      const socket = new MockSocket();
      connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
      connection._addSocketEventListeners();
      connection.state = 'open';
    });

    bench('send small UTF-8 message', () => {
      connection.sendUTF('Hello, WebSocket!');
    });

    bench('send medium UTF-8 message (1KB)', () => {
      connection.sendUTF(mediumMessage);
    });

    bench('send binary message (1KB)', () => {
      connection.sendBytes(mediumBuffer);
    });

    bench('send ping frame', () => {
      connection.ping();
    });

    bench('send pong frame', () => {
      connection.pong();
    });
  });

pnpm run bench

# Compare with previous results
pnpm run bench:compare

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pnpm run bench:compare command is mentioned here, but it's not defined as a script in package.json. The track-performance.mjs script does accept a compare argument. For consistency and ease of use, you should either add the bench:compare script to package.json or update this documentation to refer to an existing script like bench:check.

Suggested change
pnpm run bench:compare
pnpm run bench:check

Comment on lines +29 to +47
function parseBenchmarkOutput(output) {
const results = {};
const lines = output.split('\n');

let currentSuite = null;
for (const line of lines) {
// Detect suite name
if (line.includes('> WebSocket')) {
currentSuite = line.match(/> (.*)/)[1].trim();
results[currentSuite] = {};
}

// Parse benchmark results
const benchMatch = line.match(/^\s*[·•]\s+(.+?)\s+(\d+(?:,\d+)*(?:\.\d+)?)\s/);
if (benchMatch && currentSuite) {
const [, name, hz] = benchMatch;
results[currentSuite][name.trim()] = parseFloat(hz.replace(/,/g, ''));
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current parsing logic for benchmark output can create empty suite entries in baseline.json if the vitest output contains suite headers without immediate benchmark results. This can happen if vitest prints a summary before the detailed results.

You can make the parser more robust by lazily initializing the suite in the results object only when the first benchmark for that suite is found. This will keep the baseline.json file clean and prevent potential issues.

function parseBenchmarkOutput(output) {
  const results = {};
  const lines = output.split('\n');

  let currentSuite = null;
  for (const line of lines) {
    // Detect suite name
    if (line.includes('> WebSocket')) {
      currentSuite = line.match(/> (.*)/)[1].trim();
    }

    // Parse benchmark results
    const benchMatch = line.match(/^\s*[·]\s+(.+?)\s+(\d+(?:,\d+)*(?:\.\d+)?)\s/);
    if (benchMatch && currentSuite) {
      if (!results[currentSuite]) {
        results[currentSuite] = {};
      }
      const [, name, hz] = benchMatch;
      results[currentSuite][name.trim()] = parseFloat(hz.replace(/,/g, ''));
    }
  }

Comment on lines +4 to +10
test: {
include: ['test/benchmark/**/*.bench.mjs'],
benchmark: {
include: ['test/benchmark/**/*.bench.mjs'],
exclude: ['node_modules/', 'test/unit/', 'test/integration/'],
},
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The include property at test.include is redundant. When running vitest bench, the test.benchmark.include path is used to find benchmark files. The top-level test.include is for regular test files and is not needed in this benchmark-specific configuration. Removing it simplifies the config and avoids potential confusion.

  test: {
    benchmark: {
      include: ['test/benchmark/**/*.bench.mjs'],
      exclude: ['node_modules/', 'test/unit/', 'test/integration/'],
    },
  },

theturtle32 pushed a commit that referenced this pull request Oct 6, 2025
High Priority:
- Separate setup logic from measured operations in connection benchmarks
- Create shared connection once, reuse across all benchmark iterations
- Dramatically improved benchmark accuracy (2.4M ops/sec for ping vs 33K before)

Medium Priority:
- Lazily initialize suite results in track-performance.mjs to prevent empty entries
- Remove redundant test.include from vitest.bench.config.mjs
- Add bench:compare script to package.json to match README documentation

Updated baseline.json with new accurate performance measurements:
- Connection operations now measure actual send performance (not setup)
- Frame serialization remains consistent at 4M+ ops/sec
- Ping/pong operations: 2.4M / 1.9M ops/sec (previously 33K)
- Message sending: 900K / 220K / 108K ops/sec (previously 25-28K)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
theturtle32 added a commit that referenced this pull request Oct 6, 2025
* Address Gemini code review comments on PR #494

High Priority:
- Separate setup logic from measured operations in connection benchmarks
- Create shared connection once, reuse across all benchmark iterations
- Dramatically improved benchmark accuracy (2.4M ops/sec for ping vs 33K before)

Medium Priority:
- Lazily initialize suite results in track-performance.mjs to prevent empty entries
- Remove redundant test.include from vitest.bench.config.mjs
- Add bench:compare script to package.json to match README documentation

Updated baseline.json with new accurate performance measurements:
- Connection operations now measure actual send performance (not setup)
- Frame serialization remains consistent at 4M+ ops/sec
- Ping/pong operations: 2.4M / 1.9M ops/sec (previously 33K)
- Message sending: 900K / 220K / 108K ops/sec (previously 25-28K)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add explanation for not using beforeAll in benchmarks

Vitest's benchmark runner doesn't execute hooks (beforeAll/beforeEach)
before benchmarks in the same way as test(). Direct initialization at
module scope ensures the shared connection is available when benchmarks run.

Attempted using beforeAll() but it resulted in 0 hz for all benchmarks
using sharedConnection, indicating the hook wasn't executed before
benchmark iterations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude Code <[email protected]>
Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants