-
Notifications
You must be signed in to change notification settings - Fork 603
Add performance benchmarking and regression detection #494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements Phase 6.3 Performance Regression Detection with: - Frame serialization benchmarks (4M+ ops/sec) - Connection operations benchmarks (25-35K ops/sec) - Vitest bench infrastructure with dedicated config - README documentation for benchmark interpretation Benchmarks cover: - Small text frames (17 bytes) - Medium binary frames (1KB) - Large binary frames (64KB) - Connection lifecycle and message sending - Ping/Pong frames 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Automated baseline save/compare script - 15% regression threshold for CI failures - npm scripts for baseline management - Initial performance baseline saved Commands: - pnpm bench:baseline - Save current perf as baseline - pnpm bench:check - Check for regressions (exits 1 if found) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Runs benchmarks on PRs that modify lib/ or benchmarks - Manual workflow_dispatch trigger available - Regression check is informational (warning only) - Uses Node.js 20.x for consistent results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Summary of ChangesHello @theturtle32, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes a robust performance monitoring system for the project, designed to proactively identify and prevent performance regressions in critical WebSocket operations. By integrating a comprehensive benchmarking suite with automated baseline tracking and CI checks, the system ensures that future code changes do not inadvertently degrade the application's speed and responsiveness, thereby maintaining a high level of performance. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive performance benchmarking and regression detection system, which is a great addition for maintaining performance. My review focuses on improving the accuracy of the benchmarks, increasing the robustness of the tracking script, and clarifying the configuration. I've suggested refactoring the connection operation benchmarks to separate setup logic from the measured operations for more precise results. I've also proposed a fix to the performance tracking script to handle benchmark output more reliably and a simplification to the vitest configuration.
| bench('send small UTF-8 message', () => { | ||
| const socket = new MockSocket(); | ||
| const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {}); | ||
| connection._addSocketEventListeners(); | ||
| connection.state = 'open'; | ||
| connection.sendUTF('Hello, WebSocket!'); | ||
| }); | ||
|
|
||
| bench('send medium UTF-8 message (1KB)', () => { | ||
| const socket = new MockSocket(); | ||
| const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {}); | ||
| connection._addSocketEventListeners(); | ||
| connection.state = 'open'; | ||
| const message = 'x'.repeat(1024); | ||
| connection.sendUTF(message); | ||
| }); | ||
|
|
||
| bench('send binary message (1KB)', () => { | ||
| const socket = new MockSocket(); | ||
| const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {}); | ||
| connection._addSocketEventListeners(); | ||
| connection.state = 'open'; | ||
| const buffer = Buffer.alloc(1024); | ||
| connection.sendBytes(buffer); | ||
| }); | ||
|
|
||
| bench('send ping frame', () => { | ||
| const socket = new MockSocket(); | ||
| const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {}); | ||
| connection._addSocketEventListeners(); | ||
| connection.state = 'open'; | ||
| connection.ping(); | ||
| }); | ||
|
|
||
| bench('send pong frame', () => { | ||
| const socket = new MockSocket(); | ||
| const connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {}); | ||
| connection._addSocketEventListeners(); | ||
| connection.state = 'open'; | ||
| connection.pong(); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of these benchmarks includes the setup time (creating MockSocket and WebSocketConnection) in the measurement for each operation. This can lead to inaccurate performance metrics for the operations you intend to measure (e.g., sendUTF).
To get more accurate results, you should separate the setup logic from the code being benchmarked using a beforeEach hook. This ensures only the operation itself is timed.
Here is a suggested refactoring that groups the connection operation benchmarks and uses beforeEach for setup. Note that this will change the benchmark suite names in the output (by nesting them), so you will need to regenerate the baseline.json file by running pnpm bench:baseline.
describe('operations on an open connection', () => {
let connection;
const mediumMessage = 'x'.repeat(1024);
const mediumBuffer = Buffer.alloc(1024);
beforeEach(() => {
const socket = new MockSocket();
connection = new WebSocketConnection(socket, [], 'echo-protocol', false, {});
connection._addSocketEventListeners();
connection.state = 'open';
});
bench('send small UTF-8 message', () => {
connection.sendUTF('Hello, WebSocket!');
});
bench('send medium UTF-8 message (1KB)', () => {
connection.sendUTF(mediumMessage);
});
bench('send binary message (1KB)', () => {
connection.sendBytes(mediumBuffer);
});
bench('send ping frame', () => {
connection.ping();
});
bench('send pong frame', () => {
connection.pong();
});
});| pnpm run bench | ||
|
|
||
| # Compare with previous results | ||
| pnpm run bench:compare |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pnpm run bench:compare command is mentioned here, but it's not defined as a script in package.json. The track-performance.mjs script does accept a compare argument. For consistency and ease of use, you should either add the bench:compare script to package.json or update this documentation to refer to an existing script like bench:check.
| pnpm run bench:compare | |
| pnpm run bench:check |
| function parseBenchmarkOutput(output) { | ||
| const results = {}; | ||
| const lines = output.split('\n'); | ||
|
|
||
| let currentSuite = null; | ||
| for (const line of lines) { | ||
| // Detect suite name | ||
| if (line.includes('> WebSocket')) { | ||
| currentSuite = line.match(/> (.*)/)[1].trim(); | ||
| results[currentSuite] = {}; | ||
| } | ||
|
|
||
| // Parse benchmark results | ||
| const benchMatch = line.match(/^\s*[·•]\s+(.+?)\s+(\d+(?:,\d+)*(?:\.\d+)?)\s/); | ||
| if (benchMatch && currentSuite) { | ||
| const [, name, hz] = benchMatch; | ||
| results[currentSuite][name.trim()] = parseFloat(hz.replace(/,/g, '')); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current parsing logic for benchmark output can create empty suite entries in baseline.json if the vitest output contains suite headers without immediate benchmark results. This can happen if vitest prints a summary before the detailed results.
You can make the parser more robust by lazily initializing the suite in the results object only when the first benchmark for that suite is found. This will keep the baseline.json file clean and prevent potential issues.
function parseBenchmarkOutput(output) {
const results = {};
const lines = output.split('\n');
let currentSuite = null;
for (const line of lines) {
// Detect suite name
if (line.includes('> WebSocket')) {
currentSuite = line.match(/> (.*)/)[1].trim();
}
// Parse benchmark results
const benchMatch = line.match(/^\s*[·•]\s+(.+?)\s+(\d+(?:,\d+)*(?:\.\d+)?)\s/);
if (benchMatch && currentSuite) {
if (!results[currentSuite]) {
results[currentSuite] = {};
}
const [, name, hz] = benchMatch;
results[currentSuite][name.trim()] = parseFloat(hz.replace(/,/g, ''));
}
}| test: { | ||
| include: ['test/benchmark/**/*.bench.mjs'], | ||
| benchmark: { | ||
| include: ['test/benchmark/**/*.bench.mjs'], | ||
| exclude: ['node_modules/', 'test/unit/', 'test/integration/'], | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The include property at test.include is redundant. When running vitest bench, the test.benchmark.include path is used to find benchmark files. The top-level test.include is for regular test files and is not needed in this benchmark-specific configuration. Removing it simplifies the config and avoids potential confusion.
test: {
benchmark: {
include: ['test/benchmark/**/*.bench.mjs'],
exclude: ['node_modules/', 'test/unit/', 'test/integration/'],
},
},High Priority: - Separate setup logic from measured operations in connection benchmarks - Create shared connection once, reuse across all benchmark iterations - Dramatically improved benchmark accuracy (2.4M ops/sec for ping vs 33K before) Medium Priority: - Lazily initialize suite results in track-performance.mjs to prevent empty entries - Remove redundant test.include from vitest.bench.config.mjs - Add bench:compare script to package.json to match README documentation Updated baseline.json with new accurate performance measurements: - Connection operations now measure actual send performance (not setup) - Frame serialization remains consistent at 4M+ ops/sec - Ping/pong operations: 2.4M / 1.9M ops/sec (previously 33K) - Message sending: 900K / 220K / 108K ops/sec (previously 25-28K) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
* Address Gemini code review comments on PR #494 High Priority: - Separate setup logic from measured operations in connection benchmarks - Create shared connection once, reuse across all benchmark iterations - Dramatically improved benchmark accuracy (2.4M ops/sec for ping vs 33K before) Medium Priority: - Lazily initialize suite results in track-performance.mjs to prevent empty entries - Remove redundant test.include from vitest.bench.config.mjs - Add bench:compare script to package.json to match README documentation Updated baseline.json with new accurate performance measurements: - Connection operations now measure actual send performance (not setup) - Frame serialization remains consistent at 4M+ ops/sec - Ping/pong operations: 2.4M / 1.9M ops/sec (previously 33K) - Message sending: 900K / 220K / 108K ops/sec (previously 25-28K) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add explanation for not using beforeAll in benchmarks Vitest's benchmark runner doesn't execute hooks (beforeAll/beforeEach) before benchmarks in the same way as test(). Direct initialization at module scope ensures the shared connection is available when benchmarks run. Attempted using beforeAll() but it resulted in 0 hz for all benchmarks using sharedConnection, indicating the hook wasn't executed before benchmark iterations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude Code <[email protected]> Co-authored-by: Claude <[email protected]>
Summary
Implements Phase 6.3 Performance Regression Detection with comprehensive benchmark suite and automated tracking.
Key Features:
Benchmarks Implemented
Frame Operations
Connection Operations
Usage
Regression Detection
The tracking script automatically detects performance regressions:
Files Added
test/benchmark/frame-operations.bench.mjs- Frame serialization benchmarkstest/benchmark/connection-operations.bench.mjs- Connection operation benchmarkstest/benchmark/track-performance.mjs- Baseline tracking scripttest/benchmark/baseline.json- Initial performance baselinetest/benchmark/README.md- Documentation.github/workflows/performance.yml- CI workflowvitest.bench.config.mjs- Benchmark configurationTest plan
🤖 Generated with Claude Code