Skip to content

Conversation

@ItzNotABug
Copy link
Member

@ItzNotABug ItzNotABug commented Nov 12, 2025

What does this PR do?

Screenshot 2025-11-16 at 12 52 51 PM

Test Plan

Manual, Unit and E2Es.

Related PRs and Issues

N/A.

Have you read the Contributing Guidelines on issues?

Yes.

Summary by CodeRabbit

  • Chores
    • Updated container base and runtime layout, added explicit startup commands, adjusted browser runtime and permissions, and ignored build artifacts to reduce image size.
    • Added an automated cleanup utility to prune unnecessary package files and shrink images.
  • New Features
    • Introduced routed endpoints for screenshots, reports, health and test.
    • Changed default screenshot delay to zero.
  • Tests
    • Added container-side benchmarking and PR reporting for startup, memory and screenshot metrics; updated tests to reflect new defaults.

✏️ Tip: You can customize this high-level summary in your review settings.

@ItzNotABug ItzNotABug self-assigned this Nov 12, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

Walkthrough

Replaces Docker build stages to use a Debian-based Bun builder and a headless-shell runtime, adds a chrome user/group, switches APK Chromium steps to apt fonts and headless-shell, updates PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH and adds ENTRYPOINT/CMD. Adds src/utils/clean-modules.ts: a standalone Bun script that prunes node_modules (docs, tests, TS sources, dev files, selected locales, and other heavy paths) and reports space saved. Adds a Router utility and refactors src/server.ts to delegate routing to it. Changes default screenshot sleep from 3000 to 0, updates corresponding unit test, updates .dockerignore to include lighthouse/, and extends CI workflow to collect Docker stats and comment PRs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Dockerfile: verify base image swap, added clean-modules step, chown-based COPY semantics, new chrome user, executable path change, ENTRYPOINT/CMD, and lighthouse-related install/permissions.
  • src/utils/clean-modules.ts: review deletion patterns, concurrency, safety checks, preserved files, and accurate space calculation.
  • src/utils/router.ts and src/server.ts: validate routing matching logic, RegExp path construction, ordering, and error/edge-case behavior.
  • src/schemas/screenshot.schema.ts and tests: confirm default change and test alignment.
  • .github/workflows/test.yml: review new metric collection steps, outputs, and PR comment step for correctness and secrets/permissions handling.
  • .dockerignore: confirm lighthouse/ exclusion and implications for build context.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Add: aggressive cleanup' is vague and does not clearly summarize the main changes; it uses non-descriptive phrasing without conveying what was actually cleaned up or improved. Use a more specific title that clearly describes the primary change, such as 'Optimize Docker image with aggressive node_modules cleanup and Debian migration' or 'Reduce image size with node_modules cleanup and headless-shell base.'
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch aggressive-improvements

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ItzNotABug ItzNotABug requested a review from Meldiron November 13, 2025 16:47
@ItzNotABug ItzNotABug changed the base branch from tests to main November 13, 2025 16:48
@ItzNotABug
Copy link
Member Author

ItzNotABug commented Nov 16, 2025

DO NOT MERGE, this is has a bit more memory usage due to UPX.
Fixed by removing UPX and moving to other formats.

@ItzNotABug ItzNotABug marked this pull request as ready for review November 16, 2025 07:22
@ItzNotABug ItzNotABug mentioned this pull request Nov 16, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/utils/clean-modules.ts (4)

5-10: Consider avoiding the hard‑coded node_modules path

NODE_MODULES is hard‑coded to /app/node_modules, which is fine for this Docker image but makes the script less reusable elsewhere (e.g., different WORKDIR or local runs).

You might consider deriving it from process.cwd() or an environment variable for flexibility:

-const NODE_MODULES = "/app/node_modules";
+const NODE_MODULES = process.env.NODE_MODULES_DIR ?? `${process.cwd()}/node_modules`;

37-98: Aggressive glob deletions can break runtime dependencies; consider tightening or allow‑listing

The patterns here (*.txt, all *.ts/*.tsx/*.jsx, various config files, etc.) are very aggressive across the entire node_modules tree. While this squeezes size, it risks breaking packages that ship required runtime data or scripts in these forms (e.g., .txt resource files or .sh scripts invoked at runtime).

Recommendation:

  • Start from stricter, clearly “dev‑only” targets (tests, docs, coverage, examples, .github, etc.).
  • For generic patterns like *.txt, *.sh, or *.ts[x], consider:
    • Restricting them to known-safe directories (**/test/**, **/docs/**, **/examples/**), or
    • Introducing an allow‑list of packages that must be left untouched.
  • Ensure your full e2e/regression suite runs against an image built with this cleaner to catch any subtle breakages.

108-160: Locale stubbing effectively disables non‑English support—confirm product requirements

The locale removals for trace_engine, Lighthouse, and axe-core leave only en-US.json/en.json and replace all other locales with {} stubs. That’s a sensible space optimization if the container is strictly English‑only, but it will break or degrade i18n behavior if non‑English output is ever requested.

If non‑English Lighthouse/axe‑core output might matter (now or in future), consider:

  • Keeping a small set of supported locales instead of nuking all non‑English ones, or
  • Making the list of preserved locales configurable via env.

180-195: Parallel cleanup is fast but can be heavy; consider serializing if build resources are tight

Running all cleanup steps inside Promise.all will kick off multiple find/rm traversals over node_modules concurrently. That’s good for speed, but can spike I/O and memory on constrained CI runners.

If you see build instability or throttling, consider executing the phases sequentially (or in smaller batches) instead of all at once. For example:

-  await Promise.all([
-    removeDocumentationFiles(),
-    removeTypeScriptSources(),
-    ...
-  ]);
+  await removeDocumentationFiles();
+  await removeTypeScriptSources();
+  ...
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75d9505 and e7ba28c.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • .dockerignore (1 hunks)
  • Dockerfile (1 hunks)
  • src/utils/clean-modules.ts (1 hunks)
🔇 Additional comments (5)
.dockerignore (1)

19-19: Ignoring lighthouse/ in build context is appropriate

Adding lighthouse/ next to .lighthouse/ keeps local Lighthouse reports out of the Docker build context and helps keep images lean. No changes needed.

Dockerfile (4)

1-12: Base stage + cleanup wiring looks coherent; be aware of --production if reusing the stage

Using oven/bun:1.3.2-debian as base, running bun install --frozen-lockfile --production, and then bun run ./src/utils/clean-modules.ts is a clean way to produce a minimized production node_modules tree for the final image.

One implication: if you later reuse this base stage for building/running tests inside Docker, devDependencies won’t be available. In that case you may want a separate “dev-deps” stage or to drop --production in that specific pipeline.


14-25: Final headless-shell stage and font/tini setup look fine

Switching the runtime to chromedp/headless-shell and installing only tini + fonts + certs is lean and appropriate for a headless browser container. Cleaning apt caches in the same layer keeps the image size in check.

No issues spotted here.


39-45: Ownership and lighthouse reports directory align with .dockerignore

Using COPY --chown=chrome:chrome for src/, package.json, and node_modules ensures the chrome user can run everything without extra chown layers. Creating the lighthouse directory owned by chrome matches the .dockerignore rules that keep any host-side lighthouse/ out of the build context, while still providing a runtime reports directory inside the container.

Looks good.


28-35: Path is correct—no action needed

The chromedp/headless-shell:143.0.7445.3 image does expose the headless binary at /headless-shell/headless-shell, which matches the PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH environment variable set in the Dockerfile. The configuration is correct.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/utils/clean-modules.ts (3)

5-5: Hard-coded path limits script portability.

The /app/node_modules path assumes the Docker environment layout. This prevents running the script locally for testing or in other environments.

Consider parameterizing via an environment variable with a default:

-const NODE_MODULES = "/app/node_modules";
+const NODE_MODULES = process.env.NODE_MODULES_PATH ?? "/app/node_modules";

40-40: *Minor: .d.ts deletion is partially redundant with later cleanup.

Line 40 deletes *.d.ts files, which would match most TypeScript declarations. Lines 104-105 later delete *.d.cts and *.d.mts, which are CommonJS and ES Module specific declaration files. The overlap is minor since the patterns are slightly different, but this could be consolidated.


179-201: Consider adding existence check before cleanup.

The script assumes NODE_MODULES exists when calling getDirSize. If the directory doesn't exist (e.g., due to misconfiguration or running in wrong context), du will fail with a cryptic error.

Add validation at the start of cleanModules:

 async function cleanModules(): Promise<void> {
 	console.log("🧹 Starting node_modules cleanup...");
+	try {
+		await $`test -d ${NODE_MODULES}`.quiet();
+	} catch {
+		console.error(`❌ Directory ${NODE_MODULES} does not exist`);
+		process.exit(1);
+	}
 	const startSize = await getDirSize(NODE_MODULES);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7ba28c and f999d75.

📒 Files selected for processing (1)
  • src/utils/clean-modules.ts (1 hunks)
🔇 Additional comments (5)
src/utils/clean-modules.ts (5)

7-35: Helper functions are well-structured.

The utility functions properly handle errors and use shell commands safely with no injection risks. The .quiet() calls and error suppression ensure cleanup doesn't fail on missing files.


108-160: Locale cleanup strategy is sound.

The approach of stubbing non-English locale files for Lighthouse and axe-core (rather than deleting them) prevents potential runtime errors if code attempts to load those locales. The error handling ensures missing directories don't cause failures.


203-206: Entry point error handling is appropriate.

The top-level error handler properly logs failures and exits with a non-zero code, ensuring Docker build fails if cleanup encounters issues.


162-177: I need to verify two remaining concerns about dependencies that couldn't be checked in the sandbox. Let me search for documentation on these packages.

Based on the verification results, all flagged dependencies are safely deleted and do not appear to be used at runtime:

  • @sentry and @opentelemetry are not used anywhere in the application code and are not listed in package.json
  • devtools-protocol is imported only as types within puppeteer-core and is a nested dependency; removing it from node_modules won't affect puppeteer's functionality
  • domain-map.csv is used for Lighthouse data aggregation to BigQuery, not for runtime audits

No critical issues found with the deleted dependencies.

All deletions in the removeUnnecessaryFiles() function are safe:

  • @sentry (line 171) and @opentelemetry (line 172) are unused transitive dependencies not present in package.json or application code
  • devtools-protocol (line 169) is a type-only import within puppeteer-core and won't break Puppeteer
  • domain-map.csv (line 167) is for Lighthouse data infrastructure, not runtime audits

52-59: Verification confirms the dependency tree ships compiled JavaScript—no issues found.

The verification script found zero .ts files in node_modules and no packages declaring TypeScript sources as entry points. The removeTypeScriptSources() function safely deletes only source files that don't break the current dependency tree. The concern is theoretically valid but does not manifest as a practical issue in this codebase.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/utils/clean-modules.ts (1)

173-173: CRITICAL: Line 173 still deletes axe-core/axe.js despite previous review.

A previous review comment flagged line 173 as a critical issue, stating that axe-core/axe.js is the main entrypoint for the axe-core package and is required by Lighthouse's accessibility audits at runtime. The comment was marked as "✅ Addressed in commit f999d75", but this line is still present in the current code.

According to the past review and AI summary, the application uses Lighthouse audits in production (via src/routes/reports.ts), which depends on axe-core. Deleting this file will cause runtime failures when Lighthouse attempts to run accessibility audits.

Action required: Remove or comment out line 173:

-  await deletePath(`${NODE_MODULES}/axe-core/axe.js`);

If this was intentionally kept for some reason, please clarify why the dependency is no longer needed at runtime.
</review_comment_end -->

🧹 Nitpick comments (1)
src/utils/clean-modules.ts (1)

37-50: Verify that deleting all .d.ts files won't break runtime or tooling.

Line 40 removes all TypeScript declaration files (*.d.ts). While this is generally safe for production containers that only run JavaScript, it could cause issues if:

  • Any runtime tooling depends on type definitions (e.g., dynamic type checking, code generation)
  • The container image is used for development or debugging with TypeScript tooling

If this container is strictly for runtime execution (no tsc, no type checking), this is fine. Otherwise, consider preserving declarations for libraries that only ship .d.ts + .js.

Note: Line 40 overlaps with lines 104-105 in removeScriptsAndDeclarations, which also delete .d.cts and .d.mts files. The logic is consistent, but the duplication is worth noting.

</review_comment_end -->

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68decdf and 77cd303.

📒 Files selected for processing (1)
  • src/utils/clean-modules.ts (1 hunks)
🔇 Additional comments (3)
src/utils/clean-modules.ts (3)

180-207: LGTM: Clean orchestration and error handling.

The cleanModules function properly coordinates all cleanup tasks in parallel, measures size reduction, and reports results. The top-level error handler ensures the script exits with a non-zero code on failure, which is appropriate for a build-time utility.

The parallel execution via Promise.all is efficient, though it may make debugging individual task failures slightly harder. This is an acceptable trade-off for performance.

</review_comment_end -->


171-172: Verification complete: Deletions are safe.

Neither @sentry nor @opentelemetry appear in package.json dependencies, and neither package exists in the codebase. Lines 171-172 are safe—these deletion calls are effectively no-ops, and the silent error handling in deletePath() will gracefully handle the missing directories.


12-35: All flagged concerns are either intentional design choices or speculative risks that do not apply to the current code.

The shell injection pattern in deleteFiles, deleteDirectories, and deletePath is a code hygiene consideration, but all inputs are hardcoded string literals controlled by developers—no external or user input flows into these functions. The error suppression via 2>/dev/null || true and empty catch blocks is intentional for a best-effort cleanup utility where missing paths should not halt execution. The .d.ts deletion is part of aggressive cleanup appropriate for a runtime container with no build-time compilation. Additionally, @sentry and @opentelemetry are not listed as project dependencies, confirming the deletions are safe. Line 173's axe-core/axe.js deletion remains in the code as intended and does not contradict prior reviews.

@ItzNotABug ItzNotABug mentioned this pull request Nov 17, 2025
@github-actions
Copy link

Docker Image Stats

Metric Value
Image Size 494MB
Memory Usage 157.3MiB
Cold Start Time 1.06s
Screenshot Time 3.74s

Screenshot benchmark: Average of 3 runs on https://appwrite.io

@eldadfux eldadfux merged commit ba6df8e into main Nov 20, 2025
3 of 4 checks passed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
.github/workflows/test.yml (2)

34-44: Startup timing measurement is fine; consider avoiding bc dependency

The startup timing logic is correct, but it depends on bc being present on the runner to format seconds with decimals. If you want to reduce external dependencies, you could keep everything in milliseconds and avoid bc, e.g.:

-              STARTUP_TIME_MS=$((END_TIME - START_TIME))
-              STARTUP_TIME=$(echo "scale=2; $STARTUP_TIME_MS / 1000" | bc)
-              echo "startup_time=$STARTUP_TIME" >> $GITHUB_OUTPUT
-              echo "Container is ready in ${STARTUP_TIME}s!"
+              STARTUP_TIME_MS=$((END_TIME - START_TIME))
+              echo "startup_time_ms=$STARTUP_TIME_MS" >> $GITHUB_OUTPUT
+              echo "Container is ready in ${STARTUP_TIME_MS}ms!"

(and adjust the later message to use startup_time_ms accordingly).


54-85: Docker stats/benchmark step works; minor hardening possible

The stats and 3‑run screenshot benchmark are a nice addition and correctly isolated with continue-on-error: true. You could make the step slightly more robust by handling a missing container ID before calling docker stats:

-          CONTAINER_ID=$(docker compose ps -q appwrite-browser)
-          MEMORY_USAGE=$(docker stats $CONTAINER_ID --no-stream --format "{{.MemUsage}}" | cut -d'/' -f1 | xargs)
+          CONTAINER_ID=$(docker compose ps -q appwrite-browser)
+          if [ -z "$CONTAINER_ID" ]; then
+            echo "No appwrite-browser container found; skipping Docker stats." >&2
+            exit 0
+          fi
+          MEMORY_USAGE=$(docker stats "$CONTAINER_ID" --no-stream --format "{{.MemUsage}}" | cut -d'/' -f1 | xargs)

Optionally, you might also add -f to the benchmark curl to ensure only successful responses are timed, letting this step fail (but not break the job) when the endpoint is actually broken.

src/server.ts (1)

8-19: Router-based server wiring is clean and straightforward

Instantiating a single Router, registering the four routes, and delegating fetch to router.handle(request) is a nice simplification over manual routing and keeps the server entrypoint minimal. If /v1/test is intended only for diagnostics, consider whether it should remain exposed in production or be gated behind config, but the wiring itself looks solid.

src/utils/router.ts (1)

1-40: Router implementation is solid; regex use can be simplified

The Router is small and clear, and centralizing add/handle like this reads well. Given you only register literal paths like "/v1/screenshots", the new RegExp(\^${path}$`)` indirection isn’t buying much and is what triggered the static-analysis warning.

If you don’t need regex semantics for routes, you can simplify and avoid the warning by matching on strings directly:

-type Route = {
-	method: HTTPMethod;
-	pattern: RegExp;
-	handler: RouteHandler;
-};
+type Route = {
+	method: HTTPMethod;
+	path: string;
+	handler: RouteHandler;
+};

 export class Router {
 	private routes: Route[] = [];

 	add(method: HTTPMethod, path: string, handler: RouteHandler): void {
 		this.routes.push({
 			method,
-			pattern: new RegExp(`^${path}$`),
+			path,
 			handler,
 		});
 	}

 	async handle(req: Request): Promise<Response> {
 		const url = new URL(req.url);

 		for (const route of this.routes) {
-			if (route.method === req.method && route.pattern.test(url.pathname)) {
+			if (route.method === req.method && route.path === url.pathname) {
 				return await route.handler(req);
 			}
 		}

This keeps behavior the same for the current call sites and removes any future ambiguity about whether path should be a regex or a literal. As per static analysis hints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77cd303 and b913791.

📒 Files selected for processing (5)
  • .github/workflows/test.yml (3 hunks)
  • src/schemas/screenshot.schema.ts (1 hunks)
  • src/server.ts (1 hunks)
  • src/utils/router.ts (1 hunks)
  • tests/unit/screenshot.schema.test.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/server.ts (5)
src/utils/router.ts (1)
  • Router (18-40)
src/routes/screenshots.ts (1)
  • handleScreenshotsRequest (9-98)
src/routes/reports.ts (1)
  • handleReportsRequest (6-92)
src/routes/health.ts (1)
  • handleHealthRequest (3-12)
src/routes/test.ts (1)
  • handleTestRequest (4-23)
🪛 ast-grep (0.40.0)
src/utils/router.ts

[warning] 23-23: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${path}$)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (4)
src/schemas/screenshot.schema.ts (1)

7-7: Default sleep of 0 is consistent with handler logic

Changing the default to 0 aligns with the if (body.sleep > 0) guard in the screenshots route and removes unnecessary delay by default while keeping the same bounds.

tests/unit/screenshot.schema.test.ts (1)

26-26: Test expectation now matches schema default

Updating the default sleep expectation to 0 keeps this test in sync with the schema and the screenshots handler behavior.

.github/workflows/test.yml (2)

12-14: Permissions are correctly scoped for PR commenting

contents: read plus pull-requests: write is the minimal set needed for checkout and sticky-pull-request-comment, and keeps the job permissions tight.


86-104: PR stats comment wiring looks correct

The sticky PR comment step is correctly gated on pull requests and steps.docker-stats.outcome == 'success', and it references the outputs (image_size, memory_usage, startup_time, screenshot_time) consistently with how they’re set earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants