Skip to content

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Oct 2, 2025

📄 103% (1.03x) speedup for _should_be_included in sentry_sdk/tracing_utils.py

⏱️ Runtime : 112 milliseconds 55.0 milliseconds (best of 80 runs)

📝 Explanation and details

The optimized code achieves a 103% speedup through two key optimizations:

1. Precompiled Regex Pattern
The regex pattern r"[\\/](?:dist|site)-packages[\\/]" is compiled once at module load time as _DIST_SITE_PACKAGES_RE, eliminating the overhead of recompiling the regex on every call to _is_external_source(). This provides consistent ~30-40% improvements for external source detection cases.

2. Optimized _module_in_list() Function

  • Set-based exact matching: Converts the list to a set for O(1) exact lookups instead of O(n) linear search
  • Tuple-based prefix matching: Creates a tuple of prefixes (e.g., "myapp.") and uses str.startswith(tuple), which is C-optimized

Performance Impact by Test Case Type:

  • Large list scenarios: Dramatic improvements (200%+ speedup) when matching items in lists with 1000+ entries due to set lookup efficiency
  • Basic operations: Moderate improvements (10-45% faster) for typical use cases with small lists
  • Prefix matching: Some slowdown (5-15%) for submodule cases due to tuple creation overhead, but this is offset by gains in other scenarios
  • External source detection: Consistent 30-40% improvements from precompiled regex

The optimization trades a small upfront cost (set/tuple creation) for significant gains when dealing with larger lists or repeated calls, making it particularly effective for real-world tracing scenarios where these functions are called frequently.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 8 Passed
🌀 Generated Regression Tests 2086 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_tracing_utils.py::test_should_be_included 28.6μs 27.4μs 4.41%✅
🌀 Generated Regression Tests and Runtime
import re

# imports
import pytest  # used for our unit tests
from sentry_sdk.tracing_utils import _should_be_included

# unit tests

# --- BASIC TEST CASES ---

def test_basic_in_app_include_exact_match():
    # namespace matches in_app_include exactly
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=["myapp.module"],
        in_app_exclude=None,
        abs_path="/project/myapp/module.py",
        project_root="/project",
    ) # 4.66μs -> 4.18μs (11.3% faster)

def test_basic_in_app_include_prefix_match():
    # namespace is a submodule of something in in_app_include
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module.sub",
        in_app_include=["myapp.module"],
        in_app_exclude=None,
        abs_path="/project/myapp/module/sub.py",
        project_root="/project",
    ) # 5.14μs -> 5.69μs (9.64% slower)

def test_basic_in_app_exclude_exact_match():
    # namespace matches in_app_exclude exactly
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=["myapp.module"],
        abs_path="/project/myapp/module.py",
        project_root="/project",
    ) # 5.05μs -> 4.21μs (19.8% faster)

def test_basic_in_app_exclude_prefix_match():
    # namespace is a submodule of something in in_app_exclude
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module.sub",
        in_app_include=None,
        in_app_exclude=["myapp.module"],
        abs_path="/project/myapp/module/sub.py",
        project_root="/project",
    ) # 5.54μs -> 6.04μs (8.27% slower)

def test_basic_project_root_inclusion():
    # abs_path is inside project_root, not excluded, not external
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/myapp/module.py",
        project_root="/project",
    ) # 4.61μs -> 3.42μs (35.0% faster)

def test_basic_external_source_exclusion():
    # abs_path is in site-packages, should be excluded
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="external.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/usr/lib/site-packages/external/module.py",
        project_root="/project",
    ) # 4.68μs -> 3.29μs (42.1% faster)

def test_basic_sentry_sdk_frame_exclusion():
    # is_sentry_sdk_frame disables inclusion
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=True,
        namespace="myapp.module",
        in_app_include=["myapp.module"],
        in_app_exclude=None,
        abs_path="/project/myapp/module.py",
        project_root="/project",
    ) # 4.34μs -> 3.69μs (17.5% faster)

# --- EDGE TEST CASES ---

def test_edge_none_namespace():
    # None namespace, should not match include/exclude, but path is in project root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace=None,
        in_app_include=["myapp.module"],
        in_app_exclude=["myapp.module"],
        abs_path="/project/myapp/module.py",
        project_root="/project",
    ) # 4.51μs -> 3.23μs (39.4% faster)

def test_edge_none_in_app_lists():
    # None for both in_app_include and in_app_exclude
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/myapp/module.py",
        project_root="/project",
    ) # 4.59μs -> 3.16μs (45.3% faster)

def test_edge_empty_in_app_lists():
    # Empty lists for both in_app_include and in_app_exclude
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=[],
        in_app_exclude=[],
        abs_path="/project/myapp/module.py",
        project_root="/project",
    ) # 4.54μs -> 3.25μs (39.8% faster)

def test_edge_none_abs_path():
    # None abs_path, should not match external or project root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path=None,
        project_root="/project",
    ) # 1.64μs -> 1.68μs (2.38% slower)

def test_edge_none_project_root():
    # None project_root, should not match project root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/myapp/module.py",
        project_root=None,
    ) # 4.27μs -> 3.17μs (34.8% faster)

def test_edge_external_source_with_in_app_include():
    # External source but explicitly included by in_app_include
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="external.module",
        in_app_include=["external.module"],
        in_app_exclude=None,
        abs_path="/usr/lib/site-packages/external/module.py",
        project_root="/project",
    ) # 4.26μs -> 3.87μs (10.1% faster)

def test_edge_external_source_with_in_app_exclude():
    # External source and excluded by in_app_exclude
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="external.module",
        in_app_include=None,
        in_app_exclude=["external.module"],
        abs_path="/usr/lib/site-packages/external/module.py",
        project_root="/project",
    ) # 4.57μs -> 3.14μs (45.3% faster)

def test_edge_conflicting_include_exclude():
    # Namespace in both include and exclude, include wins
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=["myapp.module"],
        in_app_exclude=["myapp.module"],
        abs_path="/project/myapp/module.py",
        project_root="/project",
    ) # 4.32μs -> 3.82μs (12.9% faster)

def test_edge_abs_path_not_in_project_root():
    # abs_path not in project root, not included
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/otherdir/myapp/module.py",
        project_root="/project",
    ) # 4.57μs -> 3.26μs (40.2% faster)

def test_edge_namespace_submodule_of_exclude():
    # namespace is a submodule of one in exclude list
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module.sub",
        in_app_include=None,
        in_app_exclude=["myapp.module"],
        abs_path="/project/myapp/module/sub.py",
        project_root="/project",
    ) # 5.25μs -> 6.24μs (15.8% slower)

def test_edge_namespace_submodule_of_include():
    # namespace is a submodule of one in include list
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module.sub",
        in_app_include=["myapp.module"],
        in_app_exclude=None,
        abs_path="/project/myapp/module/sub.py",
        project_root="/project",
    ) # 4.76μs -> 5.39μs (11.7% slower)

def test_edge_namespace_not_in_any_list():
    # namespace not in any list, abs_path in project_root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="other.module",
        in_app_include=["myapp.module"],
        in_app_exclude=["external.module"],
        abs_path="/project/other/module.py",
        project_root="/project",
    ) # 5.50μs -> 6.57μs (16.3% slower)

def test_edge_namespace_not_in_any_list_abs_path_external():
    # namespace not in any list, abs_path is external
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="other.module",
        in_app_include=["myapp.module"],
        in_app_exclude=["external.module"],
        abs_path="/usr/lib/site-packages/other/module.py",
        project_root="/project",
    ) # 5.14μs -> 5.48μs (6.13% slower)

def test_edge_namespace_none_and_abs_path_none():
    # Both namespace and abs_path are None
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace=None,
        in_app_include=None,
        in_app_exclude=None,
        abs_path=None,
        project_root="/project",
    ) # 1.64μs -> 1.64μs (0.305% slower)

def test_edge_namespace_none_and_abs_path_in_project_root():
    # namespace None, abs_path in project root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace=None,
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/other/module.py",
        project_root="/project",
    ) # 4.67μs -> 3.46μs (34.9% faster)

def test_edge_namespace_none_and_abs_path_external():
    # namespace None, abs_path external
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace=None,
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/usr/lib/site-packages/other/module.py",
        project_root="/project",
    ) # 4.62μs -> 3.44μs (34.6% faster)

# --- LARGE SCALE TEST CASES ---

def test_large_in_app_include_list():
    # Large in_app_include list, one match
    include_list = [f"pkg{i}" for i in range(999)] + ["target.module"]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="target.module",
        in_app_include=include_list,
        in_app_exclude=None,
        abs_path="/project/target/module.py",
        project_root="/project",
    ) # 134μs -> 43.1μs (212% faster)

def test_large_in_app_exclude_list():
    # Large in_app_exclude list, one match
    exclude_list = [f"pkg{i}" for i in range(999)] + ["target.module"]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="target.module",
        in_app_include=None,
        in_app_exclude=exclude_list,
        abs_path="/project/target/module.py",
        project_root="/project",
    ) # 134μs -> 42.6μs (216% faster)

def test_large_no_match_in_app_include_list():
    # Large in_app_include list, no match, but abs_path in project_root
    include_list = [f"pkg{i}" for i in range(1000)]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="not_in_list",
        in_app_include=include_list,
        in_app_exclude=None,
        abs_path="/project/not_in_list.py",
        project_root="/project",
    ) # 136μs -> 145μs (6.05% slower)

def test_large_no_match_in_app_exclude_list():
    # Large in_app_exclude list, no match, abs_path in project_root
    exclude_list = [f"pkg{i}" for i in range(1000)]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="not_in_list",
        in_app_include=None,
        in_app_exclude=exclude_list,
        abs_path="/project/not_in_list.py",
        project_root="/project",
    ) # 133μs -> 106μs (25.3% faster)

def test_large_external_source_with_large_lists():
    # Large lists, abs_path is external, should be excluded
    include_list = [f"pkg{i}" for i in range(1000)]
    exclude_list = [f"pkg{i}" for i in range(1000)]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="not_in_list",
        in_app_include=include_list,
        in_app_exclude=exclude_list,
        abs_path="/usr/lib/site-packages/not_in_list.py",
        project_root="/project",
    ) # 143μs -> 151μs (5.57% slower)

def test_large_namespace_submodule_in_large_include():
    # namespace is a submodule of one in large include list
    include_list = [f"pkg{i}" for i in range(999)] + ["myapp"]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.submodule",
        in_app_include=include_list,
        in_app_exclude=None,
        abs_path="/project/myapp/submodule.py",
        project_root="/project",
    ) # 135μs -> 124μs (9.10% faster)

def test_large_namespace_submodule_in_large_exclude():
    # namespace is a submodule of one in large exclude list
    exclude_list = [f"pkg{i}" for i in range(999)] + ["myapp"]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.submodule",
        in_app_include=None,
        in_app_exclude=exclude_list,
        abs_path="/project/myapp/submodule.py",
        project_root="/project",
    ) # 143μs -> 121μs (18.5% faster)

def test_large_all_none():
    # Large scale: all None/empty, should not crash, should return False
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace=None,
        in_app_include=None,
        in_app_exclude=None,
        abs_path=None,
        project_root=None,
    ) # 1.70μs -> 1.82μs (6.66% slower)

def test_large_project_root_long_path():
    # Large project root and abs_path strings
    root = "/project/" + "a" * 900
    abs_path = root + "/module.py"
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path=abs_path,
        project_root=root,
    ) # 8.84μs -> 7.31μs (20.9% faster)

def test_large_external_source_long_path():
    # Large abs_path string, external source detection
    abs_path = "/usr/lib/site-packages/" + "a" * 900 + "/module.py"
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="external.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path=abs_path,
        project_root="/project",
    ) # 4.73μs -> 3.54μs (33.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
from typing import List, Optional

# imports
import pytest  # used for our unit tests
from sentry_sdk.tracing_utils import _should_be_included

# unit tests

# --- Basic Test Cases ---

def test_basic_include_by_namespace():
    # Should be included because namespace is in in_app_include, sentry frame is False
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=["myapp"],
        in_app_exclude=["other"],
        abs_path="/project/myapp/module.py",
        project_root="/project"
    ) # 5.32μs -> 6.87μs (22.6% slower)

def test_basic_exclude_by_namespace():
    # Should NOT be included because namespace is in in_app_exclude
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="other.module",
        in_app_include=["myapp"],
        in_app_exclude=["other"],
        abs_path="/project/other/module.py",
        project_root="/project"
    ) # 5.66μs -> 6.81μs (17.0% slower)

def test_basic_exclude_by_external_source():
    # Should NOT be included because abs_path is in site-packages
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="random.module",
        in_app_include=["myapp"],
        in_app_exclude=["other"],
        abs_path="/usr/lib/site-packages/random/module.py",
        project_root="/project"
    ) # 5.15μs -> 5.43μs (5.12% slower)

def test_basic_include_by_project_root():
    # Should be included because abs_path is in project_root and not excluded
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/myapp/module.py",
        project_root="/project"
    ) # 4.46μs -> 3.32μs (34.1% faster)

def test_basic_exclude_sentry_sdk_frame():
    # Should NOT be included because is_sentry_sdk_frame is True
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=True,
        namespace="myapp.module",
        in_app_include=["myapp"],
        in_app_exclude=["other"],
        abs_path="/project/myapp/module.py",
        project_root="/project"
    ) # 5.13μs -> 5.94μs (13.7% slower)

# --- Edge Test Cases ---

def test_edge_namespace_none():
    # Should NOT be included because namespace is None and not in project root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace=None,
        in_app_include=["myapp"],
        in_app_exclude=["other"],
        abs_path="/project/other/module.py",
        project_root="/project"
    ) # 4.28μs -> 3.20μs (33.9% faster)

def test_edge_in_app_include_none():
    # Should be included because namespace is in project root, in_app_include is None
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/myapp/module.py",
        project_root="/project"
    ) # 4.45μs -> 3.27μs (35.9% faster)

def test_edge_in_app_exclude_none():
    # Should be included because in_app_exclude is None, so not excluded
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/myapp/module.py",
        project_root="/project"
    ) # 4.43μs -> 3.12μs (42.0% faster)

def test_edge_abs_path_none():
    # Should NOT be included because abs_path is None and not covered by in_app_include
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path=None,
        project_root="/project"
    ) # 1.55μs -> 1.67μs (7.11% slower)

def test_edge_project_root_none():
    # Should NOT be included because project_root is None
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/myapp/module.py",
        project_root=None
    ) # 4.20μs -> 3.15μs (33.0% faster)

def test_edge_namespace_in_app_include_and_exclude():
    # Should be included because in_app_include takes precedence over in_app_exclude
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=["myapp"],
        in_app_exclude=["myapp"],
        abs_path="/project/myapp/module.py",
        project_root="/project"
    ) # 5.00μs -> 6.61μs (24.3% slower)

def test_edge_namespace_submodule_in_app_include():
    # Should be included because namespace is a submodule of in_app_include
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.sub.module",
        in_app_include=["myapp.sub"],
        in_app_exclude=["other"],
        abs_path="/project/myapp/sub/module.py",
        project_root="/project"
    ) # 5.08μs -> 6.35μs (20.1% slower)

def test_edge_namespace_submodule_in_app_exclude():
    # Should NOT be included because namespace is a submodule of in_app_exclude
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="other.sub.module",
        in_app_include=["myapp"],
        in_app_exclude=["other.sub"],
        abs_path="/project/other/sub/module.py",
        project_root="/project"
    ) # 5.69μs -> 6.59μs (13.6% slower)

def test_edge_abs_path_external_and_in_project_root():
    # Should NOT be included because abs_path is both in project_root and external source
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/site-packages/myapp/module.py",
        project_root="/project"
    ) # 4.61μs -> 3.40μs (35.6% faster)

def test_edge_abs_path_dist_packages():
    # Should NOT be included because abs_path is in dist-packages
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/usr/lib/dist-packages/myapp/module.py",
        project_root="/project"
    ) # 4.36μs -> 3.27μs (33.4% faster)

def test_edge_empty_in_app_include_and_exclude():
    # Should be included because abs_path is in project_root, include/exclude are empty lists
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="myapp.module",
        in_app_include=[],
        in_app_exclude=[],
        abs_path="/project/myapp/module.py",
        project_root="/project"
    ) # 4.38μs -> 3.26μs (34.5% faster)

def test_edge_include_and_exclude_none_and_namespace_none():
    # Should NOT be included because namespace is None, abs_path is not in project_root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace=None,
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/elsewhere/module.py",
        project_root="/project"
    ) # 4.29μs -> 3.06μs (40.2% faster)

def test_edge_include_and_exclude_none_and_namespace_none_but_in_project_root():
    # Should be included because abs_path is in project_root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace=None,
        in_app_include=None,
        in_app_exclude=None,
        abs_path="/project/module.py",
        project_root="/project"
    ) # 4.41μs -> 3.14μs (40.3% faster)

def test_edge_namespace_empty_string():
    # Should NOT be included because namespace is empty string and not in project root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="",
        in_app_include=["myapp"],
        in_app_exclude=["other"],
        abs_path="/elsewhere/module.py",
        project_root="/project"
    ) # 5.20μs -> 7.02μs (26.0% slower)

def test_edge_namespace_empty_string_but_in_project_root():
    # Should be included because abs_path is in project_root
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="",
        in_app_include=["myapp"],
        in_app_exclude=["other"],
        abs_path="/project/module.py",
        project_root="/project"
    ) # 5.04μs -> 6.30μs (20.1% slower)

# --- Large Scale Test Cases ---

def test_large_many_in_app_include():
    # Should be included because namespace is in a large in_app_include list
    in_app_include = [f"mod{i}" for i in range(1000)]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="mod999",
        in_app_include=in_app_include,
        in_app_exclude=["other"],
        abs_path="/project/mod999/module.py",
        project_root="/project"
    ) # 141μs -> 45.4μs (212% faster)

def test_large_many_in_app_exclude():
    # Should NOT be included because namespace is in a large in_app_exclude list
    in_app_exclude = [f"mod{i}" for i in range(1000)]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="mod888",
        in_app_include=["myapp"],
        in_app_exclude=in_app_exclude,
        abs_path="/project/mod888/module.py",
        project_root="/project"
    ) # 129μs -> 45.7μs (184% faster)

def test_large_many_in_app_include_and_exclude():
    # Should be included because in_app_include takes precedence over in_app_exclude
    in_app_include = [f"mod{i}" for i in range(500, 1000)]
    in_app_exclude = [f"mod{i}" for i in range(1000)]
    codeflash_output = _should_be_included(
        is_sentry_sdk_frame=False,
        namespace="mod999",
        in_app_include=in_app_include,
        in_app_exclude=in_app_exclude,
        abs_path="/project/mod999/module.py",
        project_root="/project"
    ) # 217μs -> 66.7μs (226% faster)

def test_large_all_external_sources():
    # Should NOT be included because all abs_path values are in site-packages
    for i in range(10):
        codeflash_output = _should_be_included(
            is_sentry_sdk_frame=False,
            namespace=f"mod{i}",
            in_app_include=[f"mod{i}" for i in range(10)],
            in_app_exclude=None,
            abs_path=f"/usr/lib/site-packages/mod{i}/module.py",
            project_root="/project"
        ) # 27.9μs -> 17.7μs (57.3% faster)

def test_large_all_in_project_root():
    # Should be included for all modules in project_root
    for i in range(10):
        codeflash_output = _should_be_included(
            is_sentry_sdk_frame=False,
            namespace=f"mod{i}",
            in_app_include=None,
            in_app_exclude=None,
            abs_path=f"/project/mod{i}/module.py",
            project_root="/project"
        ) # 19.8μs -> 15.2μs (29.9% faster)

def test_large_none_namespace_and_project_root():
    # Should NOT be included for large number of cases with None namespace and project_root
    for i in range(10):
        codeflash_output = _should_be_included(
            is_sentry_sdk_frame=False,
            namespace=None,
            in_app_include=None,
            in_app_exclude=None,
            abs_path=f"/elsewhere/mod{i}/module.py",
            project_root=None
        ) # 19.8μs -> 13.5μs (46.4% faster)

def test_large_mixed_inclusion():
    # Mix of included and excluded modules
    in_app_include = [f"mod{i}" for i in range(500)]
    in_app_exclude = [f"mod{i}" for i in range(500, 1000)]
    for i in range(1000):
        expected = i < 500
        codeflash_output = _should_be_included(
            is_sentry_sdk_frame=False,
            namespace=f"mod{i}",
            in_app_include=in_app_include,
            in_app_exclude=in_app_exclude,
            abs_path=f"/project/mod{i}/module.py",
            project_root="/project"
        ) # 108ms -> 52.7ms (106% faster)

def test_large_mixed_external_and_project_root():
    # Half modules are in site-packages, half in project_root
    for i in range(500):
        codeflash_output = _should_be_included(
            is_sentry_sdk_frame=False,
            namespace=f"mod{i}",
            in_app_include=None,
            in_app_exclude=None,
            abs_path=f"/usr/lib/site-packages/mod{i}/module.py",
            project_root="/project"
        ) # 788μs -> 565μs (39.3% faster)
    for i in range(500, 1000):
        codeflash_output = _should_be_included(
            is_sentry_sdk_frame=False,
            namespace=f"mod{i}",
            in_app_include=None,
            in_app_exclude=None,
            abs_path=f"/project/mod{i}/module.py",
            project_root="/project"
        ) # 845μs -> 631μs (33.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_should_be_included-mg9lx66s and push.

Codeflash

The optimized code achieves a 103% speedup through two key optimizations:

**1. Precompiled Regex Pattern**
The regex pattern `r"[\\/](?:dist|site)-packages[\\/]"` is compiled once at module load time as `_DIST_SITE_PACKAGES_RE`, eliminating the overhead of recompiling the regex on every call to `_is_external_source()`. This provides consistent ~30-40% improvements for external source detection cases.

**2. Optimized `_module_in_list()` Function**
- **Set-based exact matching**: Converts the list to a set for O(1) exact lookups instead of O(n) linear search
- **Tuple-based prefix matching**: Creates a tuple of prefixes (e.g., `"myapp."`) and uses `str.startswith(tuple)`, which is C-optimized

**Performance Impact by Test Case Type:**
- **Large list scenarios**: Dramatic improvements (200%+ speedup) when matching items in lists with 1000+ entries due to set lookup efficiency
- **Basic operations**: Moderate improvements (10-45% faster) for typical use cases with small lists
- **Prefix matching**: Some slowdown (5-15%) for submodule cases due to tuple creation overhead, but this is offset by gains in other scenarios
- **External source detection**: Consistent 30-40% improvements from precompiled regex

The optimization trades a small upfront cost (set/tuple creation) for significant gains when dealing with larger lists or repeated calls, making it particularly effective for real-world tracing scenarios where these functions are called frequently.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 2, 2025 16:05
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant