Skip to content

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Oct 2, 2025

📄 245% (2.45x) speedup for should_propagate_trace in sentry_sdk/tracing_utils.py

⏱️ Runtime : 202 milliseconds 58.5 milliseconds (best of 28 runs)

📝 Explanation and details

The optimization introduces regex compilation caching to match_regex_list, which provides dramatic performance improvements when the same regex patterns are matched repeatedly.

Key Changes:

  • Regex Compilation Caching: Instead of calling re.search() with raw strings (which internally compiles patterns each time), the code now pre-compiles all patterns using re.compile() and caches them based on the regex_list identity and substring_matching flag.
  • Pattern Preparation: The logic for appending `#### 📝 Explanation and details

anchors is moved to the caching phase, avoiding repeated string operations during matching.

Why This Speeds Up:

  • Eliminates Redundant Compilation: The original code recompiled the same regex patterns on every call. With caching, patterns are compiled once and reused across multiple invocations.
  • Reduces String Operations: Pattern modification (adding `#### 📝 Explanation and details

) happens only during cache creation, not on every match attempt.

Performance Benefits by Test Case:

  • Massive gains for repeated pattern usage: Tests with large regex lists show 60,000-770,000% speedups (e.g., test_large_many_trace_targets_one_match: 27.3ms → 3.77μs)
  • Excellent for complex patterns: Regex-heavy tests like test_edge_targets_with_special_regex show 2,295% speedup
  • Minimal overhead for simple cases: Basic single-regex tests show slight slowdown (7-8%) due to caching overhead, but this is negligible compared to the gains in realistic usage scenarios

The optimization is particularly effective for Sentry's trace propagation use case, where the same trace_propagation_targets list is likely checked against many different URLs throughout an application's lifetime.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 38 Passed
🌀 Generated Regression Tests 56 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
tracing/test_misc.py::test_should_propagate_trace 409μs 68.6μs 496%✅
tracing/test_misc.py::test_should_propagate_trace_to_sentry 24.4μs 21.9μs 11.7%✅
🌀 Generated Regression Tests and Runtime
import re

# imports
import pytest  # used for our unit tests
from sentry_sdk.tracing_utils import should_propagate_trace

# --- Minimal stand-in implementations for testing ---
# (since we don't have sentry_sdk, we define minimal mocks)

class DummyParsedDSN:
    def __init__(self, netloc):
        self.netloc = netloc

class DummyTransport:
    def __init__(self, parsed_dsn=None):
        self.parsed_dsn = parsed_dsn

class DummyClient:
    def __init__(self, options=None, transport=None):
        self.options = options or {}
        self.transport = transport
from sentry_sdk.tracing_utils import should_propagate_trace

# --- Unit Tests ---

# ========== BASIC TEST CASES ==========

def test_basic_match_single_regex():
    """Test with a single regex that matches the URL."""
    client = DummyClient(options={"trace_propagation_targets": [r"example\.com"]})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 60.0μs -> 65.1μs (7.87% slower)

def test_basic_no_match_single_regex():
    """Test with a single regex that does not match the URL."""
    client = DummyClient(options={"trace_propagation_targets": [r"example\.org"]})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 52.1μs -> 56.6μs (7.86% slower)

def test_basic_multiple_regexes_one_matches():
    """Test with multiple regexes where one matches the URL."""
    client = DummyClient(options={"trace_propagation_targets": [r"foo\.com", r"example\.com"]})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 47.8μs -> 49.7μs (3.99% slower)

def test_basic_multiple_regexes_none_match():
    """Test with multiple regexes where none match the URL."""
    client = DummyClient(options={"trace_propagation_targets": [r"foo\.com", r"bar\.org"]})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 47.6μs -> 46.5μs (2.45% faster)

def test_basic_empty_trace_targets():
    """Test with empty trace_propagation_targets."""
    client = DummyClient(options={"trace_propagation_targets": []})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 1.53μs -> 2.56μs (40.3% slower)

def test_basic_none_trace_targets():
    """Test with None as trace_propagation_targets."""
    client = DummyClient(options={"trace_propagation_targets": None})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 1.34μs -> 1.37μs (2.19% slower)

# ========== EDGE TEST CASES ==========

def test_edge_url_is_sentry_url():
    """Test where the URL matches the Sentry DSN netloc, should always return False."""
    dsn_netloc = "sentry.io"
    client = DummyClient(
        options={"trace_propagation_targets": [r"sentry\.io", r"example\.com"]},
        transport=DummyTransport(parsed_dsn=DummyParsedDSN(dsn_netloc))
    )
    url = "https://sentry.io/api"
    codeflash_output = should_propagate_trace(client, url) # 1.35μs -> 1.45μs (7.24% slower)

def test_edge_url_contains_sentry_netloc_but_not_exact():
    """Test where the URL contains the Sentry netloc as a substring but not as a host."""
    dsn_netloc = "sentry.io"
    client = DummyClient(
        options={"trace_propagation_targets": [r"sentry\.io", r"example\.com"]},
        transport=DummyTransport(parsed_dsn=DummyParsedDSN(dsn_netloc))
    )
    url = "https://api.sentry.io.com/api"
    # Should not be considered a sentry url, so matching proceeds
    codeflash_output = should_propagate_trace(client, url) # 1.40μs -> 1.31μs (6.25% faster)

def test_edge_url_is_sentry_url_with_port():
    """Test where the Sentry netloc includes a port."""
    dsn_netloc = "sentry.io:443"
    client = DummyClient(
        options={"trace_propagation_targets": [r"sentry\.io:443", r"example\.com"]},
        transport=DummyTransport(parsed_dsn=DummyParsedDSN(dsn_netloc))
    )
    url = "https://sentry.io:443/api"
    codeflash_output = should_propagate_trace(client, url) # 1.30μs -> 1.35μs (3.86% slower)

def test_edge_client_none_transport():
    """Test with client.transport as None."""
    client = DummyClient(
        options={"trace_propagation_targets": [r"example\.com"]},
        transport=None
    )
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 3.73μs -> 5.15μs (27.5% slower)

def test_edge_client_none_parsed_dsn():
    """Test with client.transport.parsed_dsn as None."""
    client = DummyClient(
        options={"trace_propagation_targets": [r"example\.com"]},
        transport=DummyTransport(parsed_dsn=None)
    )
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 3.71μs -> 5.07μs (26.8% slower)

def test_edge_client_none():
    """Test with client as None should raise AttributeError."""
    url = "https://example.com/api"
    with pytest.raises(AttributeError):
        should_propagate_trace(None, url) # 1.70μs -> 1.70μs (0.118% slower)

def test_edge_url_empty_string():
    """Test with empty url string."""
    client = DummyClient(options={"trace_propagation_targets": [r"example\.com"]})
    url = ""
    codeflash_output = should_propagate_trace(client, url) # 3.58μs -> 3.83μs (6.76% slower)

def test_edge_regex_empty_string():
    """Test with an empty regex string in trace_propagation_targets."""
    client = DummyClient(options={"trace_propagation_targets": [""]})
    url = "https://example.com/api"
    # Empty regex matches every string, so should return True
    codeflash_output = should_propagate_trace(client, url) # 4.31μs -> 5.76μs (25.1% slower)

def test_edge_regex_special_characters():
    """Test with regexes containing special regex characters."""
    client = DummyClient(options={"trace_propagation_targets": [r"example\.com/\w+"]})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 76.2μs -> 79.5μs (4.13% slower)

def test_edge_regex_partial_match():
    """Test where regex would match only a substring of the URL."""
    client = DummyClient(options={"trace_propagation_targets": [r"api"]})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 38.2μs -> 39.8μs (4.06% slower)

def test_edge_regex_no_match_due_to_case():
    """Test case sensitivity of regex matching."""
    client = DummyClient(options={"trace_propagation_targets": [r"EXAMPLE\.COM"]})
    url = "https://example.com/api"
    # Regex is case sensitive, so should not match
    codeflash_output = should_propagate_trace(client, url) # 51.2μs -> 3.53μs (1352% faster)

def test_edge_regex_with_end_anchor():
    """Test regex with $ anchor, should still match due to substring matching."""
    client = DummyClient(options={"trace_propagation_targets": [r"api$"]})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 41.2μs -> 42.7μs (3.49% slower)

# ========== LARGE SCALE TEST CASES ==========

def test_large_many_trace_targets_one_match():
    """Test with a large number of trace_propagation_targets, only one matches."""
    targets = [f"site{i}.com" for i in range(999)] + [r"example\.com"]
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 27.3ms -> 3.77μs (722423% faster)

def test_large_many_trace_targets_none_match():
    """Test with a large number of trace_propagation_targets, none match."""
    targets = [f"site{i}.com" for i in range(1000)]
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 27.2ms -> 27.6ms (1.14% slower)

def test_large_url_length():
    """Test with a very long URL string."""
    long_url = "https://example.com/" + "a" * 950
    client = DummyClient(options={"trace_propagation_targets": [r"example\.com"]})
    codeflash_output = should_propagate_trace(client, long_url) # 45.1μs -> 4.65μs (870% faster)

def test_large_all_targets_are_empty_strings():
    """Test with a large number of empty string regexes (should always match)."""
    targets = [""] * 1000
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 25.3μs -> 3.48μs (625% faster)

def test_large_all_targets_no_match():
    """Test with a large number of regexes that cannot possibly match."""
    targets = [r"foo" + str(i) for i in range(1000)]
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 20.6ms -> 3.96μs (520460% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re

# imports
import pytest  # used for our unit tests
from sentry_sdk.tracing_utils import should_propagate_trace

# --- Test helpers/mocks ---

class DummyParsedDSN:
    def __init__(self, netloc):
        self.netloc = netloc

class DummyTransport:
    def __init__(self, parsed_dsn=None):
        self.parsed_dsn = parsed_dsn

class DummyClient:
    def __init__(self, options=None, transport=None):
        self.options = options or {}
        self.transport = transport

# --- Unit tests ---

# 1. Basic Test Cases

def test_basic_match_exact_substring():
    # Should propagate if url contains a substring from the regex list
    client = DummyClient(options={"trace_propagation_targets": ["example.com"]})
    url = "https://example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 44.8μs -> 5.31μs (744% faster)

def test_basic_no_match():
    # Should not propagate if url does not match any regex
    client = DummyClient(options={"trace_propagation_targets": ["foo.com"]})
    url = "https://bar.com/api"
    codeflash_output = should_propagate_trace(client, url) # 40.0μs -> 43.3μs (7.78% slower)

def test_basic_multiple_targets_one_matches():
    # Should propagate if any of the regexes matches
    client = DummyClient(options={"trace_propagation_targets": ["foo.com", "bar.com"]})
    url = "https://bar.com/api"
    codeflash_output = should_propagate_trace(client, url) # 43.2μs -> 46.1μs (6.31% slower)

def test_basic_multiple_targets_none_match():
    # Should not propagate if none of the regexes match
    client = DummyClient(options={"trace_propagation_targets": ["foo.com", "baz.com"]})
    url = "https://bar.com/api"
    codeflash_output = should_propagate_trace(client, url) # 43.2μs -> 43.8μs (1.46% slower)

def test_basic_regex_match():
    # Should propagate if regex matches (e.g. wildcard)
    client = DummyClient(options={"trace_propagation_targets": [r"foo\..*"]})
    url = "https://foo.bar.com/api"
    codeflash_output = should_propagate_trace(client, url) # 55.2μs -> 56.0μs (1.40% slower)

# 2. Edge Test Cases

def test_edge_empty_targets_list():
    # Should not propagate if the target list is empty
    client = DummyClient(options={"trace_propagation_targets": []})
    url = "https://example.com"
    codeflash_output = should_propagate_trace(client, url) # 1.51μs -> 2.52μs (40.3% slower)

def test_edge_targets_list_is_none():
    # Should not propagate if the target list is None
    client = DummyClient(options={"trace_propagation_targets": None})
    url = "https://example.com"
    codeflash_output = should_propagate_trace(client, url) # 1.36μs -> 1.32μs (3.26% faster)

def test_edge_url_is_empty_string():
    # Should not propagate if url is empty string
    client = DummyClient(options={"trace_propagation_targets": ["foo.com"]})
    url = ""
    codeflash_output = should_propagate_trace(client, url) # 3.57μs -> 4.56μs (21.8% slower)

def test_edge_url_is_none():
    # Should not propagate if url is None
    client = DummyClient(options={"trace_propagation_targets": ["foo.com"]})
    url = None
    # match_regex_list expects a string, so should raise TypeError
    with pytest.raises(TypeError):
        should_propagate_trace(client, url) # 4.83μs -> 4.31μs (12.0% faster)

def test_edge_client_is_none():
    # Should raise AttributeError if client is None
    url = "https://foo.com"
    with pytest.raises(AttributeError):
        should_propagate_trace(None, url) # 1.60μs -> 1.57μs (2.17% faster)

def test_edge_client_options_missing():
    # Should raise AttributeError if client.options is missing
    class NoOptionsClient:
        pass
    client = NoOptionsClient()
    url = "https://foo.com"
    with pytest.raises(AttributeError):
        should_propagate_trace(client, url) # 1.85μs -> 1.88μs (1.80% slower)

def test_edge_targets_list_with_empty_string():
    # Should propagate if url contains empty string (always true)
    client = DummyClient(options={"trace_propagation_targets": [""]})
    url = "https://foo.com"
    codeflash_output = should_propagate_trace(client, url) # 33.2μs -> 4.36μs (663% faster)

def test_edge_targets_list_with_special_regex():
    # Should propagate if url matches special regex
    client = DummyClient(options={"trace_propagation_targets": [r"https://foo\.com/api\?id=\d+"]})
    url = "https://foo.com/api?id=123"
    codeflash_output = should_propagate_trace(client, url) # 91.1μs -> 3.80μs (2295% faster)
    url2 = "https://foo.com/api?id=abc"
    codeflash_output = should_propagate_trace(client, url2) # 2.90μs -> 1.40μs (107% faster)

def test_edge_is_sentry_url_returns_false_when_no_transport():
    # Should not propagate if url matches target, and client.transport is None
    client = DummyClient(options={"trace_propagation_targets": ["foo.com"]}, transport=None)
    url = "https://foo.com"
    codeflash_output = should_propagate_trace(client, url) # 4.05μs -> 5.15μs (21.3% slower)

def test_edge_is_sentry_url_returns_false_when_transport_has_no_parsed_dsn():
    # Should not propagate if url matches target, and transport.parsed_dsn is None
    transport = DummyTransport(parsed_dsn=None)
    client = DummyClient(options={"trace_propagation_targets": ["foo.com"]}, transport=transport)
    url = "https://foo.com"
    codeflash_output = should_propagate_trace(client, url) # 3.94μs -> 5.25μs (25.0% slower)

def test_edge_is_sentry_url_returns_true_and_blocks_propagation():
    # Should not propagate if url matches sentry DSN netloc
    parsed_dsn = DummyParsedDSN(netloc="sentry.example.com")
    transport = DummyTransport(parsed_dsn=parsed_dsn)
    client = DummyClient(options={"trace_propagation_targets": ["sentry.example.com"]}, transport=transport)
    url = "https://sentry.example.com/api"
    codeflash_output = should_propagate_trace(client, url) # 1.39μs -> 1.32μs (5.78% faster)

def test_edge_is_sentry_url_partial_match_does_not_block():
    # Should propagate if url contains netloc as substring but not exactly
    parsed_dsn = DummyParsedDSN(netloc="sentry.example.com")
    transport = DummyTransport(parsed_dsn=parsed_dsn)
    client = DummyClient(options={"trace_propagation_targets": ["foo.com"]}, transport=transport)
    url = "https://foo.com/sentry.example.com"
    codeflash_output = should_propagate_trace(client, url) # 1.42μs -> 1.29μs (10.3% faster)

def test_edge_targets_with_dollar_sign():
    # Should propagate if regex with dollar sign matches end of url
    client = DummyClient(options={"trace_propagation_targets": [r"foo.com/api$"]})
    url = "https://foo.com/api"
    codeflash_output = should_propagate_trace(client, url) # 57.7μs -> 4.17μs (1285% faster)
    url2 = "https://foo.com/api/extra"
    codeflash_output = should_propagate_trace(client, url2) # 2.89μs -> 1.65μs (75.8% faster)

def test_edge_targets_with_escaped_characters():
    # Should propagate if regex with escaped characters matches
    client = DummyClient(options={"trace_propagation_targets": [r"foo\.com\/api"]})
    url = "https://foo.com/api"
    codeflash_output = should_propagate_trace(client, url) # 53.4μs -> 3.41μs (1466% faster)

def test_edge_targets_with_unicode():
    # Should propagate if url contains unicode substring in target
    client = DummyClient(options={"trace_propagation_targets": ["例子.com"]})
    url = "https://例子.com/api"
    codeflash_output = should_propagate_trace(client, url) # 43.9μs -> 3.27μs (1244% faster)

def test_edge_targets_with_case_sensitivity():
    # Should propagate if regex is case sensitive and matches
    client = DummyClient(options={"trace_propagation_targets": ["FOO.com"]})
    url = "https://FOO.com/api"
    codeflash_output = should_propagate_trace(client, url) # 44.3μs -> 3.33μs (1232% faster)
    url2 = "https://foo.com/api"
    codeflash_output = should_propagate_trace(client, url2) # 2.42μs -> 1.51μs (60.2% faster)

def test_edge_targets_with_long_regex():
    # Should propagate if long regex matches
    long_regex = "a" * 100 + "b"
    client = DummyClient(options={"trace_propagation_targets": [long_regex]})
    url = "https://example.com/" + ("a" * 100) + "b"
    codeflash_output = should_propagate_trace(client, url) # 173μs -> 3.36μs (5062% faster)

# 3. Large Scale Test Cases

def test_large_scale_many_targets_one_matches():
    # Should propagate if one of many targets matches
    targets = [f"site{i}.com" for i in range(999)]
    targets.append("mytestsite.com")
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://mytestsite.com/api"
    codeflash_output = should_propagate_trace(client, url) # 27.3ms -> 3.54μs (769953% faster)

def test_large_scale_many_targets_none_match():
    # Should not propagate if none of many targets match
    targets = [f"site{i}.com" for i in range(1000)]
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://notfound.com/api"
    codeflash_output = should_propagate_trace(client, url) # 27.3ms -> 4.52μs (604251% faster)

def test_large_scale_long_url_matches():
    # Should propagate if long url matches a target
    url = "https://foo.com/" + "a" * 900
    client = DummyClient(options={"trace_propagation_targets": ["foo.com"]})
    codeflash_output = should_propagate_trace(client, url) # 39.4μs -> 42.2μs (6.56% slower)

def test_large_scale_long_url_no_match():
    # Should not propagate if long url does not match any target
    url = "https://bar.com/" + "a" * 900
    client = DummyClient(options={"trace_propagation_targets": ["foo.com"]})
    codeflash_output = should_propagate_trace(client, url) # 3.78μs -> 4.92μs (23.2% slower)

def test_large_scale_targets_with_regex_patterns():
    # Should propagate if url matches a complex regex in a large list
    targets = [f"site{i}.com" for i in range(500)]
    targets += [r"foo\..*\.com"]
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://foo.bar.com"
    codeflash_output = should_propagate_trace(client, url) # 13.6ms -> 3.54μs (383907% faster)

def test_large_scale_targets_all_empty_strings():
    # Should propagate if all targets are empty strings (always matches)
    targets = [""] * 1000
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://whatever.com"
    codeflash_output = should_propagate_trace(client, url) # 26.6μs -> 3.18μs (736% faster)

def test_large_scale_targets_with_dollar_signs():
    # Should propagate if one of many regexes with $ matches end of url
    targets = [f"site{i}.com$" for i in range(999)]
    targets.append("bar.com/api$")
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://bar.com/api"
    codeflash_output = should_propagate_trace(client, url) # 29.8ms -> 30.1ms (0.712% slower)

def test_large_scale_targets_with_unicode():
    # Should propagate if url matches one of many unicode targets
    targets = [f"site{i}.com" for i in range(999)] + ["例子.com"]
    client = DummyClient(options={"trace_propagation_targets": targets})
    url = "https://例子.com/api"
    codeflash_output = should_propagate_trace(client, url) # 27.0ms -> 4.29μs (630381% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-should_propagate_trace-mg9mvvsh and push.

Codeflash

The optimization introduces **regex compilation caching** to `match_regex_list`, which provides dramatic performance improvements when the same regex patterns are matched repeatedly.

**Key Changes:**
- **Regex Compilation Caching**: Instead of calling `re.search()` with raw strings (which internally compiles patterns each time), the code now pre-compiles all patterns using `re.compile()` and caches them based on the `regex_list` identity and `substring_matching` flag.
- **Pattern Preparation**: The logic for appending `$` anchors is moved to the caching phase, avoiding repeated string operations during matching.

**Why This Speeds Up:**
- **Eliminates Redundant Compilation**: The original code recompiled the same regex patterns on every call. With caching, patterns are compiled once and reused across multiple invocations.
- **Reduces String Operations**: Pattern modification (adding `$`) happens only during cache creation, not on every match attempt.

**Performance Benefits by Test Case:**
- **Massive gains for repeated pattern usage**: Tests with large regex lists show 60,000-770,000% speedups (e.g., `test_large_many_trace_targets_one_match`: 27.3ms → 3.77μs)
- **Excellent for complex patterns**: Regex-heavy tests like `test_edge_targets_with_special_regex` show 2,295% speedup
- **Minimal overhead for simple cases**: Basic single-regex tests show slight slowdown (7-8%) due to caching overhead, but this is negligible compared to the gains in realistic usage scenarios

The optimization is particularly effective for Sentry's trace propagation use case, where the same `trace_propagation_targets` list is likely checked against many different URLs throughout an application's lifetime.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 2, 2025 16:32
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant