Skip to content

Conversation

carterkozak
Copy link
Contributor

This prevents threadlocal map entry churn when traces are added
and cleared, however it comes at the potential cost of larger
threadlocal maps when tracing is not used. We generally expect
everything to be traced, so this shouldn't impact us.

==COMMIT_MSG==
Trace threadlocal entry is never removed, only nulled
==COMMIT_MSG==

Benchmarks on a single thread:

BEFORE

Benchmark                             (observability)  Mode  Cnt    Score    Error  Units
TracingBenchmark.traceWithSingleSpan           SAMPLE  avgt    3  551.519 ± 57.893  ns/op
TracingBenchmark.traceWithSingleSpan    DO_NOT_SAMPLE  avgt    3  317.826 ± 14.467  ns/op
TracingBenchmark.traceWithSingleSpan        UNDECIDED  avgt    3  342.436 ± 38.777  ns/op
Benchmark                             (observability)  Mode  Cnt    Score     Error  Units
TracingBenchmark.traceWithSingleSpan           SAMPLE  avgt    3  540.388 ±  69.014  ns/op
TracingBenchmark.traceWithSingleSpan    DO_NOT_SAMPLE  avgt    3  359.699 ± 208.046  ns/op
TracingBenchmark.traceWithSingleSpan        UNDECIDED  avgt    3  362.240 ±  96.292  ns/op

AFTER

Benchmark                             (observability)  Mode  Cnt    Score    Error  Units
TracingBenchmark.traceWithSingleSpan           SAMPLE  avgt    3  503.538 ± 20.695  ns/op
TracingBenchmark.traceWithSingleSpan    DO_NOT_SAMPLE  avgt    3  301.762 ± 16.082  ns/op
TracingBenchmark.traceWithSingleSpan        UNDECIDED  avgt    3  323.679 ± 82.884  ns/op
Benchmark                             (observability)  Mode  Cnt    Score    Error  Units
TracingBenchmark.traceWithSingleSpan           SAMPLE  avgt    3  513.711 ± 21.862  ns/op
TracingBenchmark.traceWithSingleSpan    DO_NOT_SAMPLE  avgt    3  297.156 ± 24.606  ns/op
TracingBenchmark.traceWithSingleSpan        UNDECIDED  avgt    3  319.645 ± 33.003  ns/op

Benchmarks with 20 threads

20 threads BEFORE

Benchmark                             (observability)  Mode  Cnt     Score     Error  Units
TracingBenchmark.traceWithSingleSpan           SAMPLE  avgt    3  1134.473 ± 305.985  ns/op
TracingBenchmark.traceWithSingleSpan    DO_NOT_SAMPLE  avgt    3  1432.571 ± 409.387  ns/op
TracingBenchmark.traceWithSingleSpan        UNDECIDED  avgt    3  1608.718 ± 335.243  ns/o

20 threads AFTER

Benchmark                             (observability)  Mode  Cnt     Score      Error  Units
TracingBenchmark.traceWithSingleSpan           SAMPLE  avgt    3  1910.558 ±  121.566  ns/op
TracingBenchmark.traceWithSingleSpan    DO_NOT_SAMPLE  avgt    3  1161.873 ± 1048.964  ns/op
TracingBenchmark.traceWithSingleSpan        UNDECIDED  avgt    3  1522.187 ±  375.752  ns/op

GCProfiler benchmarks (gc profiling appears to impact the results)

Before:

Benchmark                                                              (observability)  Mode  Cnt     Score      Error   Units
TracingBenchmark.traceWithSingleSpan                                            SAMPLE  avgt    3   583.999 ±  168.060   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                             SAMPLE  avgt    3  8685.304 ± 2522.891  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                        SAMPLE  avgt    3  1552.026 ±    0.012    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space                    SAMPLE  avgt    3  8623.289 ± 4266.756  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm               SAMPLE  avgt    3  1540.850 ±  478.658    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space                SAMPLE  avgt    3     0.014 ±    0.067  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm           SAMPLE  avgt    3     0.002 ±    0.012    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                                  SAMPLE  avgt    3    64.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                                   SAMPLE  avgt    3    67.000                 ms
TracingBenchmark.traceWithSingleSpan                                     DO_NOT_SAMPLE  avgt    3   865.820 ±  345.378   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                      DO_NOT_SAMPLE  avgt    3  3837.916 ± 1524.525  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                 DO_NOT_SAMPLE  avgt    3  1016.018 ±    0.026    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space             DO_NOT_SAMPLE  avgt    3  3943.084 ± 4456.074  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm        DO_NOT_SAMPLE  avgt    3  1043.879 ± 1104.363    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space         DO_NOT_SAMPLE  avgt    3     0.007 ±    0.012  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm    DO_NOT_SAMPLE  avgt    3     0.002 ±    0.003    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                           DO_NOT_SAMPLE  avgt    3    28.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                            DO_NOT_SAMPLE  avgt    3    32.000                 ms
TracingBenchmark.traceWithSingleSpan                                         UNDECIDED  avgt    3   765.661 ±   59.218   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                          UNDECIDED  avgt    3  4363.201 ±  336.582  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                     UNDECIDED  avgt    3  1021.521 ±    0.202    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space                 UNDECIDED  avgt    3  4361.652 ±   12.487  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm            UNDECIDED  avgt    3  1021.169 ±   76.230    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space             UNDECIDED  avgt    3     0.008 ±    0.019  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm        UNDECIDED  avgt    3     0.002 ±    0.005    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                               UNDECIDED  avgt    3    30.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                                UNDECIDED  avgt    3    29.000                 ms

After:

Benchmark                                                              (observability)  Mode  Cnt      Score      Error   Units
TracingBenchmark.traceWithSingleSpan                                            SAMPLE  avgt    3    873.098 ±  290.113   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                             SAMPLE  avgt    3   5690.967 ± 1876.023  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                        SAMPLE  avgt    3   1520.026 ±    0.006    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space                    SAMPLE  avgt    3   5774.303 ± 4458.832  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm               SAMPLE  avgt    3   1542.644 ± 1313.700    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space                SAMPLE  avgt    3      0.013 ±    0.074  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm           SAMPLE  avgt    3      0.003 ±    0.019    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                                  SAMPLE  avgt    3     41.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                                   SAMPLE  avgt    3     45.000                 ms
TracingBenchmark.traceWithSingleSpan                                     DO_NOT_SAMPLE  avgt    3    309.315 ±   17.596   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                      DO_NOT_SAMPLE  avgt    3  10394.631 ±  607.990  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                 DO_NOT_SAMPLE  avgt    3    984.013 ±    0.004    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space             DO_NOT_SAMPLE  avgt    3  10309.117 ±  123.584  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm        DO_NOT_SAMPLE  avgt    3    975.923 ±   48.984    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space         DO_NOT_SAMPLE  avgt    3      0.015 ±    0.037  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm    DO_NOT_SAMPLE  avgt    3      0.001 ±    0.004    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                           DO_NOT_SAMPLE  avgt    3     60.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                            DO_NOT_SAMPLE  avgt    3     67.000                 ms
TracingBenchmark.traceWithSingleSpan                                         UNDECIDED  avgt    3    829.603 ±  237.847   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                          UNDECIDED  avgt    3   3898.852 ± 1146.074  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                     UNDECIDED  avgt    3    989.532 ±    0.041    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space                 UNDECIDED  avgt    3   3974.473 ± 5468.992  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm            UNDECIDED  avgt    3   1008.222 ± 1166.486    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space             UNDECIDED  avgt    3      0.010 ±    0.119  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm        UNDECIDED  avgt    3      0.002 ±    0.029    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                               UNDECIDED  avgt    3     23.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                                UNDECIDED  avgt    3     27.000                 ms

This prevents threadlocal map entry churn when traces are added
and cleared, however it comes at the potential cost of larger
threadlocal maps when tracing is not used. We generally expect
everything to be traced, so this shouldn't impact us.
@changelog-app
Copy link

changelog-app bot commented Feb 17, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Trace threadlocal entry is never removed, only nulled

Check the box to generate changelog(s)

  • Generate changelog entry

@policy-bot policy-bot bot requested a review from tpetracca February 17, 2022 18:14
@carterkozak
Copy link
Contributor Author

Hmm, benchmark data isn't helpful because it's using an MDC map implementation which allocates. Using -Dlog4j2.garbagefreeThreadContextMap=true we get:

Before

Benchmark                                                              (observability)  Mode  Cnt     Score      Error   Units
TracingBenchmark.traceWithSingleSpan                                            SAMPLE  avgt    3   347.288 ±   47.650   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                             SAMPLE  avgt    3  1317.659 ±  179.190  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                        SAMPLE  avgt    3   560.023 ±    0.029    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space                    SAMPLE  avgt    3  1296.084 ± 1778.947  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm               SAMPLE  avgt    3   550.672 ±  685.866    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space                SAMPLE  avgt    3     0.005 ±    0.019  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm           SAMPLE  avgt    3     0.002 ±    0.008    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                                  SAMPLE  avgt    3    23.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                                   SAMPLE  avgt    3    26.000                 ms
TracingBenchmark.traceWithSingleSpan                                     DO_NOT_SAMPLE  avgt    3   163.359 ±    6.863   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                      DO_NOT_SAMPLE  avgt    3   920.359 ±   38.607  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                 DO_NOT_SAMPLE  avgt    3   184.008 ±    0.015    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space             DO_NOT_SAMPLE  avgt    3   901.539 ± 1780.372  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm        DO_NOT_SAMPLE  avgt    3   180.275 ±  363.839    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space         DO_NOT_SAMPLE  avgt    3     0.008 ±    0.169  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm    DO_NOT_SAMPLE  avgt    3     0.002 ±    0.034    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                           DO_NOT_SAMPLE  avgt    3    16.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                            DO_NOT_SAMPLE  avgt    3    19.000                 ms
TracingBenchmark.traceWithSingleSpan                                         UNDECIDED  avgt    3   176.826 ±   18.086   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                          UNDECIDED  avgt    3   868.412 ±   89.363  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                     UNDECIDED  avgt    3   187.935 ±    0.235    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space                 UNDECIDED  avgt    3   845.183 ±    0.200  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm            UNDECIDED  avgt    3   182.912 ±   18.705    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space             UNDECIDED  avgt    3     0.007 ±    0.151  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm        UNDECIDED  avgt    3     0.002 ±    0.032    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                               UNDECIDED  avgt    3    15.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                                UNDECIDED  avgt    3    17.000                 ms

After

Benchmark                                                              (observability)  Mode  Cnt     Score      Error   Units
TracingBenchmark.traceWithSingleSpan                                            SAMPLE  avgt    3   313.471 ±   67.044   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                             SAMPLE  avgt    3  1376.501 ±  295.484  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                        SAMPLE  avgt    3   528.017 ±    0.025    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space                    SAMPLE  avgt    3  1403.523 ± 2335.557  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm               SAMPLE  avgt    3   538.074 ±  783.768    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space                SAMPLE  avgt    3     0.009 ±    0.174  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm           SAMPLE  avgt    3     0.003 ±    0.066    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                                  SAMPLE  avgt    3    19.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                                   SAMPLE  avgt    3    21.000                 ms
TracingBenchmark.traceWithSingleSpan                                     DO_NOT_SAMPLE  avgt    3   111.628 ±   37.147   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                      DO_NOT_SAMPLE  avgt    3  1112.801 ±  370.462  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                 DO_NOT_SAMPLE  avgt    3   152.007 ±    0.009    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space             DO_NOT_SAMPLE  avgt    3  1126.925 ± 1779.805  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm        DO_NOT_SAMPLE  avgt    3   153.824 ±  200.768    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space         DO_NOT_SAMPLE  avgt    3     0.009 ±    0.160  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm    DO_NOT_SAMPLE  avgt    3     0.001 ±    0.022    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                           DO_NOT_SAMPLE  avgt    3    20.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                            DO_NOT_SAMPLE  avgt    3    22.000                 ms
TracingBenchmark.traceWithSingleSpan                                         UNDECIDED  avgt    3   138.785 ±   46.854   ns/op
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate                          UNDECIDED  avgt    3   918.204 ±  310.462  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.alloc.rate.norm                     UNDECIDED  avgt    3   155.932 ±    0.193    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space                 UNDECIDED  avgt    3   901.572 ± 1781.446  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Eden_Space.norm            UNDECIDED  avgt    3   153.125 ±  301.941    B/op
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space             UNDECIDED  avgt    3     0.009 ±    0.112  MB/sec
TracingBenchmark.traceWithSingleSpan:·gc.churn.G1_Survivor_Space.norm        UNDECIDED  avgt    3     0.001 ±    0.018    B/op
TracingBenchmark.traceWithSingleSpan:·gc.count                               UNDECIDED  avgt    3    16.000             counts
TracingBenchmark.traceWithSingleSpan:·gc.time                                UNDECIDED  avgt    3    19.000                 ms

currentTrace.set(null);
MDC.remove(Tracers.TRACE_ID_KEY);
MDC.remove(Tracers.TRACE_SAMPLED_KEY);
MDC.remove(Tracers.REQUEST_ID_KEY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copying comment from #850

I didn't touch these as I didn't see these pop in profiles, and log4j2 uses org.apache.logging.log4j.spi.GarbageFreeSortedArrayThreadContextMap & org.apache.logging.log4j.util.SortedArrayStringMap where the MDC.remove & MDC.put don't directly free/allocate

@bulldozer-bot bulldozer-bot bot merged commit 43e835a into develop Feb 17, 2022
@bulldozer-bot bulldozer-bot bot deleted the ckozak/threadlocal_entry_management branch February 17, 2022 18:51
@svc-autorelease
Copy link
Collaborator

Released 6.8.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants