Skip to content

Document overhead of Performance monitoring #7262

@danielkhan

Description

@danielkhan

Core or SDK?

Platform/SDK

Which part? Which one?

SDKs

Description

We are frequently asked about performance overhead, and while it's hard to put a number to it, we should help our users to understand possible performance implications and what we are doing to mitigate them.

Customer Feedback (paraphrased)

“I don’t need benchmarking statistics. I want to know how the sdk works-under-the hood, and then I can analyze my app and decide if it’s something to be worried about, because only I know the intricacies and architecture of my app”

“I’m scared to let the sentry sdk run because I don’t know what it’s doing. For analogy, if your trusted auto technician said your vehicle needs 5 hours of maintenance work and problem fixing, you’d ask for a description of the required work, just so you’re aware. And then you’d agree (or not). I trust you Sentry but I’m still looking for the same awareness regarding your SDK for Performance Monitoring - what is it doing?”

Suggested Solution

  • Focus on talking about how the sdk is designed and works rather than referencing lab-like benchmarking experiments which are expensive to run.
  • Something similar to our existing documentation for perf impact of Profiling and Session Replay, see examples below.
  • Something that addresses the concern in the above 2 customer feedback quotes.
  • Consider including examples of harmful things the sdk is-not-doing, to assure it's okay to use.

Examples:

  • https://docs.sentry.io/product/profiling/performance-overhead mentions "1 to 5%" at the bottom but even if you remove this line, this documentation is still helpful.
  • https://docs.sentry.io/product/session-replay/performance-overhead makes 0 mention of a quantified performance impact like "3% CPU" and this is still a helpful page of documentation.
  • “trace generation overhead” by Google’s Dapper Tracing whitepaper sections 4.1, 4.2, 4.3
  • New Relic's browser monitoring and performance impact documentation. Notice its general speak about web apps + sdk's. You could remove the "overhead of less than 15ms per page." at the bottom and it is still helpful documentation.
  • UX Cam's documentation also does not give CPU/benchmarking statistics on this subject, for their Session Replay product.
  • Datadog's documentation does not mention CPU/benchmarking statistics for their Session Replay other than a network bandwidth aspect, "Datadog also reduces the load on a browser’s UI thread by delegating most of the CPU-intensive work (such as compression) to a dedicated web worker. "
  • Fullstory's documentation divides it into 1. script is small 2. script is edge-cached by CDN to reduce time 3. UI thread utilization 4. Upload bandwdith, and do we do the following at sentry, "FullStory uses Event Handlers that simply add all user interaction events to an in-memory queue to be processed later, rather than processing them in real time. " ?

Not The Solution / What Is Not Being Asked

Benchmarking experiments, scientific analysis, answers like, “3% of CPU”. The Session Replay docs don’t mention this and the Profiling docs do at the bottom but you could remove that line and it’d still be a useful page of docs.

cc @lizokm

Metadata

Metadata

Labels

No labels
No labels

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions