-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
docs(dev-docs): Add metrics telemetry specs #15178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
Co-authored-by: David Herberth <[email protected]>
|
||
At minimum the SDK needs to implement the following methods for each metric type: | ||
|
||
- `Sentry.metrics.increment(name, value, options)` - Increment a counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `Sentry.metrics.increment(name, value, options)` - Increment a counter | |
- `Sentry.metrics.count(name, value, options)` - Increment a counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can count take a negative integer/float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think counters in metrics should be always increasing – but there is e.g. a separate up/down-counter in otel. We don't prevent passing negative integers from the SDK though
|
||
`name` | ||
|
||
: **String, required**. The name of the metric. This should follow a hierarchical naming convention using dots as separators (e.g., `api.response_time`, `db.query.duration`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should follow a hierarchical naming convention using dots as separators
Will this be enforced by the backend? Or just recommended to users? Are spaces in the name valid?
|
||
`span_id` | ||
|
||
: **String, optional**. The span id for the metric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this is the span id of the parent span? Or does this represent the metric item itself? If it does, we can probably remove this from the client protocol.
: **Number, required**. The numeric value of the metric. The interpretation depends on the metric `type`: | ||
|
||
- For `counter` metrics: the count to increment by (should default to 1) | ||
- For `gauge` metrics: the current value | ||
- For `distribution` metrics: a single measured value | ||
|
||
Integers should be a 64-bit signed integer, while doubles should be a 64-bit floating point number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you supply both an integer and a float to a metric? So call increment(3)
and then increment(4.5)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we stick to integers for counters. @k-fish I assume the value eventually ends up in an EAP attribute where integers and floats go into separate columns. Mixing both types might then make queries tricky, plus you'd have to deal with precision loss for very large numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging on to the discussion, is the SDK expected to truncate large integers? If so, how do we truncate, or do we just reject integers that are beyond 64 bits.
In particular, 64-bit floats are natural in Python, but integers are arbitrary precision (under the hood more memory is allocated if the user provides a large integer).
|
||
At minimum the SDK needs to implement the following methods for each metric type: | ||
|
||
- `Sentry.metrics.increment(name, value, options)` - Increment a counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can count take a negative integer/float?
- `name` **String, required**: The name of the metric | ||
- `value` **Number, required**: The value of the metric | ||
- `options` **Object, optional**: An object containing the following properties: | ||
- `unit` **String, optional**: The unit of measurement (distribution and gauge only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we put units on counts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @chargome 👍
A couple notes:
- This api might change before a beta takes place, so we may want to leave it up in PR for a while
- Metrics may have different limits than logs (eg. metric size limit is 2kib currently, might lower it) so may want to make note of that when it comes to adding attributes. We also probably need indicate what the failure mode should be if you have too much attribute data on the scope that you're trying to add to each metric
### Buffering | ||
|
||
Metrics should be buffered and aggregated before being sent. SDKs should keep a buffer of metrics on the client that flushes out based on some kind of condition. We recommend following the [batch processor specification](/sdk/telemetry/spans/batch-processor/) outlined in the develop docs, but you should choose the approach that works best for your platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careful with the term aggregated
here since generally trace metrics may have a span id, timestamp etc. that we want to preserve, making aggregation on the sdk side not desirable, although we may offer special aggregation functions / ability in the future.
Also as an aside, the metrics and log buffers are currently set to the same in the js sdk (100 items) but we may consider changing that if we keep the low metric size limit (~1-2kib)
|
||
Item type `"trace_metric"` contains an array of metric payloads encoded as JSON. This allows for multiple metric payloads to be sent in a single envelope item. | ||
|
||
Only a single trace_metric container is allowed per envelope. The `item_count` field in the envelope item header must match the amount of metrics sent, it's not optional. A `content_type` field in the envelope item header must be set to `application/vnd.sentry.items.trace_metric+json`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a single trace_metric container is allowed per envelope. The `item_count` field in the envelope item header must match the amount of metrics sent, it's not optional. A `content_type` field in the envelope item header must be set to `application/vnd.sentry.items.trace_metric+json`. | |
Only a single trace_metric container is allowed per envelope. The `item_count` field in the envelope item header must match the amount of metrics sent. A `content_type` field in the envelope item header must be set to `application/vnd.sentry.items.trace_metric+json`. Both the `item_count` and `content_type` fields are required. |
|
||
Only a single trace_metric container is allowed per envelope. The `item_count` field in the envelope item header must match the amount of metrics sent, it's not optional. A `content_type` field in the envelope item header must be set to `application/vnd.sentry.items.trace_metric+json`. | ||
|
||
It's okay to mix metrics from different traces into the same trace_metric envelope item, but if you do, you MUST not attach a DSC (dynamic sampling context) to the envelope header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay to mix metrics from different traces into the same trace_metric envelope item, but if you do, you MUST not attach a DSC (dynamic sampling context) to the envelope header. | |
It's okay to mix metrics from different traces into the same `trace_metric` envelope item, but if you do, you MUST NOT attach a DSC (dynamic sampling context) to the envelope header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Metrics are sent to Sentry via the `trace_metric` envelope item. Each `trace_metric` envelope item contains a batch of metric payloads encoded as JSON, allowing for transmission of multiple metrics in a single envelope. | ||
|
||
### `trace_metric` Envelope Item | ||
|
||
The `trace_metric` envelope item is an object that contains an array of metric payloads encoded as JSON. This allows for multiple metric payloads to be sent in a single envelope item. See [Appendix A](#appendix-a-example-trace_metric-envelope) for an example `trace_metric` envelope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: These paragraphs could be consolidated.
```python | ||
# Increment a counter | ||
Sentry.metrics.count('button.clicks', 1, { | ||
'attributes': {'button_id': 'submit', 'page': 'checkout'} | ||
}) | ||
|
||
# Set a gauge value | ||
Sentry.metrics.gauge('db.connection_pool.active', 42, { | ||
'unit': 'connection', | ||
'attributes': {'pool_name': 'main_db', 'database': 'postgres'} | ||
}) | ||
|
||
# Record a distribution | ||
Sentry.metrics.distribution('page.load_time', 245.7, { | ||
'unit': 'millisecond', | ||
'attributes': {'page': '/dashboard', 'browser': 'chrome'} | ||
}) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking.
```python | |
# Increment a counter | |
Sentry.metrics.count('button.clicks', 1, { | |
'attributes': {'button_id': 'submit', 'page': 'checkout'} | |
}) | |
# Set a gauge value | |
Sentry.metrics.gauge('db.connection_pool.active', 42, { | |
'unit': 'connection', | |
'attributes': {'pool_name': 'main_db', 'database': 'postgres'} | |
}) | |
# Record a distribution | |
Sentry.metrics.distribution('page.load_time', 245.7, { | |
'unit': 'millisecond', | |
'attributes': {'page': '/dashboard', 'browser': 'chrome'} | |
}) | |
``` | |
```python | |
# Increment a counter | |
Sentry.metrics.count( | |
'button.clicks', | |
1, | |
attributes={'button_id': 'submit', 'page': 'checkout'}, | |
) | |
# Set a gauge value | |
Sentry.metrics.gauge( | |
'db.connection_pool.active', | |
42, | |
unit='connection', | |
attributes={'pool_name': 'main_db', 'database': 'postgres'}, | |
) | |
# Record a distribution | |
Sentry.metrics.distribution( | |
'page.load_time', | |
245.7, | |
unit='millisecond', | |
attributes={'page': '/dashboard', 'browser': 'chrome'}, | |
) |
: **Number, required**. The numeric value of the metric. The interpretation depends on the metric `type`: | ||
|
||
- For `counter` metrics: the count to increment by (should default to 1) | ||
- For `gauge` metrics: the current value | ||
- For `distribution` metrics: a single measured value | ||
|
||
Integers should be a 64-bit signed integer, while doubles should be a 64-bit floating point number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging on to the discussion, is the SDK expected to truncate large integers? If so, how do we truncate, or do we just reject integers that are beyond 64 bits.
In particular, 64-bit floats are natural in Python, but integers are arbitrary precision (under the hood more memory is allocated if the user provides a large integer).
- `Sentry.metrics.count(name, value, options)` - Increment a counter | ||
- `Sentry.metrics.gauge(name, value, options)` - Set a gauge value | ||
- `Sentry.metrics.distribution(name, value, options)` - Add a distribution value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an API perspective, would any of these changes make it clearer?
- `Sentry.metrics.count(name, value, options)` - Increment a counter | |
- `Sentry.metrics.gauge(name, value, options)` - Set a gauge value | |
- `Sentry.metrics.distribution(name, value, options)` - Add a distribution value | |
- `Sentry.metrics.increment(name, count, options)` - Increment a counter | |
- `Sentry.metrics.gauge(name, value, options)` - Set a gauge value | |
- `Sentry.metrics.distribution(name, sample, options)` - Add a distribution value |
|
||
#### Method Signatures | ||
|
||
The parameters for these methods are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds prescriptive right now.
The parameters for these methods are: | |
The parameters for these methods should be the following, or an equivalent form that is more natural for SDK's language: |
`unit` | ||
|
||
: **String, optional**. The unit of measurement for the metric value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify in the spec who modifies the input or drops the metric if the user provides malformed arguments with the wrong type?
For example, I provide an integer in the unit field when calling metrics.count()
. Does the SDK
- drop the metric; and/or
- coerce into a string; and/or
- send an integer and relies on downstream systems to discard the metric?
sdk/telemetry
formetrics
./sdk/data-model/envelope-items/
for the trace metric itemTook the logs specs as a template.
ref getsentry/sentry-javascript#17883