-
Notifications
You must be signed in to change notification settings - Fork 616
Description
What version of OpenTelemetry are you using?
@opentelemetry/[email protected]@opentelemetry/[email protected]
What version of Node are you using?
Reproduced on:
- 18.18.2
- 20.9.0
What did you do?
When instrumented code uses fetch in a non-idiomatic manner, the tls instrumentation attempts to do an action on a closed tls.connect span.
For example, instrumenting the following code:
fetch('https://example.com').then(() => {
// A
console.log('got a response');
});
// let the fetch timeout, hitting B
setTimeout(() => {}, 10_000);The fetch timeouts since the response is not properly handled.
Point A is called when the CONNECT event is emitted, handled inside otel here:
opentelemetry-js-contrib/plugins/node/opentelemetry-instrumentation-net/src/instrumentation.ts
Lines 122 to 143 in de6156a
| const otelTlsSpanListener = () => { | |
| const peerCertificate = socket.getPeerCertificate(true); | |
| const cipher = socket.getCipher(); | |
| const protocol = socket.getProtocol(); | |
| const attributes = { | |
| [TLSAttributes.PROTOCOL]: String(protocol), | |
| [TLSAttributes.AUTHORIZED]: String(socket.authorized), | |
| [TLSAttributes.CIPHER_NAME]: cipher.name, | |
| [TLSAttributes.CIPHER_VERSION]: cipher.version, | |
| [TLSAttributes.CERTIFICATE_FINGERPRINT]: peerCertificate.fingerprint, | |
| [TLSAttributes.CERTIFICATE_SERIAL_NUMBER]: peerCertificate.serialNumber, | |
| [TLSAttributes.CERTIFICATE_VALID_FROM]: peerCertificate.valid_from, | |
| [TLSAttributes.CERTIFICATE_VALID_TO]: peerCertificate.valid_to, | |
| [TLSAttributes.ALPN_PROTOCOL]: '', | |
| }; | |
| if (socket.alpnProtocol) { | |
| attributes[TLSAttributes.ALPN_PROTOCOL] = socket.alpnProtocol; | |
| } | |
| tlsSpan.setAttributes(attributes); | |
| tlsSpan.end(); | |
| }; |
Point B is called when the ERROR event is emitted, handled here:
opentelemetry-js-contrib/plugins/node/opentelemetry-instrumentation-net/src/instrumentation.ts
Lines 145 to 151 in de6156a
| const otelTlsErrorListener = (e: Error) => { | |
| tlsSpan.setStatus({ | |
| code: SpanStatusCode.ERROR, | |
| message: e.message, | |
| }); | |
| tlsSpan.end(); | |
| }; |
First A is emitted, setting some attributes and ending the span. Then B is hit, which attempts to set a status and close the span. That's invalid, since the span has already ended. With OTEL_LOG_LEVEL set to info, it prints something like the following:
Can not execute the operation on ended Span {traceId: 05b3d14d94bff06cd612b28e3df51afe, spanId: 0474f505a765a41f}
Can not execute the operation on ended Span {traceId: 05b3d14d94bff06cd612b28e3df51afe, spanId: 0474f505a765a41f}
tls.connect 05b3d14d94bff06cd612b28e3df51afe-0474f505a765a41f - You can only call end() on a span once.
What did you expect to see?
The tls instrumentation handles these scenarios without stepping on its own toes.
What did you see instead?
When a fetch timeouts, the tls instrumentation tries doing operations on an ended span
Possible solutions
A couple of options (of course, more are possible):
- Clear the event listeners inside the
connecthandler - Wrap the entire tls connection in a span of its own, where
tls.connectis a child span. Subsequent errors hit the longer parent span
I'm more than willing to create a followup PR for this