Skip to content

tls instrumentation in @opentelemetry/instrumentation-net tries to operate on closed span #1775

@Zirak

Description

@Zirak

What version of OpenTelemetry are you using?

What version of Node are you using?

Reproduced on:

  • 18.18.2
  • 20.9.0

What did you do?

When instrumented code uses fetch in a non-idiomatic manner, the tls instrumentation attempts to do an action on a closed tls.connect span.

For example, instrumenting the following code:

fetch('https://example.com').then(() => {
    // A
    console.log('got a response');
});

// let the fetch timeout, hitting B
setTimeout(() => {}, 10_000);

The fetch timeouts since the response is not properly handled.

Point A is called when the CONNECT event is emitted, handled inside otel here:

const otelTlsSpanListener = () => {
const peerCertificate = socket.getPeerCertificate(true);
const cipher = socket.getCipher();
const protocol = socket.getProtocol();
const attributes = {
[TLSAttributes.PROTOCOL]: String(protocol),
[TLSAttributes.AUTHORIZED]: String(socket.authorized),
[TLSAttributes.CIPHER_NAME]: cipher.name,
[TLSAttributes.CIPHER_VERSION]: cipher.version,
[TLSAttributes.CERTIFICATE_FINGERPRINT]: peerCertificate.fingerprint,
[TLSAttributes.CERTIFICATE_SERIAL_NUMBER]: peerCertificate.serialNumber,
[TLSAttributes.CERTIFICATE_VALID_FROM]: peerCertificate.valid_from,
[TLSAttributes.CERTIFICATE_VALID_TO]: peerCertificate.valid_to,
[TLSAttributes.ALPN_PROTOCOL]: '',
};
if (socket.alpnProtocol) {
attributes[TLSAttributes.ALPN_PROTOCOL] = socket.alpnProtocol;
}
tlsSpan.setAttributes(attributes);
tlsSpan.end();
};

Point B is called when the ERROR event is emitted, handled here:

const otelTlsErrorListener = (e: Error) => {
tlsSpan.setStatus({
code: SpanStatusCode.ERROR,
message: e.message,
});
tlsSpan.end();
};

First A is emitted, setting some attributes and ending the span. Then B is hit, which attempts to set a status and close the span. That's invalid, since the span has already ended. With OTEL_LOG_LEVEL set to info, it prints something like the following:

Can not execute the operation on ended Span {traceId: 05b3d14d94bff06cd612b28e3df51afe, spanId: 0474f505a765a41f}
Can not execute the operation on ended Span {traceId: 05b3d14d94bff06cd612b28e3df51afe, spanId: 0474f505a765a41f}
tls.connect 05b3d14d94bff06cd612b28e3df51afe-0474f505a765a41f - You can only call end() on a span once.

What did you expect to see?

The tls instrumentation handles these scenarios without stepping on its own toes.

What did you see instead?

When a fetch timeouts, the tls instrumentation tries doing operations on an ended span

Possible solutions

A couple of options (of course, more are possible):

  1. Clear the event listeners inside the connect handler
  2. Wrap the entire tls connection in a span of its own, where tls.connect is a child span. Subsequent errors hit the longer parent span

I'm more than willing to create a followup PR for this

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpkg:instrumentation-netpriority:p4Bugs and spec inconsistencies which do not fall into a higher prioritization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions