Skip to content

Conversation

carterkozak
Copy link
Contributor

This allows us to track more specific context through the platform

==COMMIT_MSG==
Conjure User-Agents supports arbitrary comment data
==COMMIT_MSG==

I need to put some more thought into this. A few areas where I think we may want to do things differently:

  • I've replaced nodeId with comments, which can take multiple arbitrary values. nodeId was set on the top-level UserAgent, not on a per-agent basis, so the new comments are set on the top-level type and encoded just after the primary agent as well. I think we should move the comments to Agent instead, allowing each component to provide context.
  • I'm encoding the comments as a list of strings. In practice the user-agent rfc considers the comment a single value, but we should be more prescriptive. We could require key-value pairs, as I expect that will be how most of these comments are used (much like nodeId), however there are quite a few examples in the wild where segments don't map nicely to key/value pairs. Each segment may have its own preferred delimiters/sequences/etc.

In general, I like that we retain more of the original info from browser user-agents, which might not be possible if we force key/value pairs. We could bias the API toward key/value pairs in a way that doesn't exclude other strings (e.g. empty value to represent a single string) but that would make parsing more complex and risk mutations when we parse->re-encode (e.g. delimiter characters in the comment key getting split, and moving into the value)

This allows us to track more specific context through the platform
@changelog-app
Copy link

changelog-app bot commented Jan 30, 2025

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Conjure User-Agents supports arbitrary comment metadata, for observability metadata similar to the existing nodeId parameter.

Check the box to generate changelog(s)

  • Generate changelog entry

@fsamuel-bs
Copy link

Some context:

  • the main goal is to add metadata about the resource we're accessing in this request. Often we need more than one resource to accurately identify the resource. Currently we encode it as ... <resource-type>/<resource-identifier> <resource-type>/<resource-identifier>
  • We already have infrastructure (with Introduce ForUserAgent tracing-java#790) to propagate this user-agent in calls of with the same trace metadata.

A few comments:

I'm encoding the comments as a list of strings. In practice the user-agent rfc considers the comment a single value, but we should be more prescriptive. We could require key-value pairs, as I expect that will be how most of these comments are used (much like nodeId), however there are quite a few examples in the wild where segments don't map nicely to key/value pairs.

if we go forward with key-value pairs then I'd propose we model it as a list of key-value pairs, and not as a map. I.e. don't require the keys to be unique. It's quite common for us to have the same "key" with different values in our internal requests.

I don't have a strong opinion if we should make this key-value pairs or just strings though - we don't currently parse them, and is easy enough to parse <resource-type>:() if I want to parse all my resource identifiers, as long as we agree on delimiters (e.g. white-space).

I like that we retain more of the original info from browser user-agents

Note that for my use-case, I don't really care about the browser user-agents. In fact, they're noise. I care about which resource is being accessed, not by which client or browser (which kind of indicates we're extending the usability of user-agent to maybe not what it was designed to do).

Each segment may have its own preferred delimiters/sequences/etc.

In fact, we're currently using internally /, which in fact is not supported by the user-agent RFC. 🤦

I think we should move the comments to Agent instead, allowing each component to provide context.

I agree.

@carterkozak
Copy link
Contributor Author

if we go forward with key-value pairs then I'd propose we model it as a list of key-value pairs, and not as a map. I.e. don't require the keys to be unique. It's quite common for us to have the same "key" with different values in our internal requests.
I don't have a strong opinion if we should make this key-value pairs or just strings though - we don't currently parse them, and is easy enough to parse :() if I want to parse all my resource identifiers, as long as we agree on delimiters (e.g. white-space).

I tend to avoid using maps in apis, for uniqueness constraints as well -- I like the flexibility of a simple list of strings instead of dealing with key/value pair delimiters which may not be sufficient for all types of key/value pair. Let's stick with List<String> and allow consumers to handle the data however best suits them.

Note that for my use-case, I don't really care about the browser user-agents.

I'm not suggesting that the additional metadata is useful for your use case, but rather than by implementing a feature that is helpful for you, we may be able to more closely align with how user-agents are constructed and used more broadly. This gives me confidence that the feature will be supportable without causing issues regardless of whether/how it's used for this particular use-case.

In fact, they're noise. I care about which resource is being accessed, not by which client or browser (which kind of indicates we're extending the usability of user-agent to maybe not what it was designed to do).

This is a fair caveat -- the use-case that we discussed for this metadata a few weeks ago did have better alignment, describing state/intent of a caller rather than what specifically they were attempting to access.

In fact, we're currently using internally /, which in fact is not supported by the user-agent RFC. 🤦

That's not how I interpreted the rfc, they seem entirely valid (and common) within both the agent and comment components. See examples at https://useragents.io/

Comment on lines +48 to +49
private static final Splitter SEMICOLON_SPLITTER =
Splitter.on(CharMatcher.is(';').precomputed()).trimResults().omitEmptyStrings();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the most recent commit, I've updated this to no longer split on commas. I haven't found examples where the comma is used as a delimiter like semicolon, but it is used within the common string KHTML, like Gecko.
Everywhere that we encoded nodeId was a single component, and that wouldn't be impacted (though I don't think we actually use the nodeId component anywhere these days): (nodeId:value)

@carterkozak carterkozak marked this pull request as ready for review February 3, 2025 21:50
* for additional diagnostic information. Note that this library provides a much stricter set of allowed
* characters within comments than the linked RFCs to reduce complexity.
*/
List<@Safe String> comments();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to validate comments in the check() method as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, there are two factory methods, only one takes comments, and it validates them before passing args. This way we avoid unnecessary work in the common case.

Copy link
Contributor

@bjlaub bjlaub Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but the same could be said about name and version, no? should we just remove the check() method? i think this is more of a question of if anyone does/will use ImmutableAgent.builder() directly.

This way we avoid unnecessary work in the common case.

what's the "common case" here? using the factory that does not accept comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is more of a question of if anyone does/will use ImmutableAgent.builder() directly.

Right, ImmutableAgent is package private, so it can't be used from outside of this module (assuming no package-clobbering hackery)

what's the "common case" here? using the factory that does not accept comments?

Correct, the existing factory method that folks already use today, which does not accept comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid unnecessary work

isn't this just a conditional on comments().size()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableAgent is package private

ah okay, i missed that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

In general I'm a bit biased against immutables @Check methods as well, since they add a public method to the interfaces API which isn't meant to be called by consumers of the interface.

@bulldozer-bot bulldozer-bot bot merged commit e77c751 into develop Feb 4, 2025
5 checks passed
@bulldozer-bot bulldozer-bot bot deleted the ckozak/agent_arbitrary_comment_support branch February 4, 2025 16:38
@autorelease3
Copy link

autorelease3 bot commented Feb 4, 2025

Released 2.59.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants