Skip to content

Conversation

@colega
Copy link
Contributor

@colega colega commented Jul 14, 2025

What this PR does:

This adds a new MultiPartitionInstanceRing as an alternative to PartitionInstanceRing in which the instances can own more than one partition. It uses the same instance PartitionRing desc, so we leverage that by setting a different instance ID as owner of each partition.

The convention is enclosed in this implementation and consists of adding a suffix /<partition> to the instance ID when it owns partition. We initially added this as an option to PartitionInstanceRing, however most of the code actually differs, and the fact that you could instantiate a PartitionInstanceRing and use it in a multi-partition context introduced subtle bugs in the code. By adding a specific type to this we enforce that each implementation will consistently use one or another.

We do add this as an option to PartitionInstanceLifecycler, because there most of the code is shared.

PartitionRing exposes a new method MultiPartitionOwnerIDs that will list the real (non-suffixed) instance IDs of partition owners ready to be used by any code.

Finally, we also add a method to PartitionRingEditor to remove an instance ID as an owner of a partition, required to perform the periodic cleanup of the stale instances in the ring by users of this library.

Checklist

  • Tests updated

@colega colega changed the title Feat ring support for owning multiple partitions feat(ring): support for owning multiple partitions Jul 14, 2025
@colega colega force-pushed the feat-ring-support-for-owning-multiple-partitions branch from 578c87f to f922d4b Compare July 14, 2025 15:09
Co-authored-by: Marco Pracucci <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
@colega colega force-pushed the feat-ring-support-for-owning-multiple-partitions branch from f922d4b to e312311 Compare July 14, 2025 15:11
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I only left nits.

func (r *MultiPartitionInstanceRing) PartitionRing() *PartitionRing {
return r.partitionsRingReader.PartitionRing()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I suggest to add InstanceRing() like PartitionInstanceRing. It's legit to get the instances ring from MultiPartitionInstanceRing.

Suggested change
func (r * MultiPartitionInstanceRing) InstanceRing() InstanceRingReader {
return r.instancesRing
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly left it unimplemented because we don't use it in the implementation, and if I add it, I'd feel to be forced to use it, but I think that could make unit-testing harder.

I'd rather leave it like this for now.


// this method expects instanceIDs to be in the same order as instances.
// instanceIDs should hold the parsed multi-partition owner IDs.
func highestPreferrablyNonReadOnlyFromEachZone(instances []InstanceDesc, instanceIDs []string, instanceZones []string) []InstanceDesc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Typo

Suggested change
func highestPreferrablyNonReadOnlyFromEachZone(instances []InstanceDesc, instanceIDs []string, instanceZones []string) []InstanceDesc {
func highestPreferablyNonReadOnlyFromEachZone(instances []InstanceDesc, instanceIDs []string, instanceZones []string) []InstanceDesc {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d2b489d

}, nil
}

// this method expects instanceIDs to be in the same order as instances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Please also comment that instances is updated in-place and then returned.

}

for i, ownerID := range ids {
if p := strings.IndexByte(ownerID, '/'); p != -1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This search for the 1st / occurrence, but so far I don't a reason why / couldn't also be part of the instance ID. I think using strings.LastIndexByte() would more correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's also more efficient. Fixed and added a test with an instance ID with a slash. 1c09d6b

@colega colega merged commit 7fac058 into main Jul 15, 2025
13 of 14 checks passed
@colega colega deleted the feat-ring-support-for-owning-multiple-partitions branch July 15, 2025 11:58
salvacorts added a commit that referenced this pull request Aug 4, 2025
… arch compatibility (#730)

Loki is failing to build for the linux/arm platform with ([see][1]):
```
59.88 # github.com/grafana/dskit/ring
59.88 vendor/github.com/grafana/dskit/ring/multi_partition_instance_ring.go:151:10: cannot use math.MaxInt64 (untyped int constant 9223372036854775807) as int value in return statement (overflows)
59.88 vendor/github.com/grafana/dskit/ring/multi_partition_instance_ring.go:155:10: cannot use math.MaxInt64 (untyped int constant 9223372036854775807) as int value in return statement (overflows)
```

linux/arm is a 32bits platform so int is 32 bits, but
`indexFromInstanceSuffix` returns `math.MaxInt64` instead of
`math.MaxInt`.

**What this PR does**:

**Which issue(s) this PR fixes**:

Fixes PR - #725

**Checklist**
- [ ] Tests updated

[1]:
https://github.com/grafana/loki/actions/runs/16723281173/job/47333518579#step:9:486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants