Skip to content

Conversation

@Fredi-raspall
Copy link
Contributor

@Fredi-raspall Fredi-raspall commented Oct 17, 2025

This rides on top of #903

Fixes #909

Note: this PR includes a couple of commits that extend the left-right readhandle thread-local cache with a refresh method and iterator that I thought would be needed to inspect the FIBs (e.g. from the CLI). In the end, those methods are not needed (currently), but I've decided to keep them even if not currently used. These extension require implementing a very simple additional trait method for the Fibtable.

@Fredi-raspall Fredi-raspall self-assigned this Oct 17, 2025
@Fredi-raspall Fredi-raspall requested a review from a team as a code owner October 17, 2025 16:26
@Fredi-raspall Fredi-raspall requested review from sergeymatov and removed request for a team October 17, 2025 16:26
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/fix_concurrency_fibtable branch from c42743c to 202bdad Compare October 17, 2025 16:41
@Fredi-raspall Fredi-raspall linked an issue Oct 17, 2025 that may be closed by this pull request
@Fredi-raspall Fredi-raspall added this to the GW R1 milestone Oct 17, 2025
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/tl_left_right_readhandles branch from c772a0d to 79683ab Compare October 17, 2025 17:11
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/fix_concurrency_fibtable branch from 202bdad to 021f73f Compare October 17, 2025 17:11
@mvachhar mvachhar requested review from Copilot and mvachhar October 17, 2025 17:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes concurrency issues in the FIB table by implementing a thread-local cache for read handles, replacing the direct lookup mechanism. The primary change is transitioning from the previous concurrent access pattern to using left-right-tlcache for efficient thread-local caching of FIB readers.

Key changes:

  • Replace FibId with FibKey throughout the codebase for better type semantics
  • Implement thread-local caching for FIB read handles using left-right-tlcache
  • Simplify FIB table API by removing direct Arc returns and using factories instead

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
routing/src/rib/vrftable.rs Updated VRF table to use FibKey instead of FibId, simplified FIB access patterns
routing/src/fib/fibtype.rs Renamed FibId to FibKey, added Hash trait and Identity implementation for thread-local caching
routing/src/fib/fibtable.rs Major refactoring to use thread-local cache with FibReaderFactory pattern
routing/src/interfaces/*.rs Updated interface attachment logic to use FibKey
left-right-tlcache/src/lib.rs Extended cache with refresh and iterator functionality
dataplane/src/packet_processor/ipforward.rs Simplified packet forwarding by using cached FIB readers
routing/src/errors.rs Added new error types for FIB table access failures

Comment on lines 102 to 104
if !matches!(id, FibKey::Id(_)) {
panic!("Attempting to set invalid Id of {} to fib", id);
}
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic message contains a formatting placeholder {} but the id parameter is not passed to the format string. This will cause a runtime panic. Should be: panic!(\"Attempting to set invalid Id of {id} to fib\");

Copilot generated this review using guidance from repository custom instructions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move id to be inline here. Co-pilot is wrong about this causing a panic, but there is no reason not to inline this here.

assert_eq!(
fibtr.enter().unwrap().len(),
1,
"only defaylt fib should remain"
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'defaylt' to 'default'

Suggested change
"only defaylt fib should remain"
"only default fib should remain"

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address this comment too.

debug!("Vrf with Id {vrfid} no longer has a VNI associated");
debug!("Vrf with Id {vrfid} no longer has a vni {vni} associated");
} else {
debug!("Vrf {vrfid} has no vni configured")
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing semicolon at the end of the debug statement. Should be: debug!(\"Vrf {vrfid} has no vni configured\");

Suggested change
debug!("Vrf {vrfid} has no vni configured")
debug!("Vrf {vrfid} has no vni configured");

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this co-pilot comment.

@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/tl_left_right_readhandles branch from 79683ab to eda6dab Compare October 17, 2025 21:03
A FibReaderFactory object allows creating FibReaders.

Signed-off-by: Fredi Raspall <[email protected]>
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/fix_concurrency_fibtable branch from 021f73f to 39dcde0 Compare October 17, 2025 21:03
Copy link
Contributor

@mvachhar mvachhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the unsafe coerced pointer casts in favor of a checked conversion.

Ok(())
}
}
impl Display for FibTable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the fields are owned, aren't they properly protected by left-right? And if so, can't this stay as is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can, but we're not using it, so I removed that Display impl.

debug!("Vrf with Id {vrfid} no longer has a VNI associated");
debug!("Vrf with Id {vrfid} no longer has a vni {vni} associated");
} else {
debug!("Vrf {vrfid} has no vni configured")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this co-pilot comment.

assert_eq!(
fibtr.enter().unwrap().len(),
1,
"only defaylt fib should remain"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address this comment too.

This commit contains 3 changes:

1) Rename FibId -> FibKey and clarify usage. The old FibId is
   used in two ways: one, to identify a Fib; another, to look up
   fibs in the Fibtable. Rename it to FibKey since, while still
   used as an Id, the main purpose is to act as a key.
   Also, change some internal APIs to make sure that Fibs are
   identified by a single FibKey (based on the vrf id).

2) Restructure FibTable: FibReaders (which wrap ReadHandles<Fib>)
   cannot be shared among threads. Instead of storing a FibReader
   for each Fib, store a FibReaderFactory from which readers
   can be obtained on a per-thread basis. In this patch, every
   time a Fib needs to be accessed, a new FibReader is created,
   which is very inefficient due to the need for locking
   when creating ReadHandles. Subsequent commits will fix this.
   We also keep the Id of a Fib along with its factory. This is
   needed to be able to efficiently implement the traits needed
   for the readhandle cache. On this, note that:
   - the fibtable could keep fibreaders as it did, since new
     can be obtained via cloning. However, keeping instead
     factories protects us from errorneously handing off a reference
     to a FibReader to more than a thread.
   - if we would store fibreaders (and clone them on demand),
     we would not need to store the Id (since it could be obtained
     entering the fib via the readhandle), but the number of
     readers would be increased.

3) Elements in the new FibTable are stored within an Arc. This is
   to ensure that the compiler automatically determines that
   FibTable is Sync.

4) Avoid needing to derive/implement Clone() for FibTable by adding
   a dummy sync_from() Absorb implementation, by publishing on
   FibTable creation.

Signed-off-by: Fredi Raspall <[email protected]>
The thread-local cache of prior commit stores ReadHandle<Fib>.
However, the rest of the code uses transparent wrapper FibReader,
with additional methods. Implement AsRef so that we can have
FibReaders with zero cost.

Signed-off-by: Fredi Raspall <[email protected]>
Add left-right-tlcache as dependency to implement the fibtable
cache.

Implement trait Identity for Fib so that the thread-local cache of
ReadHandle<Fib>s can determine if a read handle points to the right
object or the entry needs to be invalidated.

Signed-off-by: Fredi Raspall <[email protected]>
Versioning the fibtable will allow us to determine the validity
of cached readhandles without needing to ask the provider, which
would add an extra lookup (e.g. hash or similar), on every
cache lookup. This patch adds the version for the FibTable and
the logic to monotonically increment its value anytime it is
updated. Note that version is u64 instead of AtomicU64 since
we don't need to synchronize multiple threads on its value.

Signed-off-by: Fredi Raspall <[email protected]>
Implement AsRef<ReadHandleFactory<Fib>> for FibReaderFactory so
that it can be treated as a ReadHandleFactory<Fib>.

Signed-off-by: Fredi Raspall <[email protected]>
Declare thread-local ReadHandleCache `FIBTABLE_CACHE`.
Implement trait ReadHandleProvider for FibTable so that it can
be polled by the cache, per thread.

We could implement ReadHandleProvider for FibTableReader instead.
However, implementing it for the FibTable is more general in that
it would also work if the FibTable itself was not wrapped in left-
right.

Signed-off-by: Fredi Raspall <[email protected]>
... from Rc<ReadHandle<Fib>>.

The cache stores ReadHandle<T>'s for some T. In the case of the
FibTable, T=Fib and we want to hand off FibReaders (a thin wrapper
of ReadHandle<Fib>) to the threads using the cache. Add a method
to perform such a conversion.

Signed-off-by: Fredi Raspall <[email protected]>
Add the main method to lookup FibReaders from a FibKey from the
thread-local cache and a FibTableReader. The method will check
the local cache to return FibReaders owned by the calling thread.
On failure, it will pull data from provider (fibtable) accessible
from the same FibTableReader.

Signed-off-by: Fredi Raspall <[email protected]>
Let the ipforward NF look up FibReader from thread-local cache.

Signed-off-by: Fredi Raspall <[email protected]>
Extend the thread-local cache with full refresh logic and
iterator. This requires adding a new method to trait
ReadHandleProvider.

Signed-off-by: Fredi Raspall <[email protected]>
impl ReadHandleProvider get_iter() for FibTable. This is needed to
allow the thread handling the cli to safely iterate over the fibs.

Signed-off-by: Fredi Raspall <[email protected]>
The existing iterator implementation is subobtimal in that each
call to get_iter() needs to request the provider a full iterator
for the whole set of read handle factories, unless refresh arg is
set to false, and then refresh the local cache with read handles
built from each factory. This is inefficient in case the set
of objects in the provider rarely changes (which may be the case
in most setups).

Avoid pulling an iterator from the provider if the version did
not change, as this means that no objects have been added or
removed. If anything, we remove unusable readers.

Signed-off-by: Fredi Raspall <[email protected]>
Fix the code so that a cache keeps the minimal number of rhandles
after a refresh. The previous code would keep an independent
rhandle for aliases, meaning that if there are N objects each with
one allias, 2 x N rhandles would exist after a refresh.
Solve this by letting aliases use the same rhandle as their
primary.

Signed-off-by: Fredi Raspall <[email protected]>
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/fix_concurrency_fibtable branch from 39dcde0 to 9a9c275 Compare October 20, 2025 18:04
@mvachhar mvachhar merged commit c0accdb into pr/fredi/tl_left_right_readhandles Oct 21, 2025
19 checks passed
@mvachhar mvachhar deleted the pr/fredi/fix_concurrency_fibtable branch October 21, 2025 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix fibtable concurrency

3 participants