Skip to content

Conversation

@mauropasse
Copy link
Collaborator

@mauropasse mauropasse commented Mar 7, 2024

Fixes for intra-process actions:

  • Don't throw in a normal situation of expired IPC Action client
  • Check if data in buffers before extract
  • Fix IPC Actions QoS depth
  • Fix data race on IPC Actions callbacks
  • Correct intra-process actions is_ready()
  • Use Goal ID as key to store client callbacks and responses from server
  • Use the FNV-1a hash algorithm for Goal UUID
  • Remove RCL_RET_CLIENT_TAKE_FAILED error log - add comment instead

The following (simplified) flowchart represents what's happening when the Action Client sends a goal request go the Action Server, until the server accepts and responds to the client:

auto goal_handle_future = action_client->async_send_goal(goal_msg, send_goal_options);
auto goal_handle = goal_handle_future.get();

image

The process is almost exactly the same for:

auto result_future = action_client->async_get_result(goal_handle);
auto wrapped_result = result_future.get();

and for cancel:

    auto cancel_result_future = action_client->async_cancel_goal(goal_handle);
    auto cancel_result = cancel_result_future.get();

In the following chart I show part of action_client->async_get_result but focusing on the server logic, which sends the result only if the client has requested for it:
image

{
feedback_buffer_->add(std::move(feedback));
gc_.trigger();
is_feedback_ready_ = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these are removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the flag here was breaking the SingleThreadedExecutor, which already sets the is_*_ready_ flags on the is_ready() API.
For EventsExecutor, the flags are set on the take_data_by_entity_id() API.

@mauropasse
Copy link
Collaborator Author

mauropasse commented Mar 11, 2024

Some comments about how things work, I'll try to make it simple and not very long. (see new flowchart in comment below)

Every time the Client requests something to the Server, a callback is created: it will be called when the server sends back the response (using the response as argument).
After the callback is called, must be removed (in some cases), since they maintain in scope shared pointers.

We have 5 types of Action Client events (EventType), all of them with their ResponseCallback callback:

  • FeedbackSubscription -> Callback created on client constructor - no need to remove after call
  • StatusSubscription -> Callback created on client constructor - no need to remove after call.

For every individual goal sent we have:

  • GoalClient -> Callback created when async_send_goal - must be removed after call
  • ResultClient -> Callback created when async_get_result - must be removed after call
  • CancelClient -> Callback created when async_cancel_goal - must be removed after call

Since a client can make multiple requests, we need a storage for all the individual goal IDs, event types, their callbacks, and the unread_count.

I created a structure to hold all the info and data to process the different events, mapped with their respective "Goal ID". So we have in the map:

< Goal_ID, {EventType, ResponseCallback, unread_count} >

Besides this map, we have all the IPC ring-buffers to hold the responses from the server. They look like:

<Goal_ID, ServerResponse>

So when we extract an element (response) form the ring buffer, we get the Goal_ID and use it as key to extract the ResponseCallback and call it (and then remove it, if necessary).

@bpwilcox
Copy link

bpwilcox commented Mar 13, 2024

For a simple interaction client/server, there are usually 4 threads involved. That's an extra complication that required attention avoiding races

Could you elaborate which threads are involved and what each are doing for a client/server interaction?

@mauropasse
Copy link
Collaborator Author

Could you elaborate which threads are involved and what each are doing for a client/server interaction?

The threads involved are:

  1. Client app thread sending goal requests
  2. Client executor thread (sets the "on_ready_callback" and executes action client work)
  3. Server executor thread (puts responses into the client's ring buffers)
  4. Server thread (usually created to execute the goal, also puts responses into the client's ring buffers)

// Intra-process version of execute_goal_request_received_
// Missing: Deep comparison of functionality betwen IPC on/off
void
ipc_execute_goal_request_received(GoalRequestDataPairSharedPtr data)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this ipc function I still see calls to rcl such as rcl_action_get_zero_initialized_goal_info(). What is the goal_info? How is this relevant or necessary to call into rcl when going through ipc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the rcl_action_goal_info_t goal_info is just used to obtain the rcl_goal_handle, to then update the goal state.
The "bookkeeping" of the goal state is still performed in rcl.

auto ipm = lock_intra_process_manager();

// Convert c++ message to C message
rcl_action_cancel_request_t cancel_request = rcl_action_get_zero_initialized_cancel_request();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question to my other comment, but why do we need to get cancel_request from rcl layer while doing ipc? I know this PR is built on top of previous work but I am missing the rationale for the interaction with the rcl layer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the communication (send request / responses / etc) goes through intra-process.
But the rest of the logic still lives in rcl, that is, we still use the rcl_handle which controls the goal state, etc.

In summary, all the rcl_action_send_* have their parallel intra_process_action_send_* versions. But the rest of the code is common.

@mauropasse mauropasse merged commit 9177002 into irobot-ros:irobot/humble Apr 9, 2024
@mauropasse mauropasse deleted the mauro/fix-ipc-actions-squashed branch April 9, 2024 14:56
apojomovsky pushed a commit to apojomovsky/rclcpp that referenced this pull request Jun 20, 2024
* Fixes for intra-process Actions

* Fixes for Clang builds

* Fix deadlock

* Server to store results until client requests them

* Fix feedback/result data race

See ros2#2451

* Add missing mutex

* Check return value of intra_process_action_send

---------

Co-authored-by: Mauro Passerino <[email protected]>
apojomovsky pushed a commit to apojomovsky/rclcpp that referenced this pull request Jun 21, 2024
* Fixes for intra-process Actions

* Fixes for Clang builds

* Fix deadlock

* Server to store results until client requests them

* Fix feedback/result data race

See ros2#2451

* Add missing mutex

* Check return value of intra_process_action_send

---------

Co-authored-by: Mauro Passerino <[email protected]>
apojomovsky pushed a commit to apojomovsky/rclcpp that referenced this pull request Aug 12, 2024
* Fixes for intra-process Actions

* Fixes for Clang builds

* Fix deadlock

* Server to store results until client requests them

* Fix feedback/result data race

See ros2#2451

* Add missing mutex

* Check return value of intra_process_action_send

---------

Co-authored-by: Mauro Passerino <[email protected]>
apojomovsky pushed a commit that referenced this pull request Aug 13, 2024
* Fixes for intra-process actions (#144)

* Fixes for intra-process Actions

* Fixes for Clang builds

* Fix deadlock

* Server to store results until client requests them

* Fix feedback/result data race

See ros2#2451

* Add missing mutex

* Check return value of intra_process_action_send

---------

Co-authored-by: Mauro Passerino <[email protected]>

* Fix IPC Actions data race (#147)

* Check if goal was sent through IPC before send responses
* Add intra_process_action_server_is_available API to intra-process Client


---------

Co-authored-by: Mauro Passerino <[email protected]>

* Fix data race in Actions: Part 2 (#148)

* Fix data race in Actions: Part 2

* Fix warning - copy elision

---------

Co-authored-by: Mauro Passerino <[email protected]>

* fix: Fixed race condition in action server between is_ready and take"… (ros2#2531)

* fix: Fixed race condition in action server between is_ready and take" (ros2#2495)

Some background information: is_ready, take_data and execute data
may be called from different threads in any order. The code in the old
state expected them to be called in series, without interruption.
This lead to multiple race conditions, as the state of the pimpl objects
was altered by the three functions in a non thread safe way.

Co-authored-by: William Woodall <[email protected]>
Signed-off-by: Janosch Machowinski <[email protected]>

* fix: added workaround for call to double calls to take_data

This adds a workaround for a known bug in the executor in iron.

Signed-off-by: Janosch Machowinski <[email protected]>

---------

Signed-off-by: Janosch Machowinski <[email protected]>
Co-authored-by: Janosch Machowinski <[email protected]>
Co-authored-by: William Woodall <[email protected]>

---------

Signed-off-by: Janosch Machowinski <[email protected]>
Co-authored-by: Mauro Passerino <[email protected]>
Co-authored-by: jmachowinski <[email protected]>
Co-authored-by: Janosch Machowinski <[email protected]>
Co-authored-by: William Woodall <[email protected]>
skyegalaxy added a commit that referenced this pull request Apr 3, 2025
action client / server ipc decrustification
skyegalaxy added a commit that referenced this pull request May 14, 2025
action client / server ipc decrustification
skyegalaxy added a commit that referenced this pull request May 19, 2025
action client / server ipc decrustification

whitespace / line length / uncrustify - service_intra_process

const correctness
skyegalaxy added a commit that referenced this pull request May 20, 2025
* Add logs on failed take response/request (#107)

* Ignore local endpoints (#131)

* Refs #18846: PoC ignore local endpoints: extend RCLCPP API

Signed-off-by: JLBuenoLopez-eProsima <[email protected]>

* Refs #18846: PoC ignore local endpoints: modify RCLCPP publish logic

Signed-off-by: JLBuenoLopez-eProsima <[email protected]>

---------

Signed-off-by: JLBuenoLopez-eProsima <[email protected]>
Co-authored-by: JLBuenoLopez-eProsima <[email protected]>
(cherry picked from commit 106c03a)

style

* Support intra-process communication: Clients & Services (ros2#1847)

Signed-off-by: Mauro Passerino <[email protected]>
(cherry picked from commit 58d2a04)

Remove redundant `add_subscription` cherry pick artifact

use const ref instead of ptr, add missing capacity fn

more style

* Jazzy recreation of "Add action client/server IPC support" (aeacde9)

* add override attribute to some ipc methods

Signed-off-by: Alberto Soragna <[email protected]>

* clear intra-process manager on client/server destructors (#94)

(cherry picked from commit 378223d)

* move ipc lock to appropriate position in client.hpp

(cherry picked from commit 9de603e)

* Actions: Use ipc_setting = rclcpp::IntraProcessSetting::NodeDefault (#133)

Co-authored-by: Mauro Passerino <[email protected]>

* Allow for deferred responses with ipc (#135)

* allow for deferred responses with ipc

* fix send response

* fix use member service ipc process

* add map to store CallbackInfoVariant

* add send_response to base class, set function ptr to get handle

* move service intra process outside base

* copy response into shared pointer for ipc

* add typename for service template

* remove ref signature

* try without std::ref

* try without ref wrapper

* try emplace

* make pair with variant

* some cleanup

* erase callback info if client invalid

* add post_init_setup for services

* fix extra comma

* add post init setup after lifecycle node services

* add documentation for ServiceIntraProcess template change

* use weak ptr to service in ServiceIntraProcess

* move check for valid service handle to beginning of function

* add comment for callback info map

(cherry picked from commit aa95a48)

* check request header for intraprocess (#139)

* check request header for intraprocess

* set request header intraprocess to false in execute_service

(cherry picked from commit a617f93)

* Include namespaces in service names (#140)

* Include node namespace in IPC Action service name
* Include node namespace in IPC Client/Service service name
---------
Co-authored-by: Mauro Passerino <[email protected]>

(cherry picked from commit f5b2001)

remove whitespace in service.hpp

* Fix mutltiple client requests (#142)

* store map of unique request id to client id and callback info pair

* fix map end check

* fix undefined reference

* remove unnecessary request id erase, remove/fix unique id comment

* improve unique id comment

(cherry picked from commit b865383)

* RECREATION OF Fixes for intra-process actions (#144)

action client / server ipc decrustification

whitespace / line length / uncrustify - service_intra_process

const correctness

* add logs and minor fixes (#146)

* add logs and minor fixes

Signed-off-by: Alberto Soragna <[email protected]>

* use >0 rather than ==1 in comparison

Signed-off-by: Alberto Soragna <[email protected]>

---------

Signed-off-by: Alberto Soragna <[email protected]>

* correct template syntax

Signed-off-by: Alberto Soragna <[email protected]>
(cherry picked from commit d4dd4e4)

* avoid adding notify waitable twice to events-executor collection (ros2#2564)

* avoid adding notify waitable twice to events-executor entities collection

Signed-off-by: Alberto Soragna <[email protected]>

* remove redundant mutex lock

Signed-off-by: Alberto Soragna <[email protected]>

---------

Signed-off-by: Alberto Soragna <[email protected]>
(cherry picked from commit f27bdbf)

* Fix bug in timers lifecycle for events executor (ros2#2586)

* Remove expired timers before updating the collection

Signed-off-by: Alexis Pojomovsky <[email protected]>

* Add regression test for reinitialized timers bug

Signed-off-by: Alexis Pojomovsky <[email protected]>

* Add missing includes

Signed-off-by: Alexis Pojomovsky <[email protected]>

* Relocate test under the executors directory

Signed-off-by: Alexis Pojomovsky <[email protected]>

* Extend test to run with all supported executors

Signed-off-by: Alexis Pojomovsky <[email protected]>

* Adjust comment in fix to make it more generic

Signed-off-by: Alexis Pojomovsky <[email protected]>

* Apply ament clang format to test

Signed-off-by: Alexis Pojomovsky <[email protected]>

* Fix uncrustify findings

Signed-off-by: Alexis Pojomovsky <[email protected]>

---------

Signed-off-by: Alexis Pojomovsky <[email protected]>
Co-authored-by: Alexis Pojomovsky <[email protected]>
(cherry picked from commit 9ef9646)

* Bring lock_free_events_queue to rclcpp from the events_executor repo (#149)

Co-authored-by: Alexis Pojomovsky <[email protected]>
(cherry picked from commit 30050c1)

exclude lock free queue from linting

actually lint lock_free_events_queue

lock_free_events_queue copyright

whitespace / line length / uncrustify - lock_free_events_queue

* Add test_actions (#150)

Co-authored-by: Alexis Pojomovsky <[email protected]>
(cherry picked from commit aef928d)

* Jazzy - Action IPC Fixes (See 7a51f00)

whitespace / line length / uncrustify - rclcpp_action client

whitespace / line length / uncrustify - rclcpp_action server

fix duplicate member definitions in rclcpp_action server.hpp

fixes for IPC action server

delete log

whitespace / line length / uncrustify - rclcpp action server again

minor action client fixes

whitespace / line length / uncrustify - rclcpp action client again

* Always publish inter-process on TRANSIENT_LOCAL pubs (#152)

* Mauro/irobot iron fixes (#155)

whitespace / line length / uncrustify - publisher

proper inter_process_publish_needed

whitespace / line length / uncrustify - rclcpp publisher

* Add test_actions & test_transient_local (#157)

Co-authored-by: Mauro Passerino <[email protected]>
(cherry picked from commit cf182e0)

whitespace / line length / uncrustify - test_actions and test_transient_local

whitespace / line length / uncrustify - test actions and transient_local again

remove redundant test_actions

* Call service post_init_setup in test_service.cpp

---------

Signed-off-by: Alberto Soragna <[email protected]>
Signed-off-by: Alexis Pojomovsky <[email protected]>
Co-authored-by: Mauro Passerino <[email protected]>
Co-authored-by: mauropasse <[email protected]>
Co-authored-by: Alberto Soragna <[email protected]>
Co-authored-by: Jeffery Hsu <[email protected]>
Co-authored-by: bpwilcox <[email protected]>
Co-authored-by: Alexis Pojomovsky <[email protected]>
Co-authored-by: Alexis Pojomovsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants