Skip to content

Add stochastic taxi (rainy+fickle) #1315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

foreverska
Copy link
Contributor

Description

Adds rainy transition probabilities and fickle passenger to align environment with paper.

Fixes #161

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @foreverska, thanks for the PR.

Generally the PR looks good.
Could you change the np.random to self.np_random and revert the environment version increment (new features that don't affect default behaviour shouldn't require version bumps)

and a couple of questions before approving and merging

  1. Is this backward compatible with default parameters? If it does currently, could we make it backward compatible.
  2. Is the probability of the behaviour reported in step and reset correctly?

@foreverska
Copy link
Contributor Author

@pseudo-rnd-thoughts

Addressed Comments.

Is this backward compatible with default parameters

Yes, default values for rainy and fickle are both False. This aligns functionality with pre-commit default behavior.

Is the probability of the behaviour reported in step and reset correctly?

It matches the other ToyText environments (Cliff/Frozen) with step returning the probability of the taken transition and reset always returning 1. I added lines in the unit test to guard against regression.

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the tests and documentation updates, then should be good to merge.
Thanks for making the changes

Copy link
Collaborator

@Kallinteris-Andreas Kallinteris-Andreas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me,

@foreverska
Copy link
Contributor Author

@pseudo-rnd-thoughts I think this needs one more review since there was a commit after your last one. Thanks.

@pseudo-rnd-thoughts
Copy link
Member

Hey @foreverska, sorry I'm on holiday currently. Looking over the PR again, I'm a tad worried about the is_rainy change.
Is there a way of making the code only run if is_rainy is true?

@foreverska
Copy link
Contributor Author

Hey @foreverska, sorry I'm on holiday currently. Looking over the PR again, I'm a tad worried about the is_rainy change. Is there a way of making the code only run if is_rainy is true?

Please enjoy vacation and ignore this until you're good and rested.

Is the ask to restore the original code and switch to it when it's not rainy and switch to this new code if it is? Or is there a more pointed change you'd like to see?

@pseudo-rnd-thoughts
Copy link
Member

hi @foreverska, I'm back. Looking over the PR with a week or two rest, I agree that I would prefer if the new code was "disabled" by default and wouldn't run at some, unlike the current solution.
I know that this produces arguably less elegant code, I think it will be better maintenance and for people understanding the code.
Could you make that change?

@foreverska
Copy link
Contributor Author

@pseudo-rnd-thoughts Not a problem at all. Pushed up a change that restores the old code when dry. Let me know what you think.

@@ -220,11 +212,148 @@ def __init__(self, render_mode: Optional[str] = None):
self.P[state][action].append(
(1.0, new_state, reward, terminated)
)

def __build_rainy_transitions(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To minimise changes, could this function, take row, col, pass_idx, dest_idx, action as arguments that we run the original code unless is_rainy then we call this function.
Then return the data.
It just means that we don't need to copy and paste the massive for loop and is clear what the differences between the current functions is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have done a reasonable job at making the code as reusable as possible. Please let me know if you had something else in mind.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @foreverska.

One last request is to change the function names to _{name} rather than double underscore to make the style of the rest of the project.
Then we should be good to merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger, adjusted the function names, ready for final review.

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit 69471be into Farama-Foundation:main Mar 25, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Add transitional probabilities to Taxi and Cliff Walking toy text environments
3 participants