Fix a bug of unintentionally using same process indices #455
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed some of batch training examples have a bug that unintentionally uses same process indices (thus same random seeds!) across env processes. This PR fixes the bug.
You can see the current and new behaviors by running the code below.
This code will outputs:
Even when same random seeds are used in env processes, actions sent by the agent are usually different due to the stochasticity of policy or eplorer, so this may not result in noticeable difference in learning results, but this is definitely a bug that needs to be fixed.