-
Notifications
You must be signed in to change notification settings - Fork 129
Chore(cgroup): Add support for cgroup version2 in stress-chaos experiment #490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ksatchit
previously approved these changes
Mar 14, 2022
…ment Signed-off-by: uditgaurav <[email protected]>
uditgaurav
added a commit
that referenced
this pull request
Jun 13, 2022
* Chore(stress-chaos): Run CPU chaos with percentage of cpu cores (#482) * Chore(stress-chaos): Run CPU chaos with percentage of cores Signed-off-by: uditgaurav <[email protected]> * Fixeing alpine CVEs by upgrading the version (#486) * Chore(vulnerability): Remove openebs retry module and update pkgs (#488) * Chore(vulnerability): Fix some vulnerability by updaing the pkgs Signed-off-by: uditgaurav <[email protected]> * Chore(vulnerability): Remove openebs retry module and update pkgs Signed-off-by: udit <[email protected]> * Chore(cgroup): Add support for cgroup version2 in stress-chaos experiment (#490) Signed-off-by: uditgaurav <[email protected]> * Chore(snyk): Fix snyk security scan on litmus-go (#492) Signed-off-by: uditgaurav <[email protected]> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment (#491) * Chore(network-chaos): Signed-off-by: uditgaurav <[email protected]> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment Signed-off-by: uditgaurav <[email protected]> Co-authored-by: Karthik Satchitanand <[email protected]> * Chore(randomize): Randomize stress-chaos tunables (#487) * Chore(randomize): Randomize stress-chaos tunables Signed-off-by: uditgaurav <[email protected]> * Update stress-chaos.go * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill (#493) * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <[email protected]> * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <[email protected]> * (enahncement)experiment: add node label filter for pod network and stress chaos (#494) Signed-off-by: uditgaurav <[email protected]> * Fix(targetContainer): Incorrect target container passed in the helper pod for pod level experiments (#496) * Fix target container issue Signed-off-by: uditgaurav <[email protected]> * Fix target container issue Signed-off-by: uditgaurav <[email protected]> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#498) Signed-off-by: uditgaurav <[email protected]> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#499) Signed-off-by: uditgaurav <[email protected]> * Chore(warn): Remove warning Neither --kubeconfig nor --master was specified for InClusterConfig (#507) Signed-off-by: uditgaurav <[email protected]> * Chore(ssm): Update the ssm file path in the Dockerfile (#508) Signed-off-by: uditgaurav <[email protected]> * GCP Experiments Refactor, New Label Selector Experiments and IAM Integration (#495) * experiment init Signed-off-by: neelanjan00 <[email protected]> * updated experiment file Signed-off-by: neelanjan00 <[email protected]> * updated experiment lib Signed-off-by: neelanjan00 <[email protected]> * updated post chaos validation Signed-off-by: neelanjan00 <[email protected]> * updated empty slices to nil, updated experiment name in environment.go Signed-off-by: neelanjan00 <[email protected]> * removed experiment charts Signed-off-by: neelanjan00 <[email protected]> * bootstrapped gcp-vm-disk-loss-by-label artiacts Signed-off-by: neelanjan00 <[email protected]> * removed device-names input for gcp-vm-disk-loss experiment, added API calls to derive device name internally Signed-off-by: neelanjan00 <[email protected]> * removed redundant condition check in gcp-vm-disk-loss experiment pre-requisite checks Signed-off-by: neelanjan00 <[email protected]> * reformatted error messages Signed-off-by: neelanjan00 <[email protected]> * replaced the SetTargetInstances function Signed-off-by: neelanjan00 <[email protected]> * added settargetdisk function for getting target disk names using label Signed-off-by: neelanjan00 <[email protected]> * refactored Target Disk Attached VM Instance memorisation, updated vm-disk-loss and added lib logic for vm-disk-loss-by-label experiment Signed-off-by: neelanjan00 <[email protected]> * added experiment to bin and cleared default experiment name in environment.go Signed-off-by: neelanjan00 <[email protected]> * removed charts Signed-off-by: neelanjan00 <[email protected]> * updated test.yml Signed-off-by: neelanjan00 <[email protected]> * updated AutoScalingGroup to ManagedInstanceGroup; updated logic for checking InstanceStop recovery for ManagedInstanceGroup VMs; Updated log and error messages with VM names Signed-off-by: neelanjan00 <[email protected]> * removed redundant computeService code snippets Signed-off-by: neelanjan00 <[email protected]> * removed redundant computeService code snippets in gcp-disk-loss experiments Signed-off-by: neelanjan00 <[email protected]> * updated logic for deriving default gcp sa credentials for computeService Signed-off-by: neelanjan00 <[email protected]> * updated logging for IAM integration Signed-off-by: neelanjan00 <[email protected]> * refactored log and error messages and wait for start/stop instances logic Signed-off-by: neelanjan00 <[email protected]> * fixed logs, optimised control statements, added comments, corrected experiment names Signed-off-by: neelanjan00 <[email protected]> * fixed file exists check logic Signed-off-by: Neelanjan Manna <[email protected]> * updated instance and device name fetch logic for disk loss Signed-off-by: Neelanjan Manna <[email protected]> * updated logs Signed-off-by: Neelanjan Manna <[email protected]> * update(sdk): updating litmus sdk for the defaultAppHealthCheck (#513) Signed-off-by: shubhamc <[email protected]> Co-authored-by: shubhamc <[email protected]> * fix: updated release workflow (#512) Signed-off-by: Soumya Ghosh Dastidar <[email protected]> * Added Active Node Count Check using AWS APIs (#500) * Added node count check using aws apis Signed-off-by: Akash Shrivastava <[email protected]> * Added node count check using aws apis to instance terminate by tag experiment Signed-off-by: Akash Shrivastava <[email protected]> * Log improvements; Code improvement in findActiveNodeCount function; Signed-off-by: Akash Shrivastava <[email protected]> * Added log for instance status check failed in find active node count Signed-off-by: Akash Shrivastava <[email protected]> * Added check if active node count is less than provided instance ids Signed-off-by: Akash Shrivastava <[email protected]> * updated appns podlist filtering error handling (#515) Signed-off-by: Neelanjan Manna <[email protected]> Co-authored-by: Udit Gaurav <[email protected]> Co-authored-by: Vedant Shrotria <[email protected]> * return error if node not present (#516) Signed-off-by: Akash Shrivastava <[email protected]> * Chore(helper pod): Make setHelper data as tunable (#519) Signed-off-by: uditgaurav <[email protected]> Co-authored-by: Udit Gaurav <[email protected]> Co-authored-by: Raj Babu Das <[email protected]> Co-authored-by: Karthik Satchitanand <[email protected]> Co-authored-by: Shubham Chaudhary <[email protected]> Co-authored-by: shubhamc <[email protected]> Co-authored-by: Soumya Ghosh Dastidar <[email protected]> Co-authored-by: Akash Shrivastava <[email protected]> Co-authored-by: Vedant Shrotria <[email protected]>
uditgaurav
added a commit
that referenced
this pull request
Jun 14, 2022
* modified the cmdProbe for inline mode of execution to accomodate litmusd Signed-off-by: neelanjan00 <[email protected]> * go mod tidy Signed-off-by: neelanjan00 <[email protected]> * bootstrapped process-kill experiment files Signed-off-by: neelanjan00 <[email protected]> * updated types.go and environment.go Signed-off-by: neelanjan00 <[email protected]> * updated secret envs Signed-off-by: neelanjan00 <[email protected]> * updated experiment logic and added steady state validation steps Signed-off-by: neelanjan00 <[email protected]> * removed action from probe refactor function parameters Signed-off-by: neelanjan00 <[email protected]> * added serial and parallel chaos execution steps Signed-off-by: neelanjan00 <[email protected]> * added conn parameter to probe Signed-off-by: neelanjan00 <[email protected]> * added logic for closing websocket in the end of the experiment Signed-off-by: neelanjan00 <[email protected]> * added experiment to bin Signed-off-by: neelanjan00 <[email protected]> * corrected the agent endpoint Signed-off-by: neelanjan00 <[email protected]> * corrected environement.go Signed-off-by: neelanjan00 <[email protected]> * updated logs, removed close message and added parallel sequence as default Signed-off-by: neelanjan00 <[email protected]> * updated experiment charts Signed-off-by: neelanjan00 <[email protected]> * updated experiment charts Signed-off-by: neelanjan00 <[email protected]> * updated authorization header, replaced Processes struct with int slice of pids Signed-off-by: neelanjan00 <[email protected]> * restored experiment image Signed-off-by: neelanjan00 <[email protected]> * updated test.yml Signed-off-by: neelanjan00 <[email protected]> * added rbac, README, exported charts Signed-off-by: neelanjan00 <[email protected]> * added websocket connection to chaos details struct, restored probe functions params Signed-off-by: neelanjan00 <[email protected]> * removed websocket connection in chaoslib params Signed-off-by: neelanjan00 <[email protected]> * updated code function Signed-off-by: neelanjan00 <[email protected]> * updated readme Signed-off-by: neelanjan00 <[email protected]> * restructured directories, added m-agent tag Signed-off-by: neelanjan00 <[email protected]> * updated workflow branch Signed-off-by: neelanjan00 <[email protected]> * removed guest-os pkg Signed-off-by: neelanjan00 <[email protected]> * Chore(stress-chaos): Run CPU chaos with percentage of cpu cores (#482) * Chore(stress-chaos): Run CPU chaos with percentage of cores Signed-off-by: uditgaurav <[email protected]> * updated client side m-agent design; added channelised message sending Signed-off-by: neelanjan00 <[email protected]> * added liveness check for process kill Signed-off-by: neelanjan00 <[email protected]> * updated mutex lock to an RWMutex lock, locked read operations on the map Signed-off-by: neelanjan00 <[email protected]> * Fixeing alpine CVEs by upgrading the version (#486) * updated WaitForDurationAndCheckLiveness function Signed-off-by: neelanjan00 <[email protected]> * updated cpu-stress experiment and steady-state condition Signed-off-by: neelanjan00 <[email protected]> * corrected probe format Signed-off-by: neelanjan00 <[email protected]> * added functionality for multiple websocket connections Signed-off-by: neelanjan00 <[email protected]> * updated liveness check to test for all the connections and added parallel chaos injection Signed-off-by: neelanjan00 <[email protected]> * updated m-agent cmd probe for only one agent endpoint Signed-off-by: neelanjan00 <[email protected]> * updated underChaosEndpoints for abort Signed-off-by: neelanjan00 <[email protected]> * optimised make connections logic Signed-off-by: neelanjan00 <[email protected]> * removed redundant check and comments Signed-off-by: neelanjan00 <[email protected]> * updated comments for function Signed-off-by: neelanjan00 <[email protected]> * updated chaosInterval timer for fixing infinitely running chaosInterval Signed-off-by: neelanjan00 <[email protected]> * added CLOSE_CONNECTION action for closure of websocket connections Signed-off-by: neelanjan00 <[email protected]> * Chore(vulnerability): Remove openebs retry module and update pkgs (#488) * Chore(vulnerability): Fix some vulnerability by updaing the pkgs Signed-off-by: uditgaurav <[email protected]> * Chore(vulnerability): Remove openebs retry module and update pkgs Signed-off-by: udit <[email protected]> * added chaos revert logic Signed-off-by: neelanjan00 <[email protected]> * updated connection close on ERROR functionalty and return on Read error Signed-off-by: neelanjan00 <[email protected]> * added log for chaos revert Signed-off-by: neelanjan00 <[email protected]> * reverted env params Signed-off-by: neelanjan00 <[email protected]> * added abort log info, added defer close statement to message listener, added load percentage validation Signed-off-by: neelanjan00 <[email protected]> * updated probe error feedback, removed charts Signed-off-by: neelanjan00 <[email protected]> * updated mutex locks for RLock and RUnlock, updated connect agent function parameters Signed-off-by: neelanjan00 <[email protected]> * Chore(cgroup): Add support for cgroup version2 in stress-chaos experiment (#490) Signed-off-by: uditgaurav <[email protected]> * updated mutex locks Signed-off-by: neelanjan00 <[email protected]> * Chore(snyk): Fix snyk security scan on litmus-go (#492) Signed-off-by: uditgaurav <[email protected]> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment (#491) * Chore(network-chaos): Signed-off-by: uditgaurav <[email protected]> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment Signed-off-by: uditgaurav <[email protected]> Co-authored-by: Karthik Satchitanand <[email protected]> * Chore(randomize): Randomize stress-chaos tunables (#487) * Chore(randomize): Randomize stress-chaos tunables Signed-off-by: uditgaurav <[email protected]> * Update stress-chaos.go * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill (#493) * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <[email protected]> * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <[email protected]> * (enahncement)experiment: add node label filter for pod network and stress chaos (#494) Signed-off-by: uditgaurav <[email protected]> * Fix(targetContainer): Incorrect target container passed in the helper pod for pod level experiments (#496) * Fix target container issue Signed-off-by: uditgaurav <[email protected]> * Fix target container issue Signed-off-by: uditgaurav <[email protected]> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#498) Signed-off-by: uditgaurav <[email protected]> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#499) Signed-off-by: uditgaurav <[email protected]> * Chore(warn): Remove warning Neither --kubeconfig nor --master was specified for InClusterConfig (#507) Signed-off-by: uditgaurav <[email protected]> * Chore(ssm): Update the ssm file path in the Dockerfile (#508) Signed-off-by: uditgaurav <[email protected]> * GCP Experiments Refactor, New Label Selector Experiments and IAM Integration (#495) * experiment init Signed-off-by: neelanjan00 <[email protected]> * updated experiment file Signed-off-by: neelanjan00 <[email protected]> * updated experiment lib Signed-off-by: neelanjan00 <[email protected]> * updated post chaos validation Signed-off-by: neelanjan00 <[email protected]> * updated empty slices to nil, updated experiment name in environment.go Signed-off-by: neelanjan00 <[email protected]> * removed experiment charts Signed-off-by: neelanjan00 <[email protected]> * bootstrapped gcp-vm-disk-loss-by-label artiacts Signed-off-by: neelanjan00 <[email protected]> * removed device-names input for gcp-vm-disk-loss experiment, added API calls to derive device name internally Signed-off-by: neelanjan00 <[email protected]> * removed redundant condition check in gcp-vm-disk-loss experiment pre-requisite checks Signed-off-by: neelanjan00 <[email protected]> * reformatted error messages Signed-off-by: neelanjan00 <[email protected]> * replaced the SetTargetInstances function Signed-off-by: neelanjan00 <[email protected]> * added settargetdisk function for getting target disk names using label Signed-off-by: neelanjan00 <[email protected]> * refactored Target Disk Attached VM Instance memorisation, updated vm-disk-loss and added lib logic for vm-disk-loss-by-label experiment Signed-off-by: neelanjan00 <[email protected]> * added experiment to bin and cleared default experiment name in environment.go Signed-off-by: neelanjan00 <[email protected]> * removed charts Signed-off-by: neelanjan00 <[email protected]> * updated test.yml Signed-off-by: neelanjan00 <[email protected]> * updated AutoScalingGroup to ManagedInstanceGroup; updated logic for checking InstanceStop recovery for ManagedInstanceGroup VMs; Updated log and error messages with VM names Signed-off-by: neelanjan00 <[email protected]> * removed redundant computeService code snippets Signed-off-by: neelanjan00 <[email protected]> * removed redundant computeService code snippets in gcp-disk-loss experiments Signed-off-by: neelanjan00 <[email protected]> * updated logic for deriving default gcp sa credentials for computeService Signed-off-by: neelanjan00 <[email protected]> * updated logging for IAM integration Signed-off-by: neelanjan00 <[email protected]> * refactored log and error messages and wait for start/stop instances logic Signed-off-by: neelanjan00 <[email protected]> * fixed logs, optimised control statements, added comments, corrected experiment names Signed-off-by: neelanjan00 <[email protected]> * fixed file exists check logic Signed-off-by: Neelanjan Manna <[email protected]> * updated instance and device name fetch logic for disk loss Signed-off-by: Neelanjan Manna <[email protected]> * updated logs Signed-off-by: Neelanjan Manna <[email protected]> * update(sdk): updating litmus sdk for the defaultAppHealthCheck (#513) Signed-off-by: shubhamc <[email protected]> Co-authored-by: shubhamc <[email protected]> * fix: updated release workflow (#512) Signed-off-by: Soumya Ghosh Dastidar <[email protected]> * Added Active Node Count Check using AWS APIs (#500) * Added node count check using aws apis Signed-off-by: Akash Shrivastava <[email protected]> * Added node count check using aws apis to instance terminate by tag experiment Signed-off-by: Akash Shrivastava <[email protected]> * Log improvements; Code improvement in findActiveNodeCount function; Signed-off-by: Akash Shrivastava <[email protected]> * Added log for instance status check failed in find active node count Signed-off-by: Akash Shrivastava <[email protected]> * Added check if active node count is less than provided instance ids Signed-off-by: Akash Shrivastava <[email protected]> * updated appns podlist filtering error handling (#515) Signed-off-by: Neelanjan Manna <[email protected]> Co-authored-by: Udit Gaurav <[email protected]> Co-authored-by: Vedant Shrotria <[email protected]> * go mod tidy Signed-off-by: neelanjan00 <[email protected]> * return error if node not present (#516) Signed-off-by: Akash Shrivastava <[email protected]> * Chore(helper pod): Make setHelper data as tunable (#519) Signed-off-by: uditgaurav <[email protected]> * added CPUs check in prerequisites check Signed-off-by: Neelanjan Manna <[email protected]> * removed .DS_Store Signed-off-by: Neelanjan Manna <[email protected]> * removed .DS_Store Signed-off-by: Neelanjan Manna <[email protected]> * updated rbac and readme Signed-off-by: Neelanjan Manna <[email protected]> * removed .DS_Store Signed-off-by: Neelanjan Manna <[email protected]> * updated qemu github action Signed-off-by: Neelanjan Manna <[email protected]> * updated qemu action version Signed-off-by: Neelanjan Manna <[email protected]> * updated m-agent go-runner tag to 2.10.0-Beta1 Signed-off-by: Neelanjan Manna <[email protected]> * updated target names Signed-off-by: Neelanjan Manna <[email protected]> * updated machine=>Machine targets, removed .DS_Store Signed-off-by: Neelanjan Manna <[email protected]> Co-authored-by: Udit Gaurav <[email protected]> Co-authored-by: Raj Babu Das <[email protected]> Co-authored-by: Karthik Satchitanand <[email protected]> Co-authored-by: Shubham Chaudhary <[email protected]> Co-authored-by: shubhamc <[email protected]> Co-authored-by: Soumya Ghosh Dastidar <[email protected]> Co-authored-by: Akash Shrivastava <[email protected]> Co-authored-by: Vedant Shrotria <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: uditgaurav [email protected]
What this PR does / why we need it:
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
Checklist:
breaking-changes
tagrequires-upgrade
tag