Skip to content

Conversation

cndoit18
Copy link
Contributor

@cndoit18 cndoit18 commented Jul 14, 2021

Signed-off-by: cndoit18 [email protected]

fixes: #5056
fixes: #5072

Description of the change:

append #5042

Motivation for the change:

Checklist

If the pull request includes user-facing changes, extra documentation is required:

@openshift-ci openshift-ci bot requested review from asmacdo and fabianvf July 14, 2021 08:48
@cndoit18
Copy link
Contributor Author

I hope to finally solve this problem, so I need to rigorously test

@cndoit18 cndoit18 marked this pull request as draft July 14, 2021 08:59
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 14, 2021
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 14, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 14, 2021
@cndoit18 cndoit18 force-pushed the fix-issue-5056 branch 5 times, most recently from 8ec4959 to 04f1630 Compare July 14, 2021 11:18
@cndoit18 cndoit18 marked this pull request as ready for review July 15, 2021 04:06
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 15, 2021
@cndoit18
Copy link
Contributor Author

can you test it, this should solve the problem. thank you @wscheele @chlam4

@wscheele
Copy link

Unfortunately still reproducible

@cndoit18
Copy link
Contributor Author

Unfortunately still reproducible

I may need more information so I can troubleshoot, can you provide a bug demo?

@wscheele
Copy link

I'm away for couple of days, when back I'll create a minimal repro case that I can share.

@cndoit18
Copy link
Contributor Author

I'm away for couple of days, when back I'll create a minimal repro case that I can share.

thank you very much

@wscheele
Copy link

upgrade-loop-operator.zip

this operator-sdk helm setup reproduces the issue.

@cndoit18
Copy link
Contributor Author

cndoit18 commented Jul 21, 2021

upgrade-loop-operator.zip

this operator-sdk helm setup reproduces the issue.

I verified this in my local area and found that I could not reproduce the problem

image

@wscheele

@cndoit18
Copy link
Contributor Author

cndoit18 commented Jul 21, 2021

image

It does not trigger an upgrade

@wscheele
Copy link

image
hmm

@cndoit18
Copy link
Contributor Author

image
hmm

#!/usr/bin/env bash
set -exuo pipefail
IFS=$'\n\t'

SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

cd "${SCRIPT_DIR}"

# Clone and build the operator sdk
if [[ ! -d operator-sdk ]]; then
  git clone [email protected]:operator-framework/operator-sdk.git
fi
cd operator-sdk

# Modify the following to check out the revision you want to test.
# Make sure we're in sync with head of the branch we're trackingt || true
git remote add cndoit18 [email protected]:cndoit18/operator-sdk.git || true
git checkout --track cndoit18/fix-issue-5056 || true
git fetch cndoit18
git reset --hard cndoit18/fix-issue-5056
cd ..

# Create the chart, clearing out old versions.
if [[ -d upgrade-loop-operator ]]; then
  rm -fr upgrade-loop-operator
fi

wget -qO - https://github.com/operator-framework/operator-sdk/files/6850929/upgrade-loop-operator.zip | bsdtar xvf -

cd upgrade-loop-operator
# Overwrite the dockerfile, in effect just updating the FROM. This allows us
# to inject our previously built base image.
rm -rf ./bin
cat <<EOF >./Dockerfile
# Build the manager binary
FROM quay.io/operator-framework/helm-operator:dev
# FROM quay.io/operator-framework/helm-operator:v1.8.0

ENV HOME=/opt/helm
COPY watches.yaml ${HOME}/watches.yaml
COPY helm-charts  ${HOME}/helm-charts
WORKDIR ${HOME}
EOF

cat <<EOF >>./config/default/kustomization.yaml
patches:
- patch: |-
    - op: add
      path: /spec/template/spec/containers/1/imagePullPolicy
      value: IfNotPresent
  target:
    kind: Deployment
EOF

# Build the operator-sdk cli and the base-image we'll need for the operator.
cd ../operator-sdk
make build
make image/helm-operator

cd ../upgrade-loop-operator
# Build the operator docker image
make docker-build

# Setup the cluster
kind delete cluster --name helmloop || true
kind create cluster --name helmloop

# Load our operator docker image
kind load docker-image --name helmloop controller:latest

# Deploy, wait for it to be ready
make install deploy
kubectl wait --namespace upgrade-loop-operator-system \
  --for=condition=ready pod \
  --timeout="60s" \
  --selector="control-plane=controller-manager"

# Install a single instance of our crd.
kubectl create namespace loop-install
kubectl apply -n loop-install -f config/samples/repro_v1alpha1_upgradelooptest.yaml

# Monitor the managers logs for looping.
kubectl -n upgrade-loop-operator-system logs deployment/upgrade-loop-operator-controller-manager -c manager -f

@cndoit18
Copy link
Contributor Author

cndoit18 commented Jul 21, 2021

image
hmm

May you use the script above to test it? @wscheele

@wscheele
Copy link

hi @cndoit18 ,

thanks, i think there was something fishy in my test setup indeed.
have cleaned all, and can confirm that it now also works for me with your branch on this repro case.
will retest my full helm chart setup.
best,
Wouter

@wscheele
Copy link

the full chart also worked after clean deploy, but initially it appeared to continue "upgrading" after i upgraded the controller.
however, i also tested upgrade path separately and that does appear to work as well. not sure why it initially appears not to work.
so far, looks very promissing.

@cndoit18
Copy link
Contributor Author

cndoit18 commented Jul 21, 2021

cc @estroz @joelanford

The parameters we pass in are the value and the chart, so when there is a change in these two parameters it can be assumed that an upgrade is needed. WDYT

Thank you

@joelanford
Copy link
Member

@cndoit18 The problem is that the code that actually executes the templates to build the release manifest can be configured in other ways. As I mentioned in #5042 (comment), there are other "inputs" such as the version of helm, the version of sprig, and any post-renderer that's being used. There are probably others I'm not aware of.

IMO, I think we should revert #4937 and any of its follow-ons. At the end of the day, it seems like there isn't a foolproof way to check whether an upgrade is required without just comparing the final release manifest. The limitation is that it is not possible to use any non-deterministic/random templating functions. This has been a well-known and well-understood limitation (see #1069, #1106, #1291, #1862, ) since the helm-operator was introduced, so I think a better solution would be to just visibly document this limitation in the operator-sdk docs. WDYT?

@cndoit18
Copy link
Contributor Author

cndoit18 commented Jul 21, 2021

@cndoit18 The problem is that the code that actually executes the templates to build the release manifest can be configured in other ways. As I mentioned in #5042 (comment), there are other "inputs" such as the version of helm, the version of sprig, and any post-renderer that's being used. There are probably others I'm not aware of.

IMO, I think we should revert #4937 and any of its follow-ons. At the end of the day, it seems like there isn't a foolproof way to check whether an upgrade is required without just comparing the final release manifest. The limitation is that it is not possible to use any non-deterministic/random templating functions. This has been a well-known and well-understood limitation (see #1069, #1106, #1291, #1862, ) since the helm-operator was introduced, so I think a better solution would be to just visibly document this limitation in the operator-sdk docs. WDYT?

Thank you.

We can check if it contains randAlphaNum and if so will issue a warning event and also reject any update operation? Or can we continue along the current lines and keep updating until the edge cases are all resolved?

installedRelease, err := install.Run(m.chart, m.values)

In the case stated by #5042 (comment), other information is already lost during installation.(So they may not be too important)

I still prefer to compare charts and values directly. WDYT?

@wscheele
Copy link

should the operator even attempt to negate the effect of non-deterministic input like randAlphaNum?
it is predictable instability of the manifest/values, and for many cases I think there are alternative options that lead to stable results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Helm based operator constantly upgrading release helm operator for a chart with some dependencies keeps upgrading every sync loop

3 participants