Skip to content

Conversation

kaovilai
Copy link
Member

@kaovilai kaovilai commented Sep 9, 2025

Why the changes were made

This PR adds Azure workload identity support for the image registry component, enabling Azure AD authentication when using workload identity federation. This is required to support image backup/restore operations in Azure environments using workload identity instead of service principal credentials.

Related to standardized STS authentication workflow for Azure.

How to test the changes made

Prerequisites

  • OpenShift cluster with OIDC enabled
  • Azure CLI installed and logged in
  • OADP operator built from this branch

Setup Azure Workload Identity

  1. Set environment variables:
export API_URL=$(oc whoami --show-server)
export CLUSTER_NAME=$(echo "$API_URL" | sed 's|https://api\.||' | sed 's|\..*||')
export CLUSTER_RESOURCE_GROUP="${CLUSTER_NAME}-rg"
export AZURE_SUBSCRIPTION_ID=$(az account show --query id -o tsv)
export AZURE_TENANT_ID=$(az account show --query tenantId -o tsv)
export IDENTITY_NAME="velero"
export APP_NAME="velero-${CLUSTER_NAME}"
export STORAGE_ACCOUNT_NAME=$(echo "velero${CLUSTER_NAME}" | tr -d '-' | tr '[:upper:]' '[:lower:]' | cut -c1-24)
export CONTAINER_NAME="velero"
  1. Follow Azure workload identity setup from oadp-azure-sts-cloud-authentication.adoc

  2. Install OADP operator with Azure workload identity:

# Get the Azure managed identity client ID from previous setup
export AZURE_CLIENT_ID=<your-managed-identity-client-id>

# Deploy OADP with Azure workload identity
make deploy-olm-stsflow-azure \
  AZURE_CLIENT_ID=${AZURE_CLIENT_ID} \
  AZURE_TENANT_ID=${AZURE_TENANT_ID} \
  AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID}
  1. Create DataProtectionApplication with Azure BSL:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: dpa
  namespace: openshift-adp
spec:
  backupLocations:
    - velero:
        provider: azure
        config:
          storageAccount: ${STORAGE_ACCOUNT_NAME}
          resourceGroup: ${CLUSTER_RESOURCE_GROUP}
        objectStorage:
          bucket: ${CONTAINER_NAME}
  1. Create test application with OpenShift Build:
# Create test namespace
oc new-project test-images

# Create a BuildConfig that produces an image in internal registry
cat <<YAML | oc apply -f -
apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
  name: test-app
  namespace: test-images
spec:
  source:
    type: Git
    git:
      uri: https://github.com/openshift/ruby-hello-world
  strategy:
    type: Source
    sourceStrategy:
      from:
        kind: ImageStreamTag
        namespace: openshift
        name: ruby:2.7-ubi8
  output:
    to:
      kind: ImageStreamTag
      name: test-app:latest
YAML

# Trigger build
oc start-build test-app -n test-images --wait

# Create deployment using the built image
oc new-app test-app -n test-images

# Verify the image is in internal registry
oc get imagestream test-app -n test-images -o yaml
  1. Verify workload identity is detected:
# Check operator logs for "Azure workload identity detected"
oc logs -n openshift-adp deployment/oadp-operator-controller-manager -c manager | grep -i "workload identity"

# Verify secret created with correct credentials_type
oc get secret -n openshift-adp oadp-<bsl-name>-azure-registry-secret -o yaml | grep credentials_type
# Should show: credentials_type: ZGVmYXVsdF9jcmVkZW50aWFscw== (base64 for "default_credentials")

# Verify Azure workload identity env vars are present in Velero
oc get deployment velero -n openshift-adp -o yaml | grep -A5 envFrom
# Should show reference to azure-workload-identity-env secret

# Verify the secret contains required env vars
oc get secret -n openshift-adp azure-workload-identity-env -o yaml
# Should contain: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_FEDERATED_TOKEN_FILE
  1. Test backup and restore with internal registry images:
# Create backup including the namespace with builds and images
velero backup create test-backup --include-namespaces test-images

# Monitor backup progress
velero backup describe test-backup --details

# Verify registry operations in logs
oc logs -n openshift-adp -l component=registry

# Delete the test namespace
oc delete project test-images

# Restore from backup
velero restore create --from-backup test-backup

# Verify restored resources
oc get all -n test-images
oc get imagestream test-app -n test-images
oc get pods -n test-images

# Verify the restored app is running with image from internal registry
oc describe pod -n test-images -l app=test-app | grep Image

Verification Points

  • AzureIsWorkloadIdentity() returns true when env vars are set
  • Registry secret contains credentials_type: default_credentials
  • Velero deployment has envFrom reference to azure-workload-identity-env secret
  • Azure workload identity environment variables are injected into Velero pod
  • No SPN environment variables in registry pod
  • BuildConfig images are backed up from internal registry
  • ImageStreams are properly restored
  • Deployed applications using internal registry images work after restore
  • Registry authenticates using workload identity (check registry pod logs)

Technical Details

Azure Workload Identity Authentication Flow

When Azure workload identity is detected (via the AzureIsWorkloadIdentity() function), the operator:

  1. Creates a secret (azure-workload-identity-env) containing:

    • AZURE_CLIENT_ID: The managed identity client ID
    • AZURE_TENANT_ID: The Azure tenant ID
    • AZURE_FEDERATED_TOKEN_FILE: Set to /var/run/secrets/openshift/serviceaccount/token
  2. Injects the secret into Velero (see internal/controller/velero.go:642-655):

    if stsflow.AzureIsWorkloadIdentity() {
        // Use envFrom to reference the secret containing Azure workload identity env vars
        veleroContainer.EnvFrom = append(veleroContainer.EnvFrom, corev1.EnvFromSource{
            SecretRef: &corev1.SecretEnvSource{
                LocalObjectReference: corev1.LocalObjectReference{
                    Name: stsflow.AzureWorkloadIdentitySecretName,
                },
            },
        })
    }
  3. Azure SDK authentication: The Azure SDK's DefaultAzureCredential automatically:

    • Detects these environment variables in the Velero container
    • Reads the federated token from the path specified in AZURE_FEDERATED_TOKEN_FILE
    • Exchanges it for Azure AD tokens using the federated identity
    • Authenticates to Azure services transparently

This design ensures that both Velero and the registry components have the necessary environment variables for workload identity authentication, while the Azure SDK handles the actual token exchange and authentication flow.

Changes Summary

  • Added AzureIsWorkloadIdentity() helper function to detect Azure workload identity configuration
  • Created ReconcileAzureWorkloadIdentitySecret() to manage the workload identity secret
  • Inject Azure workload identity environment variables into Velero container via envFrom
  • Replaced deprecated SPN environment variables with new CREDENTIALS_* format matching docker-distribution expectations
  • Added support for default_credentials authentication type when workload identity is detected
  • Refactored repeated workload identity detection pattern into a reusable function
  • Removed legacy SPN environment variable constants

Dependencies

This PR requires corresponding changes in the openshift-velero-plugin repository to consume the new secret format.

🤖 Generated with Claude Code

- Add AzureIsWorkloadIdentity() helper function to check for Azure workload identity
- Replace deprecated SPN environment variables with new CREDENTIALS_* format
- Support workload identity authentication with default_credentials type
- Refactor repeated workload identity detection pattern into reusable function
- Remove legacy SPN environment variable constants

This enables Azure AD authentication for the image registry when using
workload identity, aligning with the standardized STS authentication flow.

Note: Requires corresponding changes in openshift-velero-plugin repository
to consume the new secret format with CREDENTIALS_* environment variables.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 9, 2025
Copy link

openshift-ci bot commented Sep 9, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Sep 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 9, 2025
@kaovilai
Copy link
Member Author

Additional Testing: Service Principal Credentials (Backward Compatibility)

To ensure backward compatibility with existing Service Principal authentication, please also test with traditional Azure credentials:

Test with Service Principal

  1. Create Service Principal credentials:
# Create service principal
az ad sp create-for-rbac \
  --name "velero-sp-${CLUSTER_NAME}" \
  --role "Storage Blob Data Contributor" \
  --scopes "/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${CLUSTER_RESOURCE_GROUP}/providers/Microsoft.Storage/storageAccounts/${STORAGE_ACCOUNT_NAME}"

# Save the output values:
# - appId (client_id)
# - password (client_secret)
# - tenant (tenant_id)
  1. Create credentials secret:
cat <<EOF > /tmp/credentials-velero
AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID}
AZURE_TENANT_ID=<tenant-from-sp-output>
AZURE_CLIENT_ID=<appId-from-sp-output>
AZURE_CLIENT_SECRET=<password-from-sp-output>
AZURE_RESOURCE_GROUP=${CLUSTER_RESOURCE_GROUP}
AZURE_STORAGE_ACCOUNT_ID=${STORAGE_ACCOUNT_NAME}
AZURE_CLOUD_NAME=AzurePublicCloud
EOF

oc create secret generic cloud-credentials-azure \
  -n openshift-adp \
  --from-file=cloud=/tmp/credentials-velero
  1. Deploy OADP operator WITHOUT workload identity env vars:
# Deploy using standard OLM without STS flow
make deploy-olm
  1. Create DPA with Service Principal:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: dpa-sp
  namespace: openshift-adp
spec:
  backupLocations:
    - velero:
        provider: azure
        config:
          storageAccount: ${STORAGE_ACCOUNT_NAME}
          resourceGroup: ${CLUSTER_RESOURCE_GROUP}
        credential:
          name: cloud-credentials-azure
          key: cloud
        objectStorage:
          bucket: ${CONTAINER_NAME}
  1. Verify Service Principal authentication:
# Check registry secret has client_secret credentials_type
oc get secret -n openshift-adp oadp-dpa-sp-azure-registry-secret -o yaml | grep credentials_type
# Should show: credentials_type: Y2xpZW50X3NlY3JldA== (base64 for "client_secret")

# Verify NO azure-workload-identity-env secret is created
oc get secret -n openshift-adp azure-workload-identity-env 2>&1 | grep "NotFound"

# Test backup/restore with SP credentials
velero backup create sp-test-backup --include-namespaces test-images

Verification for Backward Compatibility

  • Service Principal credentials continue to work
  • Registry secret contains credentials_type: client_secret for SP
  • No workload identity secret created when env vars not present
  • Image backup/restore works with SP authentication
  • Existing DPAs with SP credentials are not affected by upgrade

@kaovilai
Copy link
Member Author

/test unit-test images

Copy link

openshift-ci bot commented Sep 10, 2025

@kaovilai: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kaovilai kaovilai changed the title OADP-XXXX: Add Azure workload identity support for image registry OADP-6675: Add Azure workload identity support for image registry Sep 10, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 10, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 10, 2025

@kaovilai: This pull request references OADP-6675 which is a valid jira issue.

In response to this:

Why the changes were made

This PR adds Azure workload identity support for the image registry component, enabling Azure AD authentication when using workload identity federation. This is required to support image backup/restore operations in Azure environments using workload identity instead of service principal credentials.

Related to standardized STS authentication workflow for Azure.

How to test the changes made

Prerequisites

  • OpenShift cluster with OIDC enabled
  • Azure CLI installed and logged in
  • OADP operator built from this branch

Setup Azure Workload Identity

  1. Set environment variables:
export API_URL=$(oc whoami --show-server)
export CLUSTER_NAME=$(echo "$API_URL" | sed 's|https://api\.||' | sed 's|\..*||')
export CLUSTER_RESOURCE_GROUP="${CLUSTER_NAME}-rg"
export AZURE_SUBSCRIPTION_ID=$(az account show --query id -o tsv)
export AZURE_TENANT_ID=$(az account show --query tenantId -o tsv)
export IDENTITY_NAME="velero"
export APP_NAME="velero-${CLUSTER_NAME}"
export STORAGE_ACCOUNT_NAME=$(echo "velero${CLUSTER_NAME}" | tr -d '-' | tr '[:upper:]' '[:lower:]' | cut -c1-24)
export CONTAINER_NAME="velero"
  1. Follow Azure workload identity setup from oadp-azure-sts-cloud-authentication.adoc

  2. Install OADP operator with Azure workload identity:

# Get the Azure managed identity client ID from previous setup
export AZURE_CLIENT_ID=<your-managed-identity-client-id>

# Deploy OADP with Azure workload identity
make deploy-olm-stsflow-azure \
 AZURE_CLIENT_ID=${AZURE_CLIENT_ID} \
 AZURE_TENANT_ID=${AZURE_TENANT_ID} \
 AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID}
  1. Create DataProtectionApplication with Azure BSL:
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
 name: dpa
 namespace: openshift-adp
spec:
 backupLocations:
   - velero:
       provider: azure
       config:
         storageAccount: ${STORAGE_ACCOUNT_NAME}
         resourceGroup: ${CLUSTER_RESOURCE_GROUP}
       objectStorage:
         bucket: ${CONTAINER_NAME}
  1. Create test application with OpenShift Build:
# Create test namespace
oc new-project test-images

# Create a BuildConfig that produces an image in internal registry
cat <<YAML | oc apply -f -
apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
 name: test-app
 namespace: test-images
spec:
 source:
   type: Git
   git:
     uri: https://github.com/openshift/ruby-hello-world
 strategy:
   type: Source
   sourceStrategy:
     from:
       kind: ImageStreamTag
       namespace: openshift
       name: ruby:2.7-ubi8
 output:
   to:
     kind: ImageStreamTag
     name: test-app:latest
YAML

# Trigger build
oc start-build test-app -n test-images --wait

# Create deployment using the built image
oc new-app test-app -n test-images

# Verify the image is in internal registry
oc get imagestream test-app -n test-images -o yaml
  1. Verify workload identity is detected:
# Check operator logs for "Azure workload identity detected"
oc logs -n openshift-adp deployment/oadp-operator-controller-manager -c manager | grep -i "workload identity"

# Verify secret created with correct credentials_type
oc get secret -n openshift-adp oadp-<bsl-name>-azure-registry-secret -o yaml | grep credentials_type
# Should show: credentials_type: ZGVmYXVsdF9jcmVkZW50aWFscw== (base64 for "default_credentials")

# Verify Azure workload identity env vars are present in Velero
oc get deployment velero -n openshift-adp -o yaml | grep -A5 envFrom
# Should show reference to azure-workload-identity-env secret

# Verify the secret contains required env vars
oc get secret -n openshift-adp azure-workload-identity-env -o yaml
# Should contain: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_FEDERATED_TOKEN_FILE
  1. Test backup and restore with internal registry images:
# Create backup including the namespace with builds and images
velero backup create test-backup --include-namespaces test-images

# Monitor backup progress
velero backup describe test-backup --details

# Verify registry operations in logs
oc logs -n openshift-adp -l component=registry

# Delete the test namespace
oc delete project test-images

# Restore from backup
velero restore create --from-backup test-backup

# Verify restored resources
oc get all -n test-images
oc get imagestream test-app -n test-images
oc get pods -n test-images

# Verify the restored app is running with image from internal registry
oc describe pod -n test-images -l app=test-app | grep Image

Verification Points

  • AzureIsWorkloadIdentity() returns true when env vars are set
  • Registry secret contains credentials_type: default_credentials
  • Velero deployment has envFrom reference to azure-workload-identity-env secret
  • Azure workload identity environment variables are injected into Velero pod
  • No SPN environment variables in registry pod
  • BuildConfig images are backed up from internal registry
  • ImageStreams are properly restored
  • Deployed applications using internal registry images work after restore
  • Registry authenticates using workload identity (check registry pod logs)

Technical Details

Azure Workload Identity Authentication Flow

When Azure workload identity is detected (via the AzureIsWorkloadIdentity() function), the operator:

  1. Creates a secret (azure-workload-identity-env) containing:
  • AZURE_CLIENT_ID: The managed identity client ID
  • AZURE_TENANT_ID: The Azure tenant ID
  • AZURE_FEDERATED_TOKEN_FILE: Set to /var/run/secrets/openshift/serviceaccount/token
  1. Injects the secret into Velero (see internal/controller/velero.go:642-655):
if stsflow.AzureIsWorkloadIdentity() {
    // Use envFrom to reference the secret containing Azure workload identity env vars
    veleroContainer.EnvFrom = append(veleroContainer.EnvFrom, corev1.EnvFromSource{
        SecretRef: &corev1.SecretEnvSource{
            LocalObjectReference: corev1.LocalObjectReference{
                Name: stsflow.AzureWorkloadIdentitySecretName,
            },
        },
    })
}
  1. Azure SDK authentication: The Azure SDK's DefaultAzureCredential automatically:
  • Detects these environment variables in the Velero container
  • Reads the federated token from the path specified in AZURE_FEDERATED_TOKEN_FILE
  • Exchanges it for Azure AD tokens using the federated identity
  • Authenticates to Azure services transparently

This design ensures that both Velero and the registry components have the necessary environment variables for workload identity authentication, while the Azure SDK handles the actual token exchange and authentication flow.

Changes Summary

  • Added AzureIsWorkloadIdentity() helper function to detect Azure workload identity configuration
  • Created ReconcileAzureWorkloadIdentitySecret() to manage the workload identity secret
  • Inject Azure workload identity environment variables into Velero container via envFrom
  • Replaced deprecated SPN environment variables with new CREDENTIALS_* format matching docker-distribution expectations
  • Added support for default_credentials authentication type when workload identity is detected
  • Refactored repeated workload identity detection pattern into a reusable function
  • Removed legacy SPN environment variable constants

Dependencies

This PR requires corresponding changes in the openshift-velero-plugin repository to consume the new secret format.

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@weshayutin
Copy link
Contributor

@kaovilai let's keep pushing and get this out of draft :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants