-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add core logic to support access token in postgres scaler #5589
base: main
Are you sure you want to change the base?
add core logic to support access token in postgres scaler #5589
Conversation
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
Dear reviewers, I have some doubts regarding the following topics and would appreciate assistance / guidance please: (1) Should I change the logic of how an Azure Access Token is retrieved to be able to mock it and write some specific
(2) I used regexp pattern matching and replacement to find and replace the connection string and the DB connection, is it robust?
(3) To be honest, I got inspired by both the Azure Blob and Azure pipelines scalers. The latter also uses an Access Token but with a different scope, so I am wondering if it could be a good idea to deduplicate and generalize the logic of generating an Azure Access Token to have it in one place.
|
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) Should I change the logic of how an Azure Access Token is retrieved to be able to mock it and write some specific
PodIdentityProviderAzureWorkload
tests? If yes, I am thinking about the following tests, based on what I wrote:
I don't think so because this is something quite difficult to test with unit tests, but maybe we could introduce an e2e test for this scenario, WDYT? We can spin up a postgresql database in Azure with a managed identity user (we have another repo, testing-infrastucture, where we manage the infra from)
Check that the Access Token (i.e. the password) is updated when it is expired or about to expire. This might be difficult because this part happens when performing the query, so it happens at "runtime" and it seems that the tests are not covering "runtime" behaviors, right?
I don't think that this is a "real problem". Of course, handling it is always better, but in the worst case the scaler will fail, and it will trigger the scaler regeneration (during the same loop without printing any error) and that will regenerate the token (despite as I have said, managing it well is better)
(2) I used regexp pattern matching and replacement to find and replace the connection string and the DB connection, is it robust?
I could also split the connection string into an array, replace the password entry, and then reconstruct the string, but I felt like regexp could do the same job or even better.
I don't have any preference tbh, maybe to be 100% sure that i will always work not depending on the token, we could use a placeholder for the regex and instead of updating s.metadata.connection
, using a variable scoped to the function. Using this approach, we can ensure that the regex will work (or event better, maybe not setting any password in case of pod identity)
(3) To be honest, I got inspired by both the Azure Blob and Azure pipelines scalers. The latter also uses an Access Token but with a different scope, so I am wondering if it could be a good idea to deduplicate and generalize the logic of generating an Azure Access Token to have it in one place.
If it makes sense, I would say that this should be done in another PR, so that this one remains focus on the Postgres scaler.
It's a good idea, but I think that it doesn't makes sense because we are migrating the Azure Scalers from current implementations to the new azure-sdk-for-go and that SDK uses a unified authentication approach (This PR is working on that)
@JorTurFer Thank you for your review! Regarding your answers on my interrogations: (1)
(2)
|
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
Hey @JorTurFer, I think I understand what you meant with your regexp placeholder idea (which is really nice btw) and just proposed the change to that into account. I feel like some of the code I am adding / updating can still be written in a cleaner way though, and I still miss some unit tests regarding the changes, but I would like your opinion on what I changed to know if it is better than before, please :). Thanks ! |
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
I think that the changes look nice! ❤️ |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
@Ferdinanddb any update? |
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
…nnection before recreating it Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
…atement Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
Hi @JorTurFer , sorry I was busy with other things but I found the time to change some things in the PR and test it. I tested my change within my own Azure subscription during the weekend, and it works, so I would say that the PR is ready to be reviewed :). I deployed the KEDA resources using the Helm chart + custom container images built using this branch's code, and let the resources run for more than 24 hours because the Access Token in my case would expire after 24 hours, and it works. One observation is that, during my test, I tried to use the PGBouncer feature offered by the Azure Postgres Flexible Server resource, and it is not working, I think it is somehow related to this issue. Another observation is regarding how Azure provides Access Token: if there is already an active Access Token and not yet expired, Azure will provide this Access Token until it expires. So I modified the part where we renew the Postgres connection to:
What do you think about this? TODO:
Thank you ! |
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks nice!
Could you add an e2e test for this new scenario? I think that it'll be super useful for the future.
The new infra (the postgresql) can be created via PR to this repo using terraform: https://github.com/kedacore/testing-infrastructure and the test itself can be a copycut of any of these (adding the change in the TriggerAuthentication): https://github.com/kedacore/keda/tree/main/tests/scalers/postgresql
@JorTurFer I added an e2e test in this PR which is very similar to the Postgres standalone e2e test. The logic is the following:
I did that because the test needs to run Please let me know if that makes sense! Remark:
|
/run-e2e azure_postgresql |
https://github.com/kedacore/keda/blob/main/tests/README.md#specific-test
Custom images are needed when the e2e tests uses some custom applications or something that we need to build, in this case we want to have the source code in our repo: https://github.com/kedacore/test-tools/tree/main/e2e |
@zroubalik thank you, that is good to know! The e2e test might not work because this PR has not been merged yet. |
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
I was able to run the e2e test locally after creating the Terraform resources and setting up KEDA with workload identity with my own image (thanks to the doc links inside @zroubalik 's message) ! Once this PR is merged then the e2e tests of this CICD should work. Here is the output I got when running this new e2e test locally:
|
I have a remark regarding my e2e test and I would like your opinions please: The way I grant permissions on the table to the Azure identity is subject to SQL injection if someone is able to change the value of If the value of this becomes grantPrivilegesSQL := fmt.Sprintf(`GRANT ALL ON task_instance TO \"%s\";`, azurePostgreSQLUamiName)
// This will become GRANT ALL ON task_instance TO ""; DROP DATABASE postgres ;""; The solutions I see for this:
The benefit of this last solution is that basic authentication isn't needed anymore, and imho this is a good way of tackling the problem at a high scale. But it is more complex. |
/run-e2e azure_postgresql |
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
The error from the previous Another possibility could be that the And another possibility is that the values should be referenced inside CICD pipeline definition file ...
- name: Run end to end tests
env:
TF_AZURE_SUBSCRIPTION: ${{ secrets.TF_AZURE_SUBSCRIPTION }}
TF_AZURE_RESOURCE_GROUP: ${{ secrets.TF_AZURE_RESOURCE_GROUP }}
TF_AZURE_SP_APP_ID: ${{ secrets.TF_AZURE_SP_APP_ID }}
AZURE_SP_KEY: ${{ secrets.AZURE_SP_KEY }}
TF_AZURE_SP_TENANT: ${{ secrets.TF_AZURE_SP_TENANT }}
TF_AZURE_STORAGE_CONNECTION_STRING: ${{ secrets.TF_AZURE_STORAGE_CONNECTION_STRING }}
TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID: ${{ secrets.TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID }}
# Add the following
TF_AZURE_POSTGRES_ADMIN_USERNAME: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_USERNAME }}
TF_AZURE_POSTGRES_ADMIN_PASSWORD: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_PASSWORD }}
TF_AZURE_POSTGRES_FQDN: ${{ secrets.TF_AZURE_POSTGRES_FQDN }}
TF_AZURE_POSTGRES_DB_NAME: ${{ secrets.TF_AZURE_POSTGRES_DB_NAME }}
run: make e2e-test What do you think? |
/run-e2e azure_postgresql |
I would personally prefer some simple solution, if we are talking about e2e. In general, we would like to have reproducible isolated e2e test runs. But I'd love to hear @JorTurFer's opinion here |
I see that the e2e test failed again :/ . The error is:
There are 2 things:
I thought about the following:
...
- name: Run end to end tests
env:
TF_AZURE_SUBSCRIPTION: ${{ secrets.TF_AZURE_SUBSCRIPTION }}
TF_AZURE_RESOURCE_GROUP: ${{ secrets.TF_AZURE_RESOURCE_GROUP }}
TF_AZURE_SP_APP_ID: ${{ secrets.TF_AZURE_SP_APP_ID }}
AZURE_SP_KEY: ${{ secrets.AZURE_SP_KEY }}
TF_AZURE_SP_TENANT: ${{ secrets.TF_AZURE_SP_TENANT }}
TF_AZURE_STORAGE_CONNECTION_STRING: ${{ secrets.TF_AZURE_STORAGE_CONNECTION_STRING }}
TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID: ${{ secrets.TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID }}
# Add the following
TF_AZURE_POSTGRES_ADMIN_USERNAME: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_USERNAME }}
TF_AZURE_POSTGRES_ADMIN_PASSWORD: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_PASSWORD }}
TF_AZURE_POSTGRES_FQDN: ${{ secrets.TF_AZURE_POSTGRES_FQDN }}
TF_AZURE_POSTGRES_DB_NAME: ${{ secrets.TF_AZURE_POSTGRES_DB_NAME }}
run: make e2e-test What do you think about this? |
I just did again the whole test on my personal machine and it works, what I did is the following: # At the root folder of my KEDA GitHub repo, build and publish the images
IMAGE_REGISTRY=docker.io IMAGE_REPO=ferdi7 make publish-multiarch
# In the mean time, inside my testing-infrastructure GitHub repo, I did the necessary to create the resources via Terraform
# (.i.e an AKS cluster and the Azure Postgres Flexible Server)
# Login to Azure and get AKS creds
az login
az aks get-credentials --resource-group MY-RG-NAME --name AKS-CLUSTER-NAME --overwrite-existing
# Export the following env variables to install KEDA and run the e2e test (I did not use any quotes)
export TF_AZURE_POSTGRES_ADMIN_USERNAME=USERNAME
export TF_AZURE_POSTGRES_ADMIN_PASSWORD=PASSWRD
export TF_AZURE_POSTGRES_FQDN=SERVER-NAME.postgres.database.azure.com
export TF_AZURE_POSTGRES_DB_NAME=DB_NAME
export TF_AZURE_SP_TENANT=AZURE-TENANT-ID
export TF_AZURE_IDENTITY_1_APP_ID=UAMI-IDENTITY-CLIENT-ID
export TF_AZURE_IDENTITY_1_NAME=UAMI-IDENTITY-NAME
export AZURE_RUN_WORKLOAD_IDENTITY_TESTS=true
# Deploy KEDA and run e2e test
cd tests
IMAGE_REGISTRY=docker.io IMAGE_REPO=ferdi7 go test -v -tags e2e ./utils/setup_test.go
go test -v -tags e2e ./scalers/postgresql/azure_postgresql_flex_server_aad_wi/azure_postgresql_flex_server_aad_wi_test.go
# Uninstall KEDA
go test -v -tags e2e ./utils/cleanup_test.go This works on my end, so I have 2 remarks which could explain the error: Remark 1: The environment variables provisioning is handled by Terraform as follows, for example:...
{
name = "TF_AZURE_POSTGRES_ADMIN_USERNAME"
value = module.azurerm_postgres_flexible_server.admin_username
},
{
name = "TF_AZURE_POSTGRES_ADMIN_PASSWORD"
value = module.azurerm_postgres_flexible_server.admin_password
},
... But I just realized that the ...
{
name = "TF_AZURE_POSTGRES_ADMIN_USERNAME"
value = module.azurerm_postgres_flexible_server.admin_username
},
{
name = "TF_AZURE_POSTGRES_ADMIN_PASSWORD"
value = nonsensitive(module.azurerm_postgres_flexible_server.admin_password)
},
... Remark 2: The other thing (or in addition to the previous remark) that could explain the error is what I said in a previous comment:
...
- name: Run end to end tests
env:
TF_AZURE_SUBSCRIPTION: ${{ secrets.TF_AZURE_SUBSCRIPTION }}
TF_AZURE_RESOURCE_GROUP: ${{ secrets.TF_AZURE_RESOURCE_GROUP }}
TF_AZURE_SP_APP_ID: ${{ secrets.TF_AZURE_SP_APP_ID }}
AZURE_SP_KEY: ${{ secrets.AZURE_SP_KEY }}
TF_AZURE_SP_TENANT: ${{ secrets.TF_AZURE_SP_TENANT }}
TF_AZURE_STORAGE_CONNECTION_STRING: ${{ secrets.TF_AZURE_STORAGE_CONNECTION_STRING }}
TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID: ${{ secrets.TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID }}
# Add the following
TF_AZURE_POSTGRES_ADMIN_USERNAME: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_USERNAME }}
TF_AZURE_POSTGRES_ADMIN_PASSWORD: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_PASSWORD }}
TF_AZURE_POSTGRES_FQDN: ${{ secrets.TF_AZURE_POSTGRES_FQDN }}
TF_AZURE_POSTGRES_DB_NAME: ${{ secrets.TF_AZURE_POSTGRES_DB_NAME }}
run: make e2e-test What do you think about this? |
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
@zroubalik @JorTurFer I just pushed a new commit to reference the environment variables that are used in the e2e tests in the file v1-build.yml May I ask you if you can write the comment "/run-e2e azure_postgresql" to execute the e2e tests, please? Thank you! |
...s/postgresql/azure_postgresql_flex_server_aad_wi/azure_postgresql_flex_server_aad_wi_test.go
Outdated
Show resolved
Hide resolved
/run-e2e azure_postgresql |
Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>
I guess that something isn't wrong with any string generation because there are a lot of errors like this during e2e tests: https://github.com/kedacore/keda/actions/runs/9645629374/job/26600554311#step:9:278 helper.go:***87: Waiting for successful execution of command on Pod; Output: psql: warning: extra command-line argument "54***" ignored
psql: could not translate host name "-p" to address: Name or service not known
, Error: |
@JorTurFer Thank you for your comments, I just pushed a commit to take what you wrote in account. However, I still don't understand why the e2e test fails here, but succeeds locally from my laptop. Do you have an idea by any chance? I know that one thing that might differ is that the Azure Postgres Flexible Server that should be used by the e2e test is not hosted within the same Azure region as the AKS cluster (because of some quota issues if I remember well). And when I ran the e2e tests locally, everything was in the same region. I feel like the problem is more related to some environment variables not being considered, and to be honest I am quite lost on this one! |
Let me trigger the test again and check the Env vars used |
/run-e2e azure_postgresql |
@JorTurFer @zroubalik I think I have a little clue regarding why the test is failing. At this stage of the GitHub Actions, during the But what I don't understand is that the other secrets related to the Azure Postgres (TF_AZURE_POSTGRES_ADMIN_USERNAME, TF_AZURE_POSTGRES_ADMIN_PASSWORD and TF_AZURE_POSTGRES_DB_NAME) are listed. Do you maybe know why this could be the case? Maybe it could be worth rerunning the KEDA testing-infrastructure's GitHub Actions pipeline, even though I just verified the |
Provide a description of what has been changed
This PR purpose is to add the necessary code for the Postgres scaler to support Azure Access Token authentication.
Checklist
Fixes #5823
Relates to #
TODO: