Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add apply-time-mutation feature #400

Merged

Conversation

karlkfi
Copy link
Contributor

@karlkfi karlkfi commented Aug 28, 2021

New feature: apply-time-mutation

Mutation algorithm:

  1. Detect config.kubernetes.io/apply-time-mutation annotation
  2. Add dependency wait edges to wait until referenced source resource is reconciled (using existing depends-on order & wait)
  3. Before applying the annotated target resource, loop through and apply substitutions specified in the annotation. Substitute the token value in the target field of the target resource with the latest value from the source field of the source resource.

Mutation details:

  • Mutator interface mirrors Validator interface for injection, granular testing, and extensibility.
  • ApplyTimeMutator implements Mutator interface
  • ApplyTimeMutator uses JSONPath expression language for both getting and setting values.

Expression Language:

  • JSONPath chosen, primarily for the element of least surprise.
    Kubernetes is using JSONPath elsewhere, but the existing implementations do not support mutation, just retrieval.
  • Add sigs.k8s.io/cli-utils/pkg/jsonpath library with Get/Set on map[string]interface{}.
  • Used for specifying target and source fields.
  • Wraps http://github.com/spyzhov/ajson
  • Alternatives Considered

Resource Caching:

  • Add ResourceCache to avoid extra GETs
  • Cache is single-threaded and only used by the mutator.

Other included changes:

  • Move some testutil files to fix dependency cycles when using in new packages
  • Add testutil.NestedField that supports array indexing to help test ApplyTimeMutator
  • Added a few MainTest files to allow logging with klog while testing.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 28, 2021
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Aug 28, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @karlkfi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 28, 2021
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 28, 2021
@karlkfi
Copy link
Contributor Author

karlkfi commented Aug 28, 2021

/hold (still WIP)

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 28, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2021
@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 2, 2021

/check-cla

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Sep 2, 2021
@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 2, 2021

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 2, 2021

// Interface decouples apply-time-mutation
// from the concrete structs used for applying.
type Interface interface {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider a different name, like "Mutator"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint complains about stutter on exported functions and types:

EX: sort.Interface instead of sort.Sort: https://pkg.go.dev/sort#Interface

I think this used to be in Effective Go, but now that I'm looking for it, I can't find it in the latest copy.

It was in older versions tho:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like the name here since the package is already called mutator.

newValue = sourceValue
} else {
// token specified, substitute token for source field value in target field value
targetValueString, ok := targetValue.(string)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about doing some type assertions for templated destinations,?
they should probably all be strings... or just let it fall through

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impl here errors if the token is specified and the target field isn't a string.

Having token be optional allows for maps, lists, ints, floats, and bools to be set wholesale, without string substitution.

pkg/apply/mutator/apply-time-mutator.go Outdated Show resolved Hide resolved
pkg/logutil/logutil.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@ash2k
Copy link
Member

ash2k commented Sep 3, 2021

From the description:

pkg.go.dev/k8s.io/client-go/util/jsonpath (no set API)

[Set]NestedField in github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/unstructured/helpers.go#L208 (no array indexing)

I'm curios to learn more about these options. Maybe we can improve (one of) them instead of adding the yq dependency?

p.s. This feature is something I've built before in Smith. See the docs. I didn't need a jq/yq library, just used unstructured and helpers (I'm sure it's less sophisticated, but it worked ok).

Copy link
Member

@mortent mortent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction and structure here.

pkg/apply/cache/resource_caching_poller.go Outdated Show resolved Hide resolved
pkg/apply/cache/resource_caching_poller_test.go Outdated Show resolved Hide resolved
pkg/apply/mutator/apply_time_mutator.go Outdated Show resolved Hide resolved

// Interface decouples apply-time-mutation
// from the concrete structs used for applying.
type Interface interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like the name here since the package is already called mutator.

pkg/apply/task/apply_task.go Outdated Show resolved Hide resolved
@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 3, 2021

I'm curios to learn more about these options. Maybe we can improve (one of) them instead of adding the yq dependency?

p.s. This feature is something I've built before in Smith. See the docs. I didn't need a jq/yq library, just used unstructured and helpers (I'm sure it's less sophisticated, but it worked ok).

@ash2k I spent most of the week trying to fork and modify jsonpath impls to add setter capability and failed miserably. It may be possible, but it's beyond my ability to do quickly.

This apply-time-mutation is a big blocker for using infrastructure operators, like Config Connector, which need a little hand-holding to orchestrate ordering. And it should work well as a fallback for orchestrating multiple operators that haven't been designed to integrate. So I didn't want to block this on writing a new jsonpath impl.

My initial implementation here did use [Set]NestedFields from apimachinery, but I realized that while it worked for basic use cases, it wouldn't work for anything related to Pod containers, because it can't index arrays. So I needed something more powerful.

yq was my 2nd choice, because everyone loves jq, and the new yq v4 uses a very similar (tho poorly documented) dialect.

That said, I also found https://github.com/spyzhov/ajson yesterday and it does support jsonpath setters. But it's less popular and less well maintained than yq. Also, JSONPath has as many variations as implementations. On the plus side, it's syntax is meticulously tested (https://cburgmer.github.io/json-path-comparison/).

I'm looking through your field references doc and I don't see any path syntax. Do you have time to jump on a Meet/Zoom and explain what you mean?

sourceRef := sub.SourceRef

// lookup source resource from cache or cluster
sourceObj, err := atm.getObject(sourceRef)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it, but I don't see a check for object status in getObject or here, is this dependent fully on the dependsOn implementation to assert readiness prior to fetching? will that make it more challenging in the future to move to supporting mutation references to objects outside the apply set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The depends-on impl added a graph.SortObjs that handles ordering of apply stages with wait tasks in between. I extended the SortObjs impl to inject the source resources from apply-time-mutation into the graph:
https://github.com/karlkfi/cli-utils/blob/karl-apply-time-mutation/pkg/object/graph/sort.go#L33

So by the time the mutator is executed in the apply task, the waits should have already happened.

There might be some edge cases with resources not in the inventory. So I'll need to write some tests that exercise both pieces together. But I'm not sure where to put them yet in cli-utils.

@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 4, 2021

Alright, this is ready for serious review.

I've added tests for ApplyTimeMutaiton that I'm satisfied with.

I haven't added any Mutate tests to the ApplyTask, but there aren't any for Filter either. So we'll probably need to backfill those later. It might require abstracting/refactoring the ApplyTask some more.

The biggest open question is whether we're satisfied with yq as the path selector expression language.

If yq is too big of a lift, I can try experimenting with github.com/spyzhov/ajson as an alternative, but it's a one person project with no commits in the last year.

There's also an option to support JSON Path later, in addition to yq, requiring JSON Path expressions to start with $.. This would at least allow some overlap, if we decide to deprecate yq later.

@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 16, 2021

I've gone back and reverted a bunch of the minor refactors that are in other PRs or aren't critical to the functionality.

@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 17, 2021

Alright, the general consensus talking to stakeholders seems to be that everyone expects JSONPath, even if no one can really justify why JSONPath is better than yq.

So following the principal of least surprise, I've refactored this to use https://github.com/spyzhov/ajson as the expression language, because that's the only existing Go JSONPath lib that supports both retrieval and mutation.

The one significant downside of this approach is that ajson does not auto-generate intermediate arrays/maps when setting a value. So the value needs to exist in the input object before it can be mutated. I've filed a feature request for that: spyzhov/ajson#35

@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 17, 2021

One benefit of using ajson over yq is that ajson doesn't use any logging library (except in the main.go, which we aren't using). So this means I can drop the logtuil for translation.

@karlkfi karlkfi changed the title [WIP] Add apply-time-mutation feature Add apply-time-mutation feature Sep 18, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2021
@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 18, 2021

This is ready to review again.

The biggest outstanding question, IMO, is the ResourceReference syntax.

  1. Should it exactly match ObjMetadata, without the option for apiVersion?
    This might make it hard to set the value with a kpt setter, which doesn't allow for processing the user's input value.
  2. Should it use group or apiGroup? There are competing precedents in Kubernetes.

Prior Art:

Copy link
Member

@mortent mortent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'll give others a chance to review it too.

I think it is ok to use the simple cache for now, but we should eventually think about how we can combine leverage the resources we already fetch from the cluster as part of determining reconcile status. But we might want to look at ways to improve that in other ways too.

// Group is generally preferred, to avoid needing to update the version in lock
// step with the referenced resource.
// If neither is provided, the empty group is used.
type ResourceReference struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that keeping the ObjMetadata as an internal type and define a new one here makes sense. I'm a bit skeptical about supporting apiVersion here, as we are otherwise trying to avoid using the version. If I specify a version here, would we then also require that we fetch the resource from the apiserver with that exact version?

I think we have used just Group pretty consistently rather than apiGroup.

Copy link
Contributor

@seans3 seans3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good and it is ready to merge. I disagree on the two types which are basically the same thing (ObjMetadata and ResourceReference), but I will not block the PR on that. The only thing missing appears to be an end-to-end test.

As far as the version in the ResourceReference, I think it will not break anything as long as there is only one ResourceReference per mutation. But if we ever have to identify if two ResourceReference point to the same resource, then we must not use version (the Equal implements this correctly). I also think the versioned fetch may be a more fragile approach, since fetching with a version has a higher likelihood of returning an error. A version would only be required if the referenced field in the resource is dependent on the version (i.e. the field only exists in a particular version of the resource).

}

// lookup resource using group api version, if specified
sourceGvk := ref.GroupVersionKind()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the field referenced in the resource depends on the version (e.g. the field is only present for a particular version of the resource), then the version is required. But adding the version makes the field substitution more fragile, since the API Server may return an error if there is no translation for that resource for that particular version. Allowing the API Server to choose the returned version may be more robust. Let's go with this for now--I'm sure we'll run into subtle corner cases as we run it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the version makes the field substitution more fragile

Yes. This is intentional. I want to allow apiVersion without parsing, but you're right that it's more fragile. If users run into problems, they have the option to switch to using group instead.

Let's go with this for now--I'm sure we'll run into subtle corner cases as we run it.

If this becomes a problem users complain about, I'm ok with re-evaluating later and removing apiVersion, or accepting apiVersion and just discarding the version (which would be less disruptive, but also less intuitive).

pkg/object/mutation/types.go Outdated Show resolved Hide resolved

// Equal returns true if the ResourceReference sets are eqivelent, ignoring version.
// Fulfills Equal interface from github.com/google/go-cmp
func (r ResourceReference) Equal(b ResourceReference) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equal will mostly work when ignoring version. I think this will probably work for what it is being used for. The only true way to compare resources is by comparing their API Server assigned UID. For example, the following corner case will incorrectly say the two resources are not the same when they are:

Resource 1:
Group/Version: apiextensions.k8s.io/v1beta1
Kind: Deployment
Namespace: foo,
Name: bar

Resource 2 (same resource):
Group/Version: apps/v1
Kind: Deployment
Namespace: foo,
Name: bar

Most of these corner cases have been deprecated by version 1.16 and later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only two uses of Equal right now are for detecting self-references and test comparisons. In these cases, we don't really care about the api version.

But you're right that group mismatch would break the self-reference detection if the group changed between versions. This is an interesting edge case. I suspect that it would cause the resource to be treated like two separate resources, with one depending on the other, and both being added to the dependency graph. The graph doesn't look like it checks for duplicates. I'm not even really sure how we would do client-side deduplication when the group differs like this.

// Group is generally preferred, to avoid needing to update the version in lock
// step with the referenced resource.
// If neither is provided, the empty group is used.
type ResourceReference struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should stick with ObjMetadata, but I will not block on this. Both of these types should not be part of the API, and they should not be exported. The API is the annotation--not these two reference types.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2021
@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 21, 2021

Alright, I added new e2e tests. I don't have any interesting use cases that use the default K8s resources, so it's a bit of a toy case. But I wasn't sure if it would be a good idea to copy in the Config Connector CRDs.

I also made the jsonpath lib a little more useful by adding support for multi-get and multi-set, but I kept the apply-time-mutation requiring exactly one match. We can add fan-out and/or fan-in later if users ask for it.

@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 21, 2021

Do I need to swash the commits myself? Probably don't want all the revert history in there.

@seans3
Copy link
Contributor

seans3 commented Sep 21, 2021

Do I need to swash the commits myself? Probably don't want all the revert history in there.

I usually squash them myself, although I think there is a github merge option to squash them too. We do not want the merge history. Almost always we merge with one commit.

- Detect `config.kubernetes.io/apply-time-mutation` annotation
- Parse annotation string value as YAML
- Treat source resource as a dependency
- Before applying, apply specified substitutions
- Each mutation may include one or more substitution
- Substitutions may optionally replace a token in the existing
  string value, or replace the whole value (e.g. non-strings)
- Source and target fields are specified with JSONPath expressions
- Using github.com/spyzhov/ajson because it supports mutation, and
  not just retrieval
- ApplyTimeMutator uses an in-memory ResourceCache to reduce GETs
@karlkfi
Copy link
Contributor Author

karlkfi commented Sep 21, 2021

squashed

Copy link
Contributor

@seans3 seans3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: karlkfi, seans3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 21, 2021
@k8s-ci-robot k8s-ci-robot merged commit 61a4552 into kubernetes-sigs:master Sep 21, 2021
@karlkfi karlkfi deleted the karl-apply-time-mutation branch September 22, 2021 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants