Skip to content

Commit

Permalink
fix issues related to prometheus relabel configs when target allocato…
Browse files Browse the repository at this point in the history
…r is enabled (open-telemetry#1712)

* Code fix for issues open-telemetry#1623, open-telemetry#1622, open-telemetry#958

* updated the PR with an alternative logic to fi the relabel config issue

* fixed codelint issues

* reverted the logic to update scrape_configs

* fixed unit test case failures

* fixed code lint issues

* changes to replace_config to handle unamrshalling issue when tarallocatorewrite flag is enabled

* fixed unit test case failure

* updated e2e test to cover labelkeep usecase

* updated unit test to escape $ signs

* updated unit test to cover missing and invalig configs

* resolved merge conflicts

* updated tests, refactored logic based on the new commits

* uncommented labeldrop e2e test

* added back target allocator attribute

* comment for unescaping TA config

* updated the README document to reflect the traslations

* updated the README document to reflect the traslations

* updated the TA config section in the README document

* addressed the review comment on README doc

---------

Co-authored-by: Jacob Aronoff <jaronoff97@users.noreply.github.com>
  • Loading branch information
anunarapureddy and jaronoff97 committed Jun 1, 2023
1 parent 7043fe2 commit 29c2d5e
Show file tree
Hide file tree
Showing 12 changed files with 698 additions and 144 deletions.
16 changes: 16 additions & 0 deletions .chloggen/fix_prometheus_relabel-configs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: bug_fix

# The name of the component, or a single word describing the area of concern, (e.g. operator, target allocator, github action)
component: pkg/collector, pkg/targetallocator

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: fix issues related to prometheus relabel configs when target allocator is enabled

# One or more tracking issues related to the change
issues: [958, 1622, 1623]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
49 changes: 37 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,13 @@ spec:
scrape_interval: 10s
static_configs:
- targets: [ '0.0.0.0:8888' ]
metric_relabel_configs:
- action: labeldrop
regex: (id|name)
replacement: $$1
- action: labelmap
regex: label_(.+)
replacement: $$1
exporters:
logging:
Expand All @@ -429,28 +436,27 @@ spec:
processors: []
exporters: [logging]
```
The usage of `$$` in the replacement keys in the example above is based on the information provided in the Prometheus receiver [README](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/README.md) documentation, which states:
`Note: Since the collector configuration supports env variable substitution $ characters in your prometheus configuration are interpreted as environment variables. If you want to use $ characters in your prometheus configuration, you must escape them using $$.`

Behind the scenes, the OpenTelemetry Operator will convert the Collector’s configuration after the reconciliation into the following:

```yaml
receivers:
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: otel-collector
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
http_sd_configs:
- follow_redirects: false
url: http://collector-with-ta-targetallocator:80/jobs/otel-collector/targets?collector_id=$POD_NAME
- url: http://collector-with-ta-targetallocator:80/jobs/otel-collector/targets?collector_id=$POD_NAME
metric_relabel_configs:
- action: labeldrop
regex: (id|name)
replacement: $$1
- action: labelmap
regex: label_(.+)
replacement: $$1

exporters:
logging:
Expand All @@ -463,7 +469,26 @@ Behind the scenes, the OpenTelemetry Operator will convert the Collector’s con
exporters: [logging]
```

Note how the Operator added a `global` section and a new `http_sd_configs` to the `otel-collector` scrape config, pointing to a Target Allocator instance it provisioned.
Note how the Operator removes any existing service discovery configurations (e.g., `static_configs`, `file_sd_configs`, etc.) from the `scrape_configs` section and adds an `http_sd_configs` configuration pointing to a Target Allocator instance it provisioned.

The OpenTelemetry Operator will also convert the Target Allocator's promethueus configuration after the reconciliation into the following:

```yaml
config:
scrape_configs:
- job_name: otel-collector
scrape_interval: 10s
static_configs:
- targets: [ '0.0.0.0:8888' ]
metric_relabel_configs:
- action: labeldrop
regex: (id|name)
replacement: $1
- action: labelmap
regex: label_(.+)
replacement: $1
```
Note that in this case, the Operator replaces "$$" with a single "$" in the replacement keys. This is because the collector supports environment variable substitution, whereas the TA (Target Allocator) does not. Therefore, to ensure compatibility, the TA configuration should only contain a single "$" symbol.

More info on the TargetAllocator can be found [here](cmd/otel-allocator/README.md).

Expand Down
57 changes: 26 additions & 31 deletions pkg/collector/reconcile/config_replace.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,9 @@
package reconcile

import (
"fmt"
"net/url"
"time"

promconfig "github.com/prometheus/prometheus/config"
"github.com/prometheus/prometheus/discovery"
"github.com/prometheus/prometheus/discovery/http"
_ "github.com/prometheus/prometheus/discovery/install" // Package install has the side-effect of registering all builtin.
"gopkg.in/yaml.v2"

Expand All @@ -47,12 +43,14 @@ type Config struct {
}

func ReplaceConfig(instance v1alpha1.OpenTelemetryCollector) (string, error) {
// Check if TargetAllocator is enabled, if not, return the original config
if !instance.Spec.TargetAllocator.Enabled {
return instance.Spec.Config, nil
}
config, getStringErr := adapters.ConfigFromString(instance.Spec.Config)
if getStringErr != nil {
return "", getStringErr

config, err := adapters.ConfigFromString(instance.Spec.Config)
if err != nil {
return "", err
}

promCfgMap, getCfgPromErr := ta.ConfigToPromConfig(instance.Spec.Config)
Expand All @@ -65,42 +63,39 @@ func ReplaceConfig(instance v1alpha1.OpenTelemetryCollector) (string, error) {
return "", validateCfgPromErr
}

// yaml marshaling/unsmarshaling is preferred because of the problems associated with the conversion of map to a struct using mapstructure
promCfg, marshalErr := yaml.Marshal(promCfgMap)
if marshalErr != nil {
return "", marshalErr
}
if featuregate.EnableTargetAllocatorRewrite.IsEnabled() {
// To avoid issues caused by Prometheus validation logic, which fails regex validation when it encounters
// $$ in the prom config, we update the YAML file directly without marshaling and unmarshalling.
updPromCfgMap, getCfgPromErr := ta.AddTAConfigToPromConfig(promCfgMap, naming.TAService(instance))
if getCfgPromErr != nil {
return "", getCfgPromErr
}

var cfg Config
if marshalErr = yaml.UnmarshalStrict(promCfg, &cfg); marshalErr != nil {
return "", fmt.Errorf("error unmarshaling YAML: %w", marshalErr)
}
// type coercion checks are handled in the AddTAConfigToPromConfig method above
config["receivers"].(map[interface{}]interface{})["prometheus"] = updPromCfgMap

for i := range cfg.PromConfig.ScrapeConfigs {
escapedJob := url.QueryEscape(cfg.PromConfig.ScrapeConfigs[i].JobName)
cfg.PromConfig.ScrapeConfigs[i].ServiceDiscoveryConfigs = discovery.Configs{
&http.SDConfig{
URL: fmt.Sprintf("http://%s:80/jobs/%s/targets?collector_id=$POD_NAME", naming.TAService(instance), escapedJob),
},
out, updCfgMarshalErr := yaml.Marshal(config)
if updCfgMarshalErr != nil {
return "", updCfgMarshalErr
}

return string(out), nil
}

if featuregate.EnableTargetAllocatorRewrite.IsEnabled() {
cfg.TargetAllocConfig = &targetAllocator{
Endpoint: fmt.Sprintf("http://%s:80", naming.TAService(instance)),
Interval: 30 * time.Second,
CollectorID: "${POD_NAME}",
}
// we don't need the scrape configs here anymore with target allocator enabled
cfg.PromConfig.ScrapeConfigs = []*promconfig.ScrapeConfig{}
// To avoid issues caused by Prometheus validation logic, which fails regex validation when it encounters
// $$ in the prom config, we update the YAML file directly without marshaling and unmarshalling.
updPromCfgMap, err := ta.AddHTTPSDConfigToPromConfig(promCfgMap, naming.TAService(instance))
if err != nil {
return "", err
}

// type coercion checks are handled in the ConfigToPromConfig method above
config["receivers"].(map[interface{}]interface{})["prometheus"] = cfg
config["receivers"].(map[interface{}]interface{})["prometheus"] = updPromCfgMap

out, err := yaml.Marshal(config)
if err != nil {
return "", err
}

return string(out), nil
}
58 changes: 44 additions & 14 deletions pkg/collector/reconcile/config_replace_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
package reconcile

import (
"os"
"testing"
"time"

colfeaturegate "go.opentelemetry.io/collector/featuregate"

Expand Down Expand Up @@ -68,28 +68,28 @@ func TestPrometheusParser(t *testing.T) {
err := colfeaturegate.GlobalRegistry().Set(featuregate.EnableTargetAllocatorRewrite.ID(), true)
param.Instance.Spec.TargetAllocator.Enabled = true
assert.NoError(t, err)

// Set up the test scenario
param.Instance.Spec.TargetAllocator.Enabled = true
actualConfig, err := ReplaceConfig(param.Instance)
assert.NoError(t, err)

// prepare
var cfg Config
// Verify the expected changes in the config
promCfgMap, err := ta.ConfigToPromConfig(actualConfig)
assert.NoError(t, err)

promCfg, err := yaml.Marshal(promCfgMap)
assert.NoError(t, err)
prometheusConfig := promCfgMap["config"].(map[interface{}]interface{})

err = yaml.UnmarshalStrict(promCfg, &cfg)
assert.NoError(t, err)
assert.NotContains(t, prometheusConfig, "scrape_configs")

// test
assert.Len(t, cfg.PromConfig.ScrapeConfigs, 0)
expectedTAConfig := &targetAllocator{
Endpoint: "http://test-targetallocator:80",
Interval: 30 * time.Second,
CollectorID: "${POD_NAME}",
expectedTAConfig := map[interface{}]interface{}{
"endpoint": "http://test-targetallocator:80",
"interval": "30s",
"collector_id": "${POD_NAME}",
}
assert.Equal(t, expectedTAConfig, cfg.TargetAllocConfig)
assert.Equal(t, expectedTAConfig, promCfgMap["target_allocator"])

// Disable the feature flag
err = colfeaturegate.GlobalRegistry().Set(featuregate.EnableTargetAllocatorRewrite.ID(), false)
assert.NoError(t, err)
})
Expand Down Expand Up @@ -128,3 +128,33 @@ func TestPrometheusParser(t *testing.T) {
})

}

func TestReplaceConfig(t *testing.T) {
param, err := newParams("test/test-img", "../testdata/relabel_config_original.yaml")
assert.NoError(t, err)

t.Run("should not modify config when TargetAllocator is disabled", func(t *testing.T) {
param.Instance.Spec.TargetAllocator.Enabled = false
expectedConfigBytes, err := os.ReadFile("../testdata/relabel_config_original.yaml")
assert.NoError(t, err)
expectedConfig := string(expectedConfigBytes)

actualConfig, err := ReplaceConfig(param.Instance)
assert.NoError(t, err)

assert.Equal(t, expectedConfig, actualConfig)
})

t.Run("should rewrite scrape configs with SD config when TargetAllocator is enabled and feature flag is not set", func(t *testing.T) {
param.Instance.Spec.TargetAllocator.Enabled = true

expectedConfigBytes, err := os.ReadFile("../testdata/relabel_config_expected_with_sd_config.yaml")
assert.NoError(t, err)
expectedConfig := string(expectedConfigBytes)

actualConfig, err := ReplaceConfig(param.Instance)
assert.NoError(t, err)

assert.Equal(t, expectedConfig, actualConfig)
})
}
4 changes: 3 additions & 1 deletion pkg/collector/reconcile/configmap.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,9 @@ func desiredTAConfigMap(params Params) (corev1.ConfigMap, error) {
labels["app.kubernetes.io/version"] = "latest"
}

prometheusReceiverConfig, err := ta.ConfigToPromConfig(params.Instance.Spec.Config)
// Collector supports environment variable substitution, but the TA does not.
// TA ConfigMap should have a single "$", as it does not support env var substitution
prometheusReceiverConfig, err := ta.UnescapeDollarSignsInPromConfig(params.Instance.Spec.Config)
if err != nil {
return corev1.ConfigMap{}, err
}
Expand Down
49 changes: 10 additions & 39 deletions pkg/collector/reconcile/configmap_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,23 +94,11 @@ receivers:
grpc: null
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: otel-collector
honor_timestamps: true
- http_sd_configs:
- url: http://test-targetallocator:80/jobs/otel-collector/targets?collector_id=$POD_NAME
job_name: otel-collector
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
enable_http2: true
http_sd_configs:
- follow_redirects: false
enable_http2: false
url: http://test-targetallocator:80/jobs/otel-collector/targets?collector_id=$POD_NAME
service:
pipelines:
metrics:
Expand Down Expand Up @@ -145,29 +133,16 @@ processors: null
receivers:
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: serviceMonitor/test/test/0
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
enable_http2: true
http_sd_configs:
- follow_redirects: false
enable_http2: false
url: http://test-targetallocator:80/jobs/serviceMonitor%2Ftest%2Ftest%2F0/targets?collector_id=$POD_NAME
- http_sd_configs:
- url: http://test-targetallocator:80/jobs/serviceMonitor%2Ftest%2Ftest%2F0/targets?collector_id=$POD_NAME
job_name: serviceMonitor/test/test/0
target_allocator:
endpoint: http://test-targetallocator:80
interval: 30s
collector_id: ${POD_NAME}
endpoint: http://test-targetallocator:80
http_sd_config:
refresh_interval: 60s
interval: 30s
service:
pipelines:
metrics:
Expand Down Expand Up @@ -206,15 +181,11 @@ service:
processors: null
receivers:
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
config: {}
target_allocator:
collector_id: ${POD_NAME}
endpoint: http://test-targetallocator:80
interval: 30s
collector_id: ${POD_NAME}
service:
pipelines:
metrics:
Expand Down
Loading

0 comments on commit 29c2d5e

Please sign in to comment.