Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issues related to prometheus relabel configs when target allocator is enabled #1712

Merged
merged 29 commits into from
Jun 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
e88ddf8
Code fix for issues #1623, #1622, #958
anunarapureddy May 5, 2023
290c477
updated the PR with an alternative logic to fi the relabel config issue
anunarapureddy May 10, 2023
0ea7de3
Merge branch 'main' into main
anunarapureddy May 10, 2023
b975941
fixed codelint issues
anunarapureddy May 11, 2023
30be792
reverted the logic to update scrape_configs
anunarapureddy May 11, 2023
f753292
fixed unit test case failures
anunarapureddy May 11, 2023
559c3aa
fixed code lint issues
anunarapureddy May 11, 2023
0c72aad
changes to replace_config to handle unamrshalling issue when taralloc…
anunarapureddy May 16, 2023
3f20cae
Merge branch 'main' into main
anunarapureddy May 16, 2023
c0dda37
fixed unit test case failure
anunarapureddy May 16, 2023
f618c3d
updated e2e test to cover labelkeep usecase
anunarapureddy May 17, 2023
4d5fc8c
updated unit test to escape $ signs
anunarapureddy May 17, 2023
0d0a431
Merge branch 'open-telemetry:main' into main
anunarapureddy May 24, 2023
83e644e
updated unit test to cover missing and invalig configs
anunarapureddy May 25, 2023
7998dc8
Merge branch 'main' into main
anunarapureddy May 25, 2023
21313ce
resolved merge conflicts
anunarapureddy May 25, 2023
f3089b6
updated tests, refactored logic based on the new commits
anunarapureddy May 25, 2023
0d8d50e
uncommented labeldrop e2e test
anunarapureddy May 25, 2023
7282929
Merge branch 'open-telemetry:main' into main
anunarapureddy May 25, 2023
b6cd5fe
added back target allocator attribute
anunarapureddy May 26, 2023
6461823
comment for unescaping TA config
anunarapureddy May 31, 2023
77c03d9
Merge branch 'main' into main
anunarapureddy May 31, 2023
fa82422
Merge branch 'main' into main
anunarapureddy May 31, 2023
f8369b8
updated the README document to reflect the traslations
anunarapureddy May 31, 2023
a50b2e8
updated the README document to reflect the traslations
anunarapureddy May 31, 2023
c9ccd22
updated the TA config section in the README document
anunarapureddy May 31, 2023
526c6a3
addressed the review comment on README doc
anunarapureddy May 31, 2023
37ac0b1
Merge branch 'main' into main
anunarapureddy Jun 1, 2023
436a0c7
Merge branch 'main' into main
jaronoff97 Jun 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .chloggen/fix_prometheus_relabel-configs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: bug_fix

# The name of the component, or a single word describing the area of concern, (e.g. operator, target allocator, github action)
component: pkg/collector, pkg/targetallocator

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: fix issues related to prometheus relabel configs when target allocator is enabled

# One or more tracking issues related to the change
issues: [958, 1622, 1623]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
49 changes: 37 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,13 @@ spec:
scrape_interval: 10s
static_configs:
- targets: [ '0.0.0.0:8888' ]
metric_relabel_configs:
- action: labeldrop
regex: (id|name)
replacement: $$1
- action: labelmap
regex: label_(.+)
replacement: $$1

exporters:
logging:
Expand All @@ -429,28 +436,27 @@ spec:
processors: []
exporters: [logging]
```
The usage of `$$` in the replacement keys in the example above is based on the information provided in the Prometheus receiver [README](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/README.md) documentation, which states:
`Note: Since the collector configuration supports env variable substitution $ characters in your prometheus configuration are interpreted as environment variables. If you want to use $ characters in your prometheus configuration, you must escape them using $$.`

Behind the scenes, the OpenTelemetry Operator will convert the Collector’s configuration after the reconciliation into the following:

```yaml
receivers:
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: otel-collector
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
Aneurysm9 marked this conversation as resolved.
Show resolved Hide resolved
http_sd_configs:
- follow_redirects: false
url: http://collector-with-ta-targetallocator:80/jobs/otel-collector/targets?collector_id=$POD_NAME
- url: http://collector-with-ta-targetallocator:80/jobs/otel-collector/targets?collector_id=$POD_NAME
metric_relabel_configs:
- action: labeldrop
regex: (id|name)
replacement: $$1
- action: labelmap
regex: label_(.+)
replacement: $$1

exporters:
logging:
Expand All @@ -463,7 +469,26 @@ Behind the scenes, the OpenTelemetry Operator will convert the Collector’s con
exporters: [logging]
```

Note how the Operator added a `global` section and a new `http_sd_configs` to the `otel-collector` scrape config, pointing to a Target Allocator instance it provisioned.
Note how the Operator removes any existing service discovery configurations (e.g., `static_configs`, `file_sd_configs`, etc.) from the `scrape_configs` section and adds an `http_sd_configs` configuration pointing to a Target Allocator instance it provisioned.

The OpenTelemetry Operator will also convert the Target Allocator's promethueus configuration after the reconciliation into the following:

```yaml
config:
scrape_configs:
- job_name: otel-collector
scrape_interval: 10s
static_configs:
- targets: [ '0.0.0.0:8888' ]
metric_relabel_configs:
- action: labeldrop
regex: (id|name)
replacement: $1
- action: labelmap
regex: label_(.+)
replacement: $1
```
Note that in this case, the Operator replaces "$$" with a single "$" in the replacement keys. This is because the collector supports environment variable substitution, whereas the TA (Target Allocator) does not. Therefore, to ensure compatibility, the TA configuration should only contain a single "$" symbol.

More info on the TargetAllocator can be found [here](cmd/otel-allocator/README.md).

Expand Down
57 changes: 26 additions & 31 deletions pkg/collector/reconcile/config_replace.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,9 @@
package reconcile

import (
"fmt"
"net/url"
"time"

promconfig "github.com/prometheus/prometheus/config"
"github.com/prometheus/prometheus/discovery"
"github.com/prometheus/prometheus/discovery/http"
_ "github.com/prometheus/prometheus/discovery/install" // Package install has the side-effect of registering all builtin.
"gopkg.in/yaml.v2"

Expand All @@ -47,12 +43,14 @@ type Config struct {
}

func ReplaceConfig(instance v1alpha1.OpenTelemetryCollector) (string, error) {
// Check if TargetAllocator is enabled, if not, return the original config
if !instance.Spec.TargetAllocator.Enabled {
return instance.Spec.Config, nil
}
config, getStringErr := adapters.ConfigFromString(instance.Spec.Config)
if getStringErr != nil {
return "", getStringErr

config, err := adapters.ConfigFromString(instance.Spec.Config)
if err != nil {
return "", err
}

promCfgMap, getCfgPromErr := ta.ConfigToPromConfig(instance.Spec.Config)
Expand All @@ -65,42 +63,39 @@ func ReplaceConfig(instance v1alpha1.OpenTelemetryCollector) (string, error) {
return "", validateCfgPromErr
}

// yaml marshaling/unsmarshaling is preferred because of the problems associated with the conversion of map to a struct using mapstructure
promCfg, marshalErr := yaml.Marshal(promCfgMap)
if marshalErr != nil {
return "", marshalErr
}
if featuregate.EnableTargetAllocatorRewrite.IsEnabled() {
// To avoid issues caused by Prometheus validation logic, which fails regex validation when it encounters
// $$ in the prom config, we update the YAML file directly without marshaling and unmarshalling.
updPromCfgMap, getCfgPromErr := ta.AddTAConfigToPromConfig(promCfgMap, naming.TAService(instance))
if getCfgPromErr != nil {
return "", getCfgPromErr
}

var cfg Config
if marshalErr = yaml.UnmarshalStrict(promCfg, &cfg); marshalErr != nil {
return "", fmt.Errorf("error unmarshaling YAML: %w", marshalErr)
}
// type coercion checks are handled in the AddTAConfigToPromConfig method above
config["receivers"].(map[interface{}]interface{})["prometheus"] = updPromCfgMap
anunarapureddy marked this conversation as resolved.
Show resolved Hide resolved

for i := range cfg.PromConfig.ScrapeConfigs {
escapedJob := url.QueryEscape(cfg.PromConfig.ScrapeConfigs[i].JobName)
cfg.PromConfig.ScrapeConfigs[i].ServiceDiscoveryConfigs = discovery.Configs{
&http.SDConfig{
URL: fmt.Sprintf("http://%s:80/jobs/%s/targets?collector_id=$POD_NAME", naming.TAService(instance), escapedJob),
},
out, updCfgMarshalErr := yaml.Marshal(config)
if updCfgMarshalErr != nil {
return "", updCfgMarshalErr
}

return string(out), nil
}

if featuregate.EnableTargetAllocatorRewrite.IsEnabled() {
cfg.TargetAllocConfig = &targetAllocator{
Endpoint: fmt.Sprintf("http://%s:80", naming.TAService(instance)),
Interval: 30 * time.Second,
CollectorID: "${POD_NAME}",
}
// we don't need the scrape configs here anymore with target allocator enabled
cfg.PromConfig.ScrapeConfigs = []*promconfig.ScrapeConfig{}
// To avoid issues caused by Prometheus validation logic, which fails regex validation when it encounters
// $$ in the prom config, we update the YAML file directly without marshaling and unmarshalling.
updPromCfgMap, err := ta.AddHTTPSDConfigToPromConfig(promCfgMap, naming.TAService(instance))
if err != nil {
return "", err
}

// type coercion checks are handled in the ConfigToPromConfig method above
config["receivers"].(map[interface{}]interface{})["prometheus"] = cfg
config["receivers"].(map[interface{}]interface{})["prometheus"] = updPromCfgMap

out, err := yaml.Marshal(config)
if err != nil {
return "", err
}

return string(out), nil
}
58 changes: 44 additions & 14 deletions pkg/collector/reconcile/config_replace_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
package reconcile

import (
"os"
"testing"
"time"

colfeaturegate "go.opentelemetry.io/collector/featuregate"

Expand Down Expand Up @@ -68,28 +68,28 @@ func TestPrometheusParser(t *testing.T) {
err := colfeaturegate.GlobalRegistry().Set(featuregate.EnableTargetAllocatorRewrite.ID(), true)
param.Instance.Spec.TargetAllocator.Enabled = true
assert.NoError(t, err)

// Set up the test scenario
param.Instance.Spec.TargetAllocator.Enabled = true
actualConfig, err := ReplaceConfig(param.Instance)
assert.NoError(t, err)

// prepare
var cfg Config
// Verify the expected changes in the config
promCfgMap, err := ta.ConfigToPromConfig(actualConfig)
assert.NoError(t, err)

promCfg, err := yaml.Marshal(promCfgMap)
assert.NoError(t, err)
prometheusConfig := promCfgMap["config"].(map[interface{}]interface{})

err = yaml.UnmarshalStrict(promCfg, &cfg)
assert.NoError(t, err)
assert.NotContains(t, prometheusConfig, "scrape_configs")

// test
assert.Len(t, cfg.PromConfig.ScrapeConfigs, 0)
expectedTAConfig := &targetAllocator{
Endpoint: "http://test-targetallocator:80",
Interval: 30 * time.Second,
CollectorID: "${POD_NAME}",
expectedTAConfig := map[interface{}]interface{}{
"endpoint": "http://test-targetallocator:80",
"interval": "30s",
"collector_id": "${POD_NAME}",
}
assert.Equal(t, expectedTAConfig, cfg.TargetAllocConfig)
assert.Equal(t, expectedTAConfig, promCfgMap["target_allocator"])

// Disable the feature flag
err = colfeaturegate.GlobalRegistry().Set(featuregate.EnableTargetAllocatorRewrite.ID(), false)
assert.NoError(t, err)
})
Expand Down Expand Up @@ -128,3 +128,33 @@ func TestPrometheusParser(t *testing.T) {
})

}

func TestReplaceConfig(t *testing.T) {
param, err := newParams("test/test-img", "../testdata/relabel_config_original.yaml")
assert.NoError(t, err)

t.Run("should not modify config when TargetAllocator is disabled", func(t *testing.T) {
param.Instance.Spec.TargetAllocator.Enabled = false
expectedConfigBytes, err := os.ReadFile("../testdata/relabel_config_original.yaml")
assert.NoError(t, err)
expectedConfig := string(expectedConfigBytes)

actualConfig, err := ReplaceConfig(param.Instance)
assert.NoError(t, err)

assert.Equal(t, expectedConfig, actualConfig)
})

t.Run("should rewrite scrape configs with SD config when TargetAllocator is enabled and feature flag is not set", func(t *testing.T) {
param.Instance.Spec.TargetAllocator.Enabled = true

expectedConfigBytes, err := os.ReadFile("../testdata/relabel_config_expected_with_sd_config.yaml")
assert.NoError(t, err)
expectedConfig := string(expectedConfigBytes)

actualConfig, err := ReplaceConfig(param.Instance)
assert.NoError(t, err)

assert.Equal(t, expectedConfig, actualConfig)
})
}
4 changes: 3 additions & 1 deletion pkg/collector/reconcile/configmap.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,9 @@ func desiredTAConfigMap(params Params) (corev1.ConfigMap, error) {
labels["app.kubernetes.io/version"] = "latest"
}

prometheusReceiverConfig, err := ta.ConfigToPromConfig(params.Instance.Spec.Config)
// Collector supports environment variable substitution, but the TA does not.
// TA ConfigMap should have a single "$", as it does not support env var substitution
prometheusReceiverConfig, err := ta.UnescapeDollarSignsInPromConfig(params.Instance.Spec.Config)
if err != nil {
return corev1.ConfigMap{}, err
}
Expand Down
49 changes: 10 additions & 39 deletions pkg/collector/reconcile/configmap_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,23 +94,11 @@ receivers:
grpc: null
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: otel-collector
honor_timestamps: true
- http_sd_configs:
- url: http://test-targetallocator:80/jobs/otel-collector/targets?collector_id=$POD_NAME
job_name: otel-collector
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
enable_http2: true
http_sd_configs:
- follow_redirects: false
enable_http2: false
url: http://test-targetallocator:80/jobs/otel-collector/targets?collector_id=$POD_NAME
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
service:
pipelines:
metrics:
Expand Down Expand Up @@ -145,29 +133,16 @@ processors: null
receivers:
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: serviceMonitor/test/test/0
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
enable_http2: true
http_sd_configs:
- follow_redirects: false
enable_http2: false
url: http://test-targetallocator:80/jobs/serviceMonitor%2Ftest%2Ftest%2F0/targets?collector_id=$POD_NAME
- http_sd_configs:
- url: http://test-targetallocator:80/jobs/serviceMonitor%2Ftest%2Ftest%2F0/targets?collector_id=$POD_NAME
job_name: serviceMonitor/test/test/0
target_allocator:
endpoint: http://test-targetallocator:80
interval: 30s
collector_id: ${POD_NAME}
endpoint: http://test-targetallocator:80
http_sd_config:
refresh_interval: 60s
interval: 30s
service:
pipelines:
metrics:
Expand Down Expand Up @@ -206,15 +181,11 @@ service:
processors: null
receivers:
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
config: {}
target_allocator:
collector_id: ${POD_NAME}
endpoint: http://test-targetallocator:80
interval: 30s
collector_id: ${POD_NAME}
service:
pipelines:
metrics:
Expand Down
Loading
Loading