Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope name parsing doesn't work. #23387

Closed
bin-shi-mulesoft opened this issue Jun 14, 2023 · 9 comments · Fixed by #32033
Closed

Scope name parsing doesn't work. #23387

bin-shi-mulesoft opened this issue Jun 14, 2023 · 9 comments · Fixed by #32033
Assignees
Labels
bug Something isn't working never stale Issues marked with this label will be never staled and automatically removed pkg/stanza

Comments

@bin-shi-mulesoft
Copy link

Component(s)

pkg/stanza

What happened?

Description

The Scope Name Parsing doesn't add the scope name to the ScopeLog.

Steps to Reproduce

  1. Prepare the example input log file. echo "com.example.Foo - some message" > /tmp/example.log
  2. Download the OTel Collector (contrib) v0.79.0, e.g. otelcol-contrib_0.79.0_darwin_amd64
  3. Start the Collector with the provided config file as below otelcol-contrib --config otel.yaml

Actual Result

Note: The output is one line, and I format it in json here for pretty looking.

{
    "resourceLogs": [
        {
            "resource": {},
            "scopeLogs": [
                {
                    "scope": {},
                    "logRecords": [
                        {
                            "observedTimeUnixNano": "1686785112950071000",
                            "body": {
                                "stringValue": "com.example.Foo - some message"
                            },
                            "attributes": [
                                {
                                    "key": "scope_name_field",
                                    "value": {
                                        "stringValue": "com.example.Foo"
                                    }
                                },
                                {
                                    "key": "message",
                                    "value": {
                                        "stringValue": "some message"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        }
                    ]
                }
            ]
        }
    ]
}

Expected Result

The name of scope is set to com.example.Foo.

{
    "resourceLogs": [
        {
            "resource": {},
            "scopeLogs": [
                {
                    "scope": {
                        "name": "com.example.Foo"
                    },
                    "logRecords": [
                        {
                            "observedTimeUnixNano": "1686785320986954000",
                            "body": {
                                "stringValue": "com.example.Foo - some message"
                            },
                            "attributes": [
                                {
                                    "key": "scope_name_field",
                                    "value": {
                                        "stringValue": "com.example.Foo"
                                    }
                                },
                                {
                                    "key": "message",
                                    "value": {
                                        "stringValue": "some message"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        }
                    ]
                }
            ]
        }
    ]
}

Collector version

v0.79.0

Environment information

Environment

OS: macOS Version 13.4

OpenTelemetry Collector configuration

receivers:
  filelog:
    include:
      - /tmp/example.log
    include_file_name: false
    start_at: beginning
    operators:
      - type: regex_parser
        regex: '^(?P<scope_name_field>\S*)\s-\s(?P<message>.*)'
        scope_name:
          parse_from: attributes.scope_name_field

processors:

exporters:
  file:
    path: /tmp/scope-name-parsing.jsonl

service:
  telemetry:
    logs:
      level: debug
  pipelines:
    logs:
      receivers: [filelog]
      processors: []
      exporters: [file]

Log output

2023-06-14T16:25:12.745-0700    info    service/telemetry.go:104        Setting up own telemetry...
2023-06-14T16:25:12.745-0700    info    service/telemetry.go:127        Serving Prometheus metrics      {"address": ":8888", "level": "Basic"}
2023-06-14T16:25:12.745-0700    debug   exporter@v0.79.0/exporter.go:273        Alpha component. May change in the future.      {"kind": "exporter", "data_type": "logs", "name": "file"}
2023-06-14T16:25:12.746-0700    debug   receiver@v0.79.0/receiver.go:294        Beta component. May change in the future.       {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:12.747-0700    info    service/service.go:131  Starting otelcol-contrib...     {"Version": "0.79.0", "NumCPU": 16}
2023-06-14T16:25:12.747-0700    info    extensions/extensions.go:30     Starting extensions...
2023-06-14T16:25:12.747-0700    info    adapter/receiver.go:45  Starting stanza receiver        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:12.747-0700    debug   pipeline/directed.go:59 Starting operator       {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:12.747-0700    debug   pipeline/directed.go:63 Started operator        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:12.747-0700    debug   pipeline/directed.go:59 Starting operator       {"kind": "receiver", "name": "filelog", "data_type": "logs", "operator_id": "regex_parser", "operator_type": "regex_parser"}
2023-06-14T16:25:12.747-0700    debug   pipeline/directed.go:63 Started operator        {"kind": "receiver", "name": "filelog", "data_type": "logs", "operator_id": "regex_parser", "operator_type": "regex_parser"}
2023-06-14T16:25:12.748-0700    debug   adapter/converter.go:109        Starting log converter  {"kind": "receiver", "name": "filelog", "data_type": "logs", "worker_count": 4}
2023-06-14T16:25:12.748-0700    info    service/service.go:148  Everything is ready. Begin running and processing data.
2023-06-14T16:25:12.948-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:12.948-0700    info    fileconsumer/file.go:192        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/tmp/example.log"}
2023-06-14T16:25:13.949-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:15.148-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:16.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:17.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:18.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:19.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:20.548-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:21.748-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:22.748-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:23.949-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:25.148-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:26.148-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:27.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:28.548-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:29.548-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:30.748-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:31.948-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:32.948-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:34.148-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:35.347-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:36.348-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:37.548-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2023-06-14T16:25:38.748-0700    debug   fileconsumer/file.go:129        Consuming files {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
^C2023-06-14T16:25:39.253-0700  info    otelcol/collector.go:227        Received signal from OS {"signal": "interrupt"}
2023-06-14T16:25:39.253-0700    info    service/service.go:157  Starting shutdown...
2023-06-14T16:25:39.253-0700    info    adapter/receiver.go:139 Stopping stanza receiver        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   pipeline/directed.go:74 Stopping operator       {"kind": "receiver", "name": "filelog", "data_type": "logs", "operator_id": "file_input", "operator_type": "file_input"}
2023-06-14T16:25:39.253-0700    debug   fileconsumer/reader.go:161      Problem closing reader  {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/tmp/example.log", "error": "close /tmp/example.log: file already closed"}
2023-06-14T16:25:39.253-0700    debug   pipeline/directed.go:78 Stopped operator        {"kind": "receiver", "name": "filelog", "data_type": "logs", "operator_id": "file_input", "operator_type": "file_input"}
2023-06-14T16:25:39.253-0700    debug   pipeline/directed.go:74 Stopping operator       {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   pipeline/directed.go:78 Stopped operator        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   adapter/receiver.go:90  Receive loop stopped    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   adapter/receiver.go:120 Converter channel got closed    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   adapter/receiver.go:120 Converter channel got closed    {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    debug   adapter/receiver.go:115 Consumer loop stopped   {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-06-14T16:25:39.253-0700    info    extensions/extensions.go:44     Stopping extensions...
2023-06-14T16:25:39.253-0700    info    service/service.go:171  Shutdown complete.

Additional context

  • If this is confirmed as a defect, I may work on the fix since I already get some clue.
  • The parse_from in the example Configuration on Scope Name Parsing page seem not right. The value should be attributes.scope_name_field rather than body.scope_name_field
@bin-shi-mulesoft bin-shi-mulesoft added bug Something isn't working needs triage New item requiring triage labels Jun 14, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

I agree this is a problem. From what I can tell, the scope_name parser itself is working correctly, but we are not considering the scope when translating to plogs. We just assume a single scope here.

@bin-shi-mulesoft, thanks for reporting this. Let me know if you want to take the issue and I'll assign it to you.

@bin-shi-mulesoft
Copy link
Author

@djaglowski Sure. I can work on this.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Aug 29, 2023
@djaglowski djaglowski added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Aug 29, 2023
@someshkoli
Copy link
Contributor

Facing the same issue, not sure if the root cause described above still valid or not (@djaglowski can ya confirm ?). If yes, @bin-shi-mulesoft should I go ahead and pick it up ?

@djaglowski
Copy link
Member

@someshkoli, my assessment of the cause is unchanged. I appreciate the help and will reassign this to you.

The biggest challenge here will be to preserve performance scope is not present. Grouping by scope will potentially introduce some overhead but we should be able to optimize for the case where scope is unused. Performance benchmarks will be helpful here. You may need to evaluate if the existing benchmarks in the package are sufficient to demonstrate the difference.

@someshkoli
Copy link
Contributor

@someshkoli, my assessment of the cause is unchanged. I appreciate the help and will reassign this to you.

The biggest challenge here will be to preserve performance scope is not present. Grouping by scope will potentially introduce some overhead but we should be able to optimize for the case where scope is unused. Performance benchmarks will be helpful here. You may need to evaluate if the existing benchmarks in the package are sufficient to demonstrate the difference.

@djaglowski have benchmark numbers with me,
On main branch: BenchmarkEmitterToConsumer: ~3.12
On my branch: BenchmarkEmitterToConsumer: ~3.26
on my branch: BenchmarkEmitterToConsumerScopeGroupping (with 2 host variation and 2 scope variation): ~4.2s

rough implementation

resourceIdx, ok := resourceHashToIdx[resourceID]
if !ok {
  resourceHashToIdx[resourceID] = pLogs.ResourceLogs().Len()
  
  rl = pLogs.ResourceLogs().AppendEmpty()
  upsertToMap(e.Resource, rl.Resource().Attributes())
  
  scopeIdxByResource[resourceID] = make(map[string]int)
  scopeIdxByResource[resourceID][scopeName] = rl.ScopeLogs().Len()
  sl = rl.ScopeLogs().AppendEmpty()
  sl.Scope().SetName(scopeName)
} else {
  rl = pLogs.ResourceLogs().At(resourceIdx)
  scopesInThisResource, _ := scopeIdxByResource[resourceID]
  scopeInResource, ok := scopesInThisResource[scopeName]
  if !ok {
    scopesInThisResource[scopeName] = rl.ScopeLogs().Len()
    sl = rl.ScopeLogs().AppendEmpty()
    sl.Scope().SetName(scopeName)
  } else {
    sl = pLogs.ResourceLogs().At(resourceIdx).ScopeLogs().At(scopeInResource)
  }
}

@djaglowski
Copy link
Member

@someshkoli, thanks that seems reasonable.

@someshkoli
Copy link
Contributor

This groups the records by records -> scopes
with resource R1, R2 and scope S1, S2 -> total max possible log combination -> 4 (L1, L2, L3, L4)
grouping
[R1 -> (S1 -> L1, S2 -> L2)]
[R2 -> (S1 -> L3, S2 -> L4)]

djaglowski added a commit that referenced this issue Apr 1, 2024
**Description:** <Describe what has changed.>
Fix:   #23387 
Adds grouping of logrecords based on scope

 - adds scope grouping along with scope name
 - adds integration test for benchmark

**Link to tracking Issue:**  #23387 

**Testing:**
1. Did some benchmark on this build, results below:
On main branch: `BenchmarkEmitterToConsumer`: ~3.12
On my branch: `BenchmarkEmitterToConsumer`: ~3.26
on my branch: `BenchmarkEmitterToConsumerScopeGroupping` (with 2 host
variation and 2 scope variation): ~4.2s
2. Added unit tests

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Dan Jaglowski <jaglows3@gmail.com>
ycombinator pushed a commit to ycombinator/opentelemetry-collector-contrib that referenced this issue Apr 9, 2024
**Description:** <Describe what has changed.>
Fix:   open-telemetry#23387 
Adds grouping of logrecords based on scope

 - adds scope grouping along with scope name
 - adds integration test for benchmark

**Link to tracking Issue:**  open-telemetry#23387 

**Testing:**
1. Did some benchmark on this build, results below:
On main branch: `BenchmarkEmitterToConsumer`: ~3.12
On my branch: `BenchmarkEmitterToConsumer`: ~3.26
on my branch: `BenchmarkEmitterToConsumerScopeGroupping` (with 2 host
variation and 2 scope variation): ~4.2s
2. Added unit tests

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Dan Jaglowski <jaglows3@gmail.com>
LokeshOpsramp pushed a commit to opsramp/opentelemetry-collector-contrib that referenced this issue May 5, 2024
**Description:** <Describe what has changed.>
Fix:   open-telemetry#23387 
Adds grouping of logrecords based on scope

 - adds scope grouping along with scope name
 - adds integration test for benchmark

**Link to tracking Issue:**  open-telemetry#23387 

**Testing:**
1. Did some benchmark on this build, results below:
On main branch: `BenchmarkEmitterToConsumer`: ~3.12
On my branch: `BenchmarkEmitterToConsumer`: ~3.26
on my branch: `BenchmarkEmitterToConsumerScopeGroupping` (with 2 host
variation and 2 scope variation): ~4.2s
2. Added unit tests

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Dan Jaglowski <jaglows3@gmail.com>
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
**Description:** <Describe what has changed.>
Fix:   open-telemetry#23387 
Adds grouping of logrecords based on scope

 - adds scope grouping along with scope name
 - adds integration test for benchmark

**Link to tracking Issue:**  open-telemetry#23387 

**Testing:**
1. Did some benchmark on this build, results below:
On main branch: `BenchmarkEmitterToConsumer`: ~3.12
On my branch: `BenchmarkEmitterToConsumer`: ~3.26
on my branch: `BenchmarkEmitterToConsumerScopeGroupping` (with 2 host
variation and 2 scope variation): ~4.2s
2. Added unit tests

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Dan Jaglowski <jaglows3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working never stale Issues marked with this label will be never staled and automatically removed pkg/stanza
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants