Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filelog receiver panics when delete_after_read is true #31383

Closed
prigio opened this issue Feb 22, 2024 · 2 comments · Fixed by #31384
Closed

filelog receiver panics when delete_after_read is true #31383

prigio opened this issue Feb 22, 2024 · 2 comments · Fixed by #31384
Labels
bug Something isn't working receiver/filelog

Comments

@prigio
Copy link

prigio commented Feb 22, 2024

Component(s)

receiver/filelog

What happened?

Description

When setting "delete_after_read: true" within the filelogreceiver, the collector panics with

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1013b6521]

Steps to Reproduce

  1. Create a folder data
  2. Start the opentelemetry contrib collector: ./otelcol --config config.yaml --feature-gates filelog.allowFileDeletion
  3. Collector logs fileconsumer/file.go:132 finding files: no files match the configured criteria
  4. Create file data/file.ndjson with following contents:
{"time":1707821387.123,"event":"This is a test: 7","host":"mbp","source":"/Users/user/somefile.log","sourcetype":"someval","index":"idx2","fields":{"orig_sourcetype":"test_messages"}}
{"time":1707821387.123,"event":"This is a test:: 8","host":"mbp","source":"/Users/user/somefile.log","sourcetype":"someval","index":"idx1"}
  1. Collector logs fileconsumer/file.go:261 Started watching file...
  2. Collector deletes the file data/file.ndjson as expected
  3. Panic
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1013c07e1]

goroutine 23 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/reader.(*Reader).ReadToEnd(0xc00041a070, {0x1017a6cb0, 0xc000518af0})
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.94.0/fileconsumer/internal/reader/reader.go:49 +0x61
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).preConsume.func1(0xc00041a070)
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.94.0/fileconsumer/file_other.go:47 +0xc7
created by github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).preConsume in goroutine 12
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.94.0/fileconsumer/file_other.go:45 +0x393

Expected Result

The monitored data file is deleted, execution continues.

Actual Result

The monitored data file is deleted, a panic due to a nil pointer dereference.

This happens within Reader.ReadToEnd() as the Reader.file pointer is nil due to the operations of Reader.delete() calling Reader.Close()

Collector version

v0.94.0

Environment information

Environment

MacOS Sonoma 14.2.1
go version go1.21.7 darwin/amd64

OpenTelemetry Collector configuration

receivers:
  filelog:
    include: [ 'data/*.ndjson' ] # Adjust path as needed
    include_file_name: false
    include_file_path: true
    max_log_size: 10Mib
    encoding: utf-8
    delete_after_read: true
    start_at: beginning
    operators:
      - type: json_parser
        parse_from: body
        timestamp:
          parse_from: attributes.time
          layout_type: epoch
          layout: s.us

exporters:
  debug:
    verbosity: detailed
    sampling_initial: 5
    sampling_thereafter: 200

service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]

Log output

2024-02-22T15:04:10.667+0100	info	service@v0.94.0/telemetry.go:59	Setting up own telemetry...
2024-02-22T15:04:10.667+0100	info	service@v0.94.0/telemetry.go:104	Serving metrics	{"address": ":8888", "level": "Basic"}
2024-02-22T15:04:10.667+0100	info	exporter@v0.94.1/exporter.go:275	Development component. May change in the future.	{"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-02-22T15:04:10.668+0100	info	service@v0.94.0/service.go:140	Starting otelcol-custom...	{"Version": "1.0.0", "NumCPU": 8}
2024-02-22T15:04:10.668+0100	info	extensions/extensions.go:34	Starting extensions...
2024-02-22T15:04:10.668+0100	info	adapter/receiver.go:45	Starting stanza receiver	{"kind": "receiver", "name": "filelog", "data_type": "logs"}
2024-02-22T15:04:10.669+0100	warn	fileconsumer/file.go:46	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:10.669+0100	info	service@v0.94.0/service.go:166	Everything is ready. Begin running and processing data.
2024-02-22T15:04:10.669+0100	warn	localhostgate/featuregate.go:63	The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default.	{"feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-02-22T15:04:10.869+0100	warn	fileconsumer/file.go:132	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:11.070+0100	warn	fileconsumer/file.go:132	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:11.270+0100	warn	fileconsumer/file.go:132	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:11.470+0100	warn	fileconsumer/file.go:132	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:11.670+0100	warn	fileconsumer/file.go:132	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:11.870+0100	warn	fileconsumer/file.go:132	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:12.070+0100	warn	fileconsumer/file.go:132	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:12.269+0100	warn	fileconsumer/file.go:132	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:12.469+0100	warn	fileconsumer/file.go:132	finding files: no files match the configured criteria	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-22T15:04:19.070+0100	info	fileconsumer/file.go:261	Started watching file	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "data/file.ndjson"}
2024-02-22T15:04:19.170+0100	info	LogsExporter	{"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 2}
2024-02-22T15:04:19.170+0100	info	ResourceLog #0
Resource SchemaURL:
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2024-02-22 14:04:19.071309 +0000 UTC
Timestamp: 2024-02-13 10:49:47.123 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str({"time":1707821387.123,"event":"THAT IS PAOLOS ia-lx-test DATA NUMBER: 7 - The data string must be larger so that a data set generates more bytes","host":"mbp-de-cric","source":"/Users/crichter/Applications/splunk/var/spool/splunk/9750b17f362275ad_4f7a27434c189849_events.stash_new","sourcetype":"ia-lx-test_test_messages","index":"ia-lx-test-ingest-actions-test-p","fields":{"orig_sourcetype":"test_messages"}})
Attributes:
     -> fields: Map({"orig_sourcetype":"test_messages"})
     -> time: Double(1707821387.123)
     -> log.file.path: Str(data/file.ndjson)
     -> event: Str(THAT IS PAOLOS ia-lx-test DATA NUMBER: 7 - The data string must be larger so that a data set generates more bytes)
     -> host: Str(mbp-de-cric)
     -> source: Str(/Users/crichter/Applications/splunk/var/spool/splunk/9750b17f362275ad_4f7a27434c189849_events.stash_new)
     -> sourcetype: Str(ia-lx-test_test_messages)
     -> index: Str(ia-lx-test-ingest-actions-test-p)
Trace ID:
Span ID:
Flags: 0
LogRecord #1
ObservedTimestamp: 2024-02-22 14:04:19.071815 +0000 UTC
Timestamp: 2024-02-13 10:49:47.123 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str({"time":1707821387.123,"event":"THAT IS PAOLOS lx9114 DATA NUMBER: 8 - The data string must be larger so that a data set generates more bytes","host":"mbp-de-cric","source":"/Users/crichter/Applications/splunk/var/spool/splunk/9750b17f362275ad_4f7a27434c189849_events.stash_new","sourcetype":"lx9114_test_messages","index":"lx9114"})
Attributes:
     -> log.file.path: Str(data/file.ndjson)
     -> source: Str(/Users/crichter/Applications/splunk/var/spool/splunk/9750b17f362275ad_4f7a27434c189849_events.stash_new)
     -> sourcetype: Str(lx9114_test_messages)
     -> index: Str(lx9114)
     -> time: Double(1707821387.123)
     -> event: Str(THAT IS PAOLOS lx9114 DATA NUMBER: 8 - The data string must be larger so that a data set generates more bytes)
     -> host: Str(mbp-de-cric)
Trace ID:
Span ID:
Flags: 0
	{"kind": "exporter", "data_type": "logs", "name": "debug"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1013c07e1]

goroutine 23 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/reader.(*Reader).ReadToEnd(0xc00041a070, {0x1017a6cb0, 0xc000518af0})
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.94.0/fileconsumer/internal/reader/reader.go:49 +0x61
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).preConsume.func1(0xc00041a070)
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.94.0/fileconsumer/file_other.go:47 +0xc7
created by github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).preConsume in goroutine 12
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.94.0/fileconsumer/file_other.go:45 +0x393

Additional context

I was able to fix the issue by patching Reader.ReadToEnd(ctx context.Context) with

// ReadToEnd will read until the end of the file
func (r *Reader) ReadToEnd(ctx context.Context) {
    if r.file == nil {
        return
    }

However, I am not well versed in the otel source code and am not sure this is the correct thing to do.

@prigio prigio added bug Something isn't working needs triage New item requiring triage labels Feb 22, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

crobert-1 commented Feb 26, 2024

Removing needs triage based on an open PR being posted by the code owner.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Feb 26, 2024
djaglowski added a commit that referenced this issue Feb 26, 2024
Fixes #31383

This enhances `TestDeleteAfterRead` in a way that replicates the
problem, and fixes the issue by retaining metadata on a reader until the
exported `Close` function is called. Previously, we were removing the
metadata when `delete` called `Close`, but this precluded the
opportunity for the caller of `ReadToEnd` to properly manage file
metadata after deletion.
XinRanZhAWS pushed a commit to XinRanZhAWS/opentelemetry-collector-contrib that referenced this issue Mar 13, 2024
Fixes open-telemetry#31383

This enhances `TestDeleteAfterRead` in a way that replicates the
problem, and fixes the issue by retaining metadata on a reader until the
exported `Close` function is called. Previously, we were removing the
metadata when `delete` called `Close`, but this precluded the
opportunity for the caller of `ReadToEnd` to properly manage file
metadata after deletion.
LokeshOpsramp pushed a commit to opsramp/opentelemetry-collector-contrib that referenced this issue May 5, 2024
Fixes open-telemetry#31383

This enhances `TestDeleteAfterRead` in a way that replicates the
problem, and fixes the issue by retaining metadata on a reader until the
exported `Close` function is called. Previously, we were removing the
metadata when `delete` called `Close`, but this precluded the
opportunity for the caller of `ReadToEnd` to properly manage file
metadata after deletion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/filelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants