Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python autoinstrumentation for musl libc based application containers #2264

Open
ilyamochalov opened this issue Oct 24, 2023 · 8 comments
Open
Labels
area:auto-instrumentation Issues for auto-instrumentation auto-instrumentation:python enhancement New feature or request

Comments

@ilyamochalov
Copy link

Component(s)

instrumentation

Is your feature request related to a problem? Please describe.

Python autoinstrumentation for musl libc based application containers fails with the following error:

#16 2.190 ImportError: Error relocating /autoinstrumentation/psutil/_psutil_linux.abi3.so: __sched_cpufree: symbol not found
#16 2.191 Failed to auto initialize opentelemetry
#16 2.191 Traceback (most recent call last):
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/auto_instrumentation/sitecustomize.py", line 39, in initialize
#16 2.191     _load_instrumentors(distro)
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 91, in _load_instrumentors
#16 2.191     raise exc
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
#16 2.191     distro.load_instrumentor(entry_point, skip_dep_check=True)
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
#16 2.191     instrumentor: BaseInstrumentor = entry_point.load()
#16 2.191   File "/autoinstrumentation/pkg_resources/__init__.py", line 2518, in load
#16 2.191     return self.resolve()
#16 2.191   File "/autoinstrumentation/pkg_resources/__init__.py", line 2524, in resolve
#16 2.191     module = __import__(self.module_name, fromlist=['__name__'], level=0)
#16 2.191   File "/autoinstrumentation/opentelemetry/instrumentation/system_metrics/__init__.py", line 79, in <module>
#16 2.191     import psutil
#16 2.191   File "/autoinstrumentation/psutil/__init__.py", line 102, in <module>
#16 2.191     from . import _pslinux as _psplatform
#16 2.191   File "/autoinstrumentation/psutil/_pslinux.py", line 25, in <module>
#16 2.191     from . import _psutil_linux as cext
#16 2.191 ImportError: Error relocating /autoinstrumentation/psutil/_psutil_linux.abi3.so: __sched_cpufree: symbol not found

Root cause: current autoinstrumentation build packaged for BSD libc.

Describe the solution you'd like

  1. Add an extra build stage to alpine base image at https://github.com/open-telemetry/opentelemetry-operator/blob/v0.87.0/autoinstrumentation/python/Dockerfile#L12
  2. Copy instrumentation library into final image into a separate path: https://github.com/open-telemetry/opentelemetry-operator/blob/v0.87.0/autoinstrumentation/python/Dockerfile#L22
  3. Add extra annotation instrumentation.opentelemetry.io/otel-python-auto-runtime: "linux-musl-x64""
  4. Update https://github.com/open-telemetry/opentelemetry-operator/blob/main/pkg/instrumentation/python.go to facilitate changes need to load copy and load correct dependencies

Describe alternatives you've considered

No response

Additional context

Similar change was made for .Net

@TylerHelmuth
Copy link
Member

Unlike dotnet, I believe this is a fault of the docker image we supply, not the instrumentation itself.

@open-telemetry/operator-approvers I think we need to make a concrete decision on what auto-instrumentation images we supply. For all appropriate languages, will will supply both musl and glibc based images? Or is dotnet a one-off case because of how the dotnet agent is supplied?

@ilyamochalov
Copy link
Author

ilyamochalov commented Oct 25, 2023

@TylerHelmuth thank you for checking this issue.

psutil_linux.abi3.so: __sched_cpufree: symbol not found and similar error messages indicate that psutil package (which is a dependency of Python OTel packages) was installed against a system with different C lib implementation (Glibc vs Musl). When pip installing psutil CPython compiles something against C lib. Pip dependencies compiled against Glibc won't work on Musl systems

Final autoinstrumentation images for .NET, Python, and other languages are simply one way to distribute programming language-specific auto-instr libraries. I think for languages which runtime depend on system C Lib we need to build auto-instr libraries against both Glibc and Musl libraries and bring both sets of artifacts to application. Then OTel Kubernetes operator should make a decision about what artifact needs to be injected into the app container.

@TylerHelmuth
Copy link
Member

We discussed this issue during the SIG call today. We'd like to have a clean solution that auto-detects which libs to use and handles everything for the user, but we think finding a solution like that is unlikely.

Most likely we have to implement a dotnet-like solution where the user can specify the libs they need.

@srikanthccv do you or any other Python maintainers have any advice on how to handle this?

@srikanthccv
Copy link
Member

I took a brief look at the dotnet solution. I think the same should work for Python as well. I will take some time to review the instrumentation side and see if there are any cases that require special handling.

@ilyamochalov
Copy link
Author

@srikanthccv thank you for taking a look. I will proceed with my PR proposing changes to operator and instr docker image (please review dockerfile on the PR link above)

@ilyamochalov
Copy link
Author

@open-telemetry/operator-approvers PR is ready, can someone review it please #2266?

@pmcollins
Copy link
Member

pmcollins commented Jul 2, 2024

Bumped into the psutil stacktrace issue while exploring python autoinstrumentation as defined by the files in the e2e-instrumentation/instrumentation-python directory.

Looks like the dockerfile for the default init container and for the test app (published at ghcr.io/open-telemetry/opentelemetry-operator/e2e-test-app-python:main) use binary incompatible base images -- one uses python3.11 (glibc) and the other alpine.318 (musl).

@pmcollins
Copy link
Member

Also, the collector configs defined in the instrumentation directories (e.g. tests/e2e-instrumentation/instrumentation-python/00-install-collector.yaml) don't specify a metrics receiver, but python auto-instrumentation sends metrics, so you get a 404 in the logs because of the failed metrics exports. Adding a metrics receiver to the collector pipeline solves the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:auto-instrumentation Issues for auto-instrumentation auto-instrumentation:python enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants