-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFileOp and PythonFileOp log stdout and stderr output to stdout instead of S3 file #3227
base: main
Are you sure you want to change the base?
RFileOp and PythonFileOp log stdout and stderr output to stdout instead of S3 file #3227
Conversation
… appears in system logs, catch non zero return/exit code in exception, no logging of output to file in S3 COS Signed-off-by: Sven Thoms <21118431+shalberd@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work with the feature.
Tested this, some suggestions:
print("Processing file:", python_script) | ||
try: | ||
result = subprocess.run(run_args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, check=True) | ||
output = result.stdout.decode('utf-8') | ||
logger.info(f"Output: {output}") | ||
logger.info(f"Return code: {result.returncode}") | ||
except subprocess.CalledProcessError as e: | ||
logger.error("Output: %s", e.output.decode("utf-8")) | ||
logger.error("Return code: %s", e.returncode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing this code,
It looks like the pipeline node is green even on failure.
for example:
[I 06:29:14.545] 'demo':'nb' - dependencies processed (0.173 secs)
[I 06:29:14.545] 'demo':'nb' - executing python script using 'python3 nb.py'
Processing file: nb.py
[E 06:29:14.579] Output: Traceback (most recent call last):
File "/opt/app-root/src/nb.py", line 10, in <module>
c = a/b
ZeroDivisionError: division by zero
[E 06:29:14.579] Return code: 1
print("Processing file:", python_script) | |
try: | |
result = subprocess.run(run_args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, check=True) | |
output = result.stdout.decode('utf-8') | |
logger.info(f"Output: {output}") | |
logger.info(f"Return code: {result.returncode}") | |
except subprocess.CalledProcessError as e: | |
logger.error("Output: %s", e.output.decode("utf-8")) | |
logger.error("Return code: %s", e.returncode) | |
print("Processing file:", python_script) | |
try: | |
result = subprocess.run(run_args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, check=True) | |
output = result.stdout.decode('utf-8') | |
logger.info(f"Output: {output}") | |
logger.info(f"Return code: {result.returncode}") | |
except subprocess.CalledProcessError as e: | |
logger.error("Output: %s", e.output.decode("utf-8")) | |
logger.error("Return code: %s", e.returncode) | |
raise subprocess.CalledProcessError(e.return_code, run_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, the raise
, of course :-) thank you. Yes, I just discussed this with a coworker in Airflow pipeline node green / red status, and we were like "oh wow, the more we catch explicitely, the less the status changes to red in airflow.
I was busy with other stuff, but promised him to look into it and you just gave the the way to trickle this up correctly to Airflow / KFP. Thanks a bunch, will change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way, currently, NotebookFileOps / ipynb logs do not get written/output correctly. As I found out, they are only logged when base level log level of the system (Airflow, KFP) is DEBUG. I had to set log_output=True.
changing at https://github.com/elyra-ai/elyra/blob/main/elyra/airflow/bootstrapper.py#L229
https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L379
papermill.execute_notebook(notebook, notebook_output, kernel_name=kernel_name, log_output=True, stdout_file=sys.stdout, stderr_file=sys.stderr)
Will change to that in this PR, too.
Independently from that, will check whether raising of error happens correctly, too, there. looks like it is
https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L379
/fixes #3222
I am marking this WIP because of a pending answer regarding log output as file and upload to S3, resulting in current lint error in the checks. However, the change as it is now already does solve the issue of no logs appearing in Airflow or KFP for R and Python generic pipeline nodes.
What changes were proposed in this pull request?
Hello! I have a problem with my elyra pipelines not showing logs.
I have looked for issues on github but couldn't find anything related, and i am not sure if I am missing some needed config or if this is a bug.
When I run an elyra pipeline with two nodes, one a jupyter notebook and one a python script, only the jupyter notebook one shows the logging. I am not sure why this is the case as they are both managed the same way.
This error happens in both kfp and airflow pipeline runs. It is related to the bootstrapper.py file that executes the scripts behind generic pipeline nodes, in this case pythonFileOp and RFileOp have similar syntax, subprocess.run, in both kfp and airflow bootstrapper.py
https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L521
https://github.com/elyra-ai/elyra/blob/main/elyra/airflow/bootstrapper.py#L332
i.e.
with open(r_script_output, "w") as log_file: subprocess.run(run_args, stdout=log_file, stderr=subprocess.STDOUT, check=True)
We only see the name of the processed file in stdout and no logs in case of python or r nodes.
This change uses the subprocess.run mechanism in such as way that all stderr and stdout output is written to a variable whose content (i.e. the script run output) is then logged.
The stdout content is not logged anymore to S3 bucket storage log output file, or, better said, the file is now empty.
The question remains why Notebook, Python, and R Script run stdout is even put into a file and then uploaded to S3 storage before this change. Right now, a file with empty content is uploaded to S3.
How was this pull request tested?
Added changed bootstrapper.py file for airflow to a runtime image and used it in conjunction with Elyra and Airflow 2.8.2 to see whether the logs appear now in the container log output / in Airflow Logs tab.
Results positive and as intended, documented here
#3222 (comment)
The question as to why script run log output is written back to S3 is posed there and in Slack Elyra general channel as well.
Developer's Certificate of Origin 1.1