Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor benchmark script for LLM benchmark integration #2897

Merged
merged 19 commits into from
Jan 29, 2024

Conversation

mreso
Copy link
Collaborator

@mreso mreso commented Jan 18, 2024

Description

This PR refactors the benchmark script for easier integration of an LLM benchmark

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • pytest test/pytest/test_benchmark.py
================================================================================================================================ test session starts =================================================================================================================================
platform linux -- Python 3.10.13, pytest-7.3.1, pluggy-1.0.0
rootdir: /home/ubuntu/serve
plugins: mock-3.12.0, cov-4.1.0
collecting ... This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

collected 1 item

test/pytest/test_benchmark.py .

================================================================================================================================= 1 passed in 5.16s ==================================================================================================================================

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?

@mreso mreso requested review from agunapal and lxning January 18, 2024 01:33
@mreso mreso marked this pull request as ready for review January 19, 2024 05:51
Copy link
Collaborator

@lxning lxning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mreso for the great work. Only one question:
Will this PR break existing benchmark dashboard pipeline b/c auto-benchmark.py is built based on benchmark-ab.py? Can we test it to see if there are any changes needed in auto-benchmark.py?

@mreso
Copy link
Collaborator Author

mreso commented Jan 24, 2024

@lxning auto-benchmark.py is using benchmark-ab.py as a script. The refactor does not alter the external behavior of the script. It only adds the option to use locust instead of ab. The output format (of the final ab-report.txt) is equivalent. We discussed the format of the intermediate results in our meeting last week and concluded these are not used anywhere. Did you find a place where they are used? I've asked @agunapal to test some of his use cases for benchmark-ab.py in order to make sure the external behavior is the same.

@agunapal
Copy link
Collaborator

@lxning auto-benchmark.py is using benchmark-ab.py as a script. The refactor does not alter the external behavior of the script. It only adds the option to use locust instead of ab. The output format (of the final ab-report.txt) is equivalent. We discussed the format of the intermediate results in our meeting last week and concluded these are not used anywhere. Did you find a place where they are used? I've asked @agunapal to test some of his use cases for benchmark-ab.py in order to make sure the external behavior is the same.

Sorry, haven't been able to find time to test this yet.

@agunapal
Copy link
Collaborator

Verified that the auto_benchmark script works as before.

/tmp/benchmark/
/tmp/benchmark/gpu_memory_percentage.txt
/tmp/benchmark/handler_time.txt
/tmp/benchmark/input
/tmp/benchmark/result.txt
/tmp/benchmark/predict.txt
/tmp/benchmark/cpu_percentage.txt
/tmp/benchmark/worker_thread.txt
/tmp/benchmark/logs/
/tmp/benchmark/logs/stats_metrics.json
/tmp/benchmark/logs/model_metrics.log
/tmp/benchmark/gpu_percentage.txt
/tmp/benchmark/conf/
/tmp/benchmark/conf/config.properties
/tmp/benchmark/gpu_memory_used.txt
/tmp/benchmark/memory_percentage.txt
/tmp/benchmark/waiting_time.txt
execute: tar -cvzf /tmp/ts_benchmark/scripted_mode_vgg16_w4_b8/logs.tar.gz /home/ubuntu/serve/logs
tar: Removing leading `/' from member names
/home/ubuntu/serve/logs/
/home/ubuntu/serve/logs/config/
/home/ubuntu/serve/logs/config/20240126184344680-shutdown.cfg
/home/ubuntu/serve/logs/config/20240126184146147-snapshot.cfg
/home/ubuntu/serve/logs/config/20240126184343852-snapshot.cfg
/home/ubuntu/serve/logs/config/20240126184134358-startup.cfg
/home/ubuntu/serve/logs/ts_log.log
/home/ubuntu/serve/logs/model_metrics.log
/home/ubuntu/serve/logs/ts_metrics.log
/home/ubuntu/serve/logs/access_log.log
/home/ubuntu/serve/logs/model_log.log
finish benchmark scripted_mode_vgg16_w4_b8
report.md is generated
benchmark_serving.sh finished successfully.


report.md

TorchServe Benchmark on gpu
===========================

# Date: 2024-01-26 18:43:45

# TorchServe Version: 0.9.0

## eager_mode_mnist

Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As discussed, we need to monitor the dashboard to check nothing breaks once this is merged.

@mreso mreso added this pull request to the merge queue Jan 29, 2024
Merged via the queue into master with commit 1a567db Jan 29, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants