Refactor benchmark script for LLM benchmark integration #2897

mreso · 2024-01-18T00:37:12Z

Description

This PR refactors the benchmark script for easier integration of an LLM benchmark

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

pytest test/pytest/test_benchmark.py

================================================================================================================================ test session starts =================================================================================================================================
platform linux -- Python 3.10.13, pytest-7.3.1, pluggy-1.0.0
rootdir: /home/ubuntu/serve
plugins: mock-3.12.0, cov-4.1.0
collecting ... This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

collected 1 item

test/pytest/test_benchmark.py .

================================================================================================================================= 1 passed in 5.16s ==================================================================================================================================

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?

lxning

Thanks @mreso for the great work. Only one question:
Will this PR break existing benchmark dashboard pipeline b/c auto-benchmark.py is built based on benchmark-ab.py? Can we test it to see if there are any changes needed in auto-benchmark.py?

mreso · 2024-01-24T20:24:48Z

@lxning auto-benchmark.py is using benchmark-ab.py as a script. The refactor does not alter the external behavior of the script. It only adds the option to use locust instead of ab. The output format (of the final ab-report.txt) is equivalent. We discussed the format of the intermediate results in our meeting last week and concluded these are not used anywhere. Did you find a place where they are used? I've asked @agunapal to test some of his use cases for benchmark-ab.py in order to make sure the external behavior is the same.

agunapal · 2024-01-25T19:03:12Z

@lxning auto-benchmark.py is using benchmark-ab.py as a script. The refactor does not alter the external behavior of the script. It only adds the option to use locust instead of ab. The output format (of the final ab-report.txt) is equivalent. We discussed the format of the intermediate results in our meeting last week and concluded these are not used anywhere. Did you find a place where they are used? I've asked @agunapal to test some of his use cases for benchmark-ab.py in order to make sure the external behavior is the same.

Sorry, haven't been able to find time to test this yet.

agunapal · 2024-01-26T19:02:36Z

Verified that the auto_benchmark script works as before.

/tmp/benchmark/
/tmp/benchmark/gpu_memory_percentage.txt
/tmp/benchmark/handler_time.txt
/tmp/benchmark/input
/tmp/benchmark/result.txt
/tmp/benchmark/predict.txt
/tmp/benchmark/cpu_percentage.txt
/tmp/benchmark/worker_thread.txt
/tmp/benchmark/logs/
/tmp/benchmark/logs/stats_metrics.json
/tmp/benchmark/logs/model_metrics.log
/tmp/benchmark/gpu_percentage.txt
/tmp/benchmark/conf/
/tmp/benchmark/conf/config.properties
/tmp/benchmark/gpu_memory_used.txt
/tmp/benchmark/memory_percentage.txt
/tmp/benchmark/waiting_time.txt
execute: tar -cvzf /tmp/ts_benchmark/scripted_mode_vgg16_w4_b8/logs.tar.gz /home/ubuntu/serve/logs
tar: Removing leading `/' from member names
/home/ubuntu/serve/logs/
/home/ubuntu/serve/logs/config/
/home/ubuntu/serve/logs/config/20240126184344680-shutdown.cfg
/home/ubuntu/serve/logs/config/20240126184146147-snapshot.cfg
/home/ubuntu/serve/logs/config/20240126184343852-snapshot.cfg
/home/ubuntu/serve/logs/config/20240126184134358-startup.cfg
/home/ubuntu/serve/logs/ts_log.log
/home/ubuntu/serve/logs/model_metrics.log
/home/ubuntu/serve/logs/ts_metrics.log
/home/ubuntu/serve/logs/access_log.log
/home/ubuntu/serve/logs/model_log.log
finish benchmark scripted_mode_vgg16_w4_b8
report.md is generated
benchmark_serving.sh finished successfully.


report.md

TorchServe Benchmark on gpu
===========================

# Date: 2024-01-26 18:43:45

# TorchServe Version: 0.9.0

## eager_mode_mnist

agunapal

LGTM. As discussed, we need to monitor the dashboard to check nothing breaks once this is merged.

mreso added 10 commits January 17, 2024 21:53

Added unit test for benchmark-ab.py; refactor test plans into module

5338634

Declutter benchmark script

6446ff2

Move benchmark reporting into module; remove global variable

3b7a71d

refactored system under test in benchmark script

e8b0542

Move benchmark script refactoring

d77e4da

Remove unnecessary method

522737e

Refactor benchmark into benchmark class

99188da

Resort methods

4bfedea

Move print

fbfa11e

split benchmark and systems under test into separate files

e936b0f

mreso requested review from agunapal and lxning January 18, 2024 01:33

mreso added 7 commits January 18, 2024 04:39

Split benchmark and ts reporting

db6ded8

Remove global metrics variable

9f8c520

Refactor reporting

4aa1e99

Skip benchmark test if click is missing

55f06b5

Fix mising import after precommit

691ed0c

Added locust as a benchmark backend

759b271

Merge branch 'master' into feature/llm_benchmark

ed31b95

mreso marked this pull request as ready for review January 19, 2024 05:51

Fix spell check test

756ecdc

lxning reviewed Jan 24, 2024

View reviewed changes

Merge branch 'master' into feature/llm_benchmark

0e73e4e

agunapal approved these changes Jan 26, 2024

View reviewed changes

mreso added this pull request to the merge queue Jan 29, 2024

Merged via the queue into master with commit 1a567db Jan 29, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor benchmark script for LLM benchmark integration #2897

Refactor benchmark script for LLM benchmark integration #2897

mreso commented Jan 18, 2024

lxning left a comment

mreso commented Jan 24, 2024

agunapal commented Jan 25, 2024

agunapal commented Jan 26, 2024

agunapal left a comment

Refactor benchmark script for LLM benchmark integration #2897

Refactor benchmark script for LLM benchmark integration #2897

Conversation

mreso commented Jan 18, 2024

Description

Type of change

Feature/Issue validation/testing

Checklist:

lxning left a comment

Choose a reason for hiding this comment

mreso commented Jan 24, 2024

agunapal commented Jan 25, 2024

agunapal commented Jan 26, 2024

agunapal left a comment

Choose a reason for hiding this comment