Enable opt-6.7b benchmark on inf2 #2400

namannandan · 2023-06-08T00:14:39Z

Description

Enable benchmarking for the opt-6.7b model on inferentia2 based on the inf2 example: #2399

Model archives:

Type of change

New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

Manually tested inference on model archives
Benchmark run using test branch: inf2-opt-benchmark-test
- Successful run: https://github.com/pytorch/serve/actions/runs/5328609300/jobs/9683461633

Benchmark results

TorchServe Benchmark on neuronx

Date: 2023-06-22 08:44:16

TorchServe Version: inf2-opt-benchmark-test

scripted_mode_opt_6.7b_neuronx_batch_1

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
inf2-opt-benchmark-test	AB	1	100	1	.mar	10	input	2000	1946	0.63	15945	16075	16132	15974.07	97.3	1591.6	1593.65	1594.07	1596.79	1596.7	14332.83	0.28	2.88	6.76	0.0	0.0	0.0

scripted_mode_opt_6.7b_neuronx_batch_2

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
inf2-opt-benchmark-test	AB	2	100	1	.mar	10	input	2000	1934	1.13	8860	8938	8953	8881.404	96.7	1769.37	1770.75	1770.97	1773.37	1773.28	7075.18	0.49	0.0	6.8	0.0	0.0	0.0

scripted_mode_opt_6.7b_neuronx_batch_4

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
inf2-opt-benchmark-test	AB	4	100	1	.mar	10	input	2000	1955	2.19	3666	5483	5493	4566.966	97.75	1819.03	1822.06	1822.97	1821.48	1821.39	2725.47	0.65	5.0	7.4	0.0	0.0	0.0

scripted_mode_opt_6.7b_neuronx_batch_8

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
inf2-opt-benchmark-test	AB	8	100	1	.mar	10	input	2000	1938	4.28	1863	3724	3732	2337.482	96.9	1857.53	1859.36	1859.8	1859.7	1859.61	463.83	1.11	0.0	7.3	0.0	0.0	0.0

codecov · 2023-06-08T00:35:14Z

Codecov Report

Merging #2400 (307ac65) into master (ec3b992) will not change coverage.
The diff coverage is n/a.

❗ Current head 307ac65 differs from pull request most recent head 06ea628. Consider uploading reports for the commit 06ea628 to get more accurate results

@@           Coverage Diff           @@
##           master    #2400   +/-   ##
=======================================
  Coverage   71.89%   71.89%           
=======================================
  Files          78       78           
  Lines        3654     3654           
  Branches       58       58           
=======================================
  Hits         2627     2627           
  Misses       1023     1023           
  Partials        4        4

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

agunapal

Why do we have different mar files for each batch size?

namannandan · 2023-06-26T22:58:28Z

For inferentia2, we'll need to trace the model separately to support different batch sizes. Here, the model is being traced at model load time using model-config.yaml. Since for each batch size, a different model-config.yaml file is required, I've packaged them into separate mar files.

msaroufim

unblocking

Enable opt-6.7b benchmark on inf2

02ca36c

namannandan force-pushed the naman-opt-inf2-benchmark branch from 08f2d89 to 4563876 Compare June 19, 2023 23:40

Add support for inf2 dependency handling

8be8b60

namannandan force-pushed the naman-opt-inf2-benchmark branch from 4563876 to 8be8b60 Compare June 20, 2023 01:03

update benchmark requests and concurrency

2be8307

namannandan marked this pull request as ready for review June 20, 2023 20:54

namannandan requested review from msaroufim, agunapal, lxning and HamidShojanazeri June 20, 2023 20:54

Merge branch 'master' into naman-opt-inf2-benchmark

0c830da

agunapal reviewed Jun 23, 2023

View reviewed changes

Merge branch 'master' into naman-opt-inf2-benchmark

b94cbe9

namannandan requested a review from agunapal June 26, 2023 22:58

agunapal approved these changes Jun 27, 2023

View reviewed changes

Merge branch 'master' into naman-opt-inf2-benchmark

06ea628

msaroufim approved these changes Jun 29, 2023

View reviewed changes

namannandan merged commit b260776 into pytorch:master Jun 29, 2023
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable opt-6.7b benchmark on inf2 #2400

Enable opt-6.7b benchmark on inf2 #2400

namannandan commented Jun 8, 2023 •

edited

Loading

codecov bot commented Jun 8, 2023 •

edited

Loading

agunapal left a comment

namannandan commented Jun 26, 2023

msaroufim left a comment

Enable opt-6.7b benchmark on inf2 #2400

Enable opt-6.7b benchmark on inf2 #2400

Conversation

namannandan commented Jun 8, 2023 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Benchmark results

TorchServe Benchmark on neuronx

Date: 2023-06-22 08:44:16

TorchServe Version: inf2-opt-benchmark-test

scripted_mode_opt_6.7b_neuronx_batch_1

scripted_mode_opt_6.7b_neuronx_batch_2

scripted_mode_opt_6.7b_neuronx_batch_4

scripted_mode_opt_6.7b_neuronx_batch_8

codecov bot commented Jun 8, 2023 • edited Loading

Codecov Report

agunapal left a comment

Choose a reason for hiding this comment

namannandan commented Jun 26, 2023

msaroufim left a comment

Choose a reason for hiding this comment

namannandan commented Jun 8, 2023 •

edited

Loading

codecov bot commented Jun 8, 2023 •

edited

Loading