Enable fx_graph_cache in gpt-fast example #2935

mreso · 2024-02-09T21:41:47Z

Description

This PR enables fxGraphCache in the gpt-fast example to speed up compile time.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

[X]pytest test/pytest/test_example_gpt_fast.py -k test_handler

===================================================================================================================== test session starts =====================================================================================================================
platform linux -- Python 3.10.12, pytest-7.3.1, pluggy-1.3.0
rootdir: /home/ubuntu/serve
plugins: mock-3.10.0, cov-4.1.0
collected 5 items / 3 deselected / 2 selected

test/pytest/test_example_gpt_fast.py ..                                                                                                                                                                                                                 [100%]

====================================================================================================================== warnings summary =======================================================================================================================
test/pytest/test_example_gpt_fast.py::test_handler[false]
  /home/ubuntu/serve/ts/torch_handler/base_handler.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    from pkg_resources import packaging

test/pytest/test_example_gpt_fast.py::test_handler[false]
test/pytest/test_example_gpt_fast.py::test_handler[false]
  /home/ubuntu/miniconda3/envs/serve/lib/python3.10/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('zope')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

test/pytest/test_example_gpt_fast.py::test_handler[false]
test/pytest/test_example_gpt_fast.py::test_handler[true]
test/pytest/test_example_gpt_fast.py::test_handler[true]
test/pytest/test_example_gpt_fast.py::test_handler[true]
  /home/ubuntu/miniconda3/envs/serve/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:321: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_
kernel() for the new context manager, with updated signature.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================================================== 2 passed, 3 deselected, 7 warnings in 79.21s (0:01:19) ====================================================================================================

pytest test/pytest/test_example_gpt_fast.py -k test_gpt_fast_mar[mar_file_path0]

===================================================================================================================== test session starts =====================================================================================================================
platform linux -- Python 3.10.12, pytest-7.3.1, pluggy-1.3.0
rootdir: /home/ubuntu/serve
plugins: mock-3.10.0, cov-4.1.0
collected 5 items / 4 deselected / 1 selected

test/pytest/test_example_gpt_fast.py 2024-02-09T20:28:14,022 [INFO ] W-29500-gpt_fast_handler_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:6.3719938016E7|#model_name:gpt_fast_handler,model_version:default|#hostname:ip-172-31-15-101,time
stamp:1707510494
2024-02-09T20:28:14,022 [INFO ] W-29500-gpt_fast_handler_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:144.753|#model_name:gpt_fast_handler,model_version:default|#hostname:ip-172-31-15-101,timestamp:1707510494
2024-02-09T20:28:14,022 [INFO ] W-29500-gpt_fast_handler_1.0-stdout MODEL_METRICS - HandlerTime.ms:63717.68|#ModelName:gpt_fast_handler,Level:Model|#hostname:ip-172-31-15-101,requestID:69282a5f-3532-497e-863f-4824e81e505a,timestamp:1707510494
.                                                                                                                                                                                                                  [100%]

========================================================================================================= 1 passed, 4 deselected in 75.21s (0:01:15) ==========================================================================================================

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

…able_fx_cache

agunapal

LGTM to me.
Can you please add a section in the README highlighting this example working with multiple GPUs.
The README talks about it with TP, but its not very obvious.

mreso added 3 commits February 9, 2024 21:32

Enable fx_graph_cache in gpt-fast example

19fe734

mention fx_graph_cache in readme

2bf66ba

Merge remote-tracking branch 'origin/master' into feature/gpt_fast_en…

9e6e94d

…able_fx_cache

mreso marked this pull request as ready for review February 9, 2024 21:42

mreso requested review from agunapal and lxning February 9, 2024 21:42

Fix spellcheck

c33e53e

agunapal approved these changes Feb 10, 2024

View reviewed changes

Update README.md

94a5c3c

mreso enabled auto-merge February 10, 2024 01:10

mreso added this pull request to the merge queue Feb 10, 2024

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Feb 10, 2024

msaroufim self-requested a review February 10, 2024 04:48

msaroufim approved these changes Feb 10, 2024

View reviewed changes

Merge branch 'master' into feature/gpt_fast_enable_fx_cache

e245d87

msaroufim enabled auto-merge February 10, 2024 04:49

msaroufim added this pull request to the merge queue Feb 10, 2024

Merged via the queue into master with commit e6654ec Feb 10, 2024
15 checks passed

chauhang added this to the v0.10.0 milestone Feb 27, 2024

agunapal added the torch.compile label Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable fx_graph_cache in gpt-fast example #2935

Enable fx_graph_cache in gpt-fast example #2935

mreso commented Feb 9, 2024

agunapal left a comment

Enable fx_graph_cache in gpt-fast example #2935

Enable fx_graph_cache in gpt-fast example #2935

Conversation

mreso commented Feb 9, 2024

Description

Type of change

Feature/Issue validation/testing

Checklist:

agunapal left a comment

Choose a reason for hiding this comment