Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase test timeout for test_handler_traceback_logging #3113

Merged
merged 2 commits into from
Apr 25, 2024

Conversation

namannandan
Copy link
Collaborator

@namannandan namannandan commented Apr 24, 2024

Description

Regression test test_handler_traceback_logging passes but then fails during teardown in Docker with the following traceback:

================================================================================================================ ERRORS ================================================================================================================
_________________________________________________________________________________________ ERROR at teardown of test_handler_traceback_logging __________________________________________________________________________________________


>       test_utils.torchserve_cleanup()

test/pytest/test_handler_traceback_logging.py:91: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/pytest/test_utils.py:110: in torchserve_cleanup
    stop_torchserve()
test/pytest/test_utils.py:93: in stop_torchserve
    subprocess.run(["torchserve", "--stop", "--foreground"])
/usr/lib/python3.9/subprocess.py:507: in run
    stdout, stderr = process.communicate(input, timeout=timeout)
/usr/lib/python3.9/subprocess.py:1126: in communicate
    self.wait()
/usr/lib/python3.9/subprocess.py:1189: in wait
    return self._wait(timeout=timeout)
/usr/lib/python3.9/subprocess.py:1933: in _wait
    (pid, sts) = self._try_wait(0)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Popen: returncode: -9 args: ['torchserve', '--stop', '--foreground']>, wait_flags = 0

    def _try_wait(self, wait_flags):
        """All callers to this function MUST hold self._waitpid_lock."""
        try:
>           (pid, sts) = os.waitpid(self.pid, wait_flags)
E           Failed: Timeout >60.0s

Turns out that test_utils.torchserve_cleanup() takes longer when running in Docker.

def torchserve_cleanup():
stop_torchserve()
delete_model_store()
delete_all_snapshots()

Therefore, increasing timeout from 60 seconds to 120 seconds fixes the issue.

Fixes #3106

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Feature/Issue validation/testing

  • Manual test
$ docker % docker run -it -v /Volumes/workplace/pytorch/serve:/home/serve pytorch/torchserve:ci
========================================================================================================= test session starts ==========================================================================================================
platform linux -- Python 3.9.19, pytest-7.3.1, pluggy-1.5.0
rootdir: /home/serve
configfile: pytest.ini
plugins: timeout-2.3.1, mock-3.14.0, cov-4.1.0
collected 1 item                                                                                                                                                                                                                       

test/pytest/test_handler_traceback_logging.py::test_handler_traceback_logging 
------------------------------------------------------------------------------------------------------------ live log setup ------------------------------------------------------------------------------------------------------------
INFO     root:model_packaging.py:54 Successfully exported model test_model to file /tmp/pytest-of-root/pytest-0/test_model0
DEBUG    urllib3.connectionpool:connectionpool.py:244 Starting new HTTP connection (1): localhost:8081
DEBUG    urllib3.connectionpool:connectionpool.py:549 http://localhost:8081 "POST /models?model_name=test_model&url=test_model.mar&initial_workers=1&synchronous=false&batch_size=1 HTTP/1.1" 202 47
2024-04-24T22:58:02,013 [INFO ] W-9000-test_model_1.0-stdout MODEL_LOG - AssertionErrorPASSED
                                                                                                                                                                                                                           [100%]2024-04-24T22:58:02,014 [INFO ] W-9000-test_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-test_model_1.0-stdout

---------------------------------------------------------------------------------------------------------- live log teardown -----------------------------------------------------------------------------------------------------------
DEBUG    urllib3.connectionpool:connectionpool.py:244 Starting new HTTP connection (1): localhost:8081
DEBUG    urllib3.connectionpool:connectionpool.py:549 http://localhost:8081 "DELETE /models/test_model HTTP/1.1" 200 52


===================================================================================================== 1 passed in 74.03s (0:01:14) =====================================================================================================

@namannandan namannandan marked this pull request as ready for review April 24, 2024 23:54
Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agunapal agunapal added this pull request to the merge queue Apr 24, 2024
Merged via the queue into master with commit 3d23fc3 Apr 25, 2024
11 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docker regression failure: test_handler_traceback_logging.py
2 participants