Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcard search in model_archiver --extra-files #2142

Merged
merged 4 commits into from
Feb 27, 2023

Conversation

gustavhartz
Copy link
Contributor

Description

Fixes #2111 (Feature request)

Implementation of wildcard filepaths for the model_archiver

Motivation
"Large Language Models like Bloom, for example, consist of hundreds of sharded model files. Integrating all the model files as extra-files parameters is a cumbersome process. Wildcard support would help to solve this quickly."

Implementation
The change uses the same logic as before for how the extra_files are passed to the model archiver, but now also allows for passing wildcard paths like. /project/models/**.bin using the glob python module.

This does not require any new dependencies, but an update to the documentation is needed to inform about the new feature.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Running torchserve_sanity.py passes with no issues. The model_archiver PyTests already require the model_archiver to function properly. I did not add any new test to assert the wildcard functionality is operating as expected. Should I?

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@agunapal
Copy link
Collaborator

@gustavhartz Thanks for submitting the PR. Could you please include logs of a test with wildcard matching

@gustavhartz
Copy link
Contributor Author

gustavhartz commented Feb 17, 2023

@agunapal I looked into creating a unit test for it to allow for testing using the torchserve_sanity script, but since "model_archiver.model_packaging.ModelExportUtils" is Mocked this isn't really possible.

  • Rewriting the logic to allow for a unit test would not make much sense, as the utils are not being tested anyways
  • An alternative is to add it as a nightly test in /test/pytest/test_model_archiver.py. Would thinks something like
cmd = test_utils.model_archiver_command_builder(
        None,
        "1.0",
        os.path.join(test_utils.CODEBUILD_WD, "examples", "image_classifier", "resnet_18", "model.py"),
        os.path.join(test_utils.MODEL_STORE, "resnet18-f37072fd.pth"),
        "image_classifier",
        os.path.join(test_utils.CODEBUILD_WD, "examples", "image_classifier", "*.json")
    )

would do the job as it is basically the same as many of the existing tests.

What do you think?

@agunapal
Copy link
Collaborator

@agunapal I looked into creating a unit test for it to allow for testing using the torchserve_sanity script, but since "model_archiver.model_packaging.ModelExportUtils" is Mocked this isn't really possible.

  • Rewriting the logic to allow for a unit test would not make much sense, as the utils are not being tested anyways
  • An alternative is to add it as a nightly test in /test/pytest/test_model_archiver.py. Would thinks something like
cmd = test_utils.model_archiver_command_builder(
        None,
        "1.0",
        os.path.join(test_utils.CODEBUILD_WD, "examples", "image_classifier", "resnet_18", "model.py"),
        os.path.join(test_utils.MODEL_STORE, "resnet18-f37072fd.pth"),
        "image_classifier",
        os.path.join(test_utils.CODEBUILD_WD, "examples", "image_classifier", "*.json")
    )

would do the job as it is basically the same as many of the existing tests.

What do you think?

@gustavhartz Sorry, I just meant, can you please attach the TorchServe logs/ commands showing an example of this implementation working. It doesn't have to be a unit test.

@gustavhartz
Copy link
Contributor Author

gustavhartz commented Feb 18, 2023

@agunapal Sure. Running

touch wildcardfilname.txt
torch-model-archiver --version 1.0 --model-file ./serve/examples/image_classifier/resnet_18/model.py --serialized-file ./densenet161-8d451a50.pth --handler image_classifier --extra-files ",*.txt" --export-path ./ --model-name test
# Note that "*.txt" then picks up all textfiles in the current directory

And then unzipping the .mar file reveals the wildcardfilname.txt file.

PyTest output for running the model-archiver tests.

============================= test session starts ==============================
platform darwin -- Python 3.10.9, pytest-7.2.1, pluggy-1.0.0
rootdir: /Users/gustavhartz/Projects/serve/model-archiver
plugins: mock-3.10.0, cov-4.0.0
collected 28 items

model_archiver/tests/integ_tests/test_integration_model_archiver.py ..   [  7%]
model_archiver/tests/unit_tests/test_model_packaging.py ....             [ 21%]
model_archiver/tests/unit_tests/test_model_packaging_utils.py .......... [ 57%]
...........                                                              [ 96%]
model_archiver/tests/unit_tests/test_version.py .                        [100%]

============================== 28 passed in 0.93s ==============================

@codecov
Copy link

codecov bot commented Feb 21, 2023

Codecov Report

Merging #2142 (cb9e64f) into master (87359c4) will increase coverage by 0.01%.
The diff coverage is 7.14%.

❗ Current head cb9e64f differs from pull request most recent head 5bc7c56. Consider uploading reports for the commit 5bc7c56 to get more accurate results

@@            Coverage Diff             @@
##           master    #2142      +/-   ##
==========================================
+ Coverage   53.36%   53.37%   +0.01%     
==========================================
  Files          71       71              
  Lines        3225     3226       +1     
  Branches       56       57       +1     
==========================================
+ Hits         1721     1722       +1     
  Misses       1504     1504              
Impacted Files Coverage Δ
...l-archiver/model_archiver/model_packaging_utils.py 54.60% <7.14%> (+0.30%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@agunapal agunapal requested a review from lxning February 21, 2023 23:16
@msaroufim msaroufim merged commit b05c0ea into pytorch:master Feb 27, 2023
morgandu pushed a commit to morgandu/pytorch-serve that referenced this pull request Mar 8, 2023
…nts using glob (pytorch#2142)

Co-authored-by: Ankith Gunapal <agunapal@meta.com>
Co-authored-by: Mark Saroufim <marksaroufim@fb.com>
@gustavhartz gustavhartz deleted the issue/2111 branch September 12, 2023 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wildcard support for torch-model-archiver --extra-file parameter (better Large Language Model support)
3 participants