Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pippy deferred init #2310

Merged
merged 33 commits into from
May 10, 2023
Merged

Pippy deferred init #2310

merged 33 commits into from
May 10, 2023

Conversation

HamidShojanazeri
Copy link
Collaborator

Description

This PR adds deferred init to pippy for large model inference.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ X] New feature (non-breaking change which adds functionality)
  • [ X] This change requires a documentation update

@codecov
Copy link

codecov bot commented May 3, 2023

Codecov Report

Merging #2310 (2e05a27) into master (614bfc0) will decrease coverage by 0.37%.
The diff coverage is 0.00%.

❗ Current head 2e05a27 differs from pull request most recent head 2082720. Consider uploading reports for the commit 2082720 to get more accurate results

@@            Coverage Diff             @@
##           master    #2310      +/-   ##
==========================================
- Coverage   69.82%   69.45%   -0.37%     
==========================================
  Files          77       77              
  Lines        3420     3438      +18     
  Branches       57       57              
==========================================
  Hits         2388     2388              
- Misses       1029     1047      +18     
  Partials        3        3              
Impacted Files Coverage Δ
ts/handler_utils/distributed/pt_pippy.py 0.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -46,37 +39,42 @@ pippy:
input_names: ['input_ids'] # input arg names to the model, this is required for FX tracing
model_type: "HF" # set the model type to HF if you are using Huggingface model other wise leave it blank or any other model you use.
rpc_timeout: 1800
num_worker_threads: 512 #number of threads for rpc worker usually 512 is a good number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to such large value for pippy to work? Will be good to verify, if there is any special requirement override the default value of 16 for RPC num_worker_threads

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, this number has been set mostly for training to avoid deadlock caused by RPC thread pool drain. For inference it seems it can be lowered like 4x. There is no clear number guidance here checked with Ke.

Copy link
Contributor

@chauhang chauhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Hamid. I have left few review comments based on the testing

@@ -46,6 +46,6 @@ def hf_model(model_str):
repo_id=args.model_name,
revision=args.revision,
cache_dir=args.model_path,
use_auth_token=True,
use_auth_token=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restrict to pytorch only files by passing allow_patterns param with below values:

allow_patterns = [".json", ".pt", ".bin", ".txt", "*.model"]


handler:
max_length: 80 # max length of tokens for tokenizer in the handler
model_name: "/home/ubuntu/serve/examples/large_models/Huggingface_pippy/model/models--facebook--opt-30b/snapshots/ceea0a90ac0f6fae7c2c34bcb40477438c152546" #the path to the checkpoints, in this example downloaded file. Please change to your model path.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a hard coded path for model. Can we update it with HF cache dir to make it more generic for user to follow? We can set env
HUGGINGFACE_HUB_CACHE and TRANSFORMERS_CACHE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to this example so if someone use the download script here, then they will end up having the path. Overall we are expecting user to pass the checkpoint path.


handler:
model_path: "/home/ubuntu/serve/examples/large_models/Huggingface_pippy/model/models--facebook--opt-30b/snapshots/ceea0a90ac0f6fae7c2c34bcb40477438c152546"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


# Check that the required keys are present in the "pippy" section
assert (
"chunks" in ctx.model_yaml_config["pippy"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use default value of 1 if not specified for chunks?

Copy link
Contributor

@chauhang chauhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments

Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lxning lxning merged commit 2f1f52f into master May 10, 2023
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants