Skip to content

pytorch/serve

❗ANNOUNCEMENT: Security Changes❗

TorchServe now enforces token authorization enabled and model API control disabled by default. These security features are intended to address the concern of unauthorized API calls and to prevent potential malicious code from being introduced to the model server. Refer the following documentation for more information: Token Authorization, Model API control

TorchServe

Nightly build Docker Nightly build Benchmark Nightly Docker Regression Nightly KServe Regression Nightly Kubernetes Regression Nightly

TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production.

Requires python >= 3.8

curl http://127.0.0.1:8080/predictions/bert -T input.txt

πŸš€ Quick start with TorchServe

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
pip install torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly

πŸš€ Quick start with TorchServe (conda)

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
conda install -c pytorch torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver

Getting started guide

🐳 Quick Start with Docker

# Latest release
docker pull pytorch/torchserve

# Nightly build
docker pull pytorch/torchserve-nightly

Refer to torchserve docker for details.

πŸ€– Quick Start LLM Deployment

#export token=<HUGGINGFACE_HUB_TOKEN>
docker build . -f docker/Dockerfile.llm -t ts/llm

docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token

curl -X POST -d '{"prompt":"Hello, my name is", "max_new_tokens": 50}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model"

Refer to LLM deployment for details and other methods.

⚑ Why TorchServe

πŸ€” How does TorchServe work

πŸ† Highlighted Examples

For more examples

πŸ›‘οΈ TorchServe Security Policy

SECURITY.md

πŸ€“ Learn More

https://pytorch.org/serve

πŸ«‚ Contributing

We welcome all contributions!

To learn more about how to contribute, see the contributor guide here.

πŸ“° News

πŸ’– All Contributors

Made with contrib.rocks.

βš–οΈ Disclaimer

This repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in the CONTRIBUTORS file. For questions directed at Meta, please send an email to opensource@fb.com. For questions directed at Amazon, please send an email to torchserve@amazon.com. For all other questions, please open up an issue in this repository here.

TorchServe acknowledges the Multi Model Server (MMS) project from which it was derived