Fix dangling pointer error in cpp worker #2975

mreso · 2024-02-28T05:25:12Z

Description

This PR fixes a dangling pointer issue in our cpp worker.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Run llama2 aot example

curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=llm.mar&batch_size=2&max_batch_delay=5000"
*   Trying 127.0.0.1:8081...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8081 (#0)
> POST /models?initial_workers=1&url=llm.mar&batch_size=2&max_batch_delay=5000 HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.68.0
> Accept: */*
>
2024-02-28T05:23:06,282 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model llm
2024-02-28T05:23:06,282 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model llm
2024-02-28T05:23:06,282 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model llm loaded.
2024-02-28T05:23:06,283 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - updateModel: llm, count: 1
2024-02-28T05:23:06,287 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - LSP startWorker
2024-02-28T05:23:06,288 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/ubuntu/serve/ts/cpp/bin/model_worker_socket, --sock_type, unix, --sock_name, /tmp/.ts.sock.9000, --runtime_type, LSP, --model_dir, /tmp/models/2a5ffb5360594e438b9e3f1c8062cb5a, --logger_config_path, /home/ubuntu/serve/ts/cpp/resources/logging.config, --metrics_config_path, /home/ubuntu/serve/ts/configs/metrics.yaml]
2024-02-28T05:23:06,531 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:06.531028 52767 model_worker.cc:43] Listening on /tmp/.ts.sock.9000
2024-02-28T05:23:06,533 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:06.531931 52767 model_worker.cc:67] Binding to unix socket
2024-02-28T05:23:06,534 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:06.531967 52767 model_worker.cc:91] Socket bind successful
2024-02-28T05:23:06,534 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:06.531971 52767 model_worker.cc:92] [PID]52767
2024-02-28T05:23:06,534 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:06.531974 52767 model_worker.cc:94] INFO Torch worker started.
2024-02-28T05:23:06,534 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change null -> WORKER_STARTED
2024-02-28T05:23:06,536 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2024-02-28T05:23:06,541 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:06.540905 52767 model_worker.cc:103] Connection accepted: /tmp/.ts.sock.9000
2024-02-28T05:23:06,541 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:06.540938 52767 model_worker.cc:121] Handle connection
2024-02-28T05:23:06,543 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709097786543
2024-02-28T05:23:06,545 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709097786545
2024-02-28T05:23:06,555 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:06.555172 52767 model_worker.cc:141] LOAD request received
2024-02-28T05:23:06,559 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 14
2024-02-28T05:23:06,559 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-llm_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-02-28T05:23:06,559 [INFO ] W-9000-llm_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:273.0|#WorkerName:W-9000-llm_1.0,Level:Host|#hostname:ip-172-31-55-226,timestamp:1709097786
2024-02-28T05:23:06,560 [INFO ] W-9000-llm_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:3.0|#Level:Host|#hostname:ip-172-31-55-226,timestamp:1709097786
2024-02-28T05:23:06,564 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /127.0.0.1:45312 "POST /models?initial_workers=1&url=llm.mar&batch_size=2&max_batch_delay=5000 HTTP/1.1" 200 328
2024-02-28T05:23:06,565 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:ip-172-31-55-226,timestamp:1709097786
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: application/json
< x-request-id: 98ecf890-c2ef-4764-ba21-f49abf7444bd
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 79
< connection: keep-alive
<
{
  "status": "Model \"llm\" Version: 1.0 registered with 1 initial workers"
}
* Connection #0 to host localhost left intact
(serve) ubuntu@ip-172-31-55-226:~/serve/examples/cpp/aot_inductor/llama2$ curl http://localhost:8080/predictions/llm -T prompt1.txt
2024-02-28T05:23:14,821 [INFO ] epollEventLoopGroup-3-2 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:llm,model_version:default|#hostname:ip-172-31-55-226,timestamp:1709097794
2024-02-28T05:23:19,823 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1709097799822
2024-02-28T05:23:19,823 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709097799823
2024-02-28T05:23:19,824 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:19.824458 52767 model_worker.cc:126] INFER request received
2024-02-28T05:23:21,344 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:21.344223 52767 log_metric.cc:92] [METRICS]HandlerTime.Milliseconds:1519.099771|#ModelName:llm,Level:Model|#hostname:ip-172-31-55-226,1709097801,d89d4e00-d308-4b5b-8170-971197422df3
2024-02-28T05:23:21,345 [INFO ] W-9000-llm_1.0-stdout MODEL_LOG - I0228 05:23:21.344261 52767 log_metric.cc:92] [METRICS]PredictionTime.Milliseconds:1519.099771|#ModelName:llm,Level:Model|#hostname:ip-172-31-55-226,1709097801,d89d4e00-d308-4b5b-8170-971197422df3
2024-02-28T05:23:21,345 [INFO ] W-9000-llm_1.0 ACCESS_LOG - /127.0.0.1:54658 "PUT /predictions/llm HTTP/1.1" 200 6525
2024-02-28T05:23:21,346 [INFO ] W-9000-llm_1.0 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:ip-172-31-55-226,timestamp:1709097801
2024-02-28T05:23:21,346 [INFO ] W-9000-llm_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:6522767.808|#model_name:llm,model_version:default|#hostname:ip-172-31-55-226,timestamp:1709097801
2024-02-28T05:23:21,346 [INFO ] W-9000-llm_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:5000505.114|#model_name:llm,model_version:default|#hostname:ip-172-31-55-226,timestamp:1709097801
2024-02-28T05:23:21,346 [DEBUG] W-9000-llm_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 5000505114, Backend time ns: 1523652292
2024-02-28T05:23:21,346 [INFO ] W-9000-llm_1.0 TS_METRICS - QueueTime.Milliseconds:5000.0|#Level:Host|#hostname:ip-172-31-55-226,timestamp:1709097801
2024-02-28T05:23:21,346 [INFO ] W-9000-llm_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1522
2024-02-28T05:23:21,346 [INFO ] W-9000-llm_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:2.0|#Level:Host|#hostname:ip-172-31-55-226,timestamp:1709097801
Hello my name is Dan
The sun shone brightly in the sky. Dan was feeling very happy. He wanted to go outside and play.
He asked his mom, "Can I go outside and play?"
His mom smiled and said, "Yes, but be careful. Don't go too far."
Dan ran outside and saw a big tree. He wanted to climb it. He started to climb, but he was too scared.
Suddenly, he heard a voice. It was his mom. She said, "Don't worry, Dan. I'm here to help you."
Dan was so happy. He said, "Thank you, Mommy!"
His mom smiled and said, "You're welcome, Dan. Now, let's go back inside."
Dan and his mom went back inside. Dan was so happy to be back in his warm, sunny yard.
<s>

Checklist:

Did you have fun?

Fix dangling pointer error in cpp worker

b1720c9

mreso requested a review from lxning February 28, 2024 05:25

mreso marked this pull request as ready for review February 28, 2024 05:25

lxning approved these changes Feb 28, 2024

View reviewed changes

lxning added this pull request to the merge queue Feb 28, 2024

Merged via the queue into master with commit fa2b0d2 Feb 28, 2024
15 checks passed

muthuraj-i2i pushed a commit to muthuraj-i2i/serve that referenced this pull request Mar 1, 2024

Fix dangling pointer error in cpp worker (pytorch#2975)

4959cd5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dangling pointer error in cpp worker #2975

Fix dangling pointer error in cpp worker #2975

mreso commented Feb 28, 2024

Fix dangling pointer error in cpp worker #2975

Fix dangling pointer error in cpp worker #2975

Conversation

mreso commented Feb 28, 2024

Description

Type of change

Feature/Issue validation/testing

Checklist: