You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
Describe the bug
I used this code to start a openAI server, it loaded successfully, but when I tried to generate some texts, it first output some texts and than raised a error "an illegal memory access was encountered"
this is the output↓
I looked on other issues, and I tried to set use_context_fmha=0 and to cache-max-entry-count to 0.1,neither of them worked, the only way I can solve it is to set quant-policy to 0 than it can run successfully.
I really want to use 8bit kv, is there somebody who can help me?
terminate called after throwing an instance of 'std::runtime_error'what(): [TM][ERROR] CUDA runtime error: an illegal memory access was encountered /lmdeploy/src/turbomind/models/llama/LlamaBatch.h:134
lmdeploy-v1.5-110b-awq.sh: line 10: 699775 Aborted
The text was updated successfully, but these errors were encountered:
Checklist
Describe the bug
I used this code to start a openAI server, it loaded successfully, but when I tried to generate some texts, it first output some texts and than raised a error "an illegal memory access was encountered"
this is the output↓
![image](https://private-user-images.githubusercontent.com/140953076/344778659-79f0aef5-9865-42b9-943d-a85858370430.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA4MDAwMDAsIm5iZiI6MTcyMDc5OTcwMCwicGF0aCI6Ii8xNDA5NTMwNzYvMzQ0Nzc4NjU5LTc5ZjBhZWY1LTk4NjUtNDJiOS05NDNkLWE4NTg1ODM3MDQzMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzEyJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxMlQxNTU1MDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zZTBlZDMzN2RjZjE0YWY4ODA1OWVhYzVjYzVkZGM1MTI3MGY3NjgyNmVhMzI5ZTRiYWExYWZmNjIwNmFhZWM3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.0It1gPp56dS03VXswVZxn7eAsFNO0GxM7aMA2_7d1Jc)
![image](https://private-user-images.githubusercontent.com/140953076/344779803-44cf886e-aab0-4239-b2c8-91b4e13dec9b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA4MDAwMDAsIm5iZiI6MTcyMDc5OTcwMCwicGF0aCI6Ii8xNDA5NTMwNzYvMzQ0Nzc5ODAzLTQ0Y2Y4ODZlLWFhYjAtNDIzOS1iMmM4LTkxYjRlMTNkZWM5Yi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzEyJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxMlQxNTU1MDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xYzBmMzFmNGVmMjAzOGU1NjgzNzhhODc5ZjBmNDVhNjkyNzE5YzRhZTFkZmYzMDQxOTVkZTI3NmNhNTUwNWI2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.ElziRGrjcFfXlwY93FltDP2Y8gN3Gl0-NG3lpMQUA6k)
I looked on other issues, and I tried to set use_context_fmha=0 and to cache-max-entry-count to 0.1,neither of them worked, the only way I can solve it is to set quant-policy to 0 than it can run successfully.
I really want to use 8bit kv, is there somebody who can help me?
Reproduction
Environment
Error traceback
The text was updated successfully, but these errors were encountered: