Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the equivalent of create_chat_completion in llama-cpp-python #14

Open
LondonX opened this issue Feb 21, 2024 · 1 comment
Open
Assignees
Labels
question Further information is requested

Comments

@LondonX
Copy link

LondonX commented Feb 21, 2024

Hi,

Some of the models in Hugeface shows the support of create_chat_completion, but now this plugin seems only support the Simple inference, will Chat Completion be supported in the future version?

https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF

from llama_cpp import Llama

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = Llama(
  model_path="./tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",  # Download the model file first
  n_ctx=2048,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
)

# Simple inference example
output = llm(
  "<|system|>\n{system_message}</s>\n<|user|>\n{prompt}</s>\n<|assistant|>", # Prompt
  max_tokens=512,  # Generate up to 512 tokens
  stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
  echo=True        # Whether to echo the prompt
)

# Chat Completion API

llm = Llama(model_path="./tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf", chat_format="llama-2")  # Set chat_format according to the model you are using
llm.create_chat_completion(
    messages = [
        {"role": "system", "content": "You are a story writing assistant."},
        {
            "role": "user",
            "content": "Write a story about llamas."
        }
    ]
)
@BrutalCoding BrutalCoding self-assigned this Feb 24, 2024
@BrutalCoding BrutalCoding added good first issue Good for newcomers question Further information is requested and removed good first issue Good for newcomers labels Feb 24, 2024
@BrutalCoding
Copy link
Owner

Hi,

Sorry, I don't have an answer ready yet, just wanted to let you know I've seen your question.

I'd like to say that I'll get back to you soon but I honestly have no idea when.

It's fair to say that I should solve (or answer) the earlier reported issues first before coming back to you with an answer. I hope you understand.

I will update aub_ai to sync with the latest llama.cpp changes this week, mainly due to support Google's new Gemma model. Not sure if aub_ai will have bindings for create_chat_completion, it depends on whether this method is coming from llama.cpp directly (not llama-cpp-python).

Thanks,
Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants