Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About region caption #48

Open
mu-jin-meng opened this issue May 8, 2024 · 2 comments
Open

About region caption #48

mu-jin-meng opened this issue May 8, 2024 · 2 comments

Comments

@mu-jin-meng
Copy link

the generated results only describe the content and not the answer for the specified prompt.
1715157993410
result:
1552a25d96b1424058997d306117c77

@kanguyen-vn
Copy link

The model wasn't really trained to perform region-level reasoning; it was only train to do region-level captioning. If you look in these region-level dataset classes, they only use the REGION_QUESTIONS and REGION_GROUP_QUESTIONS prompt templates from here as questions for LLM training, and they're all captioning questions. If you want region-level reasoning capabilities, GLaMM might not be the best solution for you. If you don't really need segmentation masks in the output, I'd try something like Shikra, for example.

@hanoonaR
Copy link
Member

Hi @mu-jin-meng,

Can you please explain the prompt you are using? If you could explain the task I could help. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants