Reproducing InstructBLIP on Flickr30K #719

gabrielsantosrv · 2024-06-25T17:52:27Z

Hi,

I'm trying to reproduce the results reported on "InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning". But, I'm facing difficulty reproducing the InstructBLIP (Vicuna-7B) results on Flickr30K test set for the image captioning task.

I'm using the model from Hugginface and executing the code snippet below, and I'm getting a Cider score of 60.9 while the reported one is 82.4.

I'm using the prompt reported on the paper, "A short image description: ", and decoding hyperparams from the example in huggingface. I wonder if I'm using the correct hyperparams and prompt?

PS: using the same hyperparams and the prompt "A short image caption." increases the cider score to 83.1

    model_name = "Salesforce/instructblip-vicuna-7b"
    processor = InstructBlipProcessor.from_pretrained(model_name)
    model = InstructBlipForConditionalGeneration.from_pretrained(model_name,  torch_dtype=torch.float16)
    model.to(device)
    model.eval()

    prompt = ["A short image description: "] * config.batch
    transform = lambda img: processor(images=img, text=prompt, return_tensors="pt")    
    dataset = load_dataset(config=config, transform=transform)
    
    results = []
    for batch in tqdm.tqdm(dataset, desc="Inference"):
        img_ids, images, _ = batch
        inputs = images.to(device)
        outputs = model.generate(
            **inputs,
            do_sample=False,
            num_beams=5,
            max_length=256,
            min_length=1,
            top_p=0.9,
            repetition_penalty=1.5,
            length_penalty=1.0,
            temperature=1,
        )
        generated_text = processor.batch_decode(outputs, skip_special_tokens=True)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing InstructBLIP on Flickr30K #719

Reproducing InstructBLIP on Flickr30K #719

gabrielsantosrv commented Jun 25, 2024

Reproducing InstructBLIP on Flickr30K #719

Reproducing InstructBLIP on Flickr30K #719

Comments

gabrielsantosrv commented Jun 25, 2024