Cannot use community contributed BERT model? #196

JoKerDii · 2021-01-03T14:08:01Z

Hi!

I want to use a community-contributed pre-trained BERT model PubMedBERT on biomedical literature.

I ran the following code:

text = 'The quick brown fox jumps over the lazy dog .'
pretrained_model_path = "/my/path/to/pytorch_model.bin"

aug = naw.ContextualWordEmbsAug(
    model_path=pretrained_model_path, action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

and I got:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-6144635fd2bd> in <module>
----> 1 aug = naw.ContextualWordEmbsAug(
      2     model_path=pretrained_model_path, action="insert")
      3 augmented_text = aug.augment(text)
      4 print("Original:")
      5 print(text)

~/anaconda3/lib/python3.8/site-packages/nlpaug/augmenter/word/context_word_embs.py in __init__(self, model_path, action, temperature, top_k, top_p, name, aug_min, aug_max, aug_p, stopwords, device, force_reload, optimize, stopwords_regex, verbose, silence)
     98 
     99         self._init()
--> 100         self.model = self.get_model(
    101             model_path=model_path, device=device, force_reload=force_reload, temperature=temperature, top_k=top_k,
    102             top_p=top_p, optimize=optimize, silence=silence)

~/anaconda3/lib/python3.8/site-packages/nlpaug/augmenter/word/context_word_embs.py in get_model(cls, model_path, device, force_reload, temperature, top_k, top_p, optimize, silence)
    447     def get_model(cls, model_path, device='cuda', force_reload=False, temperature=1.0, top_k=None, top_p=0.0,
    448                   optimize=None, silence=True):
--> 449         return init_context_word_embs_model(model_path, device, force_reload, temperature, top_k, top_p, optimize, silence)

~/anaconda3/lib/python3.8/site-packages/nlpaug/augmenter/word/context_word_embs.py in init_context_word_embs_model(model_path, device, force_reload, temperature, top_k, top_p, optimize, silence)
     39             silence=silence)
     40     else:
---> 41         raise ValueError('Model name value is unexpected. Only support BERT, DistilBERT, RoBERTa and XLNet model.')
     42 
     43     CONTEXT_WORD_EMBS_MODELS[model_name] = model

ValueError: Model name value is unexpected. Only support BERT, DistilBERT, RoBERTa and XLNet model.

Is it because 'ContextualWordEmbsAug' does not support 'PubMedBERT' or I used wrong code?
If it does not support it, could you make it possible?

Thank you.

The text was updated successfully, but these errors were encountered:

JoKerDii · 2021-01-03T14:56:41Z

I tried this way:

pretrained_model_path = "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext"

aug = naw.ContextualWordEmbsAug(
    model_path=pretrained_model_path, action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

but got:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-11-3e41e97ce15e> in <module>
      4 aug = naw.ContextualWordEmbsAug(
      5     model_path=pretrained_model_path, action="insert")
----> 6 augmented_text = aug.augment(text)
      7 print("Original:")
      8 print(text)

~/anaconda3/lib/python3.8/site-packages/nlpaug/base_augmenter.py in augment(self, data, n, num_thread)
     93                 # https://discuss.pytorch.org/t/using-cuda-multiprocessing-with-single-gpu/7300
     94                 for _ in range(aug_num):
---> 95                     result = action_fx(clean_data)
     96                     if isinstance(result, list):
     97                         augmented_results.extend(result)

~/anaconda3/lib/python3.8/site-packages/nlpaug/augmenter/word/context_word_embs.py in insert(self, data)
    242                     masked_text = self.model.tokenizer.convert_tokens_to_string(head_doc.get_augmented_tokens()).strip()
    243 
--> 244                 masked_texts.append(masked_text)
    245 
    246             if not len(masked_texts):

UnboundLocalError: local variable 'masked_text' referenced before assignment

JoKerDii · 2021-01-03T15:11:12Z

I just saw your upgrades ''Upgraded to use AutoModel and AutoTokeizer for ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug." in Sep 2020
It seems to solve this problem. Could you give an example of how exactly it works?

makcedward · 2021-01-04T09:29:49Z

NLPAug uses model name to check model type (different model type has different behavior). Therefore, "Model name value is unexpected. Only support BERT, DistilBERT, RoBERTa and XLNet model." is thrown. For the second expcetion, it is caused by model type as well.

Fix #196

makcedward · 2021-01-04T10:04:07Z

Can try dev version (1.1.2dev) first by
pip install git+https://github.com/makcedward/nlpaug.git

By provide "model_type" (it is "bert" in your case) parameter, it should fix the problem

text = 'The quick brown fox jumps over the lazy dog .'
pretrained_model_path = "/my/path/to/pytorch_model.bin"

aug = naw.ContextualWordEmbsAug(
    model_path=pretrained_model_path, model_type='bert', action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

JoKerDii · 2021-01-04T11:12:28Z

It works, thanks a lot!

JoKerDii changed the title ~~Cannot use community contributed BERT model~~ Cannot use community contributed BERT model? Jan 3, 2021

makcedward closed this as completed in b06037b Jan 4, 2021

makcedward added a commit that referenced this issue Jan 4, 2021

Merge pull request #197 from makcedward/dev

3df5ed2

Fix #196

makcedward reopened this Jan 4, 2021

makcedward closed this as completed Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use community contributed BERT model? #196

Cannot use community contributed BERT model? #196

JoKerDii commented Jan 3, 2021

JoKerDii commented Jan 3, 2021

JoKerDii commented Jan 3, 2021 •

edited

Loading

makcedward commented Jan 4, 2021

makcedward commented Jan 4, 2021 •

edited

Loading

JoKerDii commented Jan 4, 2021

Cannot use community contributed BERT model? #196

Cannot use community contributed BERT model? #196

Comments

JoKerDii commented Jan 3, 2021

JoKerDii commented Jan 3, 2021

JoKerDii commented Jan 3, 2021 • edited Loading

makcedward commented Jan 4, 2021

makcedward commented Jan 4, 2021 • edited Loading

JoKerDii commented Jan 4, 2021

JoKerDii commented Jan 3, 2021 •

edited

Loading

makcedward commented Jan 4, 2021 •

edited

Loading