Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use community contributed BERT model? #196

Closed
JoKerDii opened this issue Jan 3, 2021 · 5 comments
Closed

Cannot use community contributed BERT model? #196

JoKerDii opened this issue Jan 3, 2021 · 5 comments

Comments

@JoKerDii
Copy link

JoKerDii commented Jan 3, 2021

Hi!

I want to use a community-contributed pre-trained BERT model PubMedBERT on biomedical literature.

I ran the following code:

text = 'The quick brown fox jumps over the lazy dog .'
pretrained_model_path = "/my/path/to/pytorch_model.bin"

aug = naw.ContextualWordEmbsAug(
    model_path=pretrained_model_path, action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

and I got:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-6144635fd2bd> in <module>
----> 1 aug = naw.ContextualWordEmbsAug(
      2     model_path=pretrained_model_path, action="insert")
      3 augmented_text = aug.augment(text)
      4 print("Original:")
      5 print(text)

~/anaconda3/lib/python3.8/site-packages/nlpaug/augmenter/word/context_word_embs.py in __init__(self, model_path, action, temperature, top_k, top_p, name, aug_min, aug_max, aug_p, stopwords, device, force_reload, optimize, stopwords_regex, verbose, silence)
     98 
     99         self._init()
--> 100         self.model = self.get_model(
    101             model_path=model_path, device=device, force_reload=force_reload, temperature=temperature, top_k=top_k,
    102             top_p=top_p, optimize=optimize, silence=silence)

~/anaconda3/lib/python3.8/site-packages/nlpaug/augmenter/word/context_word_embs.py in get_model(cls, model_path, device, force_reload, temperature, top_k, top_p, optimize, silence)
    447     def get_model(cls, model_path, device='cuda', force_reload=False, temperature=1.0, top_k=None, top_p=0.0,
    448                   optimize=None, silence=True):
--> 449         return init_context_word_embs_model(model_path, device, force_reload, temperature, top_k, top_p, optimize, silence)

~/anaconda3/lib/python3.8/site-packages/nlpaug/augmenter/word/context_word_embs.py in init_context_word_embs_model(model_path, device, force_reload, temperature, top_k, top_p, optimize, silence)
     39             silence=silence)
     40     else:
---> 41         raise ValueError('Model name value is unexpected. Only support BERT, DistilBERT, RoBERTa and XLNet model.')
     42 
     43     CONTEXT_WORD_EMBS_MODELS[model_name] = model

ValueError: Model name value is unexpected. Only support BERT, DistilBERT, RoBERTa and XLNet model.

Is it because 'ContextualWordEmbsAug' does not support 'PubMedBERT' or I used wrong code?
If it does not support it, could you make it possible?

Thank you.

@JoKerDii
Copy link
Author

JoKerDii commented Jan 3, 2021

I tried this way:

pretrained_model_path = "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext"

aug = naw.ContextualWordEmbsAug(
    model_path=pretrained_model_path, action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

but got:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-11-3e41e97ce15e> in <module>
      4 aug = naw.ContextualWordEmbsAug(
      5     model_path=pretrained_model_path, action="insert")
----> 6 augmented_text = aug.augment(text)
      7 print("Original:")
      8 print(text)

~/anaconda3/lib/python3.8/site-packages/nlpaug/base_augmenter.py in augment(self, data, n, num_thread)
     93                 # https://discuss.pytorch.org/t/using-cuda-multiprocessing-with-single-gpu/7300
     94                 for _ in range(aug_num):
---> 95                     result = action_fx(clean_data)
     96                     if isinstance(result, list):
     97                         augmented_results.extend(result)

~/anaconda3/lib/python3.8/site-packages/nlpaug/augmenter/word/context_word_embs.py in insert(self, data)
    242                     masked_text = self.model.tokenizer.convert_tokens_to_string(head_doc.get_augmented_tokens()).strip()
    243 
--> 244                 masked_texts.append(masked_text)
    245 
    246             if not len(masked_texts):

UnboundLocalError: local variable 'masked_text' referenced before assignment

@JoKerDii JoKerDii changed the title Cannot use community contributed BERT model Cannot use community contributed BERT model? Jan 3, 2021
@JoKerDii
Copy link
Author

JoKerDii commented Jan 3, 2021

I just saw your upgrades ''Upgraded to use AutoModel and AutoTokeizer for ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug." in Sep 2020
It seems to solve this problem. Could you give an example of how exactly it works?

@makcedward
Copy link
Owner

NLPAug uses model name to check model type (different model type has different behavior). Therefore, "Model name value is unexpected. Only support BERT, DistilBERT, RoBERTa and XLNet model." is thrown. For the second expcetion, it is caused by model type as well.

makcedward added a commit that referenced this issue Jan 4, 2021
@makcedward makcedward reopened this Jan 4, 2021
@makcedward
Copy link
Owner

makcedward commented Jan 4, 2021

Can try dev version (1.1.2dev) first by
pip install git+https://github.com/makcedward/nlpaug.git

By provide "model_type" (it is "bert" in your case) parameter, it should fix the problem

text = 'The quick brown fox jumps over the lazy dog .'
pretrained_model_path = "/my/path/to/pytorch_model.bin"

aug = naw.ContextualWordEmbsAug(
    model_path=pretrained_model_path, model_type='bert', action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

@JoKerDii
Copy link
Author

JoKerDii commented Jan 4, 2021

It works, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants