Same Analyzer detects entity in text but not in image #1317

NuiMrme · 2024-02-28T10:15:11Z

Describe the bug
same Analyzer detects LOCATION entity token in text but fails to detect the same token in an image

To Reproduce

analyzer=AnalyzerEngine(nlp_engine=nlp_engine_with_french,
    log_decision_process="true",supported_languages = ["fr","en"])

print(analyzer.analyze(text='VALENCE', language ="en"))

ImageAnalyzer = ImageAnalyzerEngine(analyzer_engine = analyzer)
engine = ImageRedactorEngine(image_analyzer_engine = ImageAnalyzer)

Expected behavior
VALENCE is detected as location, even if I change the language, the text, lower-case etc... it is detected as LOCATION. If I use the same Analyzer to create an ImageAnalyzer, VALENCE should be detected as LOCATION if the word is there in the image.

omri374 · 2024-02-28T14:09:10Z

Could it be the the OCR engine doesn't recognizer this text? Have you tried running tesseract on it to see the output?

NuiMrme · 2024-02-28T14:19:32Z

since I have the log_decision_process to "true", the word "VALENCE" is there in the log

Edit: That being said, I created an empty image with just the word "VALENCE" on it, and it was detected as LOCATION. Does the detection depends on the words before and after ??

omri374 · 2024-02-29T15:28:27Z

Yes, location is detected using a named entity recognition model. context words could certainly change the output.
If you have a finite list of locations, you can create a deny list and pass it to the analyzer engine.

NuiMrme · 2024-02-29T16:38:32Z

Yes, location is detected using a named entity recognition model. context words could certainly change the output. If you have a finite list of locations, you can create a deny list and pass it to the analyzer engine.

Thanks for your reply. I actually have that already in my code but I still don't detect this one location. I think the problem is deeper, something about that exact document makes it problematic. In my deny list though I have the locations first letter Capital, in the document it is written all in CAPITAL letters, not sure if this is a problem.

NuiMrme changed the title ~~Analyzer detects entity in text but not in image~~ Same Analyzer detects entity in text but not in image Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same Analyzer detects entity in text but not in image #1317

Same Analyzer detects entity in text but not in image #1317

NuiMrme commented Feb 28, 2024 •

edited by omri374

Loading

omri374 commented Feb 28, 2024

NuiMrme commented Feb 28, 2024 •

edited

Loading

omri374 commented Feb 29, 2024 •

edited

Loading

NuiMrme commented Feb 29, 2024

Same Analyzer detects entity in text but not in image #1317

Same Analyzer detects entity in text but not in image #1317

Comments

NuiMrme commented Feb 28, 2024 • edited by omri374 Loading

omri374 commented Feb 28, 2024

NuiMrme commented Feb 28, 2024 • edited Loading

omri374 commented Feb 29, 2024 • edited Loading

NuiMrme commented Feb 29, 2024

NuiMrme commented Feb 28, 2024 •

edited by omri374

Loading

NuiMrme commented Feb 28, 2024 •

edited

Loading

omri374 commented Feb 29, 2024 •

edited

Loading