Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same Analyzer detects entity in text but not in image #1317

Open
NuiMrme opened this issue Feb 28, 2024 · 4 comments
Open

Same Analyzer detects entity in text but not in image #1317

NuiMrme opened this issue Feb 28, 2024 · 4 comments

Comments

@NuiMrme
Copy link

NuiMrme commented Feb 28, 2024

Describe the bug
same Analyzer detects LOCATION entity token in text but fails to detect the same token in an image

To Reproduce

analyzer=AnalyzerEngine(nlp_engine=nlp_engine_with_french,
    log_decision_process="true",supported_languages = ["fr","en"])

print(analyzer.analyze(text='VALENCE', language ="en"))

ImageAnalyzer = ImageAnalyzerEngine(analyzer_engine = analyzer)
engine = ImageRedactorEngine(image_analyzer_engine = ImageAnalyzer)

Expected behavior
VALENCE is detected as location, even if I change the language, the text, lower-case etc... it is detected as LOCATION. If I use the same Analyzer to create an ImageAnalyzer, VALENCE should be detected as LOCATION if the word is there in the image.

@omri374
Copy link
Contributor

omri374 commented Feb 28, 2024

Could it be the the OCR engine doesn't recognizer this text? Have you tried running tesseract on it to see the output?

@NuiMrme
Copy link
Author

NuiMrme commented Feb 28, 2024

since I have the log_decision_process to "true", the word "VALENCE" is there in the log

Edit: That being said, I created an empty image with just the word "VALENCE" on it, and it was detected as LOCATION. Does the detection depends on the words before and after ??

@NuiMrme NuiMrme changed the title Analyzer detects entity in text but not in image Same Analyzer detects entity in text but not in image Feb 29, 2024
@omri374
Copy link
Contributor

omri374 commented Feb 29, 2024

Yes, location is detected using a named entity recognition model. context words could certainly change the output.
If you have a finite list of locations, you can create a deny list and pass it to the analyzer engine.

@NuiMrme
Copy link
Author

NuiMrme commented Feb 29, 2024

Yes, location is detected using a named entity recognition model. context words could certainly change the output. If you have a finite list of locations, you can create a deny list and pass it to the analyzer engine.

Thanks for your reply. I actually have that already in my code but I still don't detect this one location. I think the problem is deeper, something about that exact document makes it problematic. In my deny list though I have the locations first letter Capital, in the document it is written all in CAPITAL letters, not sure if this is a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants