Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust parser result handling #1120

Open
myrix opened this issue May 6, 2024 · 1 comment
Open

More robust parser result handling #1120

myrix opened this issue May 6, 2024 · 1 comment
Assignees
Labels
backend bug is related to backend enhancement this label means that resolving the issue would improve some part of the system frontend bug is related to frontend

Comments

@myrix
Copy link
Contributor

myrix commented May 6, 2024

Current implementation of parser result processing is problematic.

Parser results with disambiguation info are stored as plain text html, see DB table parserresult attribute content, are displayed in the interface as is,

dangerouslySetInnerHTML={{ __html: this.content }}
and are modified by directly taking and saving interface HTML source as is,
this.docToSave.getElementsByTagName("body")[0].innerHTML = document.getElementById("markup-content").innerHTML;

This is obviously unsafe and leads to problems when there are unintended interface HTML source modifications, e.g. when the interface page is modified by translation extensions or built-in translation browser functionality, messing up parser result HTML markup structure.

We need to fix this by properly storing parser result data in explicit internal representation format, e.g. as JSON, both on the backend and the frontend, so that interface would explicitly display, modify and save this representation ensuring its integrity.

Naturally, all functionality which uses parser results as source data, in particular valency example extraction, should be suitably updated. Also, it might be beneficial to store parser results not as whole big JSON documents, but separately by paragraphs or even paragraphs and sentences to simplify processing and editing, in particular allowing to minimize data exchange between frontend and backend when saving disambiguation updates, though that will require more extensive modifications to parserresult DB table (and perhaps intoduction of additional helper tables) and source code of corresponding functionality and should be carefully considered before deciding whether to go for it or not.

It may very well be possible that to a certain extent work on this issue would be better done concurrently with other current issues pertaining to handling of parser results and their derivatives.

@myrix myrix added enhancement this label means that resolving the issue would improve some part of the system backend bug is related to backend frontend bug is related to frontend labels May 6, 2024
@vmonakhov vmonakhov self-assigned this May 6, 2024
vmonakhov added a commit to ispras/lingvodoc that referenced this issue Jun 25, 2024
* init

* get paragraph id

* get dedoc data

* text from dedoc

* full results structure

* fixes

* fixes

* undo doc_parser.py

* handling several bold words

* fix

* next

* fixes

* better bold font and refactoring

* refactoring

* save to_json

* most correct version

* use json any way

* cleanup

* cleanup

* next changes

* some fixes

* json_to_html

* next steps

* fixes after testing

* fixes

* fixed for strange parsers
vmonakhov added a commit that referenced this issue Jun 25, 2024
* get paragraph id

* fixes

* next

* fix

* fixes

* fixes

* fixes

* fixes

* it works

* better bold font and refactoring

* refactoring

* save to_json

* most correct version

* cleanup

* cleanup

* get_by_id

* next steps

* next steps

* right components

* selection

* selection

* next changes

* some fixes

* show results

* toggle variants

* toggle unverified

* correct setting approved

* cleanup

* count highlighted

* best solution for add markup

* refactoring

* refactoring

* new removeFromMarkup

* more smart code

* pasteMarkup

* paste results but keep prefix

* save

* parse element

* minor

* almost complete

* UserVariantModal

* next steps

* fixes after testing

* fixes

* thin fixes

* update on delete key
@vmonakhov
Copy link

vmonakhov commented Jun 25, 2024

The issue is mostly resolved. Main points:

  • In OdtMarkupModal component now we have internal representation of parserresult as JSON. It is generated in backend and transfered through network.
  • Parsers generate html, other components (e.g. valency) use html. So in database we stil store our parserresult as html and convert it every time to json and back on backend side.
  • All elements in browser are rendered as react components. All actions with them reflect in JSON state.
  • Manipulations with DOM are stil using for browserselection and in order to find textnode index within whole text.
  • Many side-effects were fixed. So now we can use bold and/or italic font in text and keep it after manipulations with markup.
  • Everywhere I tried to avoid code duplication and to fix such cases in old code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug is related to backend enhancement this label means that resolving the issue would improve some part of the system frontend bug is related to frontend
Projects
None yet
Development

No branches or pull requests

2 participants