Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CRLF line endings or improve documentation and error message #5508

Closed
js850 opened this issue Sep 26, 2022 · 4 comments
Closed

Comments

@js850
Copy link
Contributor

js850 commented Sep 26, 2022

Lightgbm should handle CRLF (\r\n) line endings or at the very least it should fail gracefully with a nice error message.

Summary

I have examples of lightgbm model files (.lgb files) which crash python when trying to load them in lightgbm. If you change the line endings from CRLF (\r\n) to LF (\n) then it is loaded without a problem.

According to a comment in #3589 this is expected behavior since lightgbm only supports LF (\n) line endings. That is why I am putting it here as a feature request rather than a bug.

Ideally lightgbm should support CRLF line endings. But even if that is the expected behavior, the current failure is far from ideal

  1. Error message: The error message is [LightGBM] [Fatal] Model format error, expect a tree here. met 200 1298 1149 12880 ... which does not state what the actual problem is.
  2. Crashing: It does not handle the error. And since the error occurs in low level code (not in python) it completely crashes python with message *** buffer overflow detected ***: python terminated
  3. Documentation: If it is expected to fail for CRLF line endings then it should be documented somewhere. I have not found that documentation.
  4. Consistency: I have many examples of lightgbm model files with CRLF line endings which can be loaded just fine. Only certain ones crash python.

I think lightgbm should either add support for CRLF line endings or at least gracefully handle failures caused by line endings -- returning a useful error message and not crashing python.

Motivation

Many people on windows use lightgbm. Git also has a standard feature to convert line endings when cloning a repo. So even if the model is checked in with \n line endings, it may still fail on windows machines.

@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@StrikerRUS
Copy link
Collaborator

XGBoost solution:
https://github.com/dmlc/xgboost/blob/a1bcd33a3b74f2e80b870c8f4d4f13e94375a8e4/src/common/config.h#L55-L75

@github-actions

This comment was marked as off-topic.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023
@jameslamb
Copy link
Collaborator

Sorry, this was locked accidentally. Just unlocked it. We'd still love help with this feature!

@microsoft microsoft unlocked this conversation Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants