Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: density estimation #2056

Closed
zkurtz opened this issue Mar 19, 2019 · 1 comment
Closed

Feature request: density estimation #2056

zkurtz opened this issue Mar 19, 2019 · 1 comment

Comments

@zkurtz
Copy link
Contributor

zkurtz commented Mar 19, 2019

I've started work on a density estimation package, including a classifier-adjusted density estimation (CADE) routine that currently uses LightGBM for its default classifier. However, it occurs to me that this is almost certainly suboptimal. It would be far more efficient to directly build boosted density estimation trees, and I wonder if this could be done on top of LightGBM's existing codebase without much additional effort (for someone with more C++ skill than me).

Density estimation trees are not a new concept. For example, https://arxiv.org/abs/1607.06635 points out their computational advantages. I suspect that a boosted tree framework could overcome some of the accuracy limitations of single-tree implementations.

Unlike all(?) current LightGBM-supported learners, density estimation is an unsupervised learning method. Given a node, the choice of where and whether to make a split requires keeping track of the min/max value of each feature in that node, and efficiently tallying the number of observations to the left and right of each potential split. A split provides greatest gain when it produces child nodes with very unequal densities but not-too-unequal total mass. Several specific loss functions have been proposed; I would likely start with those encoded in astropy.

Update: I hope to eventually be able to endorse a particular loss function for training. Until then, anyone who tackles this may find my general notes on performance evaluation to be relevant.

@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants