Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effect Plot for Linear Models #604

Open
mattharrison opened this issue Sep 6, 2018 · 11 comments
Open

Effect Plot for Linear Models #604

mattharrison opened this issue Sep 6, 2018 · 11 comments
Assignees
Labels
level: novice good for beginners or new contributors type: feature a new visualizer or utility for yb
Milestone

Comments

@mattharrison
Copy link

Describe the solution you'd like
Would love to have an Effect Plot for aiding with interpreting linear models. I realize that a feature importance plot does some of this. An effect plot shows the weights as a bar plot so you can see whether the impact is positive or negative and also how large the variance is.

Examples

There is a great example here
https://christophm.github.io/interpretable-ml-book/limo.html#visual-parameter-interpretation

My scouring has not turned up any Python code to generate this plot in the wild.

@bbengfort bbengfort added type: feature a new visualizer or utility for yb hacktoberfest level: novice good for beginners or new contributors labels Sep 6, 2018
@smile2snail
Copy link

Hi Mattharrison,

Have you tried Random Forrest? It can help you create a feature importance chart by using python.

I found a good example to create the feature importance chart:

http://www.agcross.com/2015/02/random-forests-scikit-learn/

Is this kind of solution you are asking about? If not, please clarify and I can continue help on this issue.

@bbengfort
Copy link
Member

@mattharrison great suggestion - I think an effect plot would be a very interesting feature to add to yellowbrick.regressors for any estimator that has an learned coefs_ attribute. I was a bit confused about how to determine the variance in the weight plot - but it looks like this is not required since the effect can simply be computed via the training data.

Matplotlib has a box plot implementation, so it would be straightforward to pass a 2D array of effects to produce this plot. However, I'm especially intrigued about the possibility of also including a single point as in 5.1.5 (or points). Perhaps we could provide this functionality by having the user pass in the point data to be plotted as test data?

@smile2snail thank you for chiming in here- I think what @mattharrison is looking for is a new visualizer that can create this visualization for regression models. Please also note that Yellowbrick does already have a FeatureImportances visualizer that does something very similar to the plot you suggested!

@mattharrison as always, thank you for being an excellent resource for new visualizers!

@bbengfort bbengfort added the priority: high should be done before next release label Oct 13, 2018
@souravsingh
Copy link
Contributor

@bbengfort I am interested in working on the issue.

@bbengfort
Copy link
Member

@souravsingh that'd be great - feel free to open a PR when you're ready to discuss it!

@naresh-bachwani
Copy link
Contributor

Hello @bbengfort ,
I was working on this issue for quite a few days and have built a class(beta version) for dealing with this issue. The output with is of the form:
Screenshot (126)

The code snippet looks like this

model = LinearRegression()
viz = effect(model=model)
viz.fit(dataset,Y)
viz.finalize()

I wish to hear your reviews on this and any suggestion would be valuable.

@lwgray
Copy link
Contributor

lwgray commented Mar 26, 2019

@naresh-bachwani Thanks for commenting on this issue. We are just coming off a hiatus and it might take a bit to get to this but we will asap. I encourage you to open a PR. Our contributing guide can be found @ http://www.scikit-yb.org/en/latest/contributing.html

@naresh-bachwani
Copy link
Contributor

naresh-bachwani commented Mar 30, 2019

Dear @bbengfort @mattharrison @rebeccabilbro @lwgray,
I have been working with effect plots and PCA for some time. I have my GSoC'19 proposal ready and would like to have reviews and help from mentors. I have made a PR related to the proposal and the link for the PR is this.

@naresh-bachwani
Copy link
Contributor

Hello @lwgray,
I have done some work regarding effect plot and wanted to open a PR. But I had a question! In which directory should I put my effect plot file into?
According to me it should go in yellowbricks/features. Correct me if I am wrong!

@bbengfort
Copy link
Member

Hi @naresh-bachwani actually I propose that this plot should go into yellowbrick/regressor/effect.py and it should extend RegressionScoreVisualizer. I completely understand your point about the similarity of this to the FeatureImportances visualizer, however, my feeling is that this plot is more directly about the analysis and interpretation of a linear model and is coupled more deeply to this type of model than the importances plots are (which might be about classification, clustering, etc).

Why don't you go ahead and start with it there, and in the course of reviewing the PR we can see if it continues to make sense in the regressor module?

@bbengfort
Copy link
Member

bbengfort commented Apr 6, 2019

@naresh-bachwani I'm slowly getting back involved with PRs and issues - I noticed that you currently have two PRs open, #806 and #807; I really appreciate your enthusiasm and desire to contribute to YB - but perhaps we could focus on getting those shipped before opening a new PR for effect plots?

We're quite a small group and we do this in our spare time -- as you can probably tell we don't have a lot of surface area to deal with a large number of PRs!

@naresh-bachwani
Copy link
Contributor

Hello @bbengfort,
Thank you for the guidance and clearing the doubts. I have completed building a simple base class for effect plot and we can work through hyperparameter setup on GITHUBgist once my two PRs get shipped!

@bbengfort bbengfort added this to the v1.2 milestone Jan 8, 2020
@bbengfort bbengfort removed the priority: high should be done before next release label Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level: novice good for beginners or new contributors type: feature a new visualizer or utility for yb
Projects
None yet
Development

No branches or pull requests

8 participants