Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPW weights bug. Missuse of np.average rather than mean of weighted outcomes #67

Open
marcoBmota8 opened this issue May 16, 2024 · 5 comments

Comments

@marcoBmota8
Copy link

marcoBmota8 commented May 16, 2024

Hi,

I believe there is a mistake in the calculation of outcomes in IPW.
The code is using np.average to compute the expected outcome across treatment groups. This computation sums the weighted outcomes and normalizes by the sum of the weights which is incorrect for IPW:
sum(weights*y)/sum(weights)

Instead it should be the mean of the weighted outcomes:
sum(weights*y)/len(y)

This is an easy but important fix.

@marcoBmota8
Copy link
Author

marcoBmota8 commented May 16, 2024

Additional information to this issue

How the IPW is implemented as of now is not wrong per se. It is just another (better in most cases) way to implement IPW, the Hajek estimator. It reduces variance at the cost of bias (no free lunch) with respect to my proposal (Hortvitz-Thompson estimator) which is unbiased but normally leads to high variance.

Maybe the package would be better if it implemented both and let the user decide.

@ehudkr
Copy link
Collaborator

ehudkr commented May 17, 2024

Hi Marco,
Thank you so much for bringing this up to my attention. This is indeed an important nuance that should be made more explicit.
I agree both options can be implemented, should be simple enough (let me know if you have the time and will to give it a try 🙃).

I'll keep this issue open until solved.

Thanks again.

@ehudkr
Copy link
Collaborator

ehudkr commented May 17, 2024

Note to self for contemplating an elastic-net-like solution for any combination between the Hájek and Horvitz–Thompson estimators a-la equation (1) from Khan and Ugander (2023).

@marcoBmota8
Copy link
Author

Sounds good. I will fork and make a pull request when I get a chance.

That ENet aproach also looks interesting. Might be worth exploring a benchmark sudy on synthetic data ;).

Best,

@marcoBmota8
Copy link
Author

marcoBmota8 commented Jun 15, 2024

Just made the pull request with my fork changes for the HT & Hajek IPW implementations.
I tried to follow the existing code structure and make as few changes as possible but feel free to lmk if there is something that can be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants