Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if filter = NULL test whether data has been filtered, and skip in case #50

Open
stemangiola opened this issue Apr 9, 2024 · 4 comments

Comments

@stemangiola
Copy link
Collaborator

@susansjy22 sometime, we don't know whether the date has been filtered so we can set up the default filter argument to NULL, and you can test within the filter function if the minimum RNA count per cell is X and assume that the data has been filtered already.

That X threshold can be found in the Surat tutorial

e.g.

nFeature_RNA > 200 # This is RNa feature

from https://satijalab.org/seurat/archive/v3.0/pbmc3k_tutorial.html

and

lower = 100,. # This is RNa counts

from https://rdrr.io/github/MarioniLab/DropletUtils/man/emptyDrops.html

This will be done within the function of filtering, empty droplets, so some samples could have been filtered, and some samples could have not. The reports will show as they do know how many droplets wear filtered out in the user will be able to tell.

@myushen
Copy link
Contributor

myushen commented Apr 10, 2024

I wonder if pipeline running time is what enables this issue? What other impact might there be if users are unaware that empty droplets have been filtered out from their data, and they continue using filter_empty_droplets=TRUE?

@stemangiola
Copy link
Collaborator Author

stemangiola commented Apr 10, 2024

this is not an issue of running time. The consequence can be that you are filtering more cells than needed. If filter_empty_droplets = NULL by default, then an undecided user will use this. If they user changes the default, then they will put quite a lot of thoughts on it.

@myushen
Copy link
Contributor

myushen commented Apr 10, 2024

I don't think applying filter_empty_droplets=TRUE twice will filter out more cells since the threshold is set.
I think a benefit of adding this function is giving users flexibility to choose the default cutoff applied by "filter_empty_droplets", or any other thresholds they prefer.

@stemangiola
Copy link
Collaborator Author

our empty droplet calculation is not base on a threshold, but on outliers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants