Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBM GPU requires folder permission to compile kernels #2955

Closed
Laurae2 opened this issue Mar 28, 2020 · 7 comments
Closed

LightGBM GPU requires folder permission to compile kernels #2955

Laurae2 opened this issue Mar 28, 2020 · 7 comments
Assignees

Comments

@Laurae2
Copy link
Contributor

Laurae2 commented Mar 28, 2020

Duplicate issue (but closed): #1531

Environment info

Operating System: Ubuntu 18.04

CPU/GPU model: Intel Quad Xeon Platinum 8280 / custom NVIDIA GPU

C++/Python/R version: R 4.0 (devel), Python 3.8, gcc-7.4.0

LightGBM version or commit hash: 03ce02a

Error message

[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 232
[LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 116
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: Unknown model, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 16 bins...
[LightGBM] [Warning] boost::filesystem::create_directory: Permission denied: "/home/hpc_shared/.boost_compute/ef/9dc91d06909cddabe023435952c0ee1f8e8256"
[LightGBM] [Warning] boost::filesystem::create_directory: Permission denied: "/home/hpc_shared/.boost_compute/8f/588239c07d86c10fbbd592dc040cae70b73ada"
terminate called without an active exception
Aborted (core dumped)

Reproducible examples

User must not have compiled any kernel before usage of LightGBM GPU.

Example in R:

library(lightgbm)
library(Matrix)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
train$data[, 1] <- 1:6513
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
valids <- list(test = dtest)

params <- list(objective = "regression",
               metric = "rmse",
               device = "gpu",
               gpu_platform_id = 0,
               gpu_device_id = 0,
               nthread = 1,
               boost_from_average = FALSE,
               max_bin = 32)
model <- lgb.train(params,
                   dtrain,
                   2,
                   valids,
                   min_data = 1,
                   learning_rate = 1,
                   early_stopping_rounds = 10)

Steps to reproduce

  1. Compile LightGBM
  2. Run any GPU model
  3. See the "Permission denied" error appearing

Solutions proposed:

Temporary solutions:

  • Set BOOST_COMPUTE_USE_OFFLINE_CACHE=0 but incur warmup penalty of kernel every time a model is trained (not recommended for HPC users, because 100k models with 1 second of compilation means 100 seconds wasted)
  • Run the model as root first, then kill process and proceed to run as a regular user
  • Change folder permissions
  • Find a workaround for compiling kernels without permissions on another folder (difficult with Boost)

Maybe @huanzhang12 has a better recommendation to solve this in environments where writing under ~/.boost_compute is not allowed.

@huanzhang12
Copy link
Contributor

It seems the compute cache path is hard coded to use the $HOME/.boost_compute folder:
https://github.com/boostorg/compute/blob/master/include/boost/compute/detail/path.hpp#L37
This is a limitation of libboost. We cannot easily fix the path unless we change boost source code.

If you have permission on this machine, I think you can create a symbolic link from ~/.boost_compute to a writable folder like /tmp.

@StrikerRUS
Copy link
Collaborator

We cannot easily fix the path unless we change boost source code.

I see boost/compute was lastly updated more than year ago. I think we can speak about that this project is abandoned. So, it will not get (frequent) updates in the future, and we can not care about the synchronizing with the upstream repo and possible merge conflicts. Maybe we can edit this submodule at our side? If I understand correctly, in this case we will preserve backward compatibility with old versions of boost package, as changes will be applied only to code that users get by git clone ...

image

@huanzhang12
Copy link
Contributor

To make it more flexible, we can check for an environment variable (something like BOOST_COMPUTE_OFFLINE_CACHE_PATH) here; if it is defined, we use it as the compute cache directory. Otherwise just use the default path.

I am not sure about the status of boost/compute, but I feel it has not been abandoned yet (last commit was less than 1 year ago on the develop branch). We can submit a pull request to boost/compute with our changes and see if they will accept it. If not, we can leave it as a patch and apply it before building lightgbm.

@StrikerRUS
Copy link
Collaborator

I like the idea!

Speaking about the status, indeed, it is not abandoned officially. But last commit in develop branch was 11 months ago and last accepted PR is celebrating 1-year anniversary today: boostorg/compute#829 (comment).

Also, Help Wanted section doesn't say that the project is healthy enough.

@StrikerRUS
Copy link
Collaborator

boostorg/compute repo hasn't received any updates for two years.

@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants