Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of bounds error in run_nms #13

Open
EpipolarWagner opened this issue May 25, 2023 · 0 comments
Open

Out of bounds error in run_nms #13

EpipolarWagner opened this issue May 25, 2023 · 0 comments

Comments

@EpipolarWagner
Copy link

HI,

I am trying to get openGlue running. I start training like this:

python train.py --config='config/config.yaml' --features_config='config/features_online/sift.yaml'

However during training superglue with sift I encounter an error.


 Epoch 0:   3%|▉                                 | 313/10851 [05:52<3:18:00,  1.13s/it, loss=7.73, v_num=0s2i, Train NLL loss=8.400, Train Metric loss=0.000]../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [1,0,0], thread: [41,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [1,0,0], thread: [62,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

Running the debugger I get the error:

CUDA error: device-side assert triggered
  File "/media/ssd4TB/software/OpenGlue/models/matching_module.py", line 81, in training_step
    lafs1, responses1, desc1 = self.local_features_extractor(batch['image1'])
  File "/media/ssd4TB/software/OpenGlue/models/features/base.py", line 79, in forward
    lafs, scores = self.run_nms(lafs, scores, image.size())
  File "/media/ssd4TB/software/OpenGlue/models/features/base.py", line 49, in run_nms
    mask[0, 0, kpts_[:, 1], kpts_[:, 0]] = scores_
RuntimeError: CUDA error: device-side assert triggered

During handling of the above exception, another exception occurred:

  File "/media/ssd4TB/software/OpenGlue/train.py", line 86, in main
    trainer.fit(model, datamodule=dm, ckpt_path=config.get('checkpoint'))
  File "/media/ssd4TB/software/OpenGlue/train.py", line 90, in <module>
    main()

I use pytroch 1.11.0 with cuda 11.3 and pytorch lightning 1.6
GPU is a 3090 and operating system is Ubunutu.

# packages in environment at /home/wga2hi/anaconda3/envs/openGlue_2:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
_openmp_mutex             5.1                       1_gnu    defaults
absl-py                   1.4.0                    pypi_0    pypi
aiohttp                   3.8.4                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
albumentations            1.3.0                    pypi_0    pypi
antlr4-python3-runtime    4.9.3                    pypi_0    pypi
appdirs                   1.4.4                    pypi_0    pypi
async-timeout             4.0.2                    pypi_0    pypi
asynctest                 0.13.0                   pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
bosch-ca                  1.0                           1    defaults
ca-certificates           2022.10.11      boschca_h06a4308_0  [bosch-ca]  defaults
cachetools                5.3.0                    pypi_0    pypi
certifi                   2022.12.7       boschca_py37h06a4308_0  [bosch-ca]  defaults
charset-normalizer        3.1.0                    pypi_0    pypi
click                     8.1.3                    pypi_0    pypi
cycler                    0.11.0                   pypi_0    pypi
deepdish                  0.3.7                    pypi_0    pypi
docker-pycreds            0.4.0                    pypi_0    pypi
fonttools                 4.38.0                   pypi_0    pypi
frozenlist                1.3.3                    pypi_0    pypi
fsspec                    2023.1.0                 pypi_0    pypi
gitdb                     4.0.10                   pypi_0    pypi
gitpython                 3.1.31                   pypi_0    pypi
google-auth               2.18.1                   pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
grpcio                    1.54.2                   pypi_0    pypi
idna                      3.4                      pypi_0    pypi
imageio                   2.29.0                   pypi_0    pypi
importlib-metadata        6.6.0                    pypi_0    pypi
joblib                    1.2.0                    pypi_0    pypi
kiwisolver                1.4.4                    pypi_0    pypi
kornia                    0.6.12                   pypi_0    pypi
kornia-moons              0.2.6                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1    defaults
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 11.2.0               h1234567_1    defaults
libgomp                   11.2.0               h1234567_1    defaults
libstdcxx-ng              11.2.0               h1234567_1    defaults
markdown                  3.4.3                    pypi_0    pypi
markupsafe                2.1.2                    pypi_0    pypi
matplotlib                3.5.3                    pypi_0    pypi
multidict                 6.0.4                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    defaults
networkx                  2.6.3                    pypi_0    pypi
numexpr                   2.8.4                    pypi_0    pypi
numpy                     1.21.6                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
omegaconf                 2.3.0                    pypi_0    pypi
opencv-python             4.7.0.72                 pypi_0    pypi
opencv-python-headless    4.7.0.72                 pypi_0    pypi
openssl                   1.1.1t               h7f8727e_0    defaults
packaging                 23.1                     pypi_0    pypi
pathtools                 0.1.2                    pypi_0    pypi
pillow                    9.5.0                    pypi_0    pypi
pip                       22.3.1           py37h06a4308_0    defaults
protobuf                  3.20.3                   pypi_0    pypi
psutil                    5.9.5                    pypi_0    pypi
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pydeprecate               0.3.2                    pypi_0    pypi
pyparsing                 3.0.9                    pypi_0    pypi
python                    3.7.16               h7a1cb2a_0    defaults
python-dateutil           2.8.2                    pypi_0    pypi
pytorch-lightning         1.6.0                    pypi_0    pypi
pywavelets                1.3.0                    pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
qudida                    0.0.4                    pypi_0    pypi
readline                  8.2                  h5eee18b_0    defaults
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
scikit-image              0.19.3                   pypi_0    pypi
scikit-learn              1.0.2                    pypi_0    pypi
scipy                     1.7.3                    pypi_0    pypi
sentry-sdk                1.24.0                   pypi_0    pypi
setproctitle              1.3.2                    pypi_0    pypi
setuptools                65.6.3           py37h06a4308_0    defaults
shutup                    0.2.0                    pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
smmap                     5.0.0                    pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0    defaults
tables                    3.7.0                    pypi_0    pypi
tensorboard               2.11.2                   pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tifffile                  2021.11.2                pypi_0    pypi
tk                        8.6.12               h1ccaba5_0    defaults
torch                     1.11.0+cu113             pypi_0    pypi
torchmetrics              0.11.4                   pypi_0    pypi
tqdm                      4.65.0                   pypi_0    pypi
typing-extensions         4.6.1                    pypi_0    pypi
urllib3                   1.26.16                  pypi_0    pypi
wandb                     0.15.3                   pypi_0    pypi
werkzeug                  2.2.3                    pypi_0    pypi
wheel                     0.38.4           py37h06a4308_0    defaults
xz                        5.4.2                h5eee18b_0    defaults
yarl                      1.9.2                    pypi_0    pypi
zipp                      3.15.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0    defaults

I also tried with different lightning and torch versions, however the issue still occurs, but at other iterations during training.
Are you aware of the issue? Are there fixes for it.

Could you please help? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant