Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encountering Rare Deadlock #4

Open
Skylion007 opened this issue Apr 6, 2017 · 5 comments
Open

Encountering Rare Deadlock #4

Skylion007 opened this issue Apr 6, 2017 · 5 comments

Comments

@Skylion007
Copy link

Skylion007 commented Apr 6, 2017

First of all, excellent job on the repository. I am getting much better results than with other libraries. I do however noticed that when I am processing a large number of files I will occasionally get deadlocks. (On the order of 10,000+). I do not have much experience debugging OpenMP code, but if you let me know any information you need to help track down the bug let me know, and I'll analyze it. You might want to consider using an automated tool like Helgrind to help remove the deadlocks. That being said, whatever race condition that is causing the deadlock is relatively rare. I have compiled with default settings from the build script on Ubuntu 16.04.

The bug was encountered when running face_collector.rb

@nagadomi
Copy link
Owner

nagadomi commented Apr 7, 2017

This code uses only omp parallel for syntax with OpneMP, so I think it should be no deadlock problem. But most of code of this is written by C, it might be a memory related issue.
I fixed one of issue that is overlapping memcpy. (b78f9cd)
In additionally, ImageMagick+libpng in Ubuntu 16.04 may crash when loading a PNG image containing more than 8 MB chunks (large metadata).

And you can check where the code is stopped with gdb.

  1. Add -g0 option to nvxs/configure.ac (around line 36).
CFLAGS+=" -g0 "
  1. Rebuild
cd nvxs
./autogen.sh
cd ..
./build.sh
  1. Run with gdb.
$ cd animeface-ruby
$ gdb --args ruby face_collector.rb --src src --dest dest 
(gdb) run
....
.... ( When deadlock occurs, you can stop with CTRL+C, and display the backtrace with `bt` )
^C
(gdb) bt 

@Skylion007
Copy link
Author

Actually upon further examination, it doesn't appear to be a deadlock based on the process status code. It might be a live lock though. I'll pull the changes and see if I continue to run into the issue.

@Skylion007
Copy link
Author

While it is rare, I was unfortunately able to reproduce it again. :( Took 76000 images for it to freeze. Also seems to happen more often when the system is under high load.

@Skylion007
Copy link
Author

@nagadomi I noticed the old releases of NVXS on your website still have the bug from commit b78f9cd is still in the source code of NVXS from your website. I noticed this when I was trying to build Python bindings for this project. You might want to update the source there too.

@nagadomi
Copy link
Owner

nagadomi commented Jul 4, 2017

hmm, I will remove that source code from the website and link to this git repo.
Note that I changed the interface of nv_animeface_detect a01c02c so need to fix that python bindings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants