Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Uniform Photon Launch Distributions from Sources #31

Closed
JBFord opened this issue Mar 1, 2018 · 2 comments
Closed

Non-Uniform Photon Launch Distributions from Sources #31

JBFord opened this issue Mar 1, 2018 · 2 comments
Assignees
Labels

Comments

@JBFord
Copy link

JBFord commented Mar 1, 2018

Hello Dr. Fang,

I have noticed that I am having an artifact occur when running a low scattering, Beer's law validation on mcxlab using Windows 10, NVIDIA GTX 1060, cuda 9.1, MATLAB R2017a. This artifact shows a non-uniform distribution of photons from the source when using a disk or Gaussian source (and a planar-like source based on your own diagnostics). Below is your diagnosis of the problem:

I confirm that I was able to reproduce this issue on windows 10 with
CUDA 8 using the latest mcx code. I can also confirm that this does not
show on Linux based binaries.

with a little bit more investigation, I found the issue is more extensive
than just the disk source, this also seem to happen on Gaussian
source, or planar-like sources (4 equal-distant points along the edge,
see attached figure). yes, this only happens on windows, and is
only observable with low scattering.

just by looking at this figure, I think it suggests some random number
generator issue. It looks like one of the two random numbers determining
the x/y position of the photon tends to drop to 0 for some reason.
and there is a period associated with this drop, likely some sort of
self-correlation.

I will investigate this further. The current RNG used is a new one
(xorshift128+) compared to the one when this was initially reported.
so, I am not entirely sure why this issue persists. Because xorshift128+
also involves 64bit data structures to store RNG states, so, there might
also be a possibility of the Windows nvidia driver has some sort of bug
related to 64bit data processing.

In the attached .zip file is the MATLAB code I have been using, as well as the raw results generated and some images of the fluence distribution.

Thank you for looking into this problem.

Best regards,
Jeremy
testMCX.zip

@fangq fangq self-assigned this Mar 1, 2018
@fangq fangq added the bug label Mar 1, 2018
@fangq fangq closed this as completed in a0d445b Mar 11, 2018
@fangq
Copy link
Owner

fangq commented Mar 11, 2018

this issue was originally reported in this mailing list thread back in April, 2016

https://groups.google.com/forum/#!topic/mcx-users/nRcAkZbvBaI

and was recently encountered again by @jbfborg.

The reported artifacts are similar, and later identified as not specific to disk sources.

sumoverz
fluenceimage
rng_defect_windows

@fangq
Copy link
Owner

fangq commented Mar 11, 2018

@jbfborg, I just want you to know that the reported issue was identified and a fix was committed.

it turns out that the cause of the issue was due to the low precision of the CPU (host) random number generator (RNG) on Windows.

the default Windows c-library rand() function is only a 15bit RNG. The low-precision makes the initial RNG states, provided by the CPU RNG, containing lots of 0s (particularly, the random number seeds from the host - unsigned integer of 32 bits - has 0 for the high 17 bits, and only the low 15bits are random). This is not an issue on Linux and Mac because the default rand() is a 31bit RNG.

I verified this by compiling the debug version of mcx using

make debug

and then run one of the benchmarks using the debug binary:

cd examples/benchmark/
./run_benchmark1.sh -n 2

the printed RNG random state vector (two 64bit integers for the default xorshift128+ RNG) shows a lot of zeros, for example:

init RNG=[E5300006ABC 464400006958]
init RNG=[74600005F6A 2AE400004FBA]

you can see only the lower 4 letters are nonzero (in each 32bit segment of the 64bit RNG states). In comparison, on Linux, this prints as

init RNG=[4F87DF05418D7CBF 88D217E43EBC9E3A] 
init RNG=[393E5213593D071E ABCBC0C6FEA11C48] 

the low-precision rand() was actually reported in many places, and has caused problems in other projects, for example

https://www.csie.ntu.edu.tw/~cjlin/liblinear/FAQ.html#windows_binary_files
cjlin1/liblinear#28
https://msdn.microsoft.com/en-us/library/398ax69y.aspx

In the below fix, I used the approach in this post cjlin1/liblinear#28 (review)

a0d445b

now running mcxlab/mcx on Windows no longer has these artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants