Non-Uniform Photon Launch Distributions from Sources #31

JBFord · 2018-03-01T17:52:15Z

Hello Dr. Fang,

I have noticed that I am having an artifact occur when running a low scattering, Beer's law validation on mcxlab using Windows 10, NVIDIA GTX 1060, cuda 9.1, MATLAB R2017a. This artifact shows a non-uniform distribution of photons from the source when using a disk or Gaussian source (and a planar-like source based on your own diagnostics). Below is your diagnosis of the problem:

I confirm that I was able to reproduce this issue on windows 10 with
CUDA 8 using the latest mcx code. I can also confirm that this does not
show on Linux based binaries.

with a little bit more investigation, I found the issue is more extensive
than just the disk source, this also seem to happen on Gaussian
source, or planar-like sources (4 equal-distant points along the edge,
see attached figure). yes, this only happens on windows, and is
only observable with low scattering.

just by looking at this figure, I think it suggests some random number
generator issue. It looks like one of the two random numbers determining
the x/y position of the photon tends to drop to 0 for some reason.
and there is a period associated with this drop, likely some sort of
self-correlation.

I will investigate this further. The current RNG used is a new one
(xorshift128+) compared to the one when this was initially reported.
so, I am not entirely sure why this issue persists. Because xorshift128+
also involves 64bit data structures to store RNG states, so, there might
also be a possibility of the Windows nvidia driver has some sort of bug
related to 64bit data processing.

In the attached .zip file is the MATLAB code I have been using, as well as the raw results generated and some images of the fluence distribution.

Thank you for looking into this problem.

Best regards,
Jeremy
testMCX.zip

fangq · 2018-03-11T21:24:24Z

this issue was originally reported in this mailing list thread back in April, 2016

https://groups.google.com/forum/#!topic/mcx-users/nRcAkZbvBaI

and was recently encountered again by @jbfborg.

The reported artifacts are similar, and later identified as not specific to disk sources.

fangq · 2018-03-11T21:34:57Z

@jbfborg, I just want you to know that the reported issue was identified and a fix was committed.

it turns out that the cause of the issue was due to the low precision of the CPU (host) random number generator (RNG) on Windows.

the default Windows c-library rand() function is only a 15bit RNG. The low-precision makes the initial RNG states, provided by the CPU RNG, containing lots of 0s (particularly, the random number seeds from the host - unsigned integer of 32 bits - has 0 for the high 17 bits, and only the low 15bits are random). This is not an issue on Linux and Mac because the default rand() is a 31bit RNG.

I verified this by compiling the debug version of mcx using

make debug

and then run one of the benchmarks using the debug binary:

cd examples/benchmark/
./run_benchmark1.sh -n 2

the printed RNG random state vector (two 64bit integers for the default xorshift128+ RNG) shows a lot of zeros, for example:

init RNG=[E5300006ABC 464400006958]
init RNG=[74600005F6A 2AE400004FBA]

you can see only the lower 4 letters are nonzero (in each 32bit segment of the 64bit RNG states). In comparison, on Linux, this prints as

init RNG=[4F87DF05418D7CBF 88D217E43EBC9E3A] 
init RNG=[393E5213593D071E ABCBC0C6FEA11C48]

the low-precision rand() was actually reported in many places, and has caused problems in other projects, for example

https://www.csie.ntu.edu.tw/~cjlin/liblinear/FAQ.html#windows_binary_files
cjlin1/liblinear#28
https://msdn.microsoft.com/en-us/library/398ax69y.aspx

In the below fix, I used the approach in this post cjlin1/liblinear#28 (review)

a0d445b

now running mcxlab/mcx on Windows no longer has these artifacts.

fangq self-assigned this Mar 1, 2018

fangq added the bug label Mar 1, 2018

fangq closed this as completed in a0d445b Mar 11, 2018

This was referenced Mar 11, 2018

Bug in source type "disk" #26

Closed

Planar source defects #29

Closed

Fluence artifact / bug when using gaussian source type #17

Closed

jdtatz pushed a commit to jdtatz/mcx that referenced this issue Jul 15, 2020

fix Windows low quality host RNG due to 15bit rand(), fix fangq#31

6537c1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Uniform Photon Launch Distributions from Sources #31

Non-Uniform Photon Launch Distributions from Sources #31

JBFord commented Mar 1, 2018

fangq commented Mar 11, 2018

fangq commented Mar 11, 2018 •

edited

Loading

Non-Uniform Photon Launch Distributions from Sources #31

Non-Uniform Photon Launch Distributions from Sources #31

Comments

JBFord commented Mar 1, 2018

fangq commented Mar 11, 2018

fangq commented Mar 11, 2018 • edited Loading

fangq commented Mar 11, 2018 •

edited

Loading