G-0K8J8ZR168
검색
검색 팝업 닫기

Ex) Article Title, Author, Keywords

## Article

Curr. Opt. Photon. 2021; 5(6): 617-626

Published online December 25, 2021 https://doi.org/10.3807/COPP.2021.5.6.617

Copyright © Optical Society of Korea.

## GPU-based Monte Carlo Photon Migration Algorithm with Path-partition Load Balancing

Youngjin Jeon1, Jongha Park1, Joonku Hahn2, Hwi Kim1

1Department of Electronics and Information Engineering, Korea University Sejong Campus, Sejong 30019, Korea
2School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea

Corresponding author: *hwikim@korea.ac.kr, ORCID 0000-0002-4283-8982

Received: June 23, 2021; Revised: August 3, 2021; Accepted: September 23, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

A parallel Monte Carlo photon migration algorithm for graphics processing units that implements an improved load-balancing strategy is presented. Conventional parallel Monte Carlo photon migration algorithms suffer from a computational bottleneck due to their reliance on a simple load-balancing strategy that does not take into account the different length of the mean free paths of the photons. In this paper, path-partition load balancing is proposed to eliminate this computational bottleneck based on a mathematical formula that parallelizes the photon path tracing process, which has previously been considered non-parallelizable. The performance of the proposed algorithm is tested using three-dimensional photon migration simulations of a human skin model.

Keywords: Blood vessel, Graphics Proccesing Unit, Monte Carlo Method, Numerical modeling, Scattering

OCIS codes: (040.5160) Photodetectors; (170.0170) Scattering; (170.3660) Light propagation in tissues; (170.5280) Photon migration; (290.7050) Turbid media

Monte Carlo (MC) simulation is a standard approach for photon transport simulations in turbid media [1-3]. The probabilistic interpretation of the radiative transfer equation (RTE), which theoretically describes the photon migration process in turbid media, leads to a statistical MC photon migration process [4, 5]. In practice, MC-based photon migration simulations are commonly used in research on in vivo optical measurement and noninvasive biomedical diagnosis technology. A typical biomedical application is near-infrared spectroscopy (NIRS) of the human skin, which is used to assess the characteristics of the skin, including water content, aging, and thickness. NIRS detects absorption bands specific to water, protein, and lipids [6-8], thus it can be used to analyze secondary structural changes in a protein via the secondary derivatives of NIR spectra [9].

In optics, the extraction of valuable information from randomly scattered reflected light is a challenge, but the technology underpinning NIRS has progressed significantly, based primarily on the use of specific computational algorithms. The non-contact, non-destructive measurement of the human skin requires accurate calibration, signal extraction, and refinement algorithms. From a theoretical perspective, the calibration of an NIRS measurement system and the analysis of extracted optical spectra are difficult due to unpredictable changes in tissue samples and a low signal-to-noise ratio [10-12]. A numerical model of target skin layers that simulates photon migration has been seen as an essential component of effective computational algorithms for near-infrared (NIR) signal extraction.

The MC-based modeling of human skin layers has been actively investigated in past research. According to a review paper on MC-based approaches [13], various approaches to MC structural modeling have been developed in terms of how a target structure is numerically modeled. For example, the original MC code was for an MC model for multi-layered (MCML) tissue that was developed for a multilayered medium with spherical, cylindrical, ellipsoidal, or cuboidal objects [14, 15], while tetrahedron-based (TIMOS) [16, 17] and mesh-based MC methods [18] have also been proposed.

Specific ray-tracing algorithms have also been devised for representation models. Since millions of photons are involved in the MC method, the computational efficiency of the ray-tracing calculation should be a critical factor for practical simulation throughput. In this respect, the parallelizable nature of the MC method has been actively researched and as a consequence, it was allowed to be implemented with parallel process ray-tracing algorithm. It can be said that, in practice, the application such as non-destructive reflectance measurement of light from the surface of the human skin can only be numerically simulated using the parallel MC method.

Recently, parallel MC optical photon transport algorithms using graphics processing units (GPUs) have received significant research attention. The basic concept of parallelism is to divide the photons into smaller photon groups and assign each sub-group of photons to a GPU core. Alerstein et al. first reported proof-of-concept GPU-based acceleration of the MC approach in homogenous media [19], followed by Fang and Boas, who reported the first GPU-accelerated MC algorithm to model light transport inside a 3D heterogeneous domain and released the open-source tool Monte Carlo eXtreme (MCX, Massachusetts, USA) [20]. Recent research on the parallelized implementation of MCX [21] has found that specialized load balancing using techniques such as linear programming should be considered for use in heterogeneous computing systems. However, a uniform thread workload and vector-wise coding are necessary to implement the MC process on a homogeneous system with the same type of GPU. Most previous GPU-based MC photon transport frameworks take just mutual independent migration of individual photons for their parallelism. The total photon count is first divided by the number of launched threads and distributed to the threads, which calculate the assigned photon migration. Although these algorithms have been optimized to reduce run-time differences between different threads, a single computation unit is the entire path of a photon, i.e., the migration path of a photon is calculated sequentially.

In this sense, the conventional MC method used in skin optics analysis can be considered a semi-parallel algorithm due to the serialism of the single photon path computation. Although millions of photons are launched into several independent computing cores, the issue is that the computational load required to trace photons with different mean free paths is not equal due to scattering by the human skin. In the MC approach, the mean free paths of individual photons are statistically determined using Beer’s law. Photons with a longer free path require more calculation time than photons with shorter free paths. This means that, when one computational core has finished estimating the propagation loss of the photons assigned to it, other computational cores may still be in operation. In this case, the computation core has to wait for the other cores, thus reducing computational efficiency in the vector-wise coding of a multicore GPU platform. This layoff-induced inefficiency represents a significant burden in 3D MC simulation. It is obvious that the inefficiency is a practical hindrance to the application of 3D MC simulation for the NIR signal analysis of human skin.

In this paper, the algorithmic bottleneck caused by the conventional load-balancing strategy of MC photon migration algorithms is addressed, and a new path-partition load balancing scheme is proposed that parallelizes the photon path tracing process in order to resolve the inefficiency of the serialized component of the current photon migration process. We modify the open simulation C code mcxyz.c from the Oregon medical laser center (OMLC) by Steven Jacques, Ting Li, and Scott Prahl, which was originally designed [3] as a general 3D MC code for use in GPU programming in MATLAB (Mathworks, MA, USA). Full 3D MC optical photon migration simulations based on the proposed algorithm are implemented on a multi-GPU machine. The practical application of the proposed method for the non-destructive NIR reflectance measurement of light from the human skin demonstrates improved algorithm performance when compared to conventional MC simulations.

### II. SEMI-PARALLELISM IN MONTE CARLO PHOTON MIGRATION SIMULATIONS

A basic simulation skin model is presented in Fig. 1(a), consisting of epidermis and dermis layers. The thickness of the epidermis and dermis is set to 60 μm and 84 μm, respectively. In the simulation, the collimated beam illuminates the skin surface normally. The paths of the non-scattered photon that are parallel to the illumination light direction are indicated by the red lines and an optical power detector is placed just above the epidermis layer, 60 μm from the illuminating light source. The computational volume is discretized in the form of a regular hexahedron bin grid, with a bin size of ΔV = ΔxΔyΔz. In Fig. 1, the total number of bins in the computational volume is set to 200 × 200 × 200. The optical flux distribution and the migration paths of the photons that are selectively captured by the detector are visualized in 1(b) and 1(c) respectively. The eight steps of the basic MC photon migration algorithm are explained in [2]. The first step is the determination of the step size of a photon, which is the straight propagation length of a photon without any directional deviation. It is probabilistically determined according to Beer’s law with eμts, which is described by the probabilistic model [2]:

Figure 1.Monte Carlo (MC) simulation results: (a) simulation skin model (x-z plane), (b) 3D visualization of photon migration simulation result inside the skin layer and simulated bin size is 200 by 200 by 200, and (c) selected migration paths simulation result of the photons detected by the detector at the epidermis layer.

s=1nξ/μt,

where s and ξ are the step size and a random variable uniformly distributed between 0 and 1, respectively, μt is the total attenuation coefficient given by 1 / μt = 1 / (μs + μa), and μs and μa are the scattering and the absorption coefficients, respectively. A photon at (x, y, z) moves distance along with the unit direction vector (μx, μy, μz) to new coordinates (x, y, z) given by

(x',y',z')=(x+μxs,y+μys,z+μzs,).

The algorithm accounts for the intermediate bins that the photon meets on the way to the destination (x, y, z) from its initial position (x, y, z). Along the path, the photon gets losing optical power and, that is in turn absorbed at each bin. Usually, dozens of regular hexahedron bins are located on the straight path s. The initial power of a photon (for convenience, the term ‘photon’ is used to represent photon bundles that can have a continuous range of power values) entering a bin is assumed to be W; the energy absorbed by the bin (i.e., that lost by the photon) and that retained by the photon are represented, respectively, by

Pabs=W(1exp(μaΔs)),

WΔs=Wexp(μaΔs),

where Δs is the differential step taken by the photon in a single bin. If the photon does not meet the boundaries of the entire computational volume, the photon will spin or deflect at (x, y, z) to move in a new direction (μ′x, μ′y, μ′z) according to the isotropic coefficient g parameterizing the Henyey-Greenstein phase function p(θ)=(1g2)/4π1+g22gcosθ3/2 [2, 12]. The deflected photon continues to travel along the new direction (μ′x, μ′y, μ′z) for a new step size s′ to reach the new position (x′′, y′′, z′′). In Fig. 2, the practical setups commonly used in the non-destructive optical reflectance measurement of the human skin are modeled and their 3D MC simulation results are presented. Human skin can have a variety of shapes because it can be easily deformed. Therefore, we simulated three types of skin deformation shapes; reflection model, transflective model and transmission model. Reflection model measures reflected photons in the absence of skin deformation and are shown in Fig. 2(a). The transflective model measures transmitted and reflected photons when pulling on thick skin and is shown in Fig. 2(b). Also, the transmission model measures the photons that penetrate the skin when the skin is pulled and can be seen in Fig. 2(c).

Figure 2.Monte Carlo (MC) analysis of photon migration within the human skin for various geometric reflectance measurement models: (a) reflection model, (b) transflective model, and (c) transmission model. Each skin model simulated 50,000 photons at a 1000-nm wavelength. The number of bins is 200 × 200 × 200, with the bin sizes set to 0.005 cm, 0.015 cm, and 0.0025 cm, respectively.

The key to computational efficiency is optimizing the processing of single straight free paths, which are the standard computational unit of conventional MC simulation. Let us consider the calculation for a single straight line using Eqs. (3) and (4). A photon traveling along a straight line of step size s is presented in Fig. 3. The path is assumed to cross six bins, so the step size s is divided into six sub-steps, Δs1, Δs2, ..., Δs6. In this process, the absorbance is stored in each bin through which the photon passes. This process of the bin absorption and the effect on photon power are described, respectively, by the sequence equations:

Figure 3.Schematics of (a) single free path calculation using the conventional GPU-based MC method, (b) serial computation using the conventional MC method, and (c) photon number-partition load balancing of n photons in the conventional GPU-based parallel MC method.

Pabs,n=Wn×(1exp(μan×Δsn)),

Wn+1=Wn×exp(μan×Δsn),

where Wn and μa,n are the weight of the photon and the absorption coefficient at the nth bin on the straight line, respectively.

Given the initial and destination points, the bins on the straight photon path are sequentially calculated using direct photon tracing in conventional MC simulations because, in general, each bin has different physical parameters [Fig. 3(a)]. Without numerical ray tracing, we cannot predict which bin would be the next that needs to be accounted for. Unfortunately, ray tracing is a sequential process. For instance, assume that n photons travel a line-graph route with two spins and three straight subpaths, then the number of bins on each subpath is probabilistically determined as shown in Fig. 3. In simple serial computation scheme, n repetitive computations of n photon migrations can be sequentially performed [Fig. 3(b)]. This sequential calculation can be reorganized as a conventional parallel computation scheme so that n simultaneous calculations of photon migration are performed on several multi-thread cores as represented in Fig. 3(c). A computational bottleneck can clearly be identified. In this multi-thread parallel algorithm, sub-groups of photons are launched, and cores that quickly complete calculations for their photon sub-group will lay idle until the other cores are finished. This non-uniform layoff due to the non-uniform free-path distribution is represented by the white area in Fig. 3(c). This inefficiency cannot be addressed if the single straight photon tracing is not atomically decomposed. In practice, the issue becomes more serious when the MC algorithm is transformed into a GPU-compatible parallel algorithm because the GPU cores need to be synchronously operated in each step of the stepwise calculation of photon migration due to the vector-wise coding mechanism. For instance, in MATLAB GPU programming, photon migration cannot be processed by a GPU core independently and asynchronously to the other GPU that calculates other photon, every step in the migration of all photons, in which a single step means the ray-tracing of a single spin-to-spin linear subpath, should proceed synchronously. Our experiments revealed that this issue is the main obstacle to improving GPU-compatible performance using simply parallelized MC simulations based on photon-number partition load balancing.

### III. FULL PARALLELIZATION OF MONTE CARLO PHOTON MIGRATION SIMULATIONS

In human skin simulations, the reflectance and transmission spectra are measured for hundreds of wavelengths. In general, computation time is linearly proportional to the number of photons, and hundreds of millions of photons are necessary for reliable simulation results. Spectra calculations using the MC method are thus a significant computational problem, and hundreds of cores may be needed to numerically analyze the spectroscopic characteristics of the human skin. As a result, improving the MC method using a new load-balancing strategy that overcomes the inherent limitations of photon-number partition load balancing is required. In this view, we devised a method of parallelizing the single path of a photon. A key component of this approach is the simultaneous determination of all the bins crossed by the photon on its finite path of step size s. We can obtain the coordinates of the bins on the path of the photon from (x, y, z) to (x, y, z) using the following bin-counting formula:

(x',y',z')=x+ΔxtΔyΔz(y'y)(z'z),y+ΔytΔzΔx(z'z)(x'x),z+ΔztΔxΔy(x'x)(y'y),for0t(x'x)(y'y)(z'z)ΔV,

where ΔV is the bin volume ΔxΔyΔz, t is all the integer between 0 and (x′ − x)(y′ − y)(z′ − z) / ΔV, and [a] is the floor integer of a real number a. Because absorption coefficients μa, scattering coefficients μs and isotropic coefficients g of all the bins obtained by Eq. (7) are given by the skin model, the absorption in each bin and the power of the photon can be calculated at the same time in a parallel manner. The photon power in the nth bin is calculated using the formula

Wn=Wn1Wn1(1exp(μa,n1×Δsn1))=Wn1×exp(μa,n1×Δsn1)=W0expμa,0×Δs0+μa,1×Δs1+μa,2×Δs2++μa,n1×Δsn1,

where n and Δsn are the bin index on the path of the photon and the propagation length within the bin, respectively. Equation (8) is no longer a sequence equation but is an explicit solution of the sequence Eq. (6) which can be distributed to multiple cores for parallel computation. The absorption by each bin is calculated using

Pabs,n=WnWn1.

In the conventional MC algorithm, Eqs. (5) and (6) are processed serially because the coordinates of the bins on the photon path are identified using ray tracing, while, in the proposed method, the coordinates of the bins are calculated using Eq. (7). Equations (8) and (9) are considered to be the explicit solutions for Eqs. (6) and (5), respectively. This parallel computation scenario is schematically illustrated in Fig. 4(a) with the depiction of computational load in Fig. 4(b). The above formula is employed in GPU processors as a form of path-partition load balancing. With the proposed algorithm, the calculation of the photons can be partitioned based on the bins, not the entire photon path. In this way, the total bin-based computation load can be easily distributed to GPU cores in a uniform and compact manner [Fig. 4(b)]. For the first step movement of the photons, all of the bin calculations in the volume can be collectively distributed to the involved GPU cores. For longer paths, more GPU cores are assigned to complete the calculation. The GPU cores, the number of which is proportional to the free path length of the photons, are allocated adaptively. Consequently, this path-partitioning load balancing strategy minimizes the idle time, as shown in the right panel of Fig. 4(b).

Figure 4.Schematics of (a) the single free path calculation in the proposed GPU-based MC method and (b) the photon path-partition load balancing of n photons in the proposed GPU-based parallel MC method.

In proof-of-concept analysis, we found that this form of path-partition load balancing is appropriate for the vector-wise coding of a MATLAB 2016b multi-GPU platform. The initial version of the proposed MC algorithm was implemented on a Clic80000 GPU server with eight NVIDIA GeForce TITAN GPUs (CoCoLinK, Seoul, South Korea). We compared the proposed path-partitioning load-balancing strategy with conventional photon number-partition load balancing in MC photon migration simulations of the human skin. In this experiment, the computation time for a different number of photons was estimated. Figure 5(a) presents the target human skin structure, which is composed of an epidermis, dermis, and embedded veins. We set the computational grid resolution at 200 × 200 × 200. The size of each grid was 0.0005 cm and the total simulation size was 100 μm × 100 μm × 100 μm. A finite Gaussian beam of wavelength 1000 nm illuminated the spot above the blood vessels. The simulation parameters of this structure were as follows: μa = 16.6 cm−1, μs = 375.9 cm−1 and g = 0.9 in the epidermis layer, μa = 0.46 cm−1, μs = 356.5 cm−1 and g = 0.9 in the dermis layer, and μa = 230.5 cm−1, μs = 94 cm−1 and g = 0.9 in the blood vessels. Cross-section images of the optical flux distributions obtained with conventional photon number-partition load balancing and path-partitioning load balancing and their respective calculation times are compared in Fig. 3(b), with the computation time for the conventional and proposed methods for photon numbers from 100 to 300,000 denoted by the green and blue, respectively. The simulation results for the two strategies exhibited similar light flux patterns and densities for the same number of photons, but the difference in computation time widened with an increase in the number of photons. The computation time for the conventional method was approximately 22 min for 300,000 photons. In NIR glucose spectrum analysis, reflectance spectra ranging from 1000 nm to 2500 nm are measured. In this context, sampling intervals of 1 nm or finer are necessary to determine the reflection spectra within a skin sample. If 1500 samples are required within that NIR range, the total computation time would be about 22 days. Even with the use of a multiple-GPU system, this calculation time is not feasible for practical applications.

Figure 5.Monte Carlo (MC) simulation results: (a) simulation skin model, (b) the change and saturation of the photon flux distributions with an increasing number of photons, and (c) a comparison of computation times for conventional photon number-partition load balancing and the proposed photon path-partition load-balancing algorithms.

In comparison, the simulation results for the proposed algorithm on the same multiple-GPU system are displayed in Fig. 5(c). In the experiment, the computation time increases linearly by photon number between 100 and 300,000 except for the initial nonlinear increase at the photon number less than about 100. For 300,000 photons, the proposed GPU calculation method was four times faster than the conventional method, even though our code was not completely optimized. It is expected that this increase in calculation speed will be maintained in MC simulations with a larger number of photons. In addition, the optimization of our initial code in MATLAB may enhance overall performance.

In conclusion, we have developed a parallel MC algorithm with path-partition load balancing for vector-wise GPU coding. A computational bottleneck in the conventional MC algorithm when calculating free path migration was identified and a simple path-partitioning parallel algorithm was proposed. Consequently, the computational efficiency of the proposed MATLAB GPU computing system demonstrated a four-fold improvement over the conventional system, validating the proposed algorithm. Although the GPU-based MC simulator requires further improvement in terms of code optimization, it is believed that the proposed MC algorithm will be useful for practical applications in the development of human healthcare technology.

### I. Derivation of Eq. (7), Bin-counting Formula

Figure 6.3D space discretized by cube-bins.

When the initial position and the destination position of a ray are (x, y, z) and (x, y, z) = (x + ΔxM, y + ΔyN, z + ΔzP), a point on the ray between the initial point and the destination, (x¯,y¯,z¯), are obtained by

(x¯,y¯,z¯)=(x,y,z)+t'(ΔxM,ΔyN,ΔzP)=(x,y,z)+(Δxt'M,Δt'yN,Δzt'P)

In counting all passed bins, we should acquire all integer pairs of (tM, tN, tP). Finding the bins on the ray is equivalent to determining the integer pair of (tM, tN, tP). The integer pair is represented, by using the floor function, as ([tM], [tN], [tP]). The range of (tM, tN, tP) is constraint to

0t'MM0t'NN0t'NP0t'NMPMNP.

Let tMNP = t′, then tM, tN, tP are obtained by

t'M=tNP,t'N=1MP,andt'P=1MN.

Substituting Eq. (A3) into Eq. (A1) and using the floor function, we have the bin counting formula indicating the coordinate of the bins on the ray as

(x¯,y¯,z¯)=(x,y,z)+ΔxtNP,ΔytMP,ΔztMN=(x,y,z)+ΔxtΔyΔz(y'y)(z'z),ΔytΔzΔx(z'z)(x'x),+ΔztΔxΔy(x'x)(y'y)

Where 0tMNP=(x'x)(y'y)(z'z)ΔV, and M=(x'-x)/Δx, N = (y′ − y) / Δy, and P = (z′ − z) / Δz.

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-00981, Development of Digital Holographic Metrology Technology for Phase Retrieval).

1. L. V. Wang and H.-I. Wu in Biomedical Optics: Principles and Imaging, (John Wiley & Sons, NJ, USA, 2007).
2. L. Wang and S. L. Jacques in Monte Carlo Modeling of Light Transport in Multi-layered Tissues in Standard C, (M. D. Anderson Cancer Center, University of Texas, USA, 1992).
3. S. Prahl, “Monte Carlo light scattering programs,” (Oregon Medical Laser Center, Published date: 2018) https://omlc.org/software/mc/ (Accessed date: January 2018).
4. N. Cao, M. Ortner and A. Nehorai, “Solutions for diffuse optical tomography using the Feynman-Kac formula and interacting particle method,” Proc. SPIE 6434, 643402 (2007).
5. S. Pauli, R. N. Gantner, P. Arbenz and A. Adelmann, “Multilevel Monte Carlo for the Laplace equation,” BIT Numer. Math. 55, 1125-1143 (2015).
6. M. Qassem and P. Kyriacou, “Reflectance near-infrared measurements for determining changes in skin barrier function and scattering in relation to moisturizer application,” J. Biomed. Opt. 20, 095008 (2015).
7. Y. Miyamae, Y. Yamakawa, M. Kawabata and Y. Ozaki, “A noninvasive method for assessing interior skin damage caused by chronological aging and photoaging based on near-infrared diffuse reflection spectroscopy,” Appl. Spectrosc. 62, 677-681 (2008).
8. Y. Miyamae, M. Kawabata, Y. Yamakawa, J. Tsuchiya and Y. Ozaki, “Non-invasive estimation of skin thickness by near infrared diffuse reflection spectroscopy-separate determination of epidermis and dermis thickness,” J. Near Infrared Spectrosc. 20, 617-622 (2012).
9. M. Miyazawa and M. Sonoyama, “Second derivative near infrared studies on the structural characterisation of proteins,” J. Near Infrared Spectrosc. 6, A253-A257 (1998).
10. A. Lorber, K. Faber and B. R. Kowalski, “Net analyte signal calculation in multivariate calibration,” Anal. Chem. 69, 1620-1626 (1997).
11. B. Nadler and R. Coifman, “Partial least squares, Beer's law and the net analyte signal: statistical modeling and analysis,” J. Chemom. 19, 45-54 (2005).
12. S. A. Prahl, “A Monte Carlo model of light propagation in tissue,” Proc. SPIE 10305, 1030509 (1989).
13. T. J. Farrell, M. S. Patterson and B. Wilson, “A diffusion theory model of spatially resolved, steady-state diffuse reflectance for the noninvasive determination of tissue optical properties in vivo,” Med. Phys. 19, 879-888 (1992).
14. L. Wang, S. Ren and X. Chen, “Comparative evaluations of the Monte Carlo-based light propagation simulation packages for optical imaging,” J. Innov. Opt. Health Sci. 11, 1750017 (2018).
15. V. Periyasamy and M. Pramanik, “Monte Carlo simulation of light transport in turbid medium with embedded object-spherical, cylindrical, ellipsoidal or cuboidal objects embedded within multilayered tissues,” J. Biomed. Opt. 19, 045003 (2014).
16. H. Shen and G. Wang, “A tetrahedron-based inhomogeneous Monte Carlo optical simulator,” Phys. Med. Biol. 55, 947-962 (2010).
17. H. Shen and G. Wang, “A study on tetrahedron-based inhomogeneous Monte Carlo optical simulation,” Biomed. Opt. Express 2, 44-57 (2011).
18. Q. Fang, “Mesh-based Monte Carlo method using fast ray-tracing in Plücker coordinates,” Biomed. Opt. Express 1, 165-175 (2010).
19. E. Alerstam, T. Svensson and S. Andersson-Engel, “Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration,” J. Biomed. Opt. 13, 060504 (2008).
20. E. Alerstam, W. C. Y. Lo, T. D. Han, J. Ross, S. Andersson-Engels and L. Lilge, “Next-generation acceleration and code optimization for light transport in turbid media using GPUs,” Biomed. Opt. Express 1, 658-675 (2010).
21. Q. Fang and D. A. Boas, “Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units,” Opt. Express 17, 20178-20190 (2009).
22. L. Yu, F. Nina-Paravecino, D. R. Kaeli and Q. Fang, “Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms,” J. Bio. Opt. 23, 010504 (2018).

### Article

#### Article

Curr. Opt. Photon. 2021; 5(6): 617-626

Published online December 25, 2021 https://doi.org/10.3807/COPP.2021.5.6.617

Copyright © Optical Society of Korea.

## GPU-based Monte Carlo Photon Migration Algorithm with Path-partition Load Balancing

Youngjin Jeon1, Jongha Park1, Joonku Hahn2, Hwi Kim1

1Department of Electronics and Information Engineering, Korea University Sejong Campus, Sejong 30019, Korea
2School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea

Correspondence to:*hwikim@korea.ac.kr, ORCID 0000-0002-4283-8982

Received: June 23, 2021; Revised: August 3, 2021; Accepted: September 23, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

### Abstract

A parallel Monte Carlo photon migration algorithm for graphics processing units that implements an improved load-balancing strategy is presented. Conventional parallel Monte Carlo photon migration algorithms suffer from a computational bottleneck due to their reliance on a simple load-balancing strategy that does not take into account the different length of the mean free paths of the photons. In this paper, path-partition load balancing is proposed to eliminate this computational bottleneck based on a mathematical formula that parallelizes the photon path tracing process, which has previously been considered non-parallelizable. The performance of the proposed algorithm is tested using three-dimensional photon migration simulations of a human skin model.

Keywords: Blood vessel, Graphics Proccesing Unit, Monte Carlo Method, Numerical modeling, Scattering

### I. INTRODUCTION

Monte Carlo (MC) simulation is a standard approach for photon transport simulations in turbid media [1-3]. The probabilistic interpretation of the radiative transfer equation (RTE), which theoretically describes the photon migration process in turbid media, leads to a statistical MC photon migration process [4, 5]. In practice, MC-based photon migration simulations are commonly used in research on in vivo optical measurement and noninvasive biomedical diagnosis technology. A typical biomedical application is near-infrared spectroscopy (NIRS) of the human skin, which is used to assess the characteristics of the skin, including water content, aging, and thickness. NIRS detects absorption bands specific to water, protein, and lipids [6-8], thus it can be used to analyze secondary structural changes in a protein via the secondary derivatives of NIR spectra [9].

In optics, the extraction of valuable information from randomly scattered reflected light is a challenge, but the technology underpinning NIRS has progressed significantly, based primarily on the use of specific computational algorithms. The non-contact, non-destructive measurement of the human skin requires accurate calibration, signal extraction, and refinement algorithms. From a theoretical perspective, the calibration of an NIRS measurement system and the analysis of extracted optical spectra are difficult due to unpredictable changes in tissue samples and a low signal-to-noise ratio [10-12]. A numerical model of target skin layers that simulates photon migration has been seen as an essential component of effective computational algorithms for near-infrared (NIR) signal extraction.

The MC-based modeling of human skin layers has been actively investigated in past research. According to a review paper on MC-based approaches [13], various approaches to MC structural modeling have been developed in terms of how a target structure is numerically modeled. For example, the original MC code was for an MC model for multi-layered (MCML) tissue that was developed for a multilayered medium with spherical, cylindrical, ellipsoidal, or cuboidal objects [14, 15], while tetrahedron-based (TIMOS) [16, 17] and mesh-based MC methods [18] have also been proposed.

Specific ray-tracing algorithms have also been devised for representation models. Since millions of photons are involved in the MC method, the computational efficiency of the ray-tracing calculation should be a critical factor for practical simulation throughput. In this respect, the parallelizable nature of the MC method has been actively researched and as a consequence, it was allowed to be implemented with parallel process ray-tracing algorithm. It can be said that, in practice, the application such as non-destructive reflectance measurement of light from the surface of the human skin can only be numerically simulated using the parallel MC method.

Recently, parallel MC optical photon transport algorithms using graphics processing units (GPUs) have received significant research attention. The basic concept of parallelism is to divide the photons into smaller photon groups and assign each sub-group of photons to a GPU core. Alerstein et al. first reported proof-of-concept GPU-based acceleration of the MC approach in homogenous media [19], followed by Fang and Boas, who reported the first GPU-accelerated MC algorithm to model light transport inside a 3D heterogeneous domain and released the open-source tool Monte Carlo eXtreme (MCX, Massachusetts, USA) [20]. Recent research on the parallelized implementation of MCX [21] has found that specialized load balancing using techniques such as linear programming should be considered for use in heterogeneous computing systems. However, a uniform thread workload and vector-wise coding are necessary to implement the MC process on a homogeneous system with the same type of GPU. Most previous GPU-based MC photon transport frameworks take just mutual independent migration of individual photons for their parallelism. The total photon count is first divided by the number of launched threads and distributed to the threads, which calculate the assigned photon migration. Although these algorithms have been optimized to reduce run-time differences between different threads, a single computation unit is the entire path of a photon, i.e., the migration path of a photon is calculated sequentially.

In this sense, the conventional MC method used in skin optics analysis can be considered a semi-parallel algorithm due to the serialism of the single photon path computation. Although millions of photons are launched into several independent computing cores, the issue is that the computational load required to trace photons with different mean free paths is not equal due to scattering by the human skin. In the MC approach, the mean free paths of individual photons are statistically determined using Beer’s law. Photons with a longer free path require more calculation time than photons with shorter free paths. This means that, when one computational core has finished estimating the propagation loss of the photons assigned to it, other computational cores may still be in operation. In this case, the computation core has to wait for the other cores, thus reducing computational efficiency in the vector-wise coding of a multicore GPU platform. This layoff-induced inefficiency represents a significant burden in 3D MC simulation. It is obvious that the inefficiency is a practical hindrance to the application of 3D MC simulation for the NIR signal analysis of human skin.

In this paper, the algorithmic bottleneck caused by the conventional load-balancing strategy of MC photon migration algorithms is addressed, and a new path-partition load balancing scheme is proposed that parallelizes the photon path tracing process in order to resolve the inefficiency of the serialized component of the current photon migration process. We modify the open simulation C code mcxyz.c from the Oregon medical laser center (OMLC) by Steven Jacques, Ting Li, and Scott Prahl, which was originally designed [3] as a general 3D MC code for use in GPU programming in MATLAB (Mathworks, MA, USA). Full 3D MC optical photon migration simulations based on the proposed algorithm are implemented on a multi-GPU machine. The practical application of the proposed method for the non-destructive NIR reflectance measurement of light from the human skin demonstrates improved algorithm performance when compared to conventional MC simulations.

### II. SEMI-PARALLELISM IN MONTE CARLO PHOTON MIGRATION SIMULATIONS

A basic simulation skin model is presented in Fig. 1(a), consisting of epidermis and dermis layers. The thickness of the epidermis and dermis is set to 60 μm and 84 μm, respectively. In the simulation, the collimated beam illuminates the skin surface normally. The paths of the non-scattered photon that are parallel to the illumination light direction are indicated by the red lines and an optical power detector is placed just above the epidermis layer, 60 μm from the illuminating light source. The computational volume is discretized in the form of a regular hexahedron bin grid, with a bin size of ΔV = ΔxΔyΔz. In Fig. 1, the total number of bins in the computational volume is set to 200 × 200 × 200. The optical flux distribution and the migration paths of the photons that are selectively captured by the detector are visualized in 1(b) and 1(c) respectively. The eight steps of the basic MC photon migration algorithm are explained in [2]. The first step is the determination of the step size of a photon, which is the straight propagation length of a photon without any directional deviation. It is probabilistically determined according to Beer’s law with eμts, which is described by the probabilistic model [2]:

Figure 1. Monte Carlo (MC) simulation results: (a) simulation skin model (x-z plane), (b) 3D visualization of photon migration simulation result inside the skin layer and simulated bin size is 200 by 200 by 200, and (c) selected migration paths simulation result of the photons detected by the detector at the epidermis layer.

$s=−1nξ/μt,$

where s and ξ are the step size and a random variable uniformly distributed between 0 and 1, respectively, μt is the total attenuation coefficient given by 1 / μt = 1 / (μs + μa), and μs and μa are the scattering and the absorption coefficients, respectively. A photon at (x, y, z) moves distance along with the unit direction vector (μx, μy, μz) to new coordinates (x, y, z) given by

$(x',y',z')=(x+μxs, y+μys, z+μzs,).$

The algorithm accounts for the intermediate bins that the photon meets on the way to the destination (x, y, z) from its initial position (x, y, z). Along the path, the photon gets losing optical power and, that is in turn absorbed at each bin. Usually, dozens of regular hexahedron bins are located on the straight path s. The initial power of a photon (for convenience, the term ‘photon’ is used to represent photon bundles that can have a continuous range of power values) entering a bin is assumed to be W; the energy absorbed by the bin (i.e., that lost by the photon) and that retained by the photon are represented, respectively, by

$Pabs=W(1−exp(−μaΔs)),$

$WΔs=Wexp(−μaΔs),$

where Δs is the differential step taken by the photon in a single bin. If the photon does not meet the boundaries of the entire computational volume, the photon will spin or deflect at (x, y, z) to move in a new direction (μ′x, μ′y, μ′z) according to the isotropic coefficient g parameterizing the Henyey-Greenstein phase function $p(θ)=(1−g2)/4π1+g2−2gcosθ3/2$ [2, 12]. The deflected photon continues to travel along the new direction (μ′x, μ′y, μ′z) for a new step size s′ to reach the new position (x′′, y′′, z′′). In Fig. 2, the practical setups commonly used in the non-destructive optical reflectance measurement of the human skin are modeled and their 3D MC simulation results are presented. Human skin can have a variety of shapes because it can be easily deformed. Therefore, we simulated three types of skin deformation shapes; reflection model, transflective model and transmission model. Reflection model measures reflected photons in the absence of skin deformation and are shown in Fig. 2(a). The transflective model measures transmitted and reflected photons when pulling on thick skin and is shown in Fig. 2(b). Also, the transmission model measures the photons that penetrate the skin when the skin is pulled and can be seen in Fig. 2(c).

Figure 2. Monte Carlo (MC) analysis of photon migration within the human skin for various geometric reflectance measurement models: (a) reflection model, (b) transflective model, and (c) transmission model. Each skin model simulated 50,000 photons at a 1000-nm wavelength. The number of bins is 200 × 200 × 200, with the bin sizes set to 0.005 cm, 0.015 cm, and 0.0025 cm, respectively.

The key to computational efficiency is optimizing the processing of single straight free paths, which are the standard computational unit of conventional MC simulation. Let us consider the calculation for a single straight line using Eqs. (3) and (4). A photon traveling along a straight line of step size s is presented in Fig. 3. The path is assumed to cross six bins, so the step size s is divided into six sub-steps, Δs1, Δs2, ..., Δs6. In this process, the absorbance is stored in each bin through which the photon passes. This process of the bin absorption and the effect on photon power are described, respectively, by the sequence equations:

Figure 3. Schematics of (a) single free path calculation using the conventional GPU-based MC method, (b) serial computation using the conventional MC method, and (c) photon number-partition load balancing of n photons in the conventional GPU-based parallel MC method.

$Pabs,n=Wn×(1−exp(−μan×Δsn)),$

$Wn+1=Wn×exp(−μan×Δsn),$

where Wn and μa,n are the weight of the photon and the absorption coefficient at the nth bin on the straight line, respectively.

Given the initial and destination points, the bins on the straight photon path are sequentially calculated using direct photon tracing in conventional MC simulations because, in general, each bin has different physical parameters [Fig. 3(a)]. Without numerical ray tracing, we cannot predict which bin would be the next that needs to be accounted for. Unfortunately, ray tracing is a sequential process. For instance, assume that n photons travel a line-graph route with two spins and three straight subpaths, then the number of bins on each subpath is probabilistically determined as shown in Fig. 3. In simple serial computation scheme, n repetitive computations of n photon migrations can be sequentially performed [Fig. 3(b)]. This sequential calculation can be reorganized as a conventional parallel computation scheme so that n simultaneous calculations of photon migration are performed on several multi-thread cores as represented in Fig. 3(c). A computational bottleneck can clearly be identified. In this multi-thread parallel algorithm, sub-groups of photons are launched, and cores that quickly complete calculations for their photon sub-group will lay idle until the other cores are finished. This non-uniform layoff due to the non-uniform free-path distribution is represented by the white area in Fig. 3(c). This inefficiency cannot be addressed if the single straight photon tracing is not atomically decomposed. In practice, the issue becomes more serious when the MC algorithm is transformed into a GPU-compatible parallel algorithm because the GPU cores need to be synchronously operated in each step of the stepwise calculation of photon migration due to the vector-wise coding mechanism. For instance, in MATLAB GPU programming, photon migration cannot be processed by a GPU core independently and asynchronously to the other GPU that calculates other photon, every step in the migration of all photons, in which a single step means the ray-tracing of a single spin-to-spin linear subpath, should proceed synchronously. Our experiments revealed that this issue is the main obstacle to improving GPU-compatible performance using simply parallelized MC simulations based on photon-number partition load balancing.

### III. FULL PARALLELIZATION OF MONTE CARLO PHOTON MIGRATION SIMULATIONS

In human skin simulations, the reflectance and transmission spectra are measured for hundreds of wavelengths. In general, computation time is linearly proportional to the number of photons, and hundreds of millions of photons are necessary for reliable simulation results. Spectra calculations using the MC method are thus a significant computational problem, and hundreds of cores may be needed to numerically analyze the spectroscopic characteristics of the human skin. As a result, improving the MC method using a new load-balancing strategy that overcomes the inherent limitations of photon-number partition load balancing is required. In this view, we devised a method of parallelizing the single path of a photon. A key component of this approach is the simultaneous determination of all the bins crossed by the photon on its finite path of step size s. We can obtain the coordinates of the bins on the path of the photon from (x, y, z) to (x, y, z) using the following bin-counting formula:

$(x',y',z')=x+ΔxtΔyΔz(y'−y)(z'−z),y+ΔytΔzΔx(z'−z)(x'−x),z+ΔztΔxΔy(x'−x)(y'−y),for0≤t≤(x'−x)(y'−y)(z'−z)ΔV,$

where ΔV is the bin volume ΔxΔyΔz, t is all the integer between 0 and (x′ − x)(y′ − y)(z′ − z) / ΔV, and [a] is the floor integer of a real number a. Because absorption coefficients μa, scattering coefficients μs and isotropic coefficients g of all the bins obtained by Eq. (7) are given by the skin model, the absorption in each bin and the power of the photon can be calculated at the same time in a parallel manner. The photon power in the nth bin is calculated using the formula

$Wn=Wn−1−Wn−1(1−exp(−μa,n−1×Δsn−1))=Wn−1×exp(−μa,n−1×Δsn−1)=W0exp−μa,0×Δs0+μa,1×Δs1+μa,2×Δs2+⋯+μa,n−1×Δsn−1,$

where n and Δsn are the bin index on the path of the photon and the propagation length within the bin, respectively. Equation (8) is no longer a sequence equation but is an explicit solution of the sequence Eq. (6) which can be distributed to multiple cores for parallel computation. The absorption by each bin is calculated using

$Pabs,n=Wn−Wn−1.$

In the conventional MC algorithm, Eqs. (5) and (6) are processed serially because the coordinates of the bins on the photon path are identified using ray tracing, while, in the proposed method, the coordinates of the bins are calculated using Eq. (7). Equations (8) and (9) are considered to be the explicit solutions for Eqs. (6) and (5), respectively. This parallel computation scenario is schematically illustrated in Fig. 4(a) with the depiction of computational load in Fig. 4(b). The above formula is employed in GPU processors as a form of path-partition load balancing. With the proposed algorithm, the calculation of the photons can be partitioned based on the bins, not the entire photon path. In this way, the total bin-based computation load can be easily distributed to GPU cores in a uniform and compact manner [Fig. 4(b)]. For the first step movement of the photons, all of the bin calculations in the volume can be collectively distributed to the involved GPU cores. For longer paths, more GPU cores are assigned to complete the calculation. The GPU cores, the number of which is proportional to the free path length of the photons, are allocated adaptively. Consequently, this path-partitioning load balancing strategy minimizes the idle time, as shown in the right panel of Fig. 4(b).

Figure 4. Schematics of (a) the single free path calculation in the proposed GPU-based MC method and (b) the photon path-partition load balancing of n photons in the proposed GPU-based parallel MC method.

In proof-of-concept analysis, we found that this form of path-partition load balancing is appropriate for the vector-wise coding of a MATLAB 2016b multi-GPU platform. The initial version of the proposed MC algorithm was implemented on a Clic80000 GPU server with eight NVIDIA GeForce TITAN GPUs (CoCoLinK, Seoul, South Korea). We compared the proposed path-partitioning load-balancing strategy with conventional photon number-partition load balancing in MC photon migration simulations of the human skin. In this experiment, the computation time for a different number of photons was estimated. Figure 5(a) presents the target human skin structure, which is composed of an epidermis, dermis, and embedded veins. We set the computational grid resolution at 200 × 200 × 200. The size of each grid was 0.0005 cm and the total simulation size was 100 μm × 100 μm × 100 μm. A finite Gaussian beam of wavelength 1000 nm illuminated the spot above the blood vessels. The simulation parameters of this structure were as follows: μa = 16.6 cm−1, μs = 375.9 cm−1 and g = 0.9 in the epidermis layer, μa = 0.46 cm−1, μs = 356.5 cm−1 and g = 0.9 in the dermis layer, and μa = 230.5 cm−1, μs = 94 cm−1 and g = 0.9 in the blood vessels. Cross-section images of the optical flux distributions obtained with conventional photon number-partition load balancing and path-partitioning load balancing and their respective calculation times are compared in Fig. 3(b), with the computation time for the conventional and proposed methods for photon numbers from 100 to 300,000 denoted by the green and blue, respectively. The simulation results for the two strategies exhibited similar light flux patterns and densities for the same number of photons, but the difference in computation time widened with an increase in the number of photons. The computation time for the conventional method was approximately 22 min for 300,000 photons. In NIR glucose spectrum analysis, reflectance spectra ranging from 1000 nm to 2500 nm are measured. In this context, sampling intervals of 1 nm or finer are necessary to determine the reflection spectra within a skin sample. If 1500 samples are required within that NIR range, the total computation time would be about 22 days. Even with the use of a multiple-GPU system, this calculation time is not feasible for practical applications.

Figure 5. Monte Carlo (MC) simulation results: (a) simulation skin model, (b) the change and saturation of the photon flux distributions with an increasing number of photons, and (c) a comparison of computation times for conventional photon number-partition load balancing and the proposed photon path-partition load-balancing algorithms.

In comparison, the simulation results for the proposed algorithm on the same multiple-GPU system are displayed in Fig. 5(c). In the experiment, the computation time increases linearly by photon number between 100 and 300,000 except for the initial nonlinear increase at the photon number less than about 100. For 300,000 photons, the proposed GPU calculation method was four times faster than the conventional method, even though our code was not completely optimized. It is expected that this increase in calculation speed will be maintained in MC simulations with a larger number of photons. In addition, the optimization of our initial code in MATLAB may enhance overall performance.

### IV. CONCLUSION

In conclusion, we have developed a parallel MC algorithm with path-partition load balancing for vector-wise GPU coding. A computational bottleneck in the conventional MC algorithm when calculating free path migration was identified and a simple path-partitioning parallel algorithm was proposed. Consequently, the computational efficiency of the proposed MATLAB GPU computing system demonstrated a four-fold improvement over the conventional system, validating the proposed algorithm. Although the GPU-based MC simulator requires further improvement in terms of code optimization, it is believed that the proposed MC algorithm will be useful for practical applications in the development of human healthcare technology.

### I. Derivation of Eq. (7), Bin-counting Formula

Figure 6. 3D space discretized by cube-bins.

When the initial position and the destination position of a ray are (x, y, z) and (x, y, z) = (x + ΔxM, y + ΔyN, z + ΔzP), a point on the ray between the initial point and the destination, $(x¯,y¯,z¯)$, are obtained by

$(x¯,y¯,z¯)=(x,y,z)+t'(ΔxM, ΔyN, ΔzP)=(x,y,z)+(Δxt'M, Δt'yN, Δzt'P)$

In counting all passed bins, we should acquire all integer pairs of (tM, tN, tP). Finding the bins on the ray is equivalent to determining the integer pair of (tM, tN, tP). The integer pair is represented, by using the floor function, as ([tM], [tN], [tP]). The range of (tM, tN, tP) is constraint to

$0≤t'M≤M0≤t'N≤N0≤t'N≤P ⇔ 0≤t'NMP≤MNP.$

Let tMNP = t′, then tM, tN, tP are obtained by

$t'M=tNP, t'N=1MP, and t'P=1MN.$

Substituting Eq. (A3) into Eq. (A1) and using the floor function, we have the bin counting formula indicating the coordinate of the bins on the ray as

$(x¯,y¯,z¯)=(x,y,z)+ΔxtNP, ΔytMP, ΔztMN=(x,y,z)+ΔxtΔyΔz(y'−y)(z'−z), ΔytΔzΔx(z'−z)(x'−x), +ΔztΔxΔy(x'−x)(y'−y)$

Where $0≤t≤MNP=(x'−x)(y'−y)(z'−z)ΔV$, and M=(x'-x)/Δx, N = (y′ − y) / Δy, and P = (z′ − z) / Δz.

### ACKNOWLEDGMENT

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-00981, Development of Digital Holographic Metrology Technology for Phase Retrieval).

### Fig 1.

Figure 1.Monte Carlo (MC) simulation results: (a) simulation skin model (x-z plane), (b) 3D visualization of photon migration simulation result inside the skin layer and simulated bin size is 200 by 200 by 200, and (c) selected migration paths simulation result of the photons detected by the detector at the epidermis layer.
Current Optics and Photonics 2021; 5: 617-626https://doi.org/10.3807/COPP.2021.5.6.617

### Fig 2.

Figure 2.Monte Carlo (MC) analysis of photon migration within the human skin for various geometric reflectance measurement models: (a) reflection model, (b) transflective model, and (c) transmission model. Each skin model simulated 50,000 photons at a 1000-nm wavelength. The number of bins is 200 × 200 × 200, with the bin sizes set to 0.005 cm, 0.015 cm, and 0.0025 cm, respectively.
Current Optics and Photonics 2021; 5: 617-626https://doi.org/10.3807/COPP.2021.5.6.617

### Fig 3.

Figure 3.Schematics of (a) single free path calculation using the conventional GPU-based MC method, (b) serial computation using the conventional MC method, and (c) photon number-partition load balancing of n photons in the conventional GPU-based parallel MC method.
Current Optics and Photonics 2021; 5: 617-626https://doi.org/10.3807/COPP.2021.5.6.617

### Fig 4.

Figure 4.Schematics of (a) the single free path calculation in the proposed GPU-based MC method and (b) the photon path-partition load balancing of n photons in the proposed GPU-based parallel MC method.
Current Optics and Photonics 2021; 5: 617-626https://doi.org/10.3807/COPP.2021.5.6.617

### Fig 5.

Figure 5.Monte Carlo (MC) simulation results: (a) simulation skin model, (b) the change and saturation of the photon flux distributions with an increasing number of photons, and (c) a comparison of computation times for conventional photon number-partition load balancing and the proposed photon path-partition load-balancing algorithms.
Current Optics and Photonics 2021; 5: 617-626https://doi.org/10.3807/COPP.2021.5.6.617

### Fig 6.

Figure 6.3D space discretized by cube-bins.
Current Optics and Photonics 2021; 5: 617-626https://doi.org/10.3807/COPP.2021.5.6.617

### References

1. L. V. Wang and H.-I. Wu in Biomedical Optics: Principles and Imaging, (John Wiley & Sons, NJ, USA, 2007).
2. L. Wang and S. L. Jacques in Monte Carlo Modeling of Light Transport in Multi-layered Tissues in Standard C, (M. D. Anderson Cancer Center, University of Texas, USA, 1992).
3. S. Prahl, “Monte Carlo light scattering programs,” (Oregon Medical Laser Center, Published date: 2018) https://omlc.org/software/mc/ (Accessed date: January 2018).
4. N. Cao, M. Ortner and A. Nehorai, “Solutions for diffuse optical tomography using the Feynman-Kac formula and interacting particle method,” Proc. SPIE 6434, 643402 (2007).
5. S. Pauli, R. N. Gantner, P. Arbenz and A. Adelmann, “Multilevel Monte Carlo for the Laplace equation,” BIT Numer. Math. 55, 1125-1143 (2015).
6. M. Qassem and P. Kyriacou, “Reflectance near-infrared measurements for determining changes in skin barrier function and scattering in relation to moisturizer application,” J. Biomed. Opt. 20, 095008 (2015).
7. Y. Miyamae, Y. Yamakawa, M. Kawabata and Y. Ozaki, “A noninvasive method for assessing interior skin damage caused by chronological aging and photoaging based on near-infrared diffuse reflection spectroscopy,” Appl. Spectrosc. 62, 677-681 (2008).
8. Y. Miyamae, M. Kawabata, Y. Yamakawa, J. Tsuchiya and Y. Ozaki, “Non-invasive estimation of skin thickness by near infrared diffuse reflection spectroscopy-separate determination of epidermis and dermis thickness,” J. Near Infrared Spectrosc. 20, 617-622 (2012).
9. M. Miyazawa and M. Sonoyama, “Second derivative near infrared studies on the structural characterisation of proteins,” J. Near Infrared Spectrosc. 6, A253-A257 (1998).
10. A. Lorber, K. Faber and B. R. Kowalski, “Net analyte signal calculation in multivariate calibration,” Anal. Chem. 69, 1620-1626 (1997).
11. B. Nadler and R. Coifman, “Partial least squares, Beer's law and the net analyte signal: statistical modeling and analysis,” J. Chemom. 19, 45-54 (2005).
12. S. A. Prahl, “A Monte Carlo model of light propagation in tissue,” Proc. SPIE 10305, 1030509 (1989).
13. T. J. Farrell, M. S. Patterson and B. Wilson, “A diffusion theory model of spatially resolved, steady-state diffuse reflectance for the noninvasive determination of tissue optical properties in vivo,” Med. Phys. 19, 879-888 (1992).
14. L. Wang, S. Ren and X. Chen, “Comparative evaluations of the Monte Carlo-based light propagation simulation packages for optical imaging,” J. Innov. Opt. Health Sci. 11, 1750017 (2018).
15. V. Periyasamy and M. Pramanik, “Monte Carlo simulation of light transport in turbid medium with embedded object-spherical, cylindrical, ellipsoidal or cuboidal objects embedded within multilayered tissues,” J. Biomed. Opt. 19, 045003 (2014).
16. H. Shen and G. Wang, “A tetrahedron-based inhomogeneous Monte Carlo optical simulator,” Phys. Med. Biol. 55, 947-962 (2010).
17. H. Shen and G. Wang, “A study on tetrahedron-based inhomogeneous Monte Carlo optical simulation,” Biomed. Opt. Express 2, 44-57 (2011).
18. Q. Fang, “Mesh-based Monte Carlo method using fast ray-tracing in Plücker coordinates,” Biomed. Opt. Express 1, 165-175 (2010).
19. E. Alerstam, T. Svensson and S. Andersson-Engel, “Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration,” J. Biomed. Opt. 13, 060504 (2008).
20. E. Alerstam, W. C. Y. Lo, T. D. Han, J. Ross, S. Andersson-Engels and L. Lilge, “Next-generation acceleration and code optimization for light transport in turbid media using GPUs,” Biomed. Opt. Express 1, 658-675 (2010).
21. Q. Fang and D. A. Boas, “Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units,” Opt. Express 17, 20178-20190 (2009).
22. L. Yu, F. Nina-Paravecino, D. R. Kaeli and Q. Fang, “Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms,” J. Bio. Opt. 23, 010504 (2018).

Wonshik Choi,
Editor-in-chief