Ex) Article Title, Author, Keywords
Current Optics
and Photonics
Ex) Article Title, Author, Keywords
Curr. Opt. Photon. 2021; 5(6): 617-626
Published online December 25, 2021 https://doi.org/10.3807/COPP.2021.5.6.617
Copyright © Optical Society of Korea.
Youngjin Jeon^{1}, Jongha Park^{1}, Joonku Hahn^{2}, Hwi Kim^{1}
Corresponding author: ^{*}hwikim@korea.ac.kr, ORCID 0000-0002-4283-8982
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
A parallel Monte Carlo photon migration algorithm for graphics processing units that implements an improved load-balancing strategy is presented. Conventional parallel Monte Carlo photon migration algorithms suffer from a computational bottleneck due to their reliance on a simple load-balancing strategy that does not take into account the different length of the mean free paths of the photons. In this paper, path-partition load balancing is proposed to eliminate this computational bottleneck based on a mathematical formula that parallelizes the photon path tracing process, which has previously been considered non-parallelizable. The performance of the proposed algorithm is tested using three-dimensional photon migration simulations of a human skin model.
Keywords: Blood vessel, Graphics Proccesing Unit, Monte Carlo Method, Numerical modeling, Scattering
OCIS codes: (040.5160) Photodetectors; (170.0170) Scattering; (170.3660) Light propagation in tissues; (170.5280) Photon migration; (290.7050) Turbid media
Monte Carlo (MC) simulation is a standard approach for photon transport simulations in turbid media [1-3]. The probabilistic interpretation of the radiative transfer equation (RTE), which theoretically describes the photon migration process in turbid media, leads to a statistical MC photon migration process [4, 5]. In practice, MC-based photon migration simulations are commonly used in research on
In optics, the extraction of valuable information from randomly scattered reflected light is a challenge, but the technology underpinning NIRS has progressed significantly, based primarily on the use of specific computational algorithms. The non-contact, non-destructive measurement of the human skin requires accurate calibration, signal extraction, and refinement algorithms. From a theoretical perspective, the calibration of an NIRS measurement system and the analysis of extracted optical spectra are difficult due to unpredictable changes in tissue samples and a low signal-to-noise ratio [10-12]. A numerical model of target skin layers that simulates photon migration has been seen as an essential component of effective computational algorithms for near-infrared (NIR) signal extraction.
The MC-based modeling of human skin layers has been actively investigated in past research. According to a review paper on MC-based approaches [13], various approaches to MC structural modeling have been developed in terms of how a target structure is numerically modeled. For example, the original MC code was for an MC model for multi-layered (MCML) tissue that was developed for a multilayered medium with spherical, cylindrical, ellipsoidal, or cuboidal objects [14, 15], while tetrahedron-based (TIMOS) [16, 17] and mesh-based MC methods [18] have also been proposed.
Specific ray-tracing algorithms have also been devised for representation models. Since millions of photons are involved in the MC method, the computational efficiency of the ray-tracing calculation should be a critical factor for practical simulation throughput. In this respect, the parallelizable nature of the MC method has been actively researched and as a consequence, it was allowed to be implemented with parallel process ray-tracing algorithm. It can be said that, in practice, the application such as non-destructive reflectance measurement of light from the surface of the human skin can only be numerically simulated using the parallel MC method.
Recently, parallel MC optical photon transport algorithms using graphics processing units (GPUs) have received significant research attention. The basic concept of parallelism is to divide the photons into smaller photon groups and assign each sub-group of photons to a GPU core. Alerstein
In this sense, the conventional MC method used in skin optics analysis can be considered a semi-parallel algorithm due to the serialism of the single photon path computation. Although millions of photons are launched into several independent computing cores, the issue is that the computational load required to trace photons with different mean free paths is not equal due to scattering by the human skin. In the MC approach, the mean free paths of individual photons are statistically determined using Beer’s law. Photons with a longer free path require more calculation time than photons with shorter free paths. This means that, when one computational core has finished estimating the propagation loss of the photons assigned to it, other computational cores may still be in operation. In this case, the computation core has to wait for the other cores, thus reducing computational efficiency in the vector-wise coding of a multicore GPU platform. This layoff-induced inefficiency represents a significant burden in 3D MC simulation. It is obvious that the inefficiency is a practical hindrance to the application of 3D MC simulation for the NIR signal analysis of human skin.
In this paper, the algorithmic bottleneck caused by the conventional load-balancing strategy of MC photon migration algorithms is addressed, and a new path-partition load balancing scheme is proposed that parallelizes the photon path tracing process in order to resolve the inefficiency of the serialized component of the current photon migration process. We modify the open simulation C code
A basic simulation skin model is presented in Fig. 1(a), consisting of epidermis and dermis layers. The thickness of the epidermis and dermis is set to 60 μm and 84 μm, respectively. In the simulation, the collimated beam illuminates the skin surface normally. The paths of the non-scattered photon that are parallel to the illumination light direction are indicated by the red lines and an optical power detector is placed just above the epidermis layer, 60 μm from the illuminating light source. The computational volume is discretized in the form of a regular hexahedron bin grid, with a bin size of Δ
where
The algorithm accounts for the intermediate bins that the photon meets on the way to the destination (
where Δ
The key to computational efficiency is optimizing the processing of single straight free paths, which are the standard computational unit of conventional MC simulation. Let us consider the calculation for a single straight line using Eqs. (3) and (4). A photon traveling along a straight line of step size
where
Given the initial and destination points, the bins on the straight photon path are sequentially calculated using direct photon tracing in conventional MC simulations because, in general, each bin has different physical parameters [Fig. 3(a)]. Without numerical ray tracing, we cannot predict which bin would be the next that needs to be accounted for. Unfortunately, ray tracing is a sequential process. For instance, assume that
In human skin simulations, the reflectance and transmission spectra are measured for hundreds of wavelengths. In general, computation time is linearly proportional to the number of photons, and hundreds of millions of photons are necessary for reliable simulation results. Spectra calculations using the MC method are thus a significant computational problem, and hundreds of cores may be needed to numerically analyze the spectroscopic characteristics of the human skin. As a result, improving the MC method using a new load-balancing strategy that overcomes the inherent limitations of photon-number partition load balancing is required. In this view, we devised a method of parallelizing the single path of a photon. A key component of this approach is the simultaneous determination of all the bins crossed by the photon on its finite path of step size
where Δ
where
In the conventional MC algorithm, Eqs. (5) and (6) are processed serially because the coordinates of the bins on the photon path are identified using ray tracing, while, in the proposed method, the coordinates of the bins are calculated using Eq. (7). Equations (8) and (9) are considered to be the explicit solutions for Eqs. (6) and (5), respectively. This parallel computation scenario is schematically illustrated in Fig. 4(a) with the depiction of computational load in Fig. 4(b). The above formula is employed in GPU processors as a form of path-partition load balancing. With the proposed algorithm, the calculation of the photons can be partitioned based on the bins, not the entire photon path. In this way, the total bin-based computation load can be easily distributed to GPU cores in a uniform and compact manner [Fig. 4(b)]. For the first step movement of the photons, all of the bin calculations in the volume can be collectively distributed to the involved GPU cores. For longer paths, more GPU cores are assigned to complete the calculation. The GPU cores, the number of which is proportional to the free path length of the photons, are allocated adaptively. Consequently, this path-partitioning load balancing strategy minimizes the idle time, as shown in the right panel of Fig. 4(b).
In proof-of-concept analysis, we found that this form of path-partition load balancing is appropriate for the vector-wise coding of a MATLAB 2016b multi-GPU platform. The initial version of the proposed MC algorithm was implemented on a Clic80000 GPU server with eight NVIDIA GeForce TITAN GPUs (CoCoLinK, Seoul, South Korea). We compared the proposed path-partitioning load-balancing strategy with conventional photon number-partition load balancing in MC photon migration simulations of the human skin. In this experiment, the computation time for a different number of photons was estimated. Figure 5(a) presents the target human skin structure, which is composed of an epidermis, dermis, and embedded veins. We set the computational grid resolution at 200 × 200 × 200. The size of each grid was 0.0005 cm and the total simulation size was 100 μm × 100 μm × 100 μm. A finite Gaussian beam of wavelength 1000 nm illuminated the spot above the blood vessels. The simulation parameters of this structure were as follows: μ_{a} = 16.6 cm^{−1}, μ_{s} = 375.9 cm^{−1} and g = 0.9 in the epidermis layer, μ_{a} = 0.46 cm^{−1}, μ_{s} = 356.5 cm^{−1} and g = 0.9 in the dermis layer, and μ_{a} = 230.5 cm^{−1}, μ_{s} = 94 cm^{−1} and g = 0.9 in the blood vessels. Cross-section images of the optical flux distributions obtained with conventional photon number-partition load balancing and path-partitioning load balancing and their respective calculation times are compared in Fig. 3(b), with the computation time for the conventional and proposed methods for photon numbers from 100 to 300,000 denoted by the green and blue, respectively. The simulation results for the two strategies exhibited similar light flux patterns and densities for the same number of photons, but the difference in computation time widened with an increase in the number of photons. The computation time for the conventional method was approximately 22 min for 300,000 photons. In NIR glucose spectrum analysis, reflectance spectra ranging from 1000 nm to 2500 nm are measured. In this context, sampling intervals of 1 nm or finer are necessary to determine the reflection spectra within a skin sample. If 1500 samples are required within that NIR range, the total computation time would be about 22 days. Even with the use of a multiple-GPU system, this calculation time is not feasible for practical applications.
In comparison, the simulation results for the proposed algorithm on the same multiple-GPU system are displayed in Fig. 5(c). In the experiment, the computation time increases linearly by photon number between 100 and 300,000 except for the initial nonlinear increase at the photon number less than about 100. For 300,000 photons, the proposed GPU calculation method was four times faster than the conventional method, even though our code was not completely optimized. It is expected that this increase in calculation speed will be maintained in MC simulations with a larger number of photons. In addition, the optimization of our initial code in MATLAB may enhance overall performance.
In conclusion, we have developed a parallel MC algorithm with path-partition load balancing for vector-wise GPU coding. A computational bottleneck in the conventional MC algorithm when calculating free path migration was identified and a simple path-partitioning parallel algorithm was proposed. Consequently, the computational efficiency of the proposed MATLAB GPU computing system demonstrated a four-fold improvement over the conventional system, validating the proposed algorithm. Although the GPU-based MC simulator requires further improvement in terms of code optimization, it is believed that the proposed MC algorithm will be useful for practical applications in the development of human healthcare technology.
When the initial position and the destination position of a ray are (
In counting all passed bins, we should acquire all integer pairs of (
Let
Substituting Eq. (A3) into Eq. (A1) and using the floor function, we have the bin counting formula indicating the coordinate of the bins on the ray as
Where
This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-00981, Development of Digital Holographic Metrology Technology for Phase Retrieval).
Curr. Opt. Photon. 2021; 5(6): 617-626
Published online December 25, 2021 https://doi.org/10.3807/COPP.2021.5.6.617
Copyright © Optical Society of Korea.
Youngjin Jeon^{1}, Jongha Park^{1}, Joonku Hahn^{2}, Hwi Kim^{1}
^{1}Department of Electronics and Information Engineering, Korea University Sejong Campus, Sejong 30019, Korea
^{2}School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea
Correspondence to:^{*}hwikim@korea.ac.kr, ORCID 0000-0002-4283-8982
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
A parallel Monte Carlo photon migration algorithm for graphics processing units that implements an improved load-balancing strategy is presented. Conventional parallel Monte Carlo photon migration algorithms suffer from a computational bottleneck due to their reliance on a simple load-balancing strategy that does not take into account the different length of the mean free paths of the photons. In this paper, path-partition load balancing is proposed to eliminate this computational bottleneck based on a mathematical formula that parallelizes the photon path tracing process, which has previously been considered non-parallelizable. The performance of the proposed algorithm is tested using three-dimensional photon migration simulations of a human skin model.
Keywords: Blood vessel, Graphics Proccesing Unit, Monte Carlo Method, Numerical modeling, Scattering
Monte Carlo (MC) simulation is a standard approach for photon transport simulations in turbid media [1-3]. The probabilistic interpretation of the radiative transfer equation (RTE), which theoretically describes the photon migration process in turbid media, leads to a statistical MC photon migration process [4, 5]. In practice, MC-based photon migration simulations are commonly used in research on
In optics, the extraction of valuable information from randomly scattered reflected light is a challenge, but the technology underpinning NIRS has progressed significantly, based primarily on the use of specific computational algorithms. The non-contact, non-destructive measurement of the human skin requires accurate calibration, signal extraction, and refinement algorithms. From a theoretical perspective, the calibration of an NIRS measurement system and the analysis of extracted optical spectra are difficult due to unpredictable changes in tissue samples and a low signal-to-noise ratio [10-12]. A numerical model of target skin layers that simulates photon migration has been seen as an essential component of effective computational algorithms for near-infrared (NIR) signal extraction.
The MC-based modeling of human skin layers has been actively investigated in past research. According to a review paper on MC-based approaches [13], various approaches to MC structural modeling have been developed in terms of how a target structure is numerically modeled. For example, the original MC code was for an MC model for multi-layered (MCML) tissue that was developed for a multilayered medium with spherical, cylindrical, ellipsoidal, or cuboidal objects [14, 15], while tetrahedron-based (TIMOS) [16, 17] and mesh-based MC methods [18] have also been proposed.
Specific ray-tracing algorithms have also been devised for representation models. Since millions of photons are involved in the MC method, the computational efficiency of the ray-tracing calculation should be a critical factor for practical simulation throughput. In this respect, the parallelizable nature of the MC method has been actively researched and as a consequence, it was allowed to be implemented with parallel process ray-tracing algorithm. It can be said that, in practice, the application such as non-destructive reflectance measurement of light from the surface of the human skin can only be numerically simulated using the parallel MC method.
Recently, parallel MC optical photon transport algorithms using graphics processing units (GPUs) have received significant research attention. The basic concept of parallelism is to divide the photons into smaller photon groups and assign each sub-group of photons to a GPU core. Alerstein
In this sense, the conventional MC method used in skin optics analysis can be considered a semi-parallel algorithm due to the serialism of the single photon path computation. Although millions of photons are launched into several independent computing cores, the issue is that the computational load required to trace photons with different mean free paths is not equal due to scattering by the human skin. In the MC approach, the mean free paths of individual photons are statistically determined using Beer’s law. Photons with a longer free path require more calculation time than photons with shorter free paths. This means that, when one computational core has finished estimating the propagation loss of the photons assigned to it, other computational cores may still be in operation. In this case, the computation core has to wait for the other cores, thus reducing computational efficiency in the vector-wise coding of a multicore GPU platform. This layoff-induced inefficiency represents a significant burden in 3D MC simulation. It is obvious that the inefficiency is a practical hindrance to the application of 3D MC simulation for the NIR signal analysis of human skin.
In this paper, the algorithmic bottleneck caused by the conventional load-balancing strategy of MC photon migration algorithms is addressed, and a new path-partition load balancing scheme is proposed that parallelizes the photon path tracing process in order to resolve the inefficiency of the serialized component of the current photon migration process. We modify the open simulation C code
A basic simulation skin model is presented in Fig. 1(a), consisting of epidermis and dermis layers. The thickness of the epidermis and dermis is set to 60 μm and 84 μm, respectively. In the simulation, the collimated beam illuminates the skin surface normally. The paths of the non-scattered photon that are parallel to the illumination light direction are indicated by the red lines and an optical power detector is placed just above the epidermis layer, 60 μm from the illuminating light source. The computational volume is discretized in the form of a regular hexahedron bin grid, with a bin size of Δ
where
The algorithm accounts for the intermediate bins that the photon meets on the way to the destination (
where Δ
The key to computational efficiency is optimizing the processing of single straight free paths, which are the standard computational unit of conventional MC simulation. Let us consider the calculation for a single straight line using Eqs. (3) and (4). A photon traveling along a straight line of step size
where
Given the initial and destination points, the bins on the straight photon path are sequentially calculated using direct photon tracing in conventional MC simulations because, in general, each bin has different physical parameters [Fig. 3(a)]. Without numerical ray tracing, we cannot predict which bin would be the next that needs to be accounted for. Unfortunately, ray tracing is a sequential process. For instance, assume that
In human skin simulations, the reflectance and transmission spectra are measured for hundreds of wavelengths. In general, computation time is linearly proportional to the number of photons, and hundreds of millions of photons are necessary for reliable simulation results. Spectra calculations using the MC method are thus a significant computational problem, and hundreds of cores may be needed to numerically analyze the spectroscopic characteristics of the human skin. As a result, improving the MC method using a new load-balancing strategy that overcomes the inherent limitations of photon-number partition load balancing is required. In this view, we devised a method of parallelizing the single path of a photon. A key component of this approach is the simultaneous determination of all the bins crossed by the photon on its finite path of step size
where Δ
where
In the conventional MC algorithm, Eqs. (5) and (6) are processed serially because the coordinates of the bins on the photon path are identified using ray tracing, while, in the proposed method, the coordinates of the bins are calculated using Eq. (7). Equations (8) and (9) are considered to be the explicit solutions for Eqs. (6) and (5), respectively. This parallel computation scenario is schematically illustrated in Fig. 4(a) with the depiction of computational load in Fig. 4(b). The above formula is employed in GPU processors as a form of path-partition load balancing. With the proposed algorithm, the calculation of the photons can be partitioned based on the bins, not the entire photon path. In this way, the total bin-based computation load can be easily distributed to GPU cores in a uniform and compact manner [Fig. 4(b)]. For the first step movement of the photons, all of the bin calculations in the volume can be collectively distributed to the involved GPU cores. For longer paths, more GPU cores are assigned to complete the calculation. The GPU cores, the number of which is proportional to the free path length of the photons, are allocated adaptively. Consequently, this path-partitioning load balancing strategy minimizes the idle time, as shown in the right panel of Fig. 4(b).
In proof-of-concept analysis, we found that this form of path-partition load balancing is appropriate for the vector-wise coding of a MATLAB 2016b multi-GPU platform. The initial version of the proposed MC algorithm was implemented on a Clic80000 GPU server with eight NVIDIA GeForce TITAN GPUs (CoCoLinK, Seoul, South Korea). We compared the proposed path-partitioning load-balancing strategy with conventional photon number-partition load balancing in MC photon migration simulations of the human skin. In this experiment, the computation time for a different number of photons was estimated. Figure 5(a) presents the target human skin structure, which is composed of an epidermis, dermis, and embedded veins. We set the computational grid resolution at 200 × 200 × 200. The size of each grid was 0.0005 cm and the total simulation size was 100 μm × 100 μm × 100 μm. A finite Gaussian beam of wavelength 1000 nm illuminated the spot above the blood vessels. The simulation parameters of this structure were as follows: μ_{a} = 16.6 cm^{−1}, μ_{s} = 375.9 cm^{−1} and g = 0.9 in the epidermis layer, μ_{a} = 0.46 cm^{−1}, μ_{s} = 356.5 cm^{−1} and g = 0.9 in the dermis layer, and μ_{a} = 230.5 cm^{−1}, μ_{s} = 94 cm^{−1} and g = 0.9 in the blood vessels. Cross-section images of the optical flux distributions obtained with conventional photon number-partition load balancing and path-partitioning load balancing and their respective calculation times are compared in Fig. 3(b), with the computation time for the conventional and proposed methods for photon numbers from 100 to 300,000 denoted by the green and blue, respectively. The simulation results for the two strategies exhibited similar light flux patterns and densities for the same number of photons, but the difference in computation time widened with an increase in the number of photons. The computation time for the conventional method was approximately 22 min for 300,000 photons. In NIR glucose spectrum analysis, reflectance spectra ranging from 1000 nm to 2500 nm are measured. In this context, sampling intervals of 1 nm or finer are necessary to determine the reflection spectra within a skin sample. If 1500 samples are required within that NIR range, the total computation time would be about 22 days. Even with the use of a multiple-GPU system, this calculation time is not feasible for practical applications.
In comparison, the simulation results for the proposed algorithm on the same multiple-GPU system are displayed in Fig. 5(c). In the experiment, the computation time increases linearly by photon number between 100 and 300,000 except for the initial nonlinear increase at the photon number less than about 100. For 300,000 photons, the proposed GPU calculation method was four times faster than the conventional method, even though our code was not completely optimized. It is expected that this increase in calculation speed will be maintained in MC simulations with a larger number of photons. In addition, the optimization of our initial code in MATLAB may enhance overall performance.
In conclusion, we have developed a parallel MC algorithm with path-partition load balancing for vector-wise GPU coding. A computational bottleneck in the conventional MC algorithm when calculating free path migration was identified and a simple path-partitioning parallel algorithm was proposed. Consequently, the computational efficiency of the proposed MATLAB GPU computing system demonstrated a four-fold improvement over the conventional system, validating the proposed algorithm. Although the GPU-based MC simulator requires further improvement in terms of code optimization, it is believed that the proposed MC algorithm will be useful for practical applications in the development of human healthcare technology.
When the initial position and the destination position of a ray are (
In counting all passed bins, we should acquire all integer pairs of (
Let
Substituting Eq. (A3) into Eq. (A1) and using the floor function, we have the bin counting formula indicating the coordinate of the bins on the ray as
Where
This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-00981, Development of Digital Holographic Metrology Technology for Phase Retrieval).