Ex) Article Title, Author, Keywords
Current Optics
and Photonics
Ex) Article Title, Author, Keywords
Curr. Opt. Photon. 2022; 6(3): 260-269
Published online June 25, 2022 https://doi.org/10.3807/COPP.2022.6.3.260
Copyright © Optical Society of Korea.
Yan Sun^{1}, Chunling Liu^{2}, Hongliu Ma^{1}, Wang Zhang^{1}
Corresponding author: wangzhang@jlu.edu.cn, ORCID 0000-0001-9029-1320
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Segmented planar imaging detector for electro-optical reconnaissance (SPIDER) is an emerging technology for optical imaging. However, this novel detection approach is faced with degraded imaging quality. In this study, a 6 × 6 planar waveguide is used after each lenslet to expand the field of view. The imaging principles of field-plane waveguide structures are described in detail. The local multiple sampling simulation mode is adopted to process the simulation of the improved imaging system. A novel image-reconstruction algorithm based on deep learning is proposed, which can effectively address the defects in imaging quality that arise during image reconstruction. The proposed algorithm is compared to a conventional algorithm to verify its better reconstruction results. The comparison of different scenarios confirms the suitability of the algorithm to the system in this paper.
Keywords: Deep learning, Image reconstruction, Optical imaging, Optical interferometry, Photonic integrated circuits
OCIS codes: (040.5160) Photodetectors; (110.3010) Image reconstruction techniques; (110.3175) Interferometric imaging
With the continuous development of space technology and increasing demand, in-orbit space detection technology has become more widely used. Interferometric imaging technology, due to its advantage of rich information content, is widely used in satellite remote sensing, mineral investigation, land measurement, material analysis, and other fields, and thus this technology has become the focus of the field of imaging [1, 2]. With more and more high-precision application requirements, technologies for high-resolution imaging usually have been designed with large apertures, leading to large volume, heavy weight, and high power consumption. The development of small-scale, low-weight, high-resolution space detection technology has become the key to promoting remote sensing.
With the rapid development of photonic integrated circuit (PIC) and interferometric imaging technologies, a small-scale interferometric optical imaging instrument known as the segmented planar imaging detector for electro-optical reconnaissance (SPIDER) was proposed [3]. In the SPIDER system, light from a scene is collected by a lenslet array and coupled into the optical waveguide on a PIC chip to form interferometric baselines. PICs sample the complex visibility of the target, and then the image is generated by processing the complex visibility information [4^{–}6].
In actual cases, sampling coverage tends to be incomplete, and direct Fourier transformation of frequency-domain information exhibits problems, such as poor imaging quality and artifacts. Furthermore, a single optical waveguide following each lenslet on a PIC only acquires narrow field-of-view information about the object [7]. It is therefore important to expand the field of view to increase the sampling information. Adopting an appropriate reconstruction algorithm to process the output images of the SPIDER introduced above is also essential.
Traditional reconstruction algorithms applied in the field of image reconstruction include the total variational algorithm (TVAL3) [8], the alternate direction method of multiply (ADMM) [9], the approximate messaging algorithm (AMP) and others [10]. However, the traditional reconstruction algorithms still have some problems. On the one hand, due to the iterative nature of the traditional algorithm solving the optimization problem, each iteration requires a large computational expenditure, thus yeliding a long reconstruction time. On the other hand, the traditional algorithms exhibit low reconstruction quality with a low sampling rate. Therefore, the traditional reconstruction algorithms still need to be improved. Along with enhanced computational power, deep learning has once again received widespread attention and application within various fields of computer vision [11^{–}14]. Some scholars have introduced deep learning into image and video reconstruction; the primary methods include the superimposed denoising autoencoder (SDA) and the convolutional neural network (CNN) [15]. A CNN is particularly attractive and more powerful, it is able to exploit the spatial correlation present in natural images, and each convolutional layer has many fewer learnable parameters than a fully connected layer.
In this paper, a planar waveguide structure is added to extend the field of view. A local multiple-sampling simulation mode for the SPIDER optical interferometric system is proposed. A novel image-reconstruction algorithm for the SPIDER is presented, which can directly learn a mapping from image-block proxies to image blocks. The multiple-sampling simulation mode inevitably produces artifacts between different fields of views, so a denoising method is introduced to reduce the blocky artifacts. A comparison of the traditional algorithm to the CNN algorithm for image quality confirms the superiority and feasibility of the presented reconstruction algorithm. This paper is organized as follows: Section 2 discusses the basic structure and imaging process of the SPIDER optical interferometric system, and describes the process of planar waveguides used to broaden the view of the SPIDER. Section 3 introduces a novel image-reconstruction framework based on a deep CNN. Section 4 shows simulated image-reconstruction results of SPIDER observations. The conclusion is given in section 5.
This section describes the structure of the SPIDER optical interferometric system and its imaging process, which includes a discussion of the design of the increased planar waveguides. Moreover, the system design of this study is described.
As shown in Fig. 1, the structure of the SPIDER system puts a linear array of lenslets onto a PIC card, with the PIC cards mounted as radial spokes on a disc [16]. The critical part of the SPIDER is the PIC, as shown in Fig. 2. The PIC integrates various optical waveguide devices, including arrayed waveguide gratings (AWGs), optical phase shifters, multi-mode interferometers (MMIs), and balanced detectors, to realize optical transmission, interference, light separation, photoelectric conversion, and other functions. AWGs are used to disperse broadband light from pairs of lenslets into different spectral channels, followed by 2 × 2 MMIs which combine light from corresponding input optical waveguide arrays after phase adjustment [17]. The balanced detectors receive interferometric information, and the measured complex visibility for each spatial frequency is calculated from the interferometric information.
The waveguide plane consists of multiple waveguides, integrated at the focal plane of the corresponding lenslet. Since light beams from different fields of view will converge at the focal plane of the lens, each single waveguide will guide the optical signal from its assigned field of view. The matching waveguides from the same location of two different focal planes can accomplish the goal of matching the light beams from corresponding field-of-view beams. Therefore, the more waveguides integrated on the focal plane, the larger the field of view obtained. In this paper, 6 × 6 optical waveguide arrays are designed behind each lenslet for extending the field of view, instead of a separate piece of optical waveguide, as shown in Fig. 3.
The SPIDER operating process is shown in Fig. 3, the beam from the target couple into PIC by lenslet array. Light beam transmission, interferometry and imaging are operated in PIC.
First, the lenslet array converges the spatial light into the optical waveguide, which couples the beam into the PIC. The light from different fields of view pass through the lenslet and cover the different position of planar waveguide as shown in Fig. 3. A larger waveguide area is advantageous for receiving more beams. The 6 × 6 planar waveguide deployed in this study receives a more comprehensive field of view. Each piece on the planar waveguide receives light from different fields of view of the target. Since the lenslet array adopts a planar distribution, all lenslet orientations remain consistent. The scenes are consistent for all lenslet. Therefore, the beam accepted by the planar waveguide after a different lenslet is also the same. Observations of the same field of view require the matching of the waveguides at the same corresponding positions after the paired lenslet.
After each pieces of the planar waveguide, the AWG is used to divide the whole band into narrower bands for interference processes. As shown in Fig. 3, the narrow interference band divided by the same field-of-view waveguide block after a baseline pair is coupled into the same MMI. Complex coherence information for a spatial-frequency point can be obtained through the MMI and the balanced detectors. According to the Van Cittert-Zernike principle, the characteristic distribution of the target source can be obtained from the information of complex coherence. Different baseline lengths and different bands constitute different spatial-frequency points, and the same field of view is reconstructed from multiple spatial-frequency points by digital signal processing (DSP). Finally, multiple pieces of FOV information are stitched together to obtain the complete image information for the target.
Furthermore, the coupling efficiency of the planar waveguides is discussed. Coupling efficiency is defined as the ratio of the optical power of the coupled incoming optical waveguide to the average power of the focal plane, which describes the efficiency problem of a beam coupled to an optical waveguide. When the incident light wave can be regarded as a plane wave and the waveguide’s center coincides with the focus, without relative inclination or offset, the coupling efficiency can be expressed as
where η is the coupling efficiency, β is coupling coefficient,
Meanwhile, according to the relationship between the coupling efficiency and the field-of-view angle, the coupling efficiency of the system decreases rapidly as the field-of-view angle increases. To ensure the incidence efficiency of the light, the absolute angle between the target beam and the optical axis is limited to the range of 0.5 λ /
where
The parameters of the lenslet are correlated to the coupling efficiency. Lenslet F/# is determined by the following:
According to a survey of visible-light near-infrared waveguides, the mode-field radius is generally about 3.5 μm. To maximize the coupling efficiency, the coupling coefficient is 1.12, and the F/# is about 6.54, as calculated from Eq. (5).
Therefore, the SPIDER system’s parameters presented in this paper are shown in Table 1.
TABLE 1 The parameters used for the simulations
Parameter | Symbol | Value |
---|---|---|
Wavelength (nm) | 500–1000 | |
Number of Spectral Segments | 10 | |
Lenslet Diameter (mm) | 5 | |
Longest Baseline (m) | 0.5 | |
Number of PIC Spokes | 37 | |
Number of Lenslets per PIC Spoke | 26 | |
Scene Distance (km) | 500 | |
Lenslet Focal Length (mm) | 32 |
The line field of view is
where
where λmin is the minimum working wavelength and
Spatial-frequency coverage is affected by the lenslet pairing method, which has a direct influence on the image quality. The pairing method of the lenslets can be described as follows. Supposing each interferometric arm consists of
This section presents the overall simulation of the imaging process of the SPIDER and the image-reconstruction method based on deep learning for the SPIDER optical interferometric system suggested in this paper. For the image-reconstruction algorithm, the design and training process of the reconstruction model are analyzed in detail.
According to the design of the 6 × 6 optical waveguide array behind each lenslet, a local multiple-sampling simulation mode is used for this imaging system. The scene is divided into nonoverlapped narrow scene blocks using 6 × 6 optical waveguide arrays. However, the simulation cannot realize the simultaneous sampling of multiple narrow scenes and the PIC’s internal processes. The planar waveguides can be considered to be composed of multiple waveguides that sample different fields of view. Therefore, the simulation enables the system to sample the different local scenes of the target and process their frequency-domain information separately, and the simulation of the 6 × 6 planar waveguide is achieved through multiple sampling. After the process of sampling and obtaining the images of the local scenes, each narrow scene image is reconstructed by feeding in the pretrained reconstruction model. The reconstructed blocks of all of the local narrow scenes are arranged to form an intermediate reconstruction image of the target. The blocky artifacts of the intermediate reconstruction image are removed by the denoising algorithm to obtain the final target image.
The sampling and imaging process of the local fields of view is described in detail, as shown in Fig. 5. According to the Van Cittert-Zernike principle, the two-dimensional intensity distribution of the target can be obtained by inverse Fourier transform to the phase and amplitude of target spatial spectral points. The imaging process can be described by the following:
where
The quality of the local fields of view of the target for direct imaging is poor, due to incomplete sampling [18]. Therefore, images from direct imaging are input into the pretrained reconstruction model to reconstruct images. The intermediate reconstruction image is acquired by appropriately arranging the reconstruction-image blocks of all the narrow scenes.
Due to the multiple-sampling processing, blocky artifacts exist in the intermediate reconstruction image. The block-matching and 3D filtering (BM3D) algorithm is chosen in the reconstruction framework to remove the artifacts and obtain the final reconstruction image, due to the superior compromise of the BM3D algorithm between time complexity and reconstruction quality [19].
Here we describe the architecture of the deep CNN, and by comparing the numbers of different channels and layers, the optimal model structure is selected. The architecture of the CNN can be described as follows. A structure of a fully convolutional layer is employed; apart from the final convolutional layer, all of the other layers adopt the ReLU function (discussed later). The first convolutional layer is used to expand the channels, and the latter convolution layers gradually reduce the number of channels. Each feature map produced by the convolutional layers is equal in image block size. The last convolutional layer uses the kernel of size 3 × 3 and generates a single feature map, which in the case of the last layer is also the output of the network. We used appropriate zero padding to keep the feature-map size constant in all layers.
The number of layers of the model has a large influence on reconstructed-image quality. For light models the training speed is faster, and each epoch takes less time. Different models show different loss at convergence, and have corresponding effects on the quality of model training. Therefore, the model selection needs to be considered in terms of the convergence rate as well as the loss at convergence. Based on the same model structure, we discuss models extending to different channel numbers. When the number of channels increases, the number of layers increases accordingly. We compare the convergence of models extending to 64, 128, and 256 channels with a partial training set. The convergence process for the different models is shown in Fig. 6.
Compared to the other models, model 1 with 64 channels converges faster. The final convergence loss of model 1 is poor, as shown in Table 2, and the subsequent epoch loss changes less. Too many layers can cause slow and unstable convergence. For model 3 with 256 channels, each epoch takes a longer time and converges more slowly. The model with 128 channels (model 2) is optimized compared to the others. Model 2 converges better than the smaller model and faster than the larger model. Meanwhile, the loss of model 2 at convergence is optimal. Therefore, model 2 is selected as the reconstruction model, and the reconstruction process is shown in Fig. 7.
TABLE 2 Comparison of different models
Model | Maximum Channels | Number of Layers | Epoch of Convergence | Time in Total (min) | Loss at Convergence |
---|---|---|---|---|---|
Model 1 | 64 | 8 | 78 | 69.2 | 0.036 |
Model 2 | 128 | 10 | 84 | 102.9 | 0.026 |
Model 3 | 256 | 12 | 125 | 163.2 | 0.038 |
As shown in Fig. 7, the input to the network is an image block proxy
where
In this section, we introduce the training process of the image-reconstruction model based on deep learning. The network architecture used is shown in Fig. 7.
Images for training the reconstruction model were the iSAID dataset, consisting of 1411 remote-sensing images [21]. Each image is of a size from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. The images are collected from different sensors and platforms, including Google Earth, GF-2, and JL-1 satellites. The GSD of GF-2 is 1 m, the wavelength range is 500^{–}800 nm, and the orbit altitude is 656 km. The GSD of JL-1 is 0.72 m, the wavelength range is 450^{–}900 nm, and the orbit altitude is 645 km. The images from Google Earth are collected from different platforms; the GSD is from 0.096^{–}4.496 m (mainly distributed within 0.1^{–}0.2 m), and the wavelength range is the visible-light band. Therefore, according to the dataset source, our system parameters are as close as possible to the parameters of the corresponding satellite. We retained only the luminance component of the images. To facilitate the reconstruction of the training process, images were converted to grayscale. The conversion of the color images is actually the conversion of RGB and grayscale values, according to the formula
where Grey is the grayscale value and R, G, and B represent the three channel values of the color plot.
To train the reconstruction model, the training set is obtained from multiple local sampling and imaging of the target. According to the 6 × 6 planar waveguide structure, 36 samples are needed to sample the different positions of the target; each sampling goes through the system mask and frequency-domain conversion process. The target images are divided into image blocks with same size of training set to form the label of training set. Thus, an input-label pair in the training set can be represented as (
This section presents the loss function for training the reconstruction model. The loss function evaluates the difference between the ground-truth images and the prediction images generated by the deep-learning network. The training of the reconstruction model is driven by the error between the label and the reconstructed image. The mean square error (MSE) is chosen as the loss function in this paper. The loss function can be represented as
where {Ω} indicates back propagation,
In this section, we discuss the result of image reconstruction simulation on the SPIDER system. Moreover, the improved CNN is compared to the traditional image-reconstruction algorithm, showing that the improved CNN is more suitable for image reconstruction in this imaging system.
The peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) are used as the evaluation parameters of image reconstruction, while estimating the performance of the reconstruction framework in terms of time complexity. For a test image
where MAX is the maximum possible pixel value of the image, μX and μY are the averages of
The computer used in this simulated experiment is an x64-compatible desktop computer; the operating system of the computer is Windows 10 64-bit, the computer processor is Xeon Silver 4210R, the memory of the computer is 64 GB, and the graphics card is NVIDIA 1080Ti 11 GB.
For our simulated experiments, we choose a total of 137 grayscale graphs containing the 6 scenarios from the iSAID dataset as the test set. Different scenes include ports, airports, transportation hubs, open areas, parking lots, and residential areas, each of which contains more than 20 images. The test-set images are not included in the training set. The background area, the number of objects, and the size of each object are different in different scenes. The reconstruction of different scenarios is used to illustrate the suitability of the reconstruction model for the imaging system. To further verify the feasibility of our algorithm, it is also compared to the traditional reconstruction algorithm TVAL3 and the CNN-based ReconNet network, in terms of reconstructed-image quality and running speed respectively. For the ReconNet network and our improved CNN algorithm as mentioned above, we use model simulations to generate measurement images, to sample and block the training set images according to the proposed optical waveguide array, to generate the reconstruction model based on the training set, and to reconstruct the test images using the corresponding model. The testing images are sampled and chunked in the TVAL3 algorithm, each image block is reconstructed, and the reconstructed image blocks are combined into intermediate images.
For the intermediate images from the algorithm based on the learning method we use the BM3D denoiser to eliminate interblock artifacts and obtain the final reconstructed image. Figure 8 shows the reconstruction effect of the various algorithms on the partial images from different scenarios in the test set. The reconstruction effect of the middle image is also shown. The calculated PSNRs and SSIMs for different scenarios by different reconstruction algorithms are listed in Table 3. The comparison of average times for the different algorithms is presented in Table 4.
TABLE 3 Average imaging quality of the test set under different algorithms
Scenarios | Algorithm | PSNR | SSIM |
---|---|---|---|
Port | Improved | 17.6567 | 0.6471 |
BM3D | 17.2318 | 0.6018 | |
ReconNet | 16.4819 | 0.6010 | |
TVAL3 | 14.6388 | 0.3674 | |
Airport | Improved | 15.8698 | 0.6478 |
BM3D | 15.6813 | 0.6198 | |
ReconNet | 14.1134 | 0.5093 | |
TVAL3 | 10.8388 | 0.3039 | |
Traffic Hub | Improved | 15.8111 | 0.5743 |
BM3D | 15.3484 | 0.4787 | |
ReconNet | 15.4325 | 0.4493 | |
TVAL3 | 13.8840 | 0.2005 | |
Open Space | Improved | 17.7446 | 0.5706 |
BM3D | 17.4190 | 0.5125 | |
ReconNet | 17.4009 | 0.5130 | |
TVAL3 | 16.0246 | 0.3085 | |
Depot | Improved | 16.3730 | 0.5218 |
BM3D | 15.8724 | 0.4556 | |
ReconNet | 15.4824 | 0.4005 | |
TVAL3 | 14.3906 | 0.2533 | |
Uptown | Improved | 16.4682 | 0.6043 |
BM3D | 16.1081 | 0.5283 | |
ReconNet | 15.8715 | 0.5142 | |
TVAL3 | 13.8552 | 0.2096 |
TABLE 4 Average running times of different algorithms
Algorithm | Time | ||
---|---|---|---|
Training (min) | Reconstruction (s) | Total (min) | |
Improved | 1578 | 0.1939 | 1578.003 |
ReconNet | 1216 | 0.1343 | 1216.002 |
TVAL3 | 0 | 25.24005 | 0.337 |
Image reconstruction was carried out using the novel reconstruction algorithm. According to Table 3, the improved algorithm is better than other algorithm in terms of imaging quality. The reconstructed-image quality varied for the different scenarios. For a detail-rich scene, such as uptown, deep-learning-based algorithms have significant advantages, as shown in Table 3. As can be seen from Fig. 8, the images restored using the learning algorithm have a better noise-removal ability compare to traditional image-reconstruction algorithm. Meanwhile, the improved CNN reconstruction algorithm further improves the reconstructed-image quality, compared to ReconNet. In addition, the BM3D algorithm smoothes the image as a whole, resulting in a decrease in the image quality index. The image quality is still better than that for the ReconNet after BM3D smoothing, as shown in Table 3. Meanwhile, the improved CNN performs well for different scenarios. This shows that the reconstruction algorithm is more suitable for the system proposed above.
Comparing the overall reconstruction time, the learning-based algorithm makes the reconstruction time longer than for the traditional algorithm, due to the longer time of the training process, as shown in Table 4. However, the models trained by learning-based algorithms may be adapted for various scenarios, without the need for retraining. The time for reconstruction processes alone, with learning-based algorithms, is far less than that needed by traditional algorithms. Therefore, for more images-reconstruction tasks, learning-based methods are relatively time-saving. In the actual observation process, usually one needs to process a large amount of data, and the computing time of the neural-network algorithm will be greatly improved, compared to the traditional algorithm. For the improved CNN algorithm, the corresponding time consumption has increased, due to the more complex network structure used to improve the quality of the reconstructed image, compared to ReconNet.
In this study, a 6 × 6 optical waveguide array was used following each lenslet to expand the imaging field of view. A novel algorithm based on deep learning for the blockwise processing of SPIDER image reconstruction was proposed. The BM3D algorithm was applied to the blockwise processing to denoise intermediate reconstruction images. Based on deep-learning theory, the image-reconstruction framework for the SPIDER system was established. Simulated results show that the proposed reconstruction algorithm in this paper has achieved a significant increase in quality for reconstructing images under different scenarios. This shows that the reconstruction algorithm is suitable to the system proposed above. Based on the characteristics of extensive applicability, learning-based algorithms obtain high speed when processing large amounts of data.
Although the quality of image reconstruction for the SPIDER system is dramatically increased based on the results of this paper, there are still shortcomings for optical interferometric imaging, such as blocky artifacts and degraded image quality. In the future, the image-reconstruction algorithm based on deep learning, which can increase the reconstruction efficiency, will continue to be researched.
The authors declare no conflicts of interest.
Data underlying the results presented in this paper are not publicly available at the time of publication, which may be obtained from the authors upon reasonable request.
The authors would like to thank the Editor in Chief, the Associate Editor, and the reviewers for their insightful comments and suggestions.
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Curr. Opt. Photon. 2022; 6(3): 260-269
Published online June 25, 2022 https://doi.org/10.3807/COPP.2022.6.3.260
Copyright © Optical Society of Korea.
Yan Sun^{1}, Chunling Liu^{2}, Hongliu Ma^{1}, Wang Zhang^{1}
^{1}School of Mechanical and Aerospace Engineering, Jilin University, Changchun 130025, China
^{2}Meteorological Service Center, Henan Meteorological Administration, Zhengzhou 450003, China
Correspondence to:wangzhang@jlu.edu.cn, ORCID 0000-0001-9029-1320
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Segmented planar imaging detector for electro-optical reconnaissance (SPIDER) is an emerging technology for optical imaging. However, this novel detection approach is faced with degraded imaging quality. In this study, a 6 × 6 planar waveguide is used after each lenslet to expand the field of view. The imaging principles of field-plane waveguide structures are described in detail. The local multiple sampling simulation mode is adopted to process the simulation of the improved imaging system. A novel image-reconstruction algorithm based on deep learning is proposed, which can effectively address the defects in imaging quality that arise during image reconstruction. The proposed algorithm is compared to a conventional algorithm to verify its better reconstruction results. The comparison of different scenarios confirms the suitability of the algorithm to the system in this paper.
Keywords: Deep learning, Image reconstruction, Optical imaging, Optical interferometry, Photonic integrated circuits
With the continuous development of space technology and increasing demand, in-orbit space detection technology has become more widely used. Interferometric imaging technology, due to its advantage of rich information content, is widely used in satellite remote sensing, mineral investigation, land measurement, material analysis, and other fields, and thus this technology has become the focus of the field of imaging [1, 2]. With more and more high-precision application requirements, technologies for high-resolution imaging usually have been designed with large apertures, leading to large volume, heavy weight, and high power consumption. The development of small-scale, low-weight, high-resolution space detection technology has become the key to promoting remote sensing.
With the rapid development of photonic integrated circuit (PIC) and interferometric imaging technologies, a small-scale interferometric optical imaging instrument known as the segmented planar imaging detector for electro-optical reconnaissance (SPIDER) was proposed [3]. In the SPIDER system, light from a scene is collected by a lenslet array and coupled into the optical waveguide on a PIC chip to form interferometric baselines. PICs sample the complex visibility of the target, and then the image is generated by processing the complex visibility information [4^{–}6].
In actual cases, sampling coverage tends to be incomplete, and direct Fourier transformation of frequency-domain information exhibits problems, such as poor imaging quality and artifacts. Furthermore, a single optical waveguide following each lenslet on a PIC only acquires narrow field-of-view information about the object [7]. It is therefore important to expand the field of view to increase the sampling information. Adopting an appropriate reconstruction algorithm to process the output images of the SPIDER introduced above is also essential.
Traditional reconstruction algorithms applied in the field of image reconstruction include the total variational algorithm (TVAL3) [8], the alternate direction method of multiply (ADMM) [9], the approximate messaging algorithm (AMP) and others [10]. However, the traditional reconstruction algorithms still have some problems. On the one hand, due to the iterative nature of the traditional algorithm solving the optimization problem, each iteration requires a large computational expenditure, thus yeliding a long reconstruction time. On the other hand, the traditional algorithms exhibit low reconstruction quality with a low sampling rate. Therefore, the traditional reconstruction algorithms still need to be improved. Along with enhanced computational power, deep learning has once again received widespread attention and application within various fields of computer vision [11^{–}14]. Some scholars have introduced deep learning into image and video reconstruction; the primary methods include the superimposed denoising autoencoder (SDA) and the convolutional neural network (CNN) [15]. A CNN is particularly attractive and more powerful, it is able to exploit the spatial correlation present in natural images, and each convolutional layer has many fewer learnable parameters than a fully connected layer.
In this paper, a planar waveguide structure is added to extend the field of view. A local multiple-sampling simulation mode for the SPIDER optical interferometric system is proposed. A novel image-reconstruction algorithm for the SPIDER is presented, which can directly learn a mapping from image-block proxies to image blocks. The multiple-sampling simulation mode inevitably produces artifacts between different fields of views, so a denoising method is introduced to reduce the blocky artifacts. A comparison of the traditional algorithm to the CNN algorithm for image quality confirms the superiority and feasibility of the presented reconstruction algorithm. This paper is organized as follows: Section 2 discusses the basic structure and imaging process of the SPIDER optical interferometric system, and describes the process of planar waveguides used to broaden the view of the SPIDER. Section 3 introduces a novel image-reconstruction framework based on a deep CNN. Section 4 shows simulated image-reconstruction results of SPIDER observations. The conclusion is given in section 5.
This section describes the structure of the SPIDER optical interferometric system and its imaging process, which includes a discussion of the design of the increased planar waveguides. Moreover, the system design of this study is described.
As shown in Fig. 1, the structure of the SPIDER system puts a linear array of lenslets onto a PIC card, with the PIC cards mounted as radial spokes on a disc [16]. The critical part of the SPIDER is the PIC, as shown in Fig. 2. The PIC integrates various optical waveguide devices, including arrayed waveguide gratings (AWGs), optical phase shifters, multi-mode interferometers (MMIs), and balanced detectors, to realize optical transmission, interference, light separation, photoelectric conversion, and other functions. AWGs are used to disperse broadband light from pairs of lenslets into different spectral channels, followed by 2 × 2 MMIs which combine light from corresponding input optical waveguide arrays after phase adjustment [17]. The balanced detectors receive interferometric information, and the measured complex visibility for each spatial frequency is calculated from the interferometric information.
The waveguide plane consists of multiple waveguides, integrated at the focal plane of the corresponding lenslet. Since light beams from different fields of view will converge at the focal plane of the lens, each single waveguide will guide the optical signal from its assigned field of view. The matching waveguides from the same location of two different focal planes can accomplish the goal of matching the light beams from corresponding field-of-view beams. Therefore, the more waveguides integrated on the focal plane, the larger the field of view obtained. In this paper, 6 × 6 optical waveguide arrays are designed behind each lenslet for extending the field of view, instead of a separate piece of optical waveguide, as shown in Fig. 3.
The SPIDER operating process is shown in Fig. 3, the beam from the target couple into PIC by lenslet array. Light beam transmission, interferometry and imaging are operated in PIC.
First, the lenslet array converges the spatial light into the optical waveguide, which couples the beam into the PIC. The light from different fields of view pass through the lenslet and cover the different position of planar waveguide as shown in Fig. 3. A larger waveguide area is advantageous for receiving more beams. The 6 × 6 planar waveguide deployed in this study receives a more comprehensive field of view. Each piece on the planar waveguide receives light from different fields of view of the target. Since the lenslet array adopts a planar distribution, all lenslet orientations remain consistent. The scenes are consistent for all lenslet. Therefore, the beam accepted by the planar waveguide after a different lenslet is also the same. Observations of the same field of view require the matching of the waveguides at the same corresponding positions after the paired lenslet.
After each pieces of the planar waveguide, the AWG is used to divide the whole band into narrower bands for interference processes. As shown in Fig. 3, the narrow interference band divided by the same field-of-view waveguide block after a baseline pair is coupled into the same MMI. Complex coherence information for a spatial-frequency point can be obtained through the MMI and the balanced detectors. According to the Van Cittert-Zernike principle, the characteristic distribution of the target source can be obtained from the information of complex coherence. Different baseline lengths and different bands constitute different spatial-frequency points, and the same field of view is reconstructed from multiple spatial-frequency points by digital signal processing (DSP). Finally, multiple pieces of FOV information are stitched together to obtain the complete image information for the target.
Furthermore, the coupling efficiency of the planar waveguides is discussed. Coupling efficiency is defined as the ratio of the optical power of the coupled incoming optical waveguide to the average power of the focal plane, which describes the efficiency problem of a beam coupled to an optical waveguide. When the incident light wave can be regarded as a plane wave and the waveguide’s center coincides with the focus, without relative inclination or offset, the coupling efficiency can be expressed as
where η is the coupling efficiency, β is coupling coefficient,
Meanwhile, according to the relationship between the coupling efficiency and the field-of-view angle, the coupling efficiency of the system decreases rapidly as the field-of-view angle increases. To ensure the incidence efficiency of the light, the absolute angle between the target beam and the optical axis is limited to the range of 0.5 λ /
where
The parameters of the lenslet are correlated to the coupling efficiency. Lenslet F/# is determined by the following:
According to a survey of visible-light near-infrared waveguides, the mode-field radius is generally about 3.5 μm. To maximize the coupling efficiency, the coupling coefficient is 1.12, and the F/# is about 6.54, as calculated from Eq. (5).
Therefore, the SPIDER system’s parameters presented in this paper are shown in Table 1.
TABLE 1. The parameters used for the simulations.
Parameter | Symbol | Value |
---|---|---|
Wavelength (nm) | 500–1000 | |
Number of Spectral Segments | 10 | |
Lenslet Diameter (mm) | 5 | |
Longest Baseline (m) | 0.5 | |
Number of PIC Spokes | 37 | |
Number of Lenslets per PIC Spoke | 26 | |
Scene Distance (km) | 500 | |
Lenslet Focal Length (mm) | 32 |
The line field of view is
where
where λmin is the minimum working wavelength and
Spatial-frequency coverage is affected by the lenslet pairing method, which has a direct influence on the image quality. The pairing method of the lenslets can be described as follows. Supposing each interferometric arm consists of
This section presents the overall simulation of the imaging process of the SPIDER and the image-reconstruction method based on deep learning for the SPIDER optical interferometric system suggested in this paper. For the image-reconstruction algorithm, the design and training process of the reconstruction model are analyzed in detail.
According to the design of the 6 × 6 optical waveguide array behind each lenslet, a local multiple-sampling simulation mode is used for this imaging system. The scene is divided into nonoverlapped narrow scene blocks using 6 × 6 optical waveguide arrays. However, the simulation cannot realize the simultaneous sampling of multiple narrow scenes and the PIC’s internal processes. The planar waveguides can be considered to be composed of multiple waveguides that sample different fields of view. Therefore, the simulation enables the system to sample the different local scenes of the target and process their frequency-domain information separately, and the simulation of the 6 × 6 planar waveguide is achieved through multiple sampling. After the process of sampling and obtaining the images of the local scenes, each narrow scene image is reconstructed by feeding in the pretrained reconstruction model. The reconstructed blocks of all of the local narrow scenes are arranged to form an intermediate reconstruction image of the target. The blocky artifacts of the intermediate reconstruction image are removed by the denoising algorithm to obtain the final target image.
The sampling and imaging process of the local fields of view is described in detail, as shown in Fig. 5. According to the Van Cittert-Zernike principle, the two-dimensional intensity distribution of the target can be obtained by inverse Fourier transform to the phase and amplitude of target spatial spectral points. The imaging process can be described by the following:
where
The quality of the local fields of view of the target for direct imaging is poor, due to incomplete sampling [18]. Therefore, images from direct imaging are input into the pretrained reconstruction model to reconstruct images. The intermediate reconstruction image is acquired by appropriately arranging the reconstruction-image blocks of all the narrow scenes.
Due to the multiple-sampling processing, blocky artifacts exist in the intermediate reconstruction image. The block-matching and 3D filtering (BM3D) algorithm is chosen in the reconstruction framework to remove the artifacts and obtain the final reconstruction image, due to the superior compromise of the BM3D algorithm between time complexity and reconstruction quality [19].
Here we describe the architecture of the deep CNN, and by comparing the numbers of different channels and layers, the optimal model structure is selected. The architecture of the CNN can be described as follows. A structure of a fully convolutional layer is employed; apart from the final convolutional layer, all of the other layers adopt the ReLU function (discussed later). The first convolutional layer is used to expand the channels, and the latter convolution layers gradually reduce the number of channels. Each feature map produced by the convolutional layers is equal in image block size. The last convolutional layer uses the kernel of size 3 × 3 and generates a single feature map, which in the case of the last layer is also the output of the network. We used appropriate zero padding to keep the feature-map size constant in all layers.
The number of layers of the model has a large influence on reconstructed-image quality. For light models the training speed is faster, and each epoch takes less time. Different models show different loss at convergence, and have corresponding effects on the quality of model training. Therefore, the model selection needs to be considered in terms of the convergence rate as well as the loss at convergence. Based on the same model structure, we discuss models extending to different channel numbers. When the number of channels increases, the number of layers increases accordingly. We compare the convergence of models extending to 64, 128, and 256 channels with a partial training set. The convergence process for the different models is shown in Fig. 6.
Compared to the other models, model 1 with 64 channels converges faster. The final convergence loss of model 1 is poor, as shown in Table 2, and the subsequent epoch loss changes less. Too many layers can cause slow and unstable convergence. For model 3 with 256 channels, each epoch takes a longer time and converges more slowly. The model with 128 channels (model 2) is optimized compared to the others. Model 2 converges better than the smaller model and faster than the larger model. Meanwhile, the loss of model 2 at convergence is optimal. Therefore, model 2 is selected as the reconstruction model, and the reconstruction process is shown in Fig. 7.
TABLE 2. Comparison of different models.
Model | Maximum Channels | Number of Layers | Epoch of Convergence | Time in Total (min) | Loss at Convergence |
---|---|---|---|---|---|
Model 1 | 64 | 8 | 78 | 69.2 | 0.036 |
Model 2 | 128 | 10 | 84 | 102.9 | 0.026 |
Model 3 | 256 | 12 | 125 | 163.2 | 0.038 |
As shown in Fig. 7, the input to the network is an image block proxy
where
In this section, we introduce the training process of the image-reconstruction model based on deep learning. The network architecture used is shown in Fig. 7.
Images for training the reconstruction model were the iSAID dataset, consisting of 1411 remote-sensing images [21]. Each image is of a size from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. The images are collected from different sensors and platforms, including Google Earth, GF-2, and JL-1 satellites. The GSD of GF-2 is 1 m, the wavelength range is 500^{–}800 nm, and the orbit altitude is 656 km. The GSD of JL-1 is 0.72 m, the wavelength range is 450^{–}900 nm, and the orbit altitude is 645 km. The images from Google Earth are collected from different platforms; the GSD is from 0.096^{–}4.496 m (mainly distributed within 0.1^{–}0.2 m), and the wavelength range is the visible-light band. Therefore, according to the dataset source, our system parameters are as close as possible to the parameters of the corresponding satellite. We retained only the luminance component of the images. To facilitate the reconstruction of the training process, images were converted to grayscale. The conversion of the color images is actually the conversion of RGB and grayscale values, according to the formula
where Grey is the grayscale value and R, G, and B represent the three channel values of the color plot.
To train the reconstruction model, the training set is obtained from multiple local sampling and imaging of the target. According to the 6 × 6 planar waveguide structure, 36 samples are needed to sample the different positions of the target; each sampling goes through the system mask and frequency-domain conversion process. The target images are divided into image blocks with same size of training set to form the label of training set. Thus, an input-label pair in the training set can be represented as (
This section presents the loss function for training the reconstruction model. The loss function evaluates the difference between the ground-truth images and the prediction images generated by the deep-learning network. The training of the reconstruction model is driven by the error between the label and the reconstructed image. The mean square error (MSE) is chosen as the loss function in this paper. The loss function can be represented as
where {Ω} indicates back propagation,
In this section, we discuss the result of image reconstruction simulation on the SPIDER system. Moreover, the improved CNN is compared to the traditional image-reconstruction algorithm, showing that the improved CNN is more suitable for image reconstruction in this imaging system.
The peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) are used as the evaluation parameters of image reconstruction, while estimating the performance of the reconstruction framework in terms of time complexity. For a test image
where MAX is the maximum possible pixel value of the image, μX and μY are the averages of
The computer used in this simulated experiment is an x64-compatible desktop computer; the operating system of the computer is Windows 10 64-bit, the computer processor is Xeon Silver 4210R, the memory of the computer is 64 GB, and the graphics card is NVIDIA 1080Ti 11 GB.
For our simulated experiments, we choose a total of 137 grayscale graphs containing the 6 scenarios from the iSAID dataset as the test set. Different scenes include ports, airports, transportation hubs, open areas, parking lots, and residential areas, each of which contains more than 20 images. The test-set images are not included in the training set. The background area, the number of objects, and the size of each object are different in different scenes. The reconstruction of different scenarios is used to illustrate the suitability of the reconstruction model for the imaging system. To further verify the feasibility of our algorithm, it is also compared to the traditional reconstruction algorithm TVAL3 and the CNN-based ReconNet network, in terms of reconstructed-image quality and running speed respectively. For the ReconNet network and our improved CNN algorithm as mentioned above, we use model simulations to generate measurement images, to sample and block the training set images according to the proposed optical waveguide array, to generate the reconstruction model based on the training set, and to reconstruct the test images using the corresponding model. The testing images are sampled and chunked in the TVAL3 algorithm, each image block is reconstructed, and the reconstructed image blocks are combined into intermediate images.
For the intermediate images from the algorithm based on the learning method we use the BM3D denoiser to eliminate interblock artifacts and obtain the final reconstructed image. Figure 8 shows the reconstruction effect of the various algorithms on the partial images from different scenarios in the test set. The reconstruction effect of the middle image is also shown. The calculated PSNRs and SSIMs for different scenarios by different reconstruction algorithms are listed in Table 3. The comparison of average times for the different algorithms is presented in Table 4.
TABLE 3. Average imaging quality of the test set under different algorithms.
Scenarios | Algorithm | PSNR | SSIM |
---|---|---|---|
Port | Improved | 17.6567 | 0.6471 |
BM3D | 17.2318 | 0.6018 | |
ReconNet | 16.4819 | 0.6010 | |
TVAL3 | 14.6388 | 0.3674 | |
Airport | Improved | 15.8698 | 0.6478 |
BM3D | 15.6813 | 0.6198 | |
ReconNet | 14.1134 | 0.5093 | |
TVAL3 | 10.8388 | 0.3039 | |
Traffic Hub | Improved | 15.8111 | 0.5743 |
BM3D | 15.3484 | 0.4787 | |
ReconNet | 15.4325 | 0.4493 | |
TVAL3 | 13.8840 | 0.2005 | |
Open Space | Improved | 17.7446 | 0.5706 |
BM3D | 17.4190 | 0.5125 | |
ReconNet | 17.4009 | 0.5130 | |
TVAL3 | 16.0246 | 0.3085 | |
Depot | Improved | 16.3730 | 0.5218 |
BM3D | 15.8724 | 0.4556 | |
ReconNet | 15.4824 | 0.4005 | |
TVAL3 | 14.3906 | 0.2533 | |
Uptown | Improved | 16.4682 | 0.6043 |
BM3D | 16.1081 | 0.5283 | |
ReconNet | 15.8715 | 0.5142 | |
TVAL3 | 13.8552 | 0.2096 |
TABLE 4. Average running times of different algorithms.
Algorithm | Time | ||
---|---|---|---|
Training (min) | Reconstruction (s) | Total (min) | |
Improved | 1578 | 0.1939 | 1578.003 |
ReconNet | 1216 | 0.1343 | 1216.002 |
TVAL3 | 0 | 25.24005 | 0.337 |
Image reconstruction was carried out using the novel reconstruction algorithm. According to Table 3, the improved algorithm is better than other algorithm in terms of imaging quality. The reconstructed-image quality varied for the different scenarios. For a detail-rich scene, such as uptown, deep-learning-based algorithms have significant advantages, as shown in Table 3. As can be seen from Fig. 8, the images restored using the learning algorithm have a better noise-removal ability compare to traditional image-reconstruction algorithm. Meanwhile, the improved CNN reconstruction algorithm further improves the reconstructed-image quality, compared to ReconNet. In addition, the BM3D algorithm smoothes the image as a whole, resulting in a decrease in the image quality index. The image quality is still better than that for the ReconNet after BM3D smoothing, as shown in Table 3. Meanwhile, the improved CNN performs well for different scenarios. This shows that the reconstruction algorithm is more suitable for the system proposed above.
Comparing the overall reconstruction time, the learning-based algorithm makes the reconstruction time longer than for the traditional algorithm, due to the longer time of the training process, as shown in Table 4. However, the models trained by learning-based algorithms may be adapted for various scenarios, without the need for retraining. The time for reconstruction processes alone, with learning-based algorithms, is far less than that needed by traditional algorithms. Therefore, for more images-reconstruction tasks, learning-based methods are relatively time-saving. In the actual observation process, usually one needs to process a large amount of data, and the computing time of the neural-network algorithm will be greatly improved, compared to the traditional algorithm. For the improved CNN algorithm, the corresponding time consumption has increased, due to the more complex network structure used to improve the quality of the reconstructed image, compared to ReconNet.
In this study, a 6 × 6 optical waveguide array was used following each lenslet to expand the imaging field of view. A novel algorithm based on deep learning for the blockwise processing of SPIDER image reconstruction was proposed. The BM3D algorithm was applied to the blockwise processing to denoise intermediate reconstruction images. Based on deep-learning theory, the image-reconstruction framework for the SPIDER system was established. Simulated results show that the proposed reconstruction algorithm in this paper has achieved a significant increase in quality for reconstructing images under different scenarios. This shows that the reconstruction algorithm is suitable to the system proposed above. Based on the characteristics of extensive applicability, learning-based algorithms obtain high speed when processing large amounts of data.
Although the quality of image reconstruction for the SPIDER system is dramatically increased based on the results of this paper, there are still shortcomings for optical interferometric imaging, such as blocky artifacts and degraded image quality. In the future, the image-reconstruction algorithm based on deep learning, which can increase the reconstruction efficiency, will continue to be researched.
The authors declare no conflicts of interest.
Data underlying the results presented in this paper are not publicly available at the time of publication, which may be obtained from the authors upon reasonable request.
The authors would like to thank the Editor in Chief, the Associate Editor, and the reviewers for their insightful comments and suggestions.
The author(s) received no financial support for the research, authorship, and/or publication of this article.
TABLE 1 The parameters used for the simulations
Parameter | Symbol | Value |
---|---|---|
Wavelength (nm) | 500–1000 | |
Number of Spectral Segments | 10 | |
Lenslet Diameter (mm) | 5 | |
Longest Baseline (m) | 0.5 | |
Number of PIC Spokes | 37 | |
Number of Lenslets per PIC Spoke | 26 | |
Scene Distance (km) | 500 | |
Lenslet Focal Length (mm) | 32 |
TABLE 2 Comparison of different models
Model | Maximum Channels | Number of Layers | Epoch of Convergence | Time in Total (min) | Loss at Convergence |
---|---|---|---|---|---|
Model 1 | 64 | 8 | 78 | 69.2 | 0.036 |
Model 2 | 128 | 10 | 84 | 102.9 | 0.026 |
Model 3 | 256 | 12 | 125 | 163.2 | 0.038 |
TABLE 3 Average imaging quality of the test set under different algorithms
Scenarios | Algorithm | PSNR | SSIM |
---|---|---|---|
Port | Improved | 17.6567 | 0.6471 |
BM3D | 17.2318 | 0.6018 | |
ReconNet | 16.4819 | 0.6010 | |
TVAL3 | 14.6388 | 0.3674 | |
Airport | Improved | 15.8698 | 0.6478 |
BM3D | 15.6813 | 0.6198 | |
ReconNet | 14.1134 | 0.5093 | |
TVAL3 | 10.8388 | 0.3039 | |
Traffic Hub | Improved | 15.8111 | 0.5743 |
BM3D | 15.3484 | 0.4787 | |
ReconNet | 15.4325 | 0.4493 | |
TVAL3 | 13.8840 | 0.2005 | |
Open Space | Improved | 17.7446 | 0.5706 |
BM3D | 17.4190 | 0.5125 | |
ReconNet | 17.4009 | 0.5130 | |
TVAL3 | 16.0246 | 0.3085 | |
Depot | Improved | 16.3730 | 0.5218 |
BM3D | 15.8724 | 0.4556 | |
ReconNet | 15.4824 | 0.4005 | |
TVAL3 | 14.3906 | 0.2533 | |
Uptown | Improved | 16.4682 | 0.6043 |
BM3D | 16.1081 | 0.5283 | |
ReconNet | 15.8715 | 0.5142 | |
TVAL3 | 13.8552 | 0.2096 |
TABLE 4 Average running times of different algorithms
Algorithm | Time | ||
---|---|---|---|
Training (min) | Reconstruction (s) | Total (min) | |
Improved | 1578 | 0.1939 | 1578.003 |
ReconNet | 1216 | 0.1343 | 1216.002 |
TVAL3 | 0 | 25.24005 | 0.337 |