검색
검색 팝업 닫기

Ex) Article Title, Author, Keywords

Article

Split Viewer

Article

Curr. Opt. Photon. 2022; 6(3): 260-269

Published online June 25, 2022 https://doi.org/10.3807/COPP.2022.6.3.260

Copyright © Optical Society of Korea.

Image Reconstruction Based on Deep Learning for the SPIDER Optical Interferometric System

Yan Sun1, Chunling Liu2, Hongliu Ma1, Wang Zhang1

1School of Mechanical and Aerospace Engineering, Jilin University, Changchun 130025, China
2Meteorological Service Center, Henan Meteorological Administration, Zhengzhou 450003, China

Corresponding author: wangzhang@jlu.edu.cn, ORCID 0000-0001-9029-1320

Received: December 20, 2021; Revised: March 19, 2022; Accepted: March 31, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Segmented planar imaging detector for electro-optical reconnaissance (SPIDER) is an emerging technology for optical imaging. However, this novel detection approach is faced with degraded imaging quality. In this study, a 6 × 6 planar waveguide is used after each lenslet to expand the field of view. The imaging principles of field-plane waveguide structures are described in detail. The local multiple sampling simulation mode is adopted to process the simulation of the improved imaging system. A novel image-reconstruction algorithm based on deep learning is proposed, which can effectively address the defects in imaging quality that arise during image reconstruction. The proposed algorithm is compared to a conventional algorithm to verify its better reconstruction results. The comparison of different scenarios confirms the suitability of the algorithm to the system in this paper.

Keywords: Deep learning, Image reconstruction, Optical imaging, Optical interferometry, Photonic integrated circuits

OCIS codes: (040.5160) Photodetectors; (110.3010) Image reconstruction techniques; (110.3175) Interferometric imaging

With the continuous development of space technology and increasing demand, in-orbit space detection technology has become more widely used. Interferometric imaging technology, due to its advantage of rich information content, is widely used in satellite remote sensing, mineral investigation, land measurement, material analysis, and other fields, and thus this technology has become the focus of the field of imaging [1, 2]. With more and more high-precision application requirements, technologies for high-resolution imaging usually have been designed with large apertures, leading to large volume, heavy weight, and high power consumption. The development of small-scale, low-weight, high-resolution space detection technology has become the key to promoting remote sensing.

With the rapid development of photonic integrated circuit (PIC) and interferometric imaging technologies, a small-scale interferometric optical imaging instrument known as the segmented planar imaging detector for electro-optical reconnaissance (SPIDER) was proposed [3]. In the SPIDER system, light from a scene is collected by a lenslet array and coupled into the optical waveguide on a PIC chip to form interferometric baselines. PICs sample the complex visibility of the target, and then the image is generated by processing the complex visibility information [46].

In actual cases, sampling coverage tends to be incomplete, and direct Fourier transformation of frequency-domain information exhibits problems, such as poor imaging quality and artifacts. Furthermore, a single optical waveguide following each lenslet on a PIC only acquires narrow field-of-view information about the object [7]. It is therefore important to expand the field of view to increase the sampling information. Adopting an appropriate reconstruction algorithm to process the output images of the SPIDER introduced above is also essential.

Traditional reconstruction algorithms applied in the field of image reconstruction include the total variational algorithm (TVAL3) [8], the alternate direction method of multiply (ADMM) [9], the approximate messaging algorithm (AMP) and others [10]. However, the traditional reconstruction algorithms still have some problems. On the one hand, due to the iterative nature of the traditional algorithm solving the optimization problem, each iteration requires a large computational expenditure, thus yeliding a long reconstruction time. On the other hand, the traditional algorithms exhibit low reconstruction quality with a low sampling rate. Therefore, the traditional reconstruction algorithms still need to be improved. Along with enhanced computational power, deep learning has once again received widespread attention and application within various fields of computer vision [1114]. Some scholars have introduced deep learning into image and video reconstruction; the primary methods include the superimposed denoising autoencoder (SDA) and the convolutional neural network (CNN) [15]. A CNN is particularly attractive and more powerful, it is able to exploit the spatial correlation present in natural images, and each convolutional layer has many fewer learnable parameters than a fully connected layer.

In this paper, a planar waveguide structure is added to extend the field of view. A local multiple-sampling simulation mode for the SPIDER optical interferometric system is proposed. A novel image-reconstruction algorithm for the SPIDER is presented, which can directly learn a mapping from image-block proxies to image blocks. The multiple-sampling simulation mode inevitably produces artifacts between different fields of views, so a denoising method is introduced to reduce the blocky artifacts. A comparison of the traditional algorithm to the CNN algorithm for image quality confirms the superiority and feasibility of the presented reconstruction algorithm. This paper is organized as follows: Section 2 discusses the basic structure and imaging process of the SPIDER optical interferometric system, and describes the process of planar waveguides used to broaden the view of the SPIDER. Section 3 introduces a novel image-reconstruction framework based on a deep CNN. Section 4 shows simulated image-reconstruction results of SPIDER observations. The conclusion is given in section 5.

This section describes the structure of the SPIDER optical interferometric system and its imaging process, which includes a discussion of the design of the increased planar waveguides. Moreover, the system design of this study is described.

2.1. The Structure

As shown in Fig. 1, the structure of the SPIDER system puts a linear array of lenslets onto a PIC card, with the PIC cards mounted as radial spokes on a disc [16]. The critical part of the SPIDER is the PIC, as shown in Fig. 2. The PIC integrates various optical waveguide devices, including arrayed waveguide gratings (AWGs), optical phase shifters, multi-mode interferometers (MMIs), and balanced detectors, to realize optical transmission, interference, light separation, photoelectric conversion, and other functions. AWGs are used to disperse broadband light from pairs of lenslets into different spectral channels, followed by 2 × 2 MMIs which combine light from corresponding input optical waveguide arrays after phase adjustment [17]. The balanced detectors receive interferometric information, and the measured complex visibility for each spatial frequency is calculated from the interferometric information.

Figure 1.The structure of the SPIDER system.

Figure 2.The working principle of a PIC.

The waveguide plane consists of multiple waveguides, integrated at the focal plane of the corresponding lenslet. Since light beams from different fields of view will converge at the focal plane of the lens, each single waveguide will guide the optical signal from its assigned field of view. The matching waveguides from the same location of two different focal planes can accomplish the goal of matching the light beams from corresponding field-of-view beams. Therefore, the more waveguides integrated on the focal plane, the larger the field of view obtained. In this paper, 6 × 6 optical waveguide arrays are designed behind each lenslet for extending the field of view, instead of a separate piece of optical waveguide, as shown in Fig. 3.

Figure 3.The schematic of the SPIDER’s operating process.

2.2. The Optical Imaging Process

The SPIDER operating process is shown in Fig. 3, the beam from the target couple into PIC by lenslet array. Light beam transmission, interferometry and imaging are operated in PIC.

First, the lenslet array converges the spatial light into the optical waveguide, which couples the beam into the PIC. The light from different fields of view pass through the lenslet and cover the different position of planar waveguide as shown in Fig. 3. A larger waveguide area is advantageous for receiving more beams. The 6 × 6 planar waveguide deployed in this study receives a more comprehensive field of view. Each piece on the planar waveguide receives light from different fields of view of the target. Since the lenslet array adopts a planar distribution, all lenslet orientations remain consistent. The scenes are consistent for all lenslet. Therefore, the beam accepted by the planar waveguide after a different lenslet is also the same. Observations of the same field of view require the matching of the waveguides at the same corresponding positions after the paired lenslet.

After each pieces of the planar waveguide, the AWG is used to divide the whole band into narrower bands for interference processes. As shown in Fig. 3, the narrow interference band divided by the same field-of-view waveguide block after a baseline pair is coupled into the same MMI. Complex coherence information for a spatial-frequency point can be obtained through the MMI and the balanced detectors. According to the Van Cittert-Zernike principle, the characteristic distribution of the target source can be obtained from the information of complex coherence. Different baseline lengths and different bands constitute different spatial-frequency points, and the same field of view is reconstructed from multiple spatial-frequency points by digital signal processing (DSP). Finally, multiple pieces of FOV information are stitched together to obtain the complete image information for the target.

Furthermore, the coupling efficiency of the planar waveguides is discussed. Coupling efficiency is defined as the ratio of the optical power of the coupled incoming optical waveguide to the average power of the focal plane, which describes the efficiency problem of a beam coupled to an optical waveguide. When the incident light wave can be regarded as a plane wave and the waveguide’s center coincides with the focus, without relative inclination or offset, the coupling efficiency can be expressed as

η=2[1exp(β2)]2β2

β=πRw0λf

where η is the coupling efficiency, β is coupling coefficient, R is the radius of the lenslet, w0 is the mode-field radius of a single-mode waveguide, λ is the wavelength, and f is the focal length of lenslet. According to Eq. (1), in the case with no aberration and no turbulence, the maximum coupling efficiency for a coupling-coefficient value of 1.12 is 81.45%.

Meanwhile, according to the relationship between the coupling efficiency and the field-of-view angle, the coupling efficiency of the system decreases rapidly as the field-of-view angle increases. To ensure the incidence efficiency of the light, the absolute angle between the target beam and the optical axis is limited to the range of 0.5 λ / d. Then the full field of view (FOV) is

FOV=λd

where d is the lenslet’s diameter. Deploying an M × M planar waveguide after each microlensing is equivalent to setting multiple response waveguides, and each part of the field of view is received independently by the waveguide block of the planar waveguide, so that the expanded field of view is

FOV=Mλd

2.3. Design of the Optical System

The parameters of the lenslet are correlated to the coupling efficiency. Lenslet F/# is determined by the following:

F/#=πw02λβ

According to a survey of visible-light near-infrared waveguides, the mode-field radius is generally about 3.5 μm. To maximize the coupling efficiency, the coupling coefficient is 1.12, and the F/# is about 6.54, as calculated from Eq. (5).

Therefore, the SPIDER system’s parameters presented in this paper are shown in Table 1.

TABLE 1 The parameters used for the simulations

ParameterSymbolValue
Wavelength (nm)λ500–1000
Number of Spectral Segmentsn10
Lenslet Diameter (mm)d5
Longest Baseline (m)Bmax0.5
Number of PIC SpokesP37
Number of Lenslets per PIC SpokeN26
Scene Distance (km)z500
Lenslet Focal Length (mm)f32


The line field of view is

ε=MλZd

where Z is the observational distance. The maximum resolution of the system is obtained at the maximum baseline length and minimum working wavelength, which can be represented as

Rmax=λminZBmax

where λmin is the minimum working wavelength and Bmax is the maximum baseline length. Therefore, when a planar waveguide of dimension 6 × 6 is configured, the line field of view of the system can reach 450 m and the maximum resolution is 0.75 m.

Spatial-frequency coverage is affected by the lenslet pairing method, which has a direct influence on the image quality. The pairing method of the lenslets can be described as follows. Supposing each interferometric arm consists of N lenslets, the lenslet pairing follows a symmetrical scheme: (1, N), (2, N − 1), (3, N − 2) … (N / 2, N / 2 + 1), as shown in Fig. 4.

Figure 4.The lenslet pairing method for a PIC.

This section presents the overall simulation of the imaging process of the SPIDER and the image-reconstruction method based on deep learning for the SPIDER optical interferometric system suggested in this paper. For the image-reconstruction algorithm, the design and training process of the reconstruction model are analyzed in detail.

3.1. Imaging Process Simulation

According to the design of the 6 × 6 optical waveguide array behind each lenslet, a local multiple-sampling simulation mode is used for this imaging system. The scene is divided into nonoverlapped narrow scene blocks using 6 × 6 optical waveguide arrays. However, the simulation cannot realize the simultaneous sampling of multiple narrow scenes and the PIC’s internal processes. The planar waveguides can be considered to be composed of multiple waveguides that sample different fields of view. Therefore, the simulation enables the system to sample the different local scenes of the target and process their frequency-domain information separately, and the simulation of the 6 × 6 planar waveguide is achieved through multiple sampling. After the process of sampling and obtaining the images of the local scenes, each narrow scene image is reconstructed by feeding in the pretrained reconstruction model. The reconstructed blocks of all of the local narrow scenes are arranged to form an intermediate reconstruction image of the target. The blocky artifacts of the intermediate reconstruction image are removed by the denoising algorithm to obtain the final target image.

The sampling and imaging process of the local fields of view is described in detail, as shown in Fig. 5. According to the Van Cittert-Zernike principle, the two-dimensional intensity distribution of the target can be obtained by inverse Fourier transform to the phase and amplitude of target spatial spectral points. The imaging process can be described by the following:

Figure 5.The sampling and imaging process.

f*x,y=F1{F[f(x,y)]Sample(u,v)}

where f *(x, y) is the image obtained directly by the reverse Fourier transformation of the spectrum image after sampling, f (x, y) is the local field of view of the target, F and F−1 are respectively the process of Fourier transformation and reverse Fourier transformation, and Sample(u, v) is the frequency coverage of the interference arrays. Therefore, the frequency coverage of the interference array masks the spectrum image obtaining by the Fourier transform of the local field of view, and then the reverse Fourier transform is used for the imaging of target spatial spectral points. The frequency coverage of the interference arrays is related to the baseline and wavelength; since the system is divided into ten spectral segments, frequency coverage of the interference arrays of the same baseline consists of a superposition of the central wavelengths for each spectral segment. Meanwhile, the overall spectrum coverage is composed of all of the baselines. Moreover, the masking process sets the value of sample location to target spatial spectral and the value of other location set to 0.

The quality of the local fields of view of the target for direct imaging is poor, due to incomplete sampling [18]. Therefore, images from direct imaging are input into the pretrained reconstruction model to reconstruct images. The intermediate reconstruction image is acquired by appropriately arranging the reconstruction-image blocks of all the narrow scenes.

Due to the multiple-sampling processing, blocky artifacts exist in the intermediate reconstruction image. The block-matching and 3D filtering (BM3D) algorithm is chosen in the reconstruction framework to remove the artifacts and obtain the final reconstruction image, due to the superior compromise of the BM3D algorithm between time complexity and reconstruction quality [19].

3.2. Image-reconstruction Algorithm

3.2.1. Designed Reconstruction Model

Here we describe the architecture of the deep CNN, and by comparing the numbers of different channels and layers, the optimal model structure is selected. The architecture of the CNN can be described as follows. A structure of a fully convolutional layer is employed; apart from the final convolutional layer, all of the other layers adopt the ReLU function (discussed later). The first convolutional layer is used to expand the channels, and the latter convolution layers gradually reduce the number of channels. Each feature map produced by the convolutional layers is equal in image block size. The last convolutional layer uses the kernel of size 3 × 3 and generates a single feature map, which in the case of the last layer is also the output of the network. We used appropriate zero padding to keep the feature-map size constant in all layers.

The number of layers of the model has a large influence on reconstructed-image quality. For light models the training speed is faster, and each epoch takes less time. Different models show different loss at convergence, and have corresponding effects on the quality of model training. Therefore, the model selection needs to be considered in terms of the convergence rate as well as the loss at convergence. Based on the same model structure, we discuss models extending to different channel numbers. When the number of channels increases, the number of layers increases accordingly. We compare the convergence of models extending to 64, 128, and 256 channels with a partial training set. The convergence process for the different models is shown in Fig. 6.

Figure 6.Loss variation for models with different numbers of layers.

Compared to the other models, model 1 with 64 channels converges faster. The final convergence loss of model 1 is poor, as shown in Table 2, and the subsequent epoch loss changes less. Too many layers can cause slow and unstable convergence. For model 3 with 256 channels, each epoch takes a longer time and converges more slowly. The model with 128 channels (model 2) is optimized compared to the others. Model 2 converges better than the smaller model and faster than the larger model. Meanwhile, the loss of model 2 at convergence is optimal. Therefore, model 2 is selected as the reconstruction model, and the reconstruction process is shown in Fig. 7.

TABLE 2 Comparison of different models

ModelMaximum ChannelsNumber of LayersEpoch of ConvergenceTime in Total (min)Loss at Convergence
Model 16487869.20.036
Model 21281084102.90.026
Model 325612125163.20.038


Figure 7.The image-reconstruction framework.

As shown in Fig. 7, the input to the network is an image block proxy x0. Then the kth feature map in the first convolutional layer receives xo as its input, and its output can be represented as

xc1k=ReLU((W1kx0)+B1k)

where W1k and B1k are the filter and bias values corresponding to the kth feature map of the first convolutional layer, and ReLU(x) = max(0, x) [20]. The feature maps of the other convolutional layers are obtained in a similar manner, except for the final convolutional layer. Although the filter shapes and biases could be different in other layers of the network, the principles for these layers are the same as for the first layer.

3.2.2. Training Reconstruction Model

In this section, we introduce the training process of the image-reconstruction model based on deep learning. The network architecture used is shown in Fig. 7.

Images for training the reconstruction model were the iSAID dataset, consisting of 1411 remote-sensing images [21]. Each image is of a size from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. The images are collected from different sensors and platforms, including Google Earth, GF-2, and JL-1 satellites. The GSD of GF-2 is 1 m, the wavelength range is 500800 nm, and the orbit altitude is 656 km. The GSD of JL-1 is 0.72 m, the wavelength range is 450900 nm, and the orbit altitude is 645 km. The images from Google Earth are collected from different platforms; the GSD is from 0.0964.496 m (mainly distributed within 0.10.2 m), and the wavelength range is the visible-light band. Therefore, according to the dataset source, our system parameters are as close as possible to the parameters of the corresponding satellite. We retained only the luminance component of the images. To facilitate the reconstruction of the training process, images were converted to grayscale. The conversion of the color images is actually the conversion of RGB and grayscale values, according to the formula

Grey=0.299R+0.587G+0.114B

where Grey is the grayscale value and R, G, and B represent the three channel values of the color plot.

To train the reconstruction model, the training set is obtained from multiple local sampling and imaging of the target. According to the 6 × 6 planar waveguide structure, 36 samples are needed to sample the different positions of the target; each sampling goes through the system mask and frequency-domain conversion process. The target images are divided into image blocks with same size of training set to form the label of training set. Thus, an input-label pair in the training set can be represented as (xo , x).

This section presents the loss function for training the reconstruction model. The loss function evaluates the difference between the ground-truth images and the prediction images generated by the deep-learning network. The training of the reconstruction model is driven by the error between the label and the reconstructed image. The mean square error (MSE) is chosen as the loss function in this paper. The loss function can be represented as

L({Ω})=1Ki Kf(y i,{Ω})x i2

where {Ω} indicates back propagation, K is the total number of image blocks in the training set, xi is the ith patch, and f (yi, {Ω}) is the network output for the ith patch. The loss function is converged using stochastic gradient descent with standard back propagation. For gradient descent, the batch size is set to 512 for all networks, and the learning rate is set to 0.0001.

In this section, we discuss the result of image reconstruction simulation on the SPIDER system. Moreover, the improved CNN is compared to the traditional image-reconstruction algorithm, showing that the improved CNN is more suitable for image reconstruction in this imaging system.

The peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) are used as the evaluation parameters of image reconstruction, while estimating the performance of the reconstruction framework in terms of time complexity. For a test image Y and its original image X, the PSNR and SSIM metrics are defined as [22]:

PSNR=10log10MAX2MSE

SSIM=(2μXμY+C1)(2σXY+C2)(μX2+μY2+C1)(σX2+σY2+C2)

where MAX is the maximum possible pixel value of the image, μX and μY are the averages of X and Y, σX2 and σY2 are the variances of X and Y, σXY is the covariance of X and Y, C1 and C2 are two variables used to stabilize division involving a weak denominator. The larger the PSNR, the better the image quality acquired. The larger the SSIM, the better the image quality acquired.

The computer used in this simulated experiment is an x64-compatible desktop computer; the operating system of the computer is Windows 10 64-bit, the computer processor is Xeon Silver 4210R, the memory of the computer is 64 GB, and the graphics card is NVIDIA 1080Ti 11 GB.

For our simulated experiments, we choose a total of 137 grayscale graphs containing the 6 scenarios from the iSAID dataset as the test set. Different scenes include ports, airports, transportation hubs, open areas, parking lots, and residential areas, each of which contains more than 20 images. The test-set images are not included in the training set. The background area, the number of objects, and the size of each object are different in different scenes. The reconstruction of different scenarios is used to illustrate the suitability of the reconstruction model for the imaging system. To further verify the feasibility of our algorithm, it is also compared to the traditional reconstruction algorithm TVAL3 and the CNN-based ReconNet network, in terms of reconstructed-image quality and running speed respectively. For the ReconNet network and our improved CNN algorithm as mentioned above, we use model simulations to generate measurement images, to sample and block the training set images according to the proposed optical waveguide array, to generate the reconstruction model based on the training set, and to reconstruct the test images using the corresponding model. The testing images are sampled and chunked in the TVAL3 algorithm, each image block is reconstructed, and the reconstructed image blocks are combined into intermediate images.

For the intermediate images from the algorithm based on the learning method we use the BM3D denoiser to eliminate interblock artifacts and obtain the final reconstructed image. Figure 8 shows the reconstruction effect of the various algorithms on the partial images from different scenarios in the test set. The reconstruction effect of the middle image is also shown. The calculated PSNRs and SSIMs for different scenarios by different reconstruction algorithms are listed in Table 3. The comparison of average times for the different algorithms is presented in Table 4.

TABLE 3 Average imaging quality of the test set under different algorithms

ScenariosAlgorithmPSNRSSIM
PortImproved17.65670.6471
BM3D17.23180.6018
ReconNet16.48190.6010
TVAL314.63880.3674
AirportImproved15.86980.6478
BM3D15.68130.6198
ReconNet14.11340.5093
TVAL310.83880.3039
Traffic HubImproved15.81110.5743
BM3D15.34840.4787
ReconNet15.43250.4493
TVAL313.88400.2005
Open SpaceImproved17.74460.5706
BM3D17.41900.5125
ReconNet17.40090.5130
TVAL316.02460.3085
DepotImproved16.37300.5218
BM3D15.87240.4556
ReconNet15.48240.4005
TVAL314.39060.2533
UptownImproved16.46820.6043
BM3D16.10810.5283
ReconNet15.87150.5142
TVAL313.85520.2096


TABLE 4 Average running times of different algorithms

AlgorithmTime
Training (min)Reconstruction (s)Total (min)
Improved15780.19391578.003
ReconNet12160.13431216.002
TVAL3025.240050.337


Figure 8.Restored images from different algorithms under different scenarios: (a) the original image, (b) improved convolutional neural network (CNN), (c) image after BM3D, (d) ReconNet, and (e) total variational algorithm (TVAL3).

Image reconstruction was carried out using the novel reconstruction algorithm. According to Table 3, the improved algorithm is better than other algorithm in terms of imaging quality. The reconstructed-image quality varied for the different scenarios. For a detail-rich scene, such as uptown, deep-learning-based algorithms have significant advantages, as shown in Table 3. As can be seen from Fig. 8, the images restored using the learning algorithm have a better noise-removal ability compare to traditional image-reconstruction algorithm. Meanwhile, the improved CNN reconstruction algorithm further improves the reconstructed-image quality, compared to ReconNet. In addition, the BM3D algorithm smoothes the image as a whole, resulting in a decrease in the image quality index. The image quality is still better than that for the ReconNet after BM3D smoothing, as shown in Table 3. Meanwhile, the improved CNN performs well for different scenarios. This shows that the reconstruction algorithm is more suitable for the system proposed above.

Comparing the overall reconstruction time, the learning-based algorithm makes the reconstruction time longer than for the traditional algorithm, due to the longer time of the training process, as shown in Table 4. However, the models trained by learning-based algorithms may be adapted for various scenarios, without the need for retraining. The time for reconstruction processes alone, with learning-based algorithms, is far less than that needed by traditional algorithms. Therefore, for more images-reconstruction tasks, learning-based methods are relatively time-saving. In the actual observation process, usually one needs to process a large amount of data, and the computing time of the neural-network algorithm will be greatly improved, compared to the traditional algorithm. For the improved CNN algorithm, the corresponding time consumption has increased, due to the more complex network structure used to improve the quality of the reconstructed image, compared to ReconNet.

In this study, a 6 × 6 optical waveguide array was used following each lenslet to expand the imaging field of view. A novel algorithm based on deep learning for the blockwise processing of SPIDER image reconstruction was proposed. The BM3D algorithm was applied to the blockwise processing to denoise intermediate reconstruction images. Based on deep-learning theory, the image-reconstruction framework for the SPIDER system was established. Simulated results show that the proposed reconstruction algorithm in this paper has achieved a significant increase in quality for reconstructing images under different scenarios. This shows that the reconstruction algorithm is suitable to the system proposed above. Based on the characteristics of extensive applicability, learning-based algorithms obtain high speed when processing large amounts of data.

Although the quality of image reconstruction for the SPIDER system is dramatically increased based on the results of this paper, there are still shortcomings for optical interferometric imaging, such as blocky artifacts and degraded image quality. In the future, the image-reconstruction algorithm based on deep learning, which can increase the reconstruction efficiency, will continue to be researched.

Data underlying the results presented in this paper are not publicly available at the time of publication, which may be obtained from the authors upon reasonable request.

The authors would like to thank the Editor in Chief, the Associate Editor, and the reviewers for their insightful comments and suggestions.

The author(s) received no financial support for the research, authorship, and/or publication of this article.

  1. W. Hasbi, Kamirul, M. Mukhayadi, and U. Renner, “The impact of space-based AIS antenna orientation on in-orbit AIS detection performance,” Appl. Sci. 9, 3319 (2019).
    CrossRef
  2. C. Saunders, D. Lobb, M. Sweeting, and Y. Gao, “Building large telescopes in orbit using small satellites,” Acta Astronaut. 141, 183-195 (2017).
    CrossRef
  3. R. P. Scott, T. Su, C. Ogden, S. T. Thurman, R. L. Kendrick, A. Duncan, R. Yu, and S. J. B. Yoo, “Demonstration of a photonic integrated circuit for multi-baseline interferometric imaging,” in Proc. IEEE Photonics Conference (San Diego, CA, USA, Oct. 12-16, 2014), pp. 1-2.
    CrossRef
  4. G.-M. Lv, Q. Li, Y.-T. Chen, H.-J. Feng, and J. Mu, “An improved scheme and numerical simulation of segmented planar imaging detector for electro-optical reconnaissance,” Opt. Rev. 26, 664-675 (2019).
    CrossRef
  5. W. Gao, Y. Yuan, X. Wang, L. Ma, Z. Zhao, and H. Yuan. “Quantitative analysis and optimization design of segmented planar integrated optical imaging system based on inhomogeneous multistage sampling lens array,” Opt. Express 29, 11869-11884 (2021).
    Pubmed CrossRef
  6. H. Hu, C. Liu, Y. Zhang, Q. Feng, and S. Liu, “Optimal design of segmented planar imaging for dense azimuthal sampling lens array,” Opt. Express 29, 24300-24314 (2021).
    Pubmed CrossRef
  7. O. Guyon, “Wide field interferometric imaging with single-mode fibers,” Astron. Astrophys. 387, 366-378 (2002).
    CrossRef
  8. C. Li, W. Yin, H. Jiang, and Y. Zhang, “An efficient augmented Lagrangian method with applications to total variation minimization,” Comput. Optim. Appl. 56, 507-530 (2013).
    CrossRef
  9. L. Pratley, J. D. McEwen, M. d’Avezac, R. E. Carrillo, A. Onose, and Y. Wiaux, “Robust sparse image reconstruction of radio interferometric observations with PURIFY,” Mon. Not. R. Astron. Soc. 473, 1038-1058 (2018).
    CrossRef
  10. C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denoising to compressed sensing,” IEEE Trans. Inform. Theory 62, 5117-5144 (2014).
    CrossRef
  11. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition-CVPR (Boston, MA, USA, Jun. 8-10, 2015), pp. 3431-3440.
    CrossRef
  12. J. Xue, Y.-Q. Zhao, Y. Bu, W. Liao, J. C.-W. Chan, and W. Philips, “Spatial-spectral structured sparse low-rank representation for hyperspectral image super-resolution,” IEEE Trans. Image Process 30, 3084-3097 (2021).
    Pubmed CrossRef
  13. D. Chang, Y. Ding, J. Xie, A. K. Bhunia, X. Li, Z. Ma, M. Wu, J. Guo, and Y. Z. Song, “The devil is in the channels: mutual-channel loss for fine-grained image classification,” IEEE Trans. Image Process 29, 4683-4695 (2020).
    Pubmed CrossRef
  14. Z. Ren, W. Luo, J. Yan, W. Liao, X. Yang, A. Yuille, and H. Zha, “STFlow: self-taught optical flow estimation using pseudo labels,” IEEE Trans. Image Process 29, 9113-9124 (2020).
    Pubmed CrossRef
  15. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “ReconNet: non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition-CVPR (Las Vegas, USA, Jun. 26-Jul. 1, 2016), pp. 449-458.
    KoreaMed CrossRef
  16. T. Su, G. Liu, K. E. Badham, S. T. Thurman, R. L. Kendrick, A. Duncan, D. Wuchenich, C. Ogden, G. Chriqui, S. Feng, J. Chun, and S. J. B. Yoo, “Interferometric imaging using Si3N4 photonic integrated circuits for a SPIDER imager,” Opt. Express 26, 12801-12812 (2018).
    Pubmed CrossRef
  17. Z. Leihong, Y. Xiao, Z. Dawei, and C. Jian, “Research on multiple-image encryption scheme based on Fourier transform and ghost imaging algorithm,” Curr. Opt. Photonics 2, 315-323 (2018).
  18. Y. Zhang, J. Deng, G. Liu, J. Fei, and H. Yang, “Simultaneous estimation of spatial frequency and phase based on an improved component cross-correlation algorithm for structured illumination microscopy,” Curr. Opt. Photonics 4, 317-325 (2020).
  19. K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process. 16, 2080-2095 (2007).
    Pubmed CrossRef
  20. X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proc. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (Fort Lauderdale, FL, USA, April. 11-13, 2011), pp. 315-323.
  21. S. W. Zamir, A. Arora, A. Gupta, S. Khan, G. Sun, F. S. Khan, F. Zhu, L. Shao, G.-S. Xia, and X. Bai, “iSAID: a large-scale dataset for instance segmentation in aerial images,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (Virtual Conference, Jun. 19-25, 2019), pp. 28-37.
  22. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13, 600-612 (2004).
    Pubmed CrossRef

Article

Article

Curr. Opt. Photon. 2022; 6(3): 260-269

Published online June 25, 2022 https://doi.org/10.3807/COPP.2022.6.3.260

Copyright © Optical Society of Korea.

Image Reconstruction Based on Deep Learning for the SPIDER Optical Interferometric System

Yan Sun1, Chunling Liu2, Hongliu Ma1, Wang Zhang1

1School of Mechanical and Aerospace Engineering, Jilin University, Changchun 130025, China
2Meteorological Service Center, Henan Meteorological Administration, Zhengzhou 450003, China

Correspondence to:wangzhang@jlu.edu.cn, ORCID 0000-0001-9029-1320

Received: December 20, 2021; Revised: March 19, 2022; Accepted: March 31, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Segmented planar imaging detector for electro-optical reconnaissance (SPIDER) is an emerging technology for optical imaging. However, this novel detection approach is faced with degraded imaging quality. In this study, a 6 × 6 planar waveguide is used after each lenslet to expand the field of view. The imaging principles of field-plane waveguide structures are described in detail. The local multiple sampling simulation mode is adopted to process the simulation of the improved imaging system. A novel image-reconstruction algorithm based on deep learning is proposed, which can effectively address the defects in imaging quality that arise during image reconstruction. The proposed algorithm is compared to a conventional algorithm to verify its better reconstruction results. The comparison of different scenarios confirms the suitability of the algorithm to the system in this paper.

Keywords: Deep learning, Image reconstruction, Optical imaging, Optical interferometry, Photonic integrated circuits

I. INTRODUCTION

With the continuous development of space technology and increasing demand, in-orbit space detection technology has become more widely used. Interferometric imaging technology, due to its advantage of rich information content, is widely used in satellite remote sensing, mineral investigation, land measurement, material analysis, and other fields, and thus this technology has become the focus of the field of imaging [1, 2]. With more and more high-precision application requirements, technologies for high-resolution imaging usually have been designed with large apertures, leading to large volume, heavy weight, and high power consumption. The development of small-scale, low-weight, high-resolution space detection technology has become the key to promoting remote sensing.

With the rapid development of photonic integrated circuit (PIC) and interferometric imaging technologies, a small-scale interferometric optical imaging instrument known as the segmented planar imaging detector for electro-optical reconnaissance (SPIDER) was proposed [3]. In the SPIDER system, light from a scene is collected by a lenslet array and coupled into the optical waveguide on a PIC chip to form interferometric baselines. PICs sample the complex visibility of the target, and then the image is generated by processing the complex visibility information [46].

In actual cases, sampling coverage tends to be incomplete, and direct Fourier transformation of frequency-domain information exhibits problems, such as poor imaging quality and artifacts. Furthermore, a single optical waveguide following each lenslet on a PIC only acquires narrow field-of-view information about the object [7]. It is therefore important to expand the field of view to increase the sampling information. Adopting an appropriate reconstruction algorithm to process the output images of the SPIDER introduced above is also essential.

Traditional reconstruction algorithms applied in the field of image reconstruction include the total variational algorithm (TVAL3) [8], the alternate direction method of multiply (ADMM) [9], the approximate messaging algorithm (AMP) and others [10]. However, the traditional reconstruction algorithms still have some problems. On the one hand, due to the iterative nature of the traditional algorithm solving the optimization problem, each iteration requires a large computational expenditure, thus yeliding a long reconstruction time. On the other hand, the traditional algorithms exhibit low reconstruction quality with a low sampling rate. Therefore, the traditional reconstruction algorithms still need to be improved. Along with enhanced computational power, deep learning has once again received widespread attention and application within various fields of computer vision [1114]. Some scholars have introduced deep learning into image and video reconstruction; the primary methods include the superimposed denoising autoencoder (SDA) and the convolutional neural network (CNN) [15]. A CNN is particularly attractive and more powerful, it is able to exploit the spatial correlation present in natural images, and each convolutional layer has many fewer learnable parameters than a fully connected layer.

In this paper, a planar waveguide structure is added to extend the field of view. A local multiple-sampling simulation mode for the SPIDER optical interferometric system is proposed. A novel image-reconstruction algorithm for the SPIDER is presented, which can directly learn a mapping from image-block proxies to image blocks. The multiple-sampling simulation mode inevitably produces artifacts between different fields of views, so a denoising method is introduced to reduce the blocky artifacts. A comparison of the traditional algorithm to the CNN algorithm for image quality confirms the superiority and feasibility of the presented reconstruction algorithm. This paper is organized as follows: Section 2 discusses the basic structure and imaging process of the SPIDER optical interferometric system, and describes the process of planar waveguides used to broaden the view of the SPIDER. Section 3 introduces a novel image-reconstruction framework based on a deep CNN. Section 4 shows simulated image-reconstruction results of SPIDER observations. The conclusion is given in section 5.

II. Structure and Theory

This section describes the structure of the SPIDER optical interferometric system and its imaging process, which includes a discussion of the design of the increased planar waveguides. Moreover, the system design of this study is described.

2.1. The Structure

As shown in Fig. 1, the structure of the SPIDER system puts a linear array of lenslets onto a PIC card, with the PIC cards mounted as radial spokes on a disc [16]. The critical part of the SPIDER is the PIC, as shown in Fig. 2. The PIC integrates various optical waveguide devices, including arrayed waveguide gratings (AWGs), optical phase shifters, multi-mode interferometers (MMIs), and balanced detectors, to realize optical transmission, interference, light separation, photoelectric conversion, and other functions. AWGs are used to disperse broadband light from pairs of lenslets into different spectral channels, followed by 2 × 2 MMIs which combine light from corresponding input optical waveguide arrays after phase adjustment [17]. The balanced detectors receive interferometric information, and the measured complex visibility for each spatial frequency is calculated from the interferometric information.

Figure 1. The structure of the SPIDER system.

Figure 2. The working principle of a PIC.

The waveguide plane consists of multiple waveguides, integrated at the focal plane of the corresponding lenslet. Since light beams from different fields of view will converge at the focal plane of the lens, each single waveguide will guide the optical signal from its assigned field of view. The matching waveguides from the same location of two different focal planes can accomplish the goal of matching the light beams from corresponding field-of-view beams. Therefore, the more waveguides integrated on the focal plane, the larger the field of view obtained. In this paper, 6 × 6 optical waveguide arrays are designed behind each lenslet for extending the field of view, instead of a separate piece of optical waveguide, as shown in Fig. 3.

Figure 3. The schematic of the SPIDER’s operating process.

2.2. The Optical Imaging Process

The SPIDER operating process is shown in Fig. 3, the beam from the target couple into PIC by lenslet array. Light beam transmission, interferometry and imaging are operated in PIC.

First, the lenslet array converges the spatial light into the optical waveguide, which couples the beam into the PIC. The light from different fields of view pass through the lenslet and cover the different position of planar waveguide as shown in Fig. 3. A larger waveguide area is advantageous for receiving more beams. The 6 × 6 planar waveguide deployed in this study receives a more comprehensive field of view. Each piece on the planar waveguide receives light from different fields of view of the target. Since the lenslet array adopts a planar distribution, all lenslet orientations remain consistent. The scenes are consistent for all lenslet. Therefore, the beam accepted by the planar waveguide after a different lenslet is also the same. Observations of the same field of view require the matching of the waveguides at the same corresponding positions after the paired lenslet.

After each pieces of the planar waveguide, the AWG is used to divide the whole band into narrower bands for interference processes. As shown in Fig. 3, the narrow interference band divided by the same field-of-view waveguide block after a baseline pair is coupled into the same MMI. Complex coherence information for a spatial-frequency point can be obtained through the MMI and the balanced detectors. According to the Van Cittert-Zernike principle, the characteristic distribution of the target source can be obtained from the information of complex coherence. Different baseline lengths and different bands constitute different spatial-frequency points, and the same field of view is reconstructed from multiple spatial-frequency points by digital signal processing (DSP). Finally, multiple pieces of FOV information are stitched together to obtain the complete image information for the target.

Furthermore, the coupling efficiency of the planar waveguides is discussed. Coupling efficiency is defined as the ratio of the optical power of the coupled incoming optical waveguide to the average power of the focal plane, which describes the efficiency problem of a beam coupled to an optical waveguide. When the incident light wave can be regarded as a plane wave and the waveguide’s center coincides with the focus, without relative inclination or offset, the coupling efficiency can be expressed as

η=2[1exp(β2)]2β2

β=πRw0λf

where η is the coupling efficiency, β is coupling coefficient, R is the radius of the lenslet, w0 is the mode-field radius of a single-mode waveguide, λ is the wavelength, and f is the focal length of lenslet. According to Eq. (1), in the case with no aberration and no turbulence, the maximum coupling efficiency for a coupling-coefficient value of 1.12 is 81.45%.

Meanwhile, according to the relationship between the coupling efficiency and the field-of-view angle, the coupling efficiency of the system decreases rapidly as the field-of-view angle increases. To ensure the incidence efficiency of the light, the absolute angle between the target beam and the optical axis is limited to the range of 0.5 λ / d. Then the full field of view (FOV) is

FOV=λd

where d is the lenslet’s diameter. Deploying an M × M planar waveguide after each microlensing is equivalent to setting multiple response waveguides, and each part of the field of view is received independently by the waveguide block of the planar waveguide, so that the expanded field of view is

FOV=Mλd

2.3. Design of the Optical System

The parameters of the lenslet are correlated to the coupling efficiency. Lenslet F/# is determined by the following:

F/#=πw02λβ

According to a survey of visible-light near-infrared waveguides, the mode-field radius is generally about 3.5 μm. To maximize the coupling efficiency, the coupling coefficient is 1.12, and the F/# is about 6.54, as calculated from Eq. (5).

Therefore, the SPIDER system’s parameters presented in this paper are shown in Table 1.

TABLE 1. The parameters used for the simulations.

ParameterSymbolValue
Wavelength (nm)λ500–1000
Number of Spectral Segmentsn10
Lenslet Diameter (mm)d5
Longest Baseline (m)Bmax0.5
Number of PIC SpokesP37
Number of Lenslets per PIC SpokeN26
Scene Distance (km)z500
Lenslet Focal Length (mm)f32


The line field of view is

ε=MλZd

where Z is the observational distance. The maximum resolution of the system is obtained at the maximum baseline length and minimum working wavelength, which can be represented as

Rmax=λminZBmax

where λmin is the minimum working wavelength and Bmax is the maximum baseline length. Therefore, when a planar waveguide of dimension 6 × 6 is configured, the line field of view of the system can reach 450 m and the maximum resolution is 0.75 m.

Spatial-frequency coverage is affected by the lenslet pairing method, which has a direct influence on the image quality. The pairing method of the lenslets can be described as follows. Supposing each interferometric arm consists of N lenslets, the lenslet pairing follows a symmetrical scheme: (1, N), (2, N − 1), (3, N − 2) … (N / 2, N / 2 + 1), as shown in Fig. 4.

Figure 4. The lenslet pairing method for a PIC.

III. METHODS

This section presents the overall simulation of the imaging process of the SPIDER and the image-reconstruction method based on deep learning for the SPIDER optical interferometric system suggested in this paper. For the image-reconstruction algorithm, the design and training process of the reconstruction model are analyzed in detail.

3.1. Imaging Process Simulation

According to the design of the 6 × 6 optical waveguide array behind each lenslet, a local multiple-sampling simulation mode is used for this imaging system. The scene is divided into nonoverlapped narrow scene blocks using 6 × 6 optical waveguide arrays. However, the simulation cannot realize the simultaneous sampling of multiple narrow scenes and the PIC’s internal processes. The planar waveguides can be considered to be composed of multiple waveguides that sample different fields of view. Therefore, the simulation enables the system to sample the different local scenes of the target and process their frequency-domain information separately, and the simulation of the 6 × 6 planar waveguide is achieved through multiple sampling. After the process of sampling and obtaining the images of the local scenes, each narrow scene image is reconstructed by feeding in the pretrained reconstruction model. The reconstructed blocks of all of the local narrow scenes are arranged to form an intermediate reconstruction image of the target. The blocky artifacts of the intermediate reconstruction image are removed by the denoising algorithm to obtain the final target image.

The sampling and imaging process of the local fields of view is described in detail, as shown in Fig. 5. According to the Van Cittert-Zernike principle, the two-dimensional intensity distribution of the target can be obtained by inverse Fourier transform to the phase and amplitude of target spatial spectral points. The imaging process can be described by the following:

Figure 5. The sampling and imaging process.

f*x,y=F1{F[f(x,y)]Sample(u,v)}

where f *(x, y) is the image obtained directly by the reverse Fourier transformation of the spectrum image after sampling, f (x, y) is the local field of view of the target, F and F−1 are respectively the process of Fourier transformation and reverse Fourier transformation, and Sample(u, v) is the frequency coverage of the interference arrays. Therefore, the frequency coverage of the interference array masks the spectrum image obtaining by the Fourier transform of the local field of view, and then the reverse Fourier transform is used for the imaging of target spatial spectral points. The frequency coverage of the interference arrays is related to the baseline and wavelength; since the system is divided into ten spectral segments, frequency coverage of the interference arrays of the same baseline consists of a superposition of the central wavelengths for each spectral segment. Meanwhile, the overall spectrum coverage is composed of all of the baselines. Moreover, the masking process sets the value of sample location to target spatial spectral and the value of other location set to 0.

The quality of the local fields of view of the target for direct imaging is poor, due to incomplete sampling [18]. Therefore, images from direct imaging are input into the pretrained reconstruction model to reconstruct images. The intermediate reconstruction image is acquired by appropriately arranging the reconstruction-image blocks of all the narrow scenes.

Due to the multiple-sampling processing, blocky artifacts exist in the intermediate reconstruction image. The block-matching and 3D filtering (BM3D) algorithm is chosen in the reconstruction framework to remove the artifacts and obtain the final reconstruction image, due to the superior compromise of the BM3D algorithm between time complexity and reconstruction quality [19].

3.2. Image-reconstruction Algorithm

3.2.1. Designed Reconstruction Model

Here we describe the architecture of the deep CNN, and by comparing the numbers of different channels and layers, the optimal model structure is selected. The architecture of the CNN can be described as follows. A structure of a fully convolutional layer is employed; apart from the final convolutional layer, all of the other layers adopt the ReLU function (discussed later). The first convolutional layer is used to expand the channels, and the latter convolution layers gradually reduce the number of channels. Each feature map produced by the convolutional layers is equal in image block size. The last convolutional layer uses the kernel of size 3 × 3 and generates a single feature map, which in the case of the last layer is also the output of the network. We used appropriate zero padding to keep the feature-map size constant in all layers.

The number of layers of the model has a large influence on reconstructed-image quality. For light models the training speed is faster, and each epoch takes less time. Different models show different loss at convergence, and have corresponding effects on the quality of model training. Therefore, the model selection needs to be considered in terms of the convergence rate as well as the loss at convergence. Based on the same model structure, we discuss models extending to different channel numbers. When the number of channels increases, the number of layers increases accordingly. We compare the convergence of models extending to 64, 128, and 256 channels with a partial training set. The convergence process for the different models is shown in Fig. 6.

Figure 6. Loss variation for models with different numbers of layers.

Compared to the other models, model 1 with 64 channels converges faster. The final convergence loss of model 1 is poor, as shown in Table 2, and the subsequent epoch loss changes less. Too many layers can cause slow and unstable convergence. For model 3 with 256 channels, each epoch takes a longer time and converges more slowly. The model with 128 channels (model 2) is optimized compared to the others. Model 2 converges better than the smaller model and faster than the larger model. Meanwhile, the loss of model 2 at convergence is optimal. Therefore, model 2 is selected as the reconstruction model, and the reconstruction process is shown in Fig. 7.

TABLE 2. Comparison of different models.

ModelMaximum ChannelsNumber of LayersEpoch of ConvergenceTime in Total (min)Loss at Convergence
Model 16487869.20.036
Model 21281084102.90.026
Model 325612125163.20.038


Figure 7. The image-reconstruction framework.

As shown in Fig. 7, the input to the network is an image block proxy x0. Then the kth feature map in the first convolutional layer receives xo as its input, and its output can be represented as

xc1k=ReLU((W1kx0)+B1k)

where W1k and B1k are the filter and bias values corresponding to the kth feature map of the first convolutional layer, and ReLU(x) = max(0, x) [20]. The feature maps of the other convolutional layers are obtained in a similar manner, except for the final convolutional layer. Although the filter shapes and biases could be different in other layers of the network, the principles for these layers are the same as for the first layer.

3.2.2. Training Reconstruction Model

In this section, we introduce the training process of the image-reconstruction model based on deep learning. The network architecture used is shown in Fig. 7.

Images for training the reconstruction model were the iSAID dataset, consisting of 1411 remote-sensing images [21]. Each image is of a size from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. The images are collected from different sensors and platforms, including Google Earth, GF-2, and JL-1 satellites. The GSD of GF-2 is 1 m, the wavelength range is 500800 nm, and the orbit altitude is 656 km. The GSD of JL-1 is 0.72 m, the wavelength range is 450900 nm, and the orbit altitude is 645 km. The images from Google Earth are collected from different platforms; the GSD is from 0.0964.496 m (mainly distributed within 0.10.2 m), and the wavelength range is the visible-light band. Therefore, according to the dataset source, our system parameters are as close as possible to the parameters of the corresponding satellite. We retained only the luminance component of the images. To facilitate the reconstruction of the training process, images were converted to grayscale. The conversion of the color images is actually the conversion of RGB and grayscale values, according to the formula

Grey=0.299R+0.587G+0.114B

where Grey is the grayscale value and R, G, and B represent the three channel values of the color plot.

To train the reconstruction model, the training set is obtained from multiple local sampling and imaging of the target. According to the 6 × 6 planar waveguide structure, 36 samples are needed to sample the different positions of the target; each sampling goes through the system mask and frequency-domain conversion process. The target images are divided into image blocks with same size of training set to form the label of training set. Thus, an input-label pair in the training set can be represented as (xo , x).

This section presents the loss function for training the reconstruction model. The loss function evaluates the difference between the ground-truth images and the prediction images generated by the deep-learning network. The training of the reconstruction model is driven by the error between the label and the reconstructed image. The mean square error (MSE) is chosen as the loss function in this paper. The loss function can be represented as

L({Ω})=1Ki Kf(y i,{Ω})x i2

where {Ω} indicates back propagation, K is the total number of image blocks in the training set, xi is the ith patch, and f (yi, {Ω}) is the network output for the ith patch. The loss function is converged using stochastic gradient descent with standard back propagation. For gradient descent, the batch size is set to 512 for all networks, and the learning rate is set to 0.0001.

IV. RESULTS

In this section, we discuss the result of image reconstruction simulation on the SPIDER system. Moreover, the improved CNN is compared to the traditional image-reconstruction algorithm, showing that the improved CNN is more suitable for image reconstruction in this imaging system.

The peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) are used as the evaluation parameters of image reconstruction, while estimating the performance of the reconstruction framework in terms of time complexity. For a test image Y and its original image X, the PSNR and SSIM metrics are defined as [22]:

PSNR=10log10MAX2MSE

SSIM=(2μXμY+C1)(2σXY+C2)(μX2+μY2+C1)(σX2+σY2+C2)

where MAX is the maximum possible pixel value of the image, μX and μY are the averages of X and Y, σX2 and σY2 are the variances of X and Y, σXY is the covariance of X and Y, C1 and C2 are two variables used to stabilize division involving a weak denominator. The larger the PSNR, the better the image quality acquired. The larger the SSIM, the better the image quality acquired.

The computer used in this simulated experiment is an x64-compatible desktop computer; the operating system of the computer is Windows 10 64-bit, the computer processor is Xeon Silver 4210R, the memory of the computer is 64 GB, and the graphics card is NVIDIA 1080Ti 11 GB.

For our simulated experiments, we choose a total of 137 grayscale graphs containing the 6 scenarios from the iSAID dataset as the test set. Different scenes include ports, airports, transportation hubs, open areas, parking lots, and residential areas, each of which contains more than 20 images. The test-set images are not included in the training set. The background area, the number of objects, and the size of each object are different in different scenes. The reconstruction of different scenarios is used to illustrate the suitability of the reconstruction model for the imaging system. To further verify the feasibility of our algorithm, it is also compared to the traditional reconstruction algorithm TVAL3 and the CNN-based ReconNet network, in terms of reconstructed-image quality and running speed respectively. For the ReconNet network and our improved CNN algorithm as mentioned above, we use model simulations to generate measurement images, to sample and block the training set images according to the proposed optical waveguide array, to generate the reconstruction model based on the training set, and to reconstruct the test images using the corresponding model. The testing images are sampled and chunked in the TVAL3 algorithm, each image block is reconstructed, and the reconstructed image blocks are combined into intermediate images.

For the intermediate images from the algorithm based on the learning method we use the BM3D denoiser to eliminate interblock artifacts and obtain the final reconstructed image. Figure 8 shows the reconstruction effect of the various algorithms on the partial images from different scenarios in the test set. The reconstruction effect of the middle image is also shown. The calculated PSNRs and SSIMs for different scenarios by different reconstruction algorithms are listed in Table 3. The comparison of average times for the different algorithms is presented in Table 4.

TABLE 3. Average imaging quality of the test set under different algorithms.

ScenariosAlgorithmPSNRSSIM
PortImproved17.65670.6471
BM3D17.23180.6018
ReconNet16.48190.6010
TVAL314.63880.3674
AirportImproved15.86980.6478
BM3D15.68130.6198
ReconNet14.11340.5093
TVAL310.83880.3039
Traffic HubImproved15.81110.5743
BM3D15.34840.4787
ReconNet15.43250.4493
TVAL313.88400.2005
Open SpaceImproved17.74460.5706
BM3D17.41900.5125
ReconNet17.40090.5130
TVAL316.02460.3085
DepotImproved16.37300.5218
BM3D15.87240.4556
ReconNet15.48240.4005
TVAL314.39060.2533
UptownImproved16.46820.6043
BM3D16.10810.5283
ReconNet15.87150.5142
TVAL313.85520.2096


TABLE 4. Average running times of different algorithms.

AlgorithmTime
Training (min)Reconstruction (s)Total (min)
Improved15780.19391578.003
ReconNet12160.13431216.002
TVAL3025.240050.337


Figure 8. Restored images from different algorithms under different scenarios: (a) the original image, (b) improved convolutional neural network (CNN), (c) image after BM3D, (d) ReconNet, and (e) total variational algorithm (TVAL3).

Image reconstruction was carried out using the novel reconstruction algorithm. According to Table 3, the improved algorithm is better than other algorithm in terms of imaging quality. The reconstructed-image quality varied for the different scenarios. For a detail-rich scene, such as uptown, deep-learning-based algorithms have significant advantages, as shown in Table 3. As can be seen from Fig. 8, the images restored using the learning algorithm have a better noise-removal ability compare to traditional image-reconstruction algorithm. Meanwhile, the improved CNN reconstruction algorithm further improves the reconstructed-image quality, compared to ReconNet. In addition, the BM3D algorithm smoothes the image as a whole, resulting in a decrease in the image quality index. The image quality is still better than that for the ReconNet after BM3D smoothing, as shown in Table 3. Meanwhile, the improved CNN performs well for different scenarios. This shows that the reconstruction algorithm is more suitable for the system proposed above.

Comparing the overall reconstruction time, the learning-based algorithm makes the reconstruction time longer than for the traditional algorithm, due to the longer time of the training process, as shown in Table 4. However, the models trained by learning-based algorithms may be adapted for various scenarios, without the need for retraining. The time for reconstruction processes alone, with learning-based algorithms, is far less than that needed by traditional algorithms. Therefore, for more images-reconstruction tasks, learning-based methods are relatively time-saving. In the actual observation process, usually one needs to process a large amount of data, and the computing time of the neural-network algorithm will be greatly improved, compared to the traditional algorithm. For the improved CNN algorithm, the corresponding time consumption has increased, due to the more complex network structure used to improve the quality of the reconstructed image, compared to ReconNet.

V. CONCLUSION

In this study, a 6 × 6 optical waveguide array was used following each lenslet to expand the imaging field of view. A novel algorithm based on deep learning for the blockwise processing of SPIDER image reconstruction was proposed. The BM3D algorithm was applied to the blockwise processing to denoise intermediate reconstruction images. Based on deep-learning theory, the image-reconstruction framework for the SPIDER system was established. Simulated results show that the proposed reconstruction algorithm in this paper has achieved a significant increase in quality for reconstructing images under different scenarios. This shows that the reconstruction algorithm is suitable to the system proposed above. Based on the characteristics of extensive applicability, learning-based algorithms obtain high speed when processing large amounts of data.

Although the quality of image reconstruction for the SPIDER system is dramatically increased based on the results of this paper, there are still shortcomings for optical interferometric imaging, such as blocky artifacts and degraded image quality. In the future, the image-reconstruction algorithm based on deep learning, which can increase the reconstruction efficiency, will continue to be researched.

DISCLOSURES

The authors declare no conflicts of interest.

DATA AVILABILITY

Data underlying the results presented in this paper are not publicly available at the time of publication, which may be obtained from the authors upon reasonable request.

ACKNOWLEDGMENT

The authors would like to thank the Editor in Chief, the Associate Editor, and the reviewers for their insightful comments and suggestions.

FUNDING

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Fig 1.

Figure 1.The structure of the SPIDER system.
Current Optics and Photonics 2022; 6: 260-269https://doi.org/10.3807/COPP.2022.6.3.260

Fig 2.

Figure 2.The working principle of a PIC.
Current Optics and Photonics 2022; 6: 260-269https://doi.org/10.3807/COPP.2022.6.3.260

Fig 3.

Figure 3.The schematic of the SPIDER’s operating process.
Current Optics and Photonics 2022; 6: 260-269https://doi.org/10.3807/COPP.2022.6.3.260

Fig 4.

Figure 4.The lenslet pairing method for a PIC.
Current Optics and Photonics 2022; 6: 260-269https://doi.org/10.3807/COPP.2022.6.3.260

Fig 5.

Figure 5.The sampling and imaging process.
Current Optics and Photonics 2022; 6: 260-269https://doi.org/10.3807/COPP.2022.6.3.260

Fig 6.

Figure 6.Loss variation for models with different numbers of layers.
Current Optics and Photonics 2022; 6: 260-269https://doi.org/10.3807/COPP.2022.6.3.260

Fig 7.

Figure 7.The image-reconstruction framework.
Current Optics and Photonics 2022; 6: 260-269https://doi.org/10.3807/COPP.2022.6.3.260

Fig 8.

Figure 8.Restored images from different algorithms under different scenarios: (a) the original image, (b) improved convolutional neural network (CNN), (c) image after BM3D, (d) ReconNet, and (e) total variational algorithm (TVAL3).
Current Optics and Photonics 2022; 6: 260-269https://doi.org/10.3807/COPP.2022.6.3.260

TABLE 1 The parameters used for the simulations

ParameterSymbolValue
Wavelength (nm)λ500–1000
Number of Spectral Segmentsn10
Lenslet Diameter (mm)d5
Longest Baseline (m)Bmax0.5
Number of PIC SpokesP37
Number of Lenslets per PIC SpokeN26
Scene Distance (km)z500
Lenslet Focal Length (mm)f32

TABLE 2 Comparison of different models

ModelMaximum ChannelsNumber of LayersEpoch of ConvergenceTime in Total (min)Loss at Convergence
Model 16487869.20.036
Model 21281084102.90.026
Model 325612125163.20.038

TABLE 3 Average imaging quality of the test set under different algorithms

ScenariosAlgorithmPSNRSSIM
PortImproved17.65670.6471
BM3D17.23180.6018
ReconNet16.48190.6010
TVAL314.63880.3674
AirportImproved15.86980.6478
BM3D15.68130.6198
ReconNet14.11340.5093
TVAL310.83880.3039
Traffic HubImproved15.81110.5743
BM3D15.34840.4787
ReconNet15.43250.4493
TVAL313.88400.2005
Open SpaceImproved17.74460.5706
BM3D17.41900.5125
ReconNet17.40090.5130
TVAL316.02460.3085
DepotImproved16.37300.5218
BM3D15.87240.4556
ReconNet15.48240.4005
TVAL314.39060.2533
UptownImproved16.46820.6043
BM3D16.10810.5283
ReconNet15.87150.5142
TVAL313.85520.2096

TABLE 4 Average running times of different algorithms

AlgorithmTime
Training (min)Reconstruction (s)Total (min)
Improved15780.19391578.003
ReconNet12160.13431216.002
TVAL3025.240050.337

References

  1. W. Hasbi, Kamirul, M. Mukhayadi, and U. Renner, “The impact of space-based AIS antenna orientation on in-orbit AIS detection performance,” Appl. Sci. 9, 3319 (2019).
    CrossRef
  2. C. Saunders, D. Lobb, M. Sweeting, and Y. Gao, “Building large telescopes in orbit using small satellites,” Acta Astronaut. 141, 183-195 (2017).
    CrossRef
  3. R. P. Scott, T. Su, C. Ogden, S. T. Thurman, R. L. Kendrick, A. Duncan, R. Yu, and S. J. B. Yoo, “Demonstration of a photonic integrated circuit for multi-baseline interferometric imaging,” in Proc. IEEE Photonics Conference (San Diego, CA, USA, Oct. 12-16, 2014), pp. 1-2.
    CrossRef
  4. G.-M. Lv, Q. Li, Y.-T. Chen, H.-J. Feng, and J. Mu, “An improved scheme and numerical simulation of segmented planar imaging detector for electro-optical reconnaissance,” Opt. Rev. 26, 664-675 (2019).
    CrossRef
  5. W. Gao, Y. Yuan, X. Wang, L. Ma, Z. Zhao, and H. Yuan. “Quantitative analysis and optimization design of segmented planar integrated optical imaging system based on inhomogeneous multistage sampling lens array,” Opt. Express 29, 11869-11884 (2021).
    Pubmed CrossRef
  6. H. Hu, C. Liu, Y. Zhang, Q. Feng, and S. Liu, “Optimal design of segmented planar imaging for dense azimuthal sampling lens array,” Opt. Express 29, 24300-24314 (2021).
    Pubmed CrossRef
  7. O. Guyon, “Wide field interferometric imaging with single-mode fibers,” Astron. Astrophys. 387, 366-378 (2002).
    CrossRef
  8. C. Li, W. Yin, H. Jiang, and Y. Zhang, “An efficient augmented Lagrangian method with applications to total variation minimization,” Comput. Optim. Appl. 56, 507-530 (2013).
    CrossRef
  9. L. Pratley, J. D. McEwen, M. d’Avezac, R. E. Carrillo, A. Onose, and Y. Wiaux, “Robust sparse image reconstruction of radio interferometric observations with PURIFY,” Mon. Not. R. Astron. Soc. 473, 1038-1058 (2018).
    CrossRef
  10. C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denoising to compressed sensing,” IEEE Trans. Inform. Theory 62, 5117-5144 (2014).
    CrossRef
  11. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition-CVPR (Boston, MA, USA, Jun. 8-10, 2015), pp. 3431-3440.
    CrossRef
  12. J. Xue, Y.-Q. Zhao, Y. Bu, W. Liao, J. C.-W. Chan, and W. Philips, “Spatial-spectral structured sparse low-rank representation for hyperspectral image super-resolution,” IEEE Trans. Image Process 30, 3084-3097 (2021).
    Pubmed CrossRef
  13. D. Chang, Y. Ding, J. Xie, A. K. Bhunia, X. Li, Z. Ma, M. Wu, J. Guo, and Y. Z. Song, “The devil is in the channels: mutual-channel loss for fine-grained image classification,” IEEE Trans. Image Process 29, 4683-4695 (2020).
    Pubmed CrossRef
  14. Z. Ren, W. Luo, J. Yan, W. Liao, X. Yang, A. Yuille, and H. Zha, “STFlow: self-taught optical flow estimation using pseudo labels,” IEEE Trans. Image Process 29, 9113-9124 (2020).
    Pubmed CrossRef
  15. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “ReconNet: non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition-CVPR (Las Vegas, USA, Jun. 26-Jul. 1, 2016), pp. 449-458.
    KoreaMed CrossRef
  16. T. Su, G. Liu, K. E. Badham, S. T. Thurman, R. L. Kendrick, A. Duncan, D. Wuchenich, C. Ogden, G. Chriqui, S. Feng, J. Chun, and S. J. B. Yoo, “Interferometric imaging using Si3N4 photonic integrated circuits for a SPIDER imager,” Opt. Express 26, 12801-12812 (2018).
    Pubmed CrossRef
  17. Z. Leihong, Y. Xiao, Z. Dawei, and C. Jian, “Research on multiple-image encryption scheme based on Fourier transform and ghost imaging algorithm,” Curr. Opt. Photonics 2, 315-323 (2018).
  18. Y. Zhang, J. Deng, G. Liu, J. Fei, and H. Yang, “Simultaneous estimation of spatial frequency and phase based on an improved component cross-correlation algorithm for structured illumination microscopy,” Curr. Opt. Photonics 4, 317-325 (2020).
  19. K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process. 16, 2080-2095 (2007).
    Pubmed CrossRef
  20. X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proc. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (Fort Lauderdale, FL, USA, April. 11-13, 2011), pp. 315-323.
  21. S. W. Zamir, A. Arora, A. Gupta, S. Khan, G. Sun, F. S. Khan, F. Zhu, L. Shao, G.-S. Xia, and X. Bai, “iSAID: a large-scale dataset for instance segmentation in aerial images,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (Virtual Conference, Jun. 19-25, 2019), pp. 28-37.
  22. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13, 600-612 (2004).
    Pubmed CrossRef