검색
검색 팝업 닫기

Ex) Article Title, Author, Keywords

Article

Split Viewer

Research Paper

Curr. Opt. Photon. 2024; 8(4): 391-398

Published online August 25, 2024 https://doi.org/10.3807/COPP.2024.8.4.391

Copyright © Optical Society of Korea.

Image Reconstruction Method for Photonic Integrated Interferometric Imaging Based on Deep Learning

Qianchen Xu1, Weijie Chang2, Feng Huang2 , Wang Zhang1

1School of Mechanical and Aerospace Engineering, Jilin University, Changchun 130025, China
2College of Mechanical Engineering and Automation, Fuzhou University, Fuzhou 350108, China

Corresponding author: *huangf@fzu.edu.cn, ORCID 0000-0003-4652-4312
**wangzhang@jlu.edu.cn, ORCID 0000-0001-9029-1320

Received: April 4, 2024; Revised: July 15, 2024; Accepted: July 22, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

An image reconstruction algorithm is vital for the image quality of a photonic integrated interferometric imaging (PIII) system. However, image reconstruction algorithms have limitations that always lead to degraded image reconstruction. In this paper, a novel image reconstruction algorithm based on deep learning is proposed. Firstly, the principle of optical signal transmission through the PIII system is investigated. A dataset suitable for image reconstruction of the PIII system is constructed. Key aspects such as model and loss functions are compared and constructed to solve the problem of image blurring and noise influence. By comparing it with other algorithms, the proposed algorithm is verified to have good reconstruction results not only qualitatively but also quantitatively.

Keywords: Deep learning, Image reconstruction, Optical imaging, Optical interferometry, Photonic integrated circuits

OCIS codes: (100.3010) Image reconstruction techniques; (100.3020) Image reconstruction-restoration; (110.3175) Interferometric imaging; (150.1135) Algorithms

A photonic integrated circuit (PIC)-based interferometric imaging system has been developed by the Lockheed Martin Center and UC Davis for a new generation of optical imaging systems [1]. The photonic integrated interferometric imaging (PIII) system has many outstanding features, such as light weight, small size, high resolution, and low power consumption [1, 2]. In contrast with traditional optical imaging systems, the PIII system incorporates photonic integrated devices. By modulating and processing the optical signals through a PIC card, the size of the optical system can be greatly reduced, and the system can be highly integrated [2].

Sampling coverage is incomplete due to the structure of the PIII system, and the image directly restored by the system has some problems, such as imaging blurring and noise. Some classical algorithms are applied in the field of image reconstruction, such as the clean algorithm (Högbom CLEAN), and the total variational algorithm based on compressed sensing (TVAL3) [3, 4]. Although traditional image reconstruction algorithms can improve image quality, they need many iterations to reconstruct images, which leads to a long reconstruction time. The traditional algorithms still have limitations. With rapid developments in the field of computer science, image processing algorithms such as super-resolution and denoising based on deep learning have shown strong advantages. In this paper, a novel image-reconstruction algorithm for the PIII system is proposed. The algorithm proposes a novel architecture that balances the restoration of image details and denoising. We design a loss function to ensure the smoothness of the reconstructed images. A comparison of image quality between the traditional algorithm and the proposed algorithm confirms the superiority of the reconstruction algorithm. The structure of the PIII system is shown in Fig. 1.

Figure 1.The structure of the PIII system.

2.1. Imaging Principle

The imaging principle of the PIII system is based on the Van Cittert–Zernike theorem. The PIII system uses sub-aperture interferometry of light emitted from incoherent target sources. The measured interference fringes can be used to extract the amplitude and phase information of the complex visibility [5]. Due to the constraints imposed by the arrangement of lenslets in the structure, a sparse sampling of spatial frequency will be acquired. However, the image restored in this way has defects of low resolution and missing content [6].

Spatial frequency has a linear relationship with the baselines [57].

u=Δxzλ¯,
v=Δyzλ¯,
f=Bzλ¯,
B=Δx2+Δy2,
f=u2+v2,

where Δx and Δy represent the distance between apertures. λ̅ represents wavelengths, and z represents the distance from the target to the lenslet. B represents the length of the baseline. f represents spatial frequency.

The distance between the paired lenslets equals the length of the baseline B, which determines the resolution of the system. The longer the baseline B, the higher the spatial frequency f will be collected. Moreover, detailed information about the target can be obtained. The paired lenslets used to form short baselines will collect lower spatial frequencies, which can obtain the overall contour information of the target [8, 9].

2.2. Functions and Features of the System

Figure 2 shows the function and pairing method of the PIII system. The lenslets located at the top of the system collects light from the observed scene. A pair of lenslets on the same interference arm constitutes an interference baseline. Then, the light signal is coupled into the optical waveguide. The waveguide array collects light from a different field of view. A silicon-based photonic integrated circuit (PIC) chip is used for signal transmission [9]. The PIC integrates different optical devices, including an arrayed waveguide grating (AWG), phase modulator, and 90° optical hybrid. The AWG decomposes a wide spectral band to output multiple narrow spectral bands. The phase modulator adjusts the phase of the light signals in the two input waveguides to ensure the coherence of the beam [10]. A 90° hybrid mixer combines the two input light signals for interference.

Figure 2.Schematic diagram of initial SPIDER imaging system.

The balanced detector denoises and differentially amplifies the IQ signals input from the 90° hybrid mixer. Then, the complex coherence coefficient of the interference fringes is calculated. According to the Van Cittert–Zernike theory, the mutual coherent intensity is equivalent to the complex amplitude of the interference fringe. By applying inverse Fourier transform to the complex amplitude, the brightness distribution of the observed target can be restored [1115]. This is the theoretical basis of the PIII system.

2.3. Design of the PIII System

The wavelength range will affect the imaging resolution of the PIII system. The relationship between resolution, wavelength, and the maximum baseline of the system can be represented as

r=λzB,

where r represents the resolution of the system, z is the observation distance, and λ is wavelength range.

To simulate the PIII system, necessary parameters need to be configured. Table 1 lists the parameters required for the imaging simulation of the PIII system.

TABLE 1 System parameters for simulation

ParametersValue
Wavelength Range (nm)800–1,600
Number of Spectral Segments8
Observation Distance (km)100
Lenslets Size (mm)6
Number of Interference Arms35
Number of Single-arm Lenslets38


Images with strong contrast can be generated in the short-wave infrared band. The band is suitable for long-distance observation and has strong penetration power, which is consistent with the use scenario of the PIII system. Therefore, the wavelength range should be set within the short-wave infrared band. For components in a silicon-based photonic chip, the operating loss is lower around the center wavelength of 1,550 nm, which enhances the system’s imaging performance. Considering the practical operating conditions of the system and the optimal operating wavelength of silicon photonic devices, the wavelength range is ultimately defined as 800–1,600 nm [10].

Based on the previous analysis of the PIII system, the lenslets affect the acquisition of spatial frequency information, that is, u-v coverage.

This paper provides a simulation of the imaging process of the PIII system and a novel image reconstruction method based on deep learning for the PIII system. The method of direct image restoration, the design of the model and the training process are performed and analyzed in detail.

3.1. Imaging Simulation

The u-v coverage describes the acquisition of spatial frequency information by the integrated interference system. In the formed u-v coverage, low-frequency points are distributed in the inner region. The frequency gradually increases from the center outward. The system collects light signals into planar waveguides through lenslets. In interferometry theory, two paired apertures will produce interference fringes in the image plane. The system processes the optical signals and calculates the amplitude and phase information extracted from the interference fringes. Each point in the u-v coverage collected by paired lenslets contains both amplitude and phase information. The image can be restored by reverse Fourier transform under limited u-v coverage.

The imaging process can be described as

lx,y=F1Ffx,y·Su,v,

where l (x, y) is the image directly restored by reverse Fourier transformation. F and F −1 represent the process of Fourier transformation and reverse Fourier transformation, respectively. f (x, y) is the field of view captured by the system. S (u, v) represents the u-v coverage associated with the lenslets array. The u-v coverage of the lenslets array can be considered as a mask. The u-v coverage masks the spectrum formed by the Fourier transform of the observation field. Then, the restored image is obtained by applying an inverse Fourier transform to the retained spectral information. The AWG decomposes the input wide-spectrum optical signal into multiple narrow-spectrum signals. The number of spectral segments is equal to the number of output channels of the AWG. The central wavelengths of each spectral segment are used to calculate the frequency coverage.

In the u-v coverage, each point reflects the complex coherence information of the original observed target. The PIII system relies on lenslets to collect the light. Compared to a single large-aperture optical telescope, the PIII system has limited capacity for collecting light signals. Due to the incomplete sampling of spatial frequency information, the quality of the directly restored image is poor. Additionally, noise interference during imaging further degrades the image clarity and resolution. Therefore, the directly restored blurry images need to be processed by using an image reconstruction method to obtain high-quality images. The directly restored images are input into a pretrained model to reconstruct images.

3.2. Proposed Model

The architecture of the CNN proved to be applicable in the field of image reconstruction, but the structure of a fully convolutional layer only carries out some feature learning and cannot solve problems such as noise and image smoothing. Although the use of a large convolution kernel can capture more information, it will greatly increase the computational burden. In the pursuit of a lightweight model, many papers show that a small convolution kernel can be used for multilayer training instead of a large convolution kernel [16]. Using a residual network structure is highly modular, with repeated building blocks stacked together. By enabling the training of deeper networks, ResNet can significantly increase the representational power of the architecture [17]. Due to incomplete sampling by lenslets array and processing by PIC cards, the reconstructed image is influenced by optical components and environmental factors.

The new architecture can be divided into three parts: coarse-level network, finer-level network and fusion learning network.

3.2.1. Coarse-level Network

The structure of convolutional layers is employed in a coarse-level network. The first convolutional layer uses a 7 × 7 filter. A large kernel provides a larger receptive field, enabling the network to capture a broader context. The second layer uses a filter size of 5 × 5 to further learn complex features of images.

3.2.2. Finer-level Network

The finer-level network consists of two parts. At the start of this stage, coarse information begins to become more refined. The residual network structure is modified to fulfill the lightweight requirements of the algorithm [18]. The dropout layer can lead to better generalization and robustness of the network. Batch normalization layers are removed to enhance training speed. The modified residual block is shown in Fig. 3.

Figure 3.Modified residual block (MRB).

Following the residual network, an attention mechanism network is designed to extract more detailed information and remove certain noise. Block matching and 3D filtering (BM3D) have exceptional denoising capabilities and preserve fine details and edges. However, BM3D is computationally intensive, which makes it difficult for real-time applications [19]. A significant advantage of the PIII system is real-time and high-resolution imaging. The newly developed attentional mechanism architecture fulfills the system requirements. The architecture is shown in Fig. 4.

Figure 4.Multi-scale convolutional attention module (MCA).

By applying a 1 × 1 convolutional filter, the network can reduce the number of output channels. A max-pooling layer is used to reduce the spatial dimensions, suppressing the less informative or noisy responses. Multiscale feature learning is applied to the extracted features. The features are sequentially subjected to upsampling and channel restoration. For the initial features, a connection can be established using a similar residual network, enabling the capture of more comprehensive features and the suppression of certain noise levels. In the finer network, the features of the original image are learned.

3.2.3. Fusion Learning Network

Features obtained from MRB and MCA in the finer feature learning network can be fused appropriately to preserve as many features as possible while reducing the impact of noise. Further learning is performed through multiple convolutional layers with a 3 × 3 filter size while gradually reducing the number of channels. The final two convolutional layers are designed to concurrently learn features and approximate a filtering operation.

3.3. Training

The comprehensive network architecture is shown in Fig. 5. Earth observation is applicable to the functionalities of this system. The proposed model is trained on the iSAID dataset [20]. By simulating undersampling operations, the original images are transformed into blurred undersampled representations. A total of 212 pairs were used for training. Due to device requirements, the images are segmented into 160 × 160 small patches. The training dataset ultimately comprises 63,716 pairs. Color images are converted to grayscale to facilitate training and image reconstruction. The learning rate is adaptively tuned beginning from 1 × 10−4. Due to device requirements, the images are segmented into 160 × 160 small patches [21].

Figure 5.Comprehensive network architecture. Ui denotes a blurry image, Ri denotes a reconstructed image, and Ti denotes an original image.

MSE provides a clear measure of the intensity differences between corresponding pixels in the original and processed images. However, one significant limitation of MSE is its sensitivity to absolute pixel intensity differences, which can result in a lack of correlation with perceived visual quality. Unlike MSE, L1 loss tends to preserve edges and fine details. L1 loss provides robustness against outliers or noise. The loss function is defined as

ltotal=λlL1+1λlMSE,
lL1=1k i=1 kfxiγi,
lMSE=1k i=1 kfxiγi2.

Here f (xi) is the output of the network, and γi is the original image label.

The L1 loss function is robust against outliers, but it may not be as sensitive as MSE in reducing high-frequency errors. MSE can provide smoother results to reconstruct detailed features. L1 loss contributes 10%, which appropriately enhances the model’s robustness.

3.4. Evaluation Index

To accurately evaluate the quality of restored images, quantitative analysis indicators are introduced.

The peak signal-to-noise ratio (PSNR) is an index for evaluating image quality based on pixel errors. The higher the PSNR is, the better the imaging quality is. Mean square error (MSE) based on image pixel statistics is used to evaluate image quality. peak represents the maximum brightness value. PSNR doesn’t comprehensively consider human visual characteristics. Thus, it can only be regarded as a rough estimation and cannot fully reflect the perception of image quality.

PSNR=10log10peak2MSE.

The structural similarity index (SSIM) is an index based on the human visual system. SSIM focuses on the perception of images and complies with human visual characteristics. It considers information such as image structure, color, brightness, etc., and can comprehensively reflect the quality of images. If the test image is denoted as I, and the original reference image as G, the definition is as follows:

SSIM=2μIμG+c12γIG+c2μI2+μG2+c1γI2+γG2+c2,

where μI and μG are the average values of I and G, γI2 and γG2 denotes the variances of I and G, respectively. γIG denotes the covariance of I and G, and c1 and c2 are constants to ensure that the denominator is not zero and maintain the stability of the evaluation structure. The closer the value of SSIM is to 1, the more similar the recovered image is to the original image.

We evaluate the performance of the proposed model with the iSAID dataset. The dataset consists of 36 images containing different scenarios, which is different from the images in the training set. The experiments were performed on a desktop with Xeon Silver 4210R CPU and NVIDIA 1080Ti GPU. To test the performance of the model, the TVAL3 algorithm and a CNN model previously proposed by our team are introduced to compare with the proposed model. The CNN reconstructs images through a series of convolutional layers and ReLU activation functions. Afterwards, the BM3D denoiser is used to reduce noise, smooth the images, and remove artifacts. The TVAL3 algorithm can be effectively applied to imaging of the PIII system, which requires some iterations. We choose several sets of images to compare the reconstruction results and show them in Fig. 6. The PSNRs and SSIMs for different scenarios are shown in Table 2.

Figure 6.Comparison of the images reconstructed by the TVAL3 algorithm and a CNN model proposed in this paper with original images: (a) Original image, (b) images reconstructed by TVAL3, (c) images reconstructed by the CNN model, (d) images reconstructed by the proposed model.

TABLE 2 Comparison of different algorithms on the constructed dataset

Figure 6TVAL3CNN ModelOurs
PSNRSSIMPSNRSSIMPSNRSSIM
i(a)–i(d)16.240.525017.780.627418.100.6894
ii(a)–ii(d)15.190.445215.650.495316.330.6111
iii(a)–iii(d)14.830.367215.180.606619.190.7098
iv(a)–iv(d)13.280.423319.540.548720.400.6010
v(a)–v(d)13.900.532816.070.452316.140.6465
vi(a)–vi(d)14.810.458015.000.479617.290.5897


The algorithms based on deep learning can achieve better detail restoration. The training of the algorithm based on deep learning requires a substantial amount of time to learn how to restore image features and reduce noise. However, the trained weights can be adapted for various scenarios without extensive iterations. In the process of image reconstruction, traditional algorithms typically require numerous iterations to produce results. So, the time for reconstruction is much longer than that of newly proposed algorithm. The images reconstructed by the CNN model are too smooth and fail to capture effective detail information. Additionally, the brightness and contrast of images reconstructed by the CNN model are the lowest among the reconstruction methods. While training for the same epochs, the proposed model requires 404.412 minutes, and the CNN model needs 449.038 minutes. The maximum number of channels of the model is limited to 64. By limiting the number of channels and reducing unnecessary training operations, the lightweight requirements of the model can be ensured. Additionally, the trained model can be transferred to other devices for use without the need for retraining, which greatly reduces the operational burden on the devices. The new image reconstruction method has advantages in both training time and reconstructed image quality.

By designing a multi-scale convolutional attention module and a modified residual block, a novel image reconstruction algorithm based on deep learning has been proposed. The model underwent training in three distinct phases and effectively enhanced feature restoration and noise reduction capabilities. In line with the imaging process, an undersampled image dataset was created to facilitate efficient feature learning and improved evaluation. The results show that the image reconstruction method for PIII system based on deep learning performs well in both qualitative and quantitative ways.

The authors thank the Editor-in-Chief, the reviewers, the School of Mechanical and Aerospace Engineering of Jilin University and College of Mechanical Engineering and Automation of Fuzhou University for this work.

The authors received no financial support for the research, authorship, and/or publication of this paper.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data that supports the findings of this study is available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

  1. R. P. Scott, T. Su, C. Ogden, S. T. Thurman, R. L. Kendrick, A. Duncan, R. Yu, and S. J. B. Yoo, “Demonstration of a photonic integrated circuit for multi-baseline interferometric imaging,” in Proc. IEEE Photonics Conference (San Diego, USA, Oct. 12-16, 2014), pp. 1-2.
    CrossRef
  2. T. Su, R. P. Scott, C. Ogden, S. T. Thurman, R. L. Kendrick, A. Duncan, and S. J. B. Yoo, “Experimental demonstration of interferometric imaging using photonic integrated circuits,” Opt. Express 25, 12653-12665 (2017).
    Pubmed CrossRef
  3. C. Li, W. Yin, H. Jiang, and Y. Zhang, “An efficient augmented Lagrangian method with applications to total variation minimization,” Comput. Optim. Appl. 56, 507-530 (2013).
    CrossRef
  4. L. Pratley, J. D. McEwen, M. d'Avezac, R. E. Carrillo, A. Onose, and Y. Wiaux, “Robust sparse image reconstruction of radio interferometric observations with PURIFY,” Mon. Not. R. Astron. Soc. 473, 1038-1058 (2018).
    CrossRef
  5. W. P. Gao, X. R. Wang, L. Ma, Y. Yuan, and D. F. Guo, “Quantitative analysis of segmented planar imaging quality based on hierarchical multistage sampling lens array,” Opt. Express 27, 7955-7967 (2019).
    Pubmed CrossRef
  6. W. Zhang, H. Ma, and K. Huang, “Spatial frequency coverage and image reconstruction for photonic integrated interferometric imaging system,” Curr. Opt. Photonics 5, 606-616 (2021).
  7. S. T. Thurman, R. L. Kendrick, A. Duncan, D. Wuchenich, and C. Ogden, “System design for a SPIDER imager,” in Frontiers in Optics (Optical Society of America, 2015), p. paper FM3E.3.
    CrossRef
  8. Y. Sun, C. Liu, H. Ma, and W. Zhang, “Image reconstruction based on deep learning for the SPIDER Optical Interferometric System,” Curr. Opt. Photonics 6, 260-269 (2022).
  9. Q. Chu, Y. Shen, M. Yuan, and M. Gong, “Numerical simulation and optimal design of segmented planar imaging detector for electro-optical reconnaissance,” Opt. Commun. 405, 288-296 (2017).
    CrossRef
  10. G.-M. Lv, Q. Li, Y.-T. Chen, H.-J. Feng, Z.-H. Xu, and J. Mu, “An improved scheme and numerical simulation of segmented planar imaging detector for electro-optical reconnaissance,” Opt. Rev. 26, 664-675 (2019).
    CrossRef
  11. J. Yong, Z. Feng, Z. Wu, S. Ye, M. Li, J. Wu, and C. Cao, “Photonic integrated interferometric imaging based on main and auxiliary nested microlens arrays,” Opt. Express 30, 29472-29484 (2022).
    Pubmed CrossRef
  12. Q. Yu, B. Ge, Y. Li, Y. Yue, F. Chen, and S. Sun, “System design for a "checkerboard" imager,” Appl. Opt. 57, 10218-10223 (2018).
    Pubmed CrossRef
  13. T. Su, G. Liu, K. E. Badham, S. T. Thurman, R. L. Kendrick, A. Duncan, D. Wuchenich, C. Ogden, G. Chriqui, S. Feng, J. Chun, W. Lai, and S. J. B. Yoo, “Interferometric imaging using Si3N4 photonic integrated circuits for a SPIDER imager,” Opt. Express 26, 12801-12812 (2018).
    Pubmed CrossRef
  14. G. Liu, D. Wen, Z. Song, and T. Jiang, “System design of an optical interferometer based on compressive sensing: An update,” Opt. Express 28, 19349-19361 (2020).
    Pubmed CrossRef
  15. W. Gao, Y. Yuan, X. Wang, L. Ma, Z. Zhao, and H. Yuan, “Quantitative analysis and optimization design of the segmented planar integrated optical imaging system based on an inhomogeneous multistage sampling lens array,” Opt. Express 29, 11869-11884 (2021).
    Pubmed CrossRef
  16. S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” arXiv:1612.02177v2 (2018).
    CrossRef
  17. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385v1 (2015).
    CrossRef
  18. J. Liu, W. Zhang, Y. Tang, J. Tang, and G. Wu, “Residual feature aggregation network for image super-resolution,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition-CVPR (Seattle, WA, USA, Jun. 13-19, 2020), pp. 2356-2365.
    CrossRef
  19. K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process 16, 2080-2095 (2007).
    Pubmed CrossRef
  20. S. W. Zamir, A. Arora, A. Gupta, S. Khan, G. Sun, F. S. Khan, F. Zhu, L. Shao, G.-S. Xia, and X. Bai, “iSAID: A large-scale dataset for instance segmentation in aerial images,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition-CVPR Workshops (Long Beach, CA, USA, Jun. 15-20, 2019), pp. 28-37.
  21. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “ReconNet: Non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition-CVPR (Las Vegas, USA, Jun. 26-Jul. 1, 2016), pp. 449-458.
    KoreaMed CrossRef

Article

Research Paper

Curr. Opt. Photon. 2024; 8(4): 391-398

Published online August 25, 2024 https://doi.org/10.3807/COPP.2024.8.4.391

Copyright © Optical Society of Korea.

Image Reconstruction Method for Photonic Integrated Interferometric Imaging Based on Deep Learning

Qianchen Xu1, Weijie Chang2, Feng Huang2 , Wang Zhang1

1School of Mechanical and Aerospace Engineering, Jilin University, Changchun 130025, China
2College of Mechanical Engineering and Automation, Fuzhou University, Fuzhou 350108, China

Correspondence to:*huangf@fzu.edu.cn, ORCID 0000-0003-4652-4312
**wangzhang@jlu.edu.cn, ORCID 0000-0001-9029-1320

Received: April 4, 2024; Revised: July 15, 2024; Accepted: July 22, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

An image reconstruction algorithm is vital for the image quality of a photonic integrated interferometric imaging (PIII) system. However, image reconstruction algorithms have limitations that always lead to degraded image reconstruction. In this paper, a novel image reconstruction algorithm based on deep learning is proposed. Firstly, the principle of optical signal transmission through the PIII system is investigated. A dataset suitable for image reconstruction of the PIII system is constructed. Key aspects such as model and loss functions are compared and constructed to solve the problem of image blurring and noise influence. By comparing it with other algorithms, the proposed algorithm is verified to have good reconstruction results not only qualitatively but also quantitatively.

Keywords: Deep learning, Image reconstruction, Optical imaging, Optical interferometry, Photonic integrated circuits

I. INTRODUCTION

A photonic integrated circuit (PIC)-based interferometric imaging system has been developed by the Lockheed Martin Center and UC Davis for a new generation of optical imaging systems [1]. The photonic integrated interferometric imaging (PIII) system has many outstanding features, such as light weight, small size, high resolution, and low power consumption [1, 2]. In contrast with traditional optical imaging systems, the PIII system incorporates photonic integrated devices. By modulating and processing the optical signals through a PIC card, the size of the optical system can be greatly reduced, and the system can be highly integrated [2].

Sampling coverage is incomplete due to the structure of the PIII system, and the image directly restored by the system has some problems, such as imaging blurring and noise. Some classical algorithms are applied in the field of image reconstruction, such as the clean algorithm (Högbom CLEAN), and the total variational algorithm based on compressed sensing (TVAL3) [3, 4]. Although traditional image reconstruction algorithms can improve image quality, they need many iterations to reconstruct images, which leads to a long reconstruction time. The traditional algorithms still have limitations. With rapid developments in the field of computer science, image processing algorithms such as super-resolution and denoising based on deep learning have shown strong advantages. In this paper, a novel image-reconstruction algorithm for the PIII system is proposed. The algorithm proposes a novel architecture that balances the restoration of image details and denoising. We design a loss function to ensure the smoothness of the reconstructed images. A comparison of image quality between the traditional algorithm and the proposed algorithm confirms the superiority of the reconstruction algorithm. The structure of the PIII system is shown in Fig. 1.

Figure 1. The structure of the PIII system.

II. FUNDAMENTALS OF THE PIII SYSTEM

2.1. Imaging Principle

The imaging principle of the PIII system is based on the Van Cittert–Zernike theorem. The PIII system uses sub-aperture interferometry of light emitted from incoherent target sources. The measured interference fringes can be used to extract the amplitude and phase information of the complex visibility [5]. Due to the constraints imposed by the arrangement of lenslets in the structure, a sparse sampling of spatial frequency will be acquired. However, the image restored in this way has defects of low resolution and missing content [6].

Spatial frequency has a linear relationship with the baselines [57].

u=Δxzλ¯,
v=Δyzλ¯,
f=Bzλ¯,
B=Δx2+Δy2,
f=u2+v2,

where Δx and Δy represent the distance between apertures. λ̅ represents wavelengths, and z represents the distance from the target to the lenslet. B represents the length of the baseline. f represents spatial frequency.

The distance between the paired lenslets equals the length of the baseline B, which determines the resolution of the system. The longer the baseline B, the higher the spatial frequency f will be collected. Moreover, detailed information about the target can be obtained. The paired lenslets used to form short baselines will collect lower spatial frequencies, which can obtain the overall contour information of the target [8, 9].

2.2. Functions and Features of the System

Figure 2 shows the function and pairing method of the PIII system. The lenslets located at the top of the system collects light from the observed scene. A pair of lenslets on the same interference arm constitutes an interference baseline. Then, the light signal is coupled into the optical waveguide. The waveguide array collects light from a different field of view. A silicon-based photonic integrated circuit (PIC) chip is used for signal transmission [9]. The PIC integrates different optical devices, including an arrayed waveguide grating (AWG), phase modulator, and 90° optical hybrid. The AWG decomposes a wide spectral band to output multiple narrow spectral bands. The phase modulator adjusts the phase of the light signals in the two input waveguides to ensure the coherence of the beam [10]. A 90° hybrid mixer combines the two input light signals for interference.

Figure 2. Schematic diagram of initial SPIDER imaging system.

The balanced detector denoises and differentially amplifies the IQ signals input from the 90° hybrid mixer. Then, the complex coherence coefficient of the interference fringes is calculated. According to the Van Cittert–Zernike theory, the mutual coherent intensity is equivalent to the complex amplitude of the interference fringe. By applying inverse Fourier transform to the complex amplitude, the brightness distribution of the observed target can be restored [1115]. This is the theoretical basis of the PIII system.

2.3. Design of the PIII System

The wavelength range will affect the imaging resolution of the PIII system. The relationship between resolution, wavelength, and the maximum baseline of the system can be represented as

r=λzB,

where r represents the resolution of the system, z is the observation distance, and λ is wavelength range.

To simulate the PIII system, necessary parameters need to be configured. Table 1 lists the parameters required for the imaging simulation of the PIII system.

TABLE 1. System parameters for simulation.

ParametersValue
Wavelength Range (nm)800–1,600
Number of Spectral Segments8
Observation Distance (km)100
Lenslets Size (mm)6
Number of Interference Arms35
Number of Single-arm Lenslets38


Images with strong contrast can be generated in the short-wave infrared band. The band is suitable for long-distance observation and has strong penetration power, which is consistent with the use scenario of the PIII system. Therefore, the wavelength range should be set within the short-wave infrared band. For components in a silicon-based photonic chip, the operating loss is lower around the center wavelength of 1,550 nm, which enhances the system’s imaging performance. Considering the practical operating conditions of the system and the optimal operating wavelength of silicon photonic devices, the wavelength range is ultimately defined as 800–1,600 nm [10].

III. METHODS

Based on the previous analysis of the PIII system, the lenslets affect the acquisition of spatial frequency information, that is, u-v coverage.

This paper provides a simulation of the imaging process of the PIII system and a novel image reconstruction method based on deep learning for the PIII system. The method of direct image restoration, the design of the model and the training process are performed and analyzed in detail.

3.1. Imaging Simulation

The u-v coverage describes the acquisition of spatial frequency information by the integrated interference system. In the formed u-v coverage, low-frequency points are distributed in the inner region. The frequency gradually increases from the center outward. The system collects light signals into planar waveguides through lenslets. In interferometry theory, two paired apertures will produce interference fringes in the image plane. The system processes the optical signals and calculates the amplitude and phase information extracted from the interference fringes. Each point in the u-v coverage collected by paired lenslets contains both amplitude and phase information. The image can be restored by reverse Fourier transform under limited u-v coverage.

The imaging process can be described as

lx,y=F1Ffx,y·Su,v,

where l (x, y) is the image directly restored by reverse Fourier transformation. F and F −1 represent the process of Fourier transformation and reverse Fourier transformation, respectively. f (x, y) is the field of view captured by the system. S (u, v) represents the u-v coverage associated with the lenslets array. The u-v coverage of the lenslets array can be considered as a mask. The u-v coverage masks the spectrum formed by the Fourier transform of the observation field. Then, the restored image is obtained by applying an inverse Fourier transform to the retained spectral information. The AWG decomposes the input wide-spectrum optical signal into multiple narrow-spectrum signals. The number of spectral segments is equal to the number of output channels of the AWG. The central wavelengths of each spectral segment are used to calculate the frequency coverage.

In the u-v coverage, each point reflects the complex coherence information of the original observed target. The PIII system relies on lenslets to collect the light. Compared to a single large-aperture optical telescope, the PIII system has limited capacity for collecting light signals. Due to the incomplete sampling of spatial frequency information, the quality of the directly restored image is poor. Additionally, noise interference during imaging further degrades the image clarity and resolution. Therefore, the directly restored blurry images need to be processed by using an image reconstruction method to obtain high-quality images. The directly restored images are input into a pretrained model to reconstruct images.

3.2. Proposed Model

The architecture of the CNN proved to be applicable in the field of image reconstruction, but the structure of a fully convolutional layer only carries out some feature learning and cannot solve problems such as noise and image smoothing. Although the use of a large convolution kernel can capture more information, it will greatly increase the computational burden. In the pursuit of a lightweight model, many papers show that a small convolution kernel can be used for multilayer training instead of a large convolution kernel [16]. Using a residual network structure is highly modular, with repeated building blocks stacked together. By enabling the training of deeper networks, ResNet can significantly increase the representational power of the architecture [17]. Due to incomplete sampling by lenslets array and processing by PIC cards, the reconstructed image is influenced by optical components and environmental factors.

The new architecture can be divided into three parts: coarse-level network, finer-level network and fusion learning network.

3.2.1. Coarse-level Network

The structure of convolutional layers is employed in a coarse-level network. The first convolutional layer uses a 7 × 7 filter. A large kernel provides a larger receptive field, enabling the network to capture a broader context. The second layer uses a filter size of 5 × 5 to further learn complex features of images.

3.2.2. Finer-level Network

The finer-level network consists of two parts. At the start of this stage, coarse information begins to become more refined. The residual network structure is modified to fulfill the lightweight requirements of the algorithm [18]. The dropout layer can lead to better generalization and robustness of the network. Batch normalization layers are removed to enhance training speed. The modified residual block is shown in Fig. 3.

Figure 3. Modified residual block (MRB).

Following the residual network, an attention mechanism network is designed to extract more detailed information and remove certain noise. Block matching and 3D filtering (BM3D) have exceptional denoising capabilities and preserve fine details and edges. However, BM3D is computationally intensive, which makes it difficult for real-time applications [19]. A significant advantage of the PIII system is real-time and high-resolution imaging. The newly developed attentional mechanism architecture fulfills the system requirements. The architecture is shown in Fig. 4.

Figure 4. Multi-scale convolutional attention module (MCA).

By applying a 1 × 1 convolutional filter, the network can reduce the number of output channels. A max-pooling layer is used to reduce the spatial dimensions, suppressing the less informative or noisy responses. Multiscale feature learning is applied to the extracted features. The features are sequentially subjected to upsampling and channel restoration. For the initial features, a connection can be established using a similar residual network, enabling the capture of more comprehensive features and the suppression of certain noise levels. In the finer network, the features of the original image are learned.

3.2.3. Fusion Learning Network

Features obtained from MRB and MCA in the finer feature learning network can be fused appropriately to preserve as many features as possible while reducing the impact of noise. Further learning is performed through multiple convolutional layers with a 3 × 3 filter size while gradually reducing the number of channels. The final two convolutional layers are designed to concurrently learn features and approximate a filtering operation.

3.3. Training

The comprehensive network architecture is shown in Fig. 5. Earth observation is applicable to the functionalities of this system. The proposed model is trained on the iSAID dataset [20]. By simulating undersampling operations, the original images are transformed into blurred undersampled representations. A total of 212 pairs were used for training. Due to device requirements, the images are segmented into 160 × 160 small patches. The training dataset ultimately comprises 63,716 pairs. Color images are converted to grayscale to facilitate training and image reconstruction. The learning rate is adaptively tuned beginning from 1 × 10−4. Due to device requirements, the images are segmented into 160 × 160 small patches [21].

Figure 5. Comprehensive network architecture. Ui denotes a blurry image, Ri denotes a reconstructed image, and Ti denotes an original image.

MSE provides a clear measure of the intensity differences between corresponding pixels in the original and processed images. However, one significant limitation of MSE is its sensitivity to absolute pixel intensity differences, which can result in a lack of correlation with perceived visual quality. Unlike MSE, L1 loss tends to preserve edges and fine details. L1 loss provides robustness against outliers or noise. The loss function is defined as

ltotal=λlL1+1λlMSE,
lL1=1k i=1 kfxiγi,
lMSE=1k i=1 kfxiγi2.

Here f (xi) is the output of the network, and γi is the original image label.

The L1 loss function is robust against outliers, but it may not be as sensitive as MSE in reducing high-frequency errors. MSE can provide smoother results to reconstruct detailed features. L1 loss contributes 10%, which appropriately enhances the model’s robustness.

3.4. Evaluation Index

To accurately evaluate the quality of restored images, quantitative analysis indicators are introduced.

The peak signal-to-noise ratio (PSNR) is an index for evaluating image quality based on pixel errors. The higher the PSNR is, the better the imaging quality is. Mean square error (MSE) based on image pixel statistics is used to evaluate image quality. peak represents the maximum brightness value. PSNR doesn’t comprehensively consider human visual characteristics. Thus, it can only be regarded as a rough estimation and cannot fully reflect the perception of image quality.

PSNR=10log10peak2MSE.

The structural similarity index (SSIM) is an index based on the human visual system. SSIM focuses on the perception of images and complies with human visual characteristics. It considers information such as image structure, color, brightness, etc., and can comprehensively reflect the quality of images. If the test image is denoted as I, and the original reference image as G, the definition is as follows:

SSIM=2μIμG+c12γIG+c2μI2+μG2+c1γI2+γG2+c2,

where μI and μG are the average values of I and G, γI2 and γG2 denotes the variances of I and G, respectively. γIG denotes the covariance of I and G, and c1 and c2 are constants to ensure that the denominator is not zero and maintain the stability of the evaluation structure. The closer the value of SSIM is to 1, the more similar the recovered image is to the original image.

IV. RESULTS

We evaluate the performance of the proposed model with the iSAID dataset. The dataset consists of 36 images containing different scenarios, which is different from the images in the training set. The experiments were performed on a desktop with Xeon Silver 4210R CPU and NVIDIA 1080Ti GPU. To test the performance of the model, the TVAL3 algorithm and a CNN model previously proposed by our team are introduced to compare with the proposed model. The CNN reconstructs images through a series of convolutional layers and ReLU activation functions. Afterwards, the BM3D denoiser is used to reduce noise, smooth the images, and remove artifacts. The TVAL3 algorithm can be effectively applied to imaging of the PIII system, which requires some iterations. We choose several sets of images to compare the reconstruction results and show them in Fig. 6. The PSNRs and SSIMs for different scenarios are shown in Table 2.

Figure 6. Comparison of the images reconstructed by the TVAL3 algorithm and a CNN model proposed in this paper with original images: (a) Original image, (b) images reconstructed by TVAL3, (c) images reconstructed by the CNN model, (d) images reconstructed by the proposed model.

TABLE 2. Comparison of different algorithms on the constructed dataset.

Figure 6TVAL3CNN ModelOurs
PSNRSSIMPSNRSSIMPSNRSSIM
i(a)–i(d)16.240.525017.780.627418.100.6894
ii(a)–ii(d)15.190.445215.650.495316.330.6111
iii(a)–iii(d)14.830.367215.180.606619.190.7098
iv(a)–iv(d)13.280.423319.540.548720.400.6010
v(a)–v(d)13.900.532816.070.452316.140.6465
vi(a)–vi(d)14.810.458015.000.479617.290.5897


The algorithms based on deep learning can achieve better detail restoration. The training of the algorithm based on deep learning requires a substantial amount of time to learn how to restore image features and reduce noise. However, the trained weights can be adapted for various scenarios without extensive iterations. In the process of image reconstruction, traditional algorithms typically require numerous iterations to produce results. So, the time for reconstruction is much longer than that of newly proposed algorithm. The images reconstructed by the CNN model are too smooth and fail to capture effective detail information. Additionally, the brightness and contrast of images reconstructed by the CNN model are the lowest among the reconstruction methods. While training for the same epochs, the proposed model requires 404.412 minutes, and the CNN model needs 449.038 minutes. The maximum number of channels of the model is limited to 64. By limiting the number of channels and reducing unnecessary training operations, the lightweight requirements of the model can be ensured. Additionally, the trained model can be transferred to other devices for use without the need for retraining, which greatly reduces the operational burden on the devices. The new image reconstruction method has advantages in both training time and reconstructed image quality.

V. CONCLUSION

By designing a multi-scale convolutional attention module and a modified residual block, a novel image reconstruction algorithm based on deep learning has been proposed. The model underwent training in three distinct phases and effectively enhanced feature restoration and noise reduction capabilities. In line with the imaging process, an undersampled image dataset was created to facilitate efficient feature learning and improved evaluation. The results show that the image reconstruction method for PIII system based on deep learning performs well in both qualitative and quantitative ways.

Acknowledgments

The authors thank the Editor-in-Chief, the reviewers, the School of Mechanical and Aerospace Engineering of Jilin University and College of Mechanical Engineering and Automation of Fuzhou University for this work.

FUNDING

The authors received no financial support for the research, authorship, and/or publication of this paper.

DISCLOSURES

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

DATA AVAILABILITY

Data that supports the findings of this study is available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Fig 1.

Figure 1.The structure of the PIII system.
Current Optics and Photonics 2024; 8: 391-398https://doi.org/10.3807/COPP.2024.8.4.391

Fig 2.

Figure 2.Schematic diagram of initial SPIDER imaging system.
Current Optics and Photonics 2024; 8: 391-398https://doi.org/10.3807/COPP.2024.8.4.391

Fig 3.

Figure 3.Modified residual block (MRB).
Current Optics and Photonics 2024; 8: 391-398https://doi.org/10.3807/COPP.2024.8.4.391

Fig 4.

Figure 4.Multi-scale convolutional attention module (MCA).
Current Optics and Photonics 2024; 8: 391-398https://doi.org/10.3807/COPP.2024.8.4.391

Fig 5.

Figure 5.Comprehensive network architecture. Ui denotes a blurry image, Ri denotes a reconstructed image, and Ti denotes an original image.
Current Optics and Photonics 2024; 8: 391-398https://doi.org/10.3807/COPP.2024.8.4.391

Fig 6.

Figure 6.Comparison of the images reconstructed by the TVAL3 algorithm and a CNN model proposed in this paper with original images: (a) Original image, (b) images reconstructed by TVAL3, (c) images reconstructed by the CNN model, (d) images reconstructed by the proposed model.
Current Optics and Photonics 2024; 8: 391-398https://doi.org/10.3807/COPP.2024.8.4.391

TABLE 1 System parameters for simulation

ParametersValue
Wavelength Range (nm)800–1,600
Number of Spectral Segments8
Observation Distance (km)100
Lenslets Size (mm)6
Number of Interference Arms35
Number of Single-arm Lenslets38

TABLE 2 Comparison of different algorithms on the constructed dataset

Figure 6TVAL3CNN ModelOurs
PSNRSSIMPSNRSSIMPSNRSSIM
i(a)–i(d)16.240.525017.780.627418.100.6894
ii(a)–ii(d)15.190.445215.650.495316.330.6111
iii(a)–iii(d)14.830.367215.180.606619.190.7098
iv(a)–iv(d)13.280.423319.540.548720.400.6010
v(a)–v(d)13.900.532816.070.452316.140.6465
vi(a)–vi(d)14.810.458015.000.479617.290.5897

References

  1. R. P. Scott, T. Su, C. Ogden, S. T. Thurman, R. L. Kendrick, A. Duncan, R. Yu, and S. J. B. Yoo, “Demonstration of a photonic integrated circuit for multi-baseline interferometric imaging,” in Proc. IEEE Photonics Conference (San Diego, USA, Oct. 12-16, 2014), pp. 1-2.
    CrossRef
  2. T. Su, R. P. Scott, C. Ogden, S. T. Thurman, R. L. Kendrick, A. Duncan, and S. J. B. Yoo, “Experimental demonstration of interferometric imaging using photonic integrated circuits,” Opt. Express 25, 12653-12665 (2017).
    Pubmed CrossRef
  3. C. Li, W. Yin, H. Jiang, and Y. Zhang, “An efficient augmented Lagrangian method with applications to total variation minimization,” Comput. Optim. Appl. 56, 507-530 (2013).
    CrossRef
  4. L. Pratley, J. D. McEwen, M. d'Avezac, R. E. Carrillo, A. Onose, and Y. Wiaux, “Robust sparse image reconstruction of radio interferometric observations with PURIFY,” Mon. Not. R. Astron. Soc. 473, 1038-1058 (2018).
    CrossRef
  5. W. P. Gao, X. R. Wang, L. Ma, Y. Yuan, and D. F. Guo, “Quantitative analysis of segmented planar imaging quality based on hierarchical multistage sampling lens array,” Opt. Express 27, 7955-7967 (2019).
    Pubmed CrossRef
  6. W. Zhang, H. Ma, and K. Huang, “Spatial frequency coverage and image reconstruction for photonic integrated interferometric imaging system,” Curr. Opt. Photonics 5, 606-616 (2021).
  7. S. T. Thurman, R. L. Kendrick, A. Duncan, D. Wuchenich, and C. Ogden, “System design for a SPIDER imager,” in Frontiers in Optics (Optical Society of America, 2015), p. paper FM3E.3.
    CrossRef
  8. Y. Sun, C. Liu, H. Ma, and W. Zhang, “Image reconstruction based on deep learning for the SPIDER Optical Interferometric System,” Curr. Opt. Photonics 6, 260-269 (2022).
  9. Q. Chu, Y. Shen, M. Yuan, and M. Gong, “Numerical simulation and optimal design of segmented planar imaging detector for electro-optical reconnaissance,” Opt. Commun. 405, 288-296 (2017).
    CrossRef
  10. G.-M. Lv, Q. Li, Y.-T. Chen, H.-J. Feng, Z.-H. Xu, and J. Mu, “An improved scheme and numerical simulation of segmented planar imaging detector for electro-optical reconnaissance,” Opt. Rev. 26, 664-675 (2019).
    CrossRef
  11. J. Yong, Z. Feng, Z. Wu, S. Ye, M. Li, J. Wu, and C. Cao, “Photonic integrated interferometric imaging based on main and auxiliary nested microlens arrays,” Opt. Express 30, 29472-29484 (2022).
    Pubmed CrossRef
  12. Q. Yu, B. Ge, Y. Li, Y. Yue, F. Chen, and S. Sun, “System design for a "checkerboard" imager,” Appl. Opt. 57, 10218-10223 (2018).
    Pubmed CrossRef
  13. T. Su, G. Liu, K. E. Badham, S. T. Thurman, R. L. Kendrick, A. Duncan, D. Wuchenich, C. Ogden, G. Chriqui, S. Feng, J. Chun, W. Lai, and S. J. B. Yoo, “Interferometric imaging using Si3N4 photonic integrated circuits for a SPIDER imager,” Opt. Express 26, 12801-12812 (2018).
    Pubmed CrossRef
  14. G. Liu, D. Wen, Z. Song, and T. Jiang, “System design of an optical interferometer based on compressive sensing: An update,” Opt. Express 28, 19349-19361 (2020).
    Pubmed CrossRef
  15. W. Gao, Y. Yuan, X. Wang, L. Ma, Z. Zhao, and H. Yuan, “Quantitative analysis and optimization design of the segmented planar integrated optical imaging system based on an inhomogeneous multistage sampling lens array,” Opt. Express 29, 11869-11884 (2021).
    Pubmed CrossRef
  16. S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” arXiv:1612.02177v2 (2018).
    CrossRef
  17. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385v1 (2015).
    CrossRef
  18. J. Liu, W. Zhang, Y. Tang, J. Tang, and G. Wu, “Residual feature aggregation network for image super-resolution,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition-CVPR (Seattle, WA, USA, Jun. 13-19, 2020), pp. 2356-2365.
    CrossRef
  19. K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process 16, 2080-2095 (2007).
    Pubmed CrossRef
  20. S. W. Zamir, A. Arora, A. Gupta, S. Khan, G. Sun, F. S. Khan, F. Zhu, L. Shao, G.-S. Xia, and X. Bai, “iSAID: A large-scale dataset for instance segmentation in aerial images,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition-CVPR Workshops (Long Beach, CA, USA, Jun. 15-20, 2019), pp. 28-37.
  21. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “ReconNet: Non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition-CVPR (Las Vegas, USA, Jun. 26-Jul. 1, 2016), pp. 449-458.
    KoreaMed CrossRef