Ex) Article Title, Author, Keywords
Current Optics
and Photonics
Ex) Article Title, Author, Keywords
Curr. Opt. Photon. 2024; 8(4): 391-398
Published online August 25, 2024 https://doi.org/10.3807/COPP.2024.8.4.391
Copyright © Optical Society of Korea.
Qianchen Xu^{1}, Weijie Chang^{2}, Feng Huang^{2} , Wang Zhang^{1}
Corresponding author: *huangf@fzu.edu.cn, ORCID 0000-0003-4652-4312
**wangzhang@jlu.edu.cn, ORCID 0000-0001-9029-1320
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
An image reconstruction algorithm is vital for the image quality of a photonic integrated interferometric imaging (PIII) system. However, image reconstruction algorithms have limitations that always lead to degraded image reconstruction. In this paper, a novel image reconstruction algorithm based on deep learning is proposed. Firstly, the principle of optical signal transmission through the PIII system is investigated. A dataset suitable for image reconstruction of the PIII system is constructed. Key aspects such as model and loss functions are compared and constructed to solve the problem of image blurring and noise influence. By comparing it with other algorithms, the proposed algorithm is verified to have good reconstruction results not only qualitatively but also quantitatively.
Keywords: Deep learning, Image reconstruction, Optical imaging, Optical interferometry, Photonic integrated circuits
OCIS codes: (100.3010) Image reconstruction techniques; (100.3020) Image reconstruction-restoration; (110.3175) Interferometric imaging; (150.1135) Algorithms
A photonic integrated circuit (PIC)-based interferometric imaging system has been developed by the Lockheed Martin Center and UC Davis for a new generation of optical imaging systems [1]. The photonic integrated interferometric imaging (PIII) system has many outstanding features, such as light weight, small size, high resolution, and low power consumption [1, 2]. In contrast with traditional optical imaging systems, the PIII system incorporates photonic integrated devices. By modulating and processing the optical signals through a PIC card, the size of the optical system can be greatly reduced, and the system can be highly integrated [2].
Sampling coverage is incomplete due to the structure of the PIII system, and the image directly restored by the system has some problems, such as imaging blurring and noise. Some classical algorithms are applied in the field of image reconstruction, such as the clean algorithm (Högbom CLEAN), and the total variational algorithm based on compressed sensing (TVAL3) [3, 4]. Although traditional image reconstruction algorithms can improve image quality, they need many iterations to reconstruct images, which leads to a long reconstruction time. The traditional algorithms still have limitations. With rapid developments in the field of computer science, image processing algorithms such as super-resolution and denoising based on deep learning have shown strong advantages. In this paper, a novel image-reconstruction algorithm for the PIII system is proposed. The algorithm proposes a novel architecture that balances the restoration of image details and denoising. We design a loss function to ensure the smoothness of the reconstructed images. A comparison of image quality between the traditional algorithm and the proposed algorithm confirms the superiority of the reconstruction algorithm. The structure of the PIII system is shown in Fig. 1.
The imaging principle of the PIII system is based on the Van Cittert–Zernike theorem. The PIII system uses sub-aperture interferometry of light emitted from incoherent target sources. The measured interference fringes can be used to extract the amplitude and phase information of the complex visibility [5]. Due to the constraints imposed by the arrangement of lenslets in the structure, a sparse sampling of spatial frequency will be acquired. However, the image restored in this way has defects of low resolution and missing content [6].
Spatial frequency has a linear relationship with the baselines [5–7].
where Δx and Δy represent the distance between apertures. λ̅ represents wavelengths, and z represents the distance from the target to the lenslet. B represents the length of the baseline. f represents spatial frequency.
The distance between the paired lenslets equals the length of the baseline B, which determines the resolution of the system. The longer the baseline B, the higher the spatial frequency f will be collected. Moreover, detailed information about the target can be obtained. The paired lenslets used to form short baselines will collect lower spatial frequencies, which can obtain the overall contour information of the target [8, 9].
Figure 2 shows the function and pairing method of the PIII system. The lenslets located at the top of the system collects light from the observed scene. A pair of lenslets on the same interference arm constitutes an interference baseline. Then, the light signal is coupled into the optical waveguide. The waveguide array collects light from a different field of view. A silicon-based photonic integrated circuit (PIC) chip is used for signal transmission [9]. The PIC integrates different optical devices, including an arrayed waveguide grating (AWG), phase modulator, and 90° optical hybrid. The AWG decomposes a wide spectral band to output multiple narrow spectral bands. The phase modulator adjusts the phase of the light signals in the two input waveguides to ensure the coherence of the beam [10]. A 90° hybrid mixer combines the two input light signals for interference.
The balanced detector denoises and differentially amplifies the IQ signals input from the 90° hybrid mixer. Then, the complex coherence coefficient of the interference fringes is calculated. According to the Van Cittert–Zernike theory, the mutual coherent intensity is equivalent to the complex amplitude of the interference fringe. By applying inverse Fourier transform to the complex amplitude, the brightness distribution of the observed target can be restored [11–15]. This is the theoretical basis of the PIII system.
The wavelength range will affect the imaging resolution of the PIII system. The relationship between resolution, wavelength, and the maximum baseline of the system can be represented as
where r represents the resolution of the system, z is the observation distance, and λ is wavelength range.
To simulate the PIII system, necessary parameters need to be configured. Table 1 lists the parameters required for the imaging simulation of the PIII system.
TABLE 1 System parameters for simulation
Parameters | Value |
---|---|
Wavelength Range (nm) | 800–1,600 |
Number of Spectral Segments | 8 |
Observation Distance (km) | 100 |
Lenslets Size (mm) | 6 |
Number of Interference Arms | 35 |
Number of Single-arm Lenslets | 38 |
Images with strong contrast can be generated in the short-wave infrared band. The band is suitable for long-distance observation and has strong penetration power, which is consistent with the use scenario of the PIII system. Therefore, the wavelength range should be set within the short-wave infrared band. For components in a silicon-based photonic chip, the operating loss is lower around the center wavelength of 1,550 nm, which enhances the system’s imaging performance. Considering the practical operating conditions of the system and the optimal operating wavelength of silicon photonic devices, the wavelength range is ultimately defined as 800–1,600 nm [10].
Based on the previous analysis of the PIII system, the lenslets affect the acquisition of spatial frequency information, that is, u-v coverage.
This paper provides a simulation of the imaging process of the PIII system and a novel image reconstruction method based on deep learning for the PIII system. The method of direct image restoration, the design of the model and the training process are performed and analyzed in detail.
The u-v coverage describes the acquisition of spatial frequency information by the integrated interference system. In the formed u-v coverage, low-frequency points are distributed in the inner region. The frequency gradually increases from the center outward. The system collects light signals into planar waveguides through lenslets. In interferometry theory, two paired apertures will produce interference fringes in the image plane. The system processes the optical signals and calculates the amplitude and phase information extracted from the interference fringes. Each point in the u-v coverage collected by paired lenslets contains both amplitude and phase information. The image can be restored by reverse Fourier transform under limited u-v coverage.
The imaging process can be described as
where l (x, y) is the image directly restored by reverse Fourier transformation. F and F ^{−1} represent the process of Fourier transformation and reverse Fourier transformation, respectively. f (x, y) is the field of view captured by the system. S (u, v) represents the u-v coverage associated with the lenslets array. The u-v coverage of the lenslets array can be considered as a mask. The u-v coverage masks the spectrum formed by the Fourier transform of the observation field. Then, the restored image is obtained by applying an inverse Fourier transform to the retained spectral information. The AWG decomposes the input wide-spectrum optical signal into multiple narrow-spectrum signals. The number of spectral segments is equal to the number of output channels of the AWG. The central wavelengths of each spectral segment are used to calculate the frequency coverage.
In the u-v coverage, each point reflects the complex coherence information of the original observed target. The PIII system relies on lenslets to collect the light. Compared to a single large-aperture optical telescope, the PIII system has limited capacity for collecting light signals. Due to the incomplete sampling of spatial frequency information, the quality of the directly restored image is poor. Additionally, noise interference during imaging further degrades the image clarity and resolution. Therefore, the directly restored blurry images need to be processed by using an image reconstruction method to obtain high-quality images. The directly restored images are input into a pretrained model to reconstruct images.
The architecture of the CNN proved to be applicable in the field of image reconstruction, but the structure of a fully convolutional layer only carries out some feature learning and cannot solve problems such as noise and image smoothing. Although the use of a large convolution kernel can capture more information, it will greatly increase the computational burden. In the pursuit of a lightweight model, many papers show that a small convolution kernel can be used for multilayer training instead of a large convolution kernel [16]. Using a residual network structure is highly modular, with repeated building blocks stacked together. By enabling the training of deeper networks, ResNet can significantly increase the representational power of the architecture [17]. Due to incomplete sampling by lenslets array and processing by PIC cards, the reconstructed image is influenced by optical components and environmental factors.
The new architecture can be divided into three parts: coarse-level network, finer-level network and fusion learning network.
The structure of convolutional layers is employed in a coarse-level network. The first convolutional layer uses a 7 × 7 filter. A large kernel provides a larger receptive field, enabling the network to capture a broader context. The second layer uses a filter size of 5 × 5 to further learn complex features of images.
The finer-level network consists of two parts. At the start of this stage, coarse information begins to become more refined. The residual network structure is modified to fulfill the lightweight requirements of the algorithm [18]. The dropout layer can lead to better generalization and robustness of the network. Batch normalization layers are removed to enhance training speed. The modified residual block is shown in Fig. 3.
Following the residual network, an attention mechanism network is designed to extract more detailed information and remove certain noise. Block matching and 3D filtering (BM3D) have exceptional denoising capabilities and preserve fine details and edges. However, BM3D is computationally intensive, which makes it difficult for real-time applications [19]. A significant advantage of the PIII system is real-time and high-resolution imaging. The newly developed attentional mechanism architecture fulfills the system requirements. The architecture is shown in Fig. 4.
By applying a 1 × 1 convolutional filter, the network can reduce the number of output channels. A max-pooling layer is used to reduce the spatial dimensions, suppressing the less informative or noisy responses. Multiscale feature learning is applied to the extracted features. The features are sequentially subjected to upsampling and channel restoration. For the initial features, a connection can be established using a similar residual network, enabling the capture of more comprehensive features and the suppression of certain noise levels. In the finer network, the features of the original image are learned.
Features obtained from MRB and MCA in the finer feature learning network can be fused appropriately to preserve as many features as possible while reducing the impact of noise. Further learning is performed through multiple convolutional layers with a 3 × 3 filter size while gradually reducing the number of channels. The final two convolutional layers are designed to concurrently learn features and approximate a filtering operation.
The comprehensive network architecture is shown in Fig. 5. Earth observation is applicable to the functionalities of this system. The proposed model is trained on the iSAID dataset [20]. By simulating undersampling operations, the original images are transformed into blurred undersampled representations. A total of 212 pairs were used for training. Due to device requirements, the images are segmented into 160 × 160 small patches. The training dataset ultimately comprises 63,716 pairs. Color images are converted to grayscale to facilitate training and image reconstruction. The learning rate is adaptively tuned beginning from 1 × 10^{−4}. Due to device requirements, the images are segmented into 160 × 160 small patches [21].
MSE provides a clear measure of the intensity differences between corresponding pixels in the original and processed images. However, one significant limitation of MSE is its sensitivity to absolute pixel intensity differences, which can result in a lack of correlation with perceived visual quality. Unlike MSE, L1 loss tends to preserve edges and fine details. L1 loss provides robustness against outliers or noise. The loss function is defined as
Here f (x_{i}) is the output of the network, and γ_{i} is the original image label.
The L1 loss function is robust against outliers, but it may not be as sensitive as MSE in reducing high-frequency errors. MSE can provide smoother results to reconstruct detailed features. L1 loss contributes 10%, which appropriately enhances the model’s robustness.
To accurately evaluate the quality of restored images, quantitative analysis indicators are introduced.
The peak signal-to-noise ratio (PSNR) is an index for evaluating image quality based on pixel errors. The higher the PSNR is, the better the imaging quality is. Mean square error (MSE) based on image pixel statistics is used to evaluate image quality. peak represents the maximum brightness value. PSNR doesn’t comprehensively consider human visual characteristics. Thus, it can only be regarded as a rough estimation and cannot fully reflect the perception of image quality.
The structural similarity index (SSIM) is an index based on the human visual system. SSIM focuses on the perception of images and complies with human visual characteristics. It considers information such as image structure, color, brightness, etc., and can comprehensively reflect the quality of images. If the test image is denoted as I, and the original reference image as G, the definition is as follows:
where μ_{I} and μ_{G} are the average values of I and G,
We evaluate the performance of the proposed model with the iSAID dataset. The dataset consists of 36 images containing different scenarios, which is different from the images in the training set. The experiments were performed on a desktop with Xeon Silver 4210R CPU and NVIDIA 1080Ti GPU. To test the performance of the model, the TVAL3 algorithm and a CNN model previously proposed by our team are introduced to compare with the proposed model. The CNN reconstructs images through a series of convolutional layers and ReLU activation functions. Afterwards, the BM3D denoiser is used to reduce noise, smooth the images, and remove artifacts. The TVAL3 algorithm can be effectively applied to imaging of the PIII system, which requires some iterations. We choose several sets of images to compare the reconstruction results and show them in Fig. 6. The PSNRs and SSIMs for different scenarios are shown in Table 2.
TABLE 2 Comparison of different algorithms on the constructed dataset
Figure 6 | TVAL3 | CNN Model | Ours | |||
---|---|---|---|---|---|---|
PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | |
i(a)–i(d) | 16.24 | 0.5250 | 17.78 | 0.6274 | 18.10 | 0.6894 |
ii(a)–ii(d) | 15.19 | 0.4452 | 15.65 | 0.4953 | 16.33 | 0.6111 |
iii(a)–iii(d) | 14.83 | 0.3672 | 15.18 | 0.6066 | 19.19 | 0.7098 |
iv(a)–iv(d) | 13.28 | 0.4233 | 19.54 | 0.5487 | 20.40 | 0.6010 |
v(a)–v(d) | 13.90 | 0.5328 | 16.07 | 0.4523 | 16.14 | 0.6465 |
vi(a)–vi(d) | 14.81 | 0.4580 | 15.00 | 0.4796 | 17.29 | 0.5897 |
The algorithms based on deep learning can achieve better detail restoration. The training of the algorithm based on deep learning requires a substantial amount of time to learn how to restore image features and reduce noise. However, the trained weights can be adapted for various scenarios without extensive iterations. In the process of image reconstruction, traditional algorithms typically require numerous iterations to produce results. So, the time for reconstruction is much longer than that of newly proposed algorithm. The images reconstructed by the CNN model are too smooth and fail to capture effective detail information. Additionally, the brightness and contrast of images reconstructed by the CNN model are the lowest among the reconstruction methods. While training for the same epochs, the proposed model requires 404.412 minutes, and the CNN model needs 449.038 minutes. The maximum number of channels of the model is limited to 64. By limiting the number of channels and reducing unnecessary training operations, the lightweight requirements of the model can be ensured. Additionally, the trained model can be transferred to other devices for use without the need for retraining, which greatly reduces the operational burden on the devices. The new image reconstruction method has advantages in both training time and reconstructed image quality.
By designing a multi-scale convolutional attention module and a modified residual block, a novel image reconstruction algorithm based on deep learning has been proposed. The model underwent training in three distinct phases and effectively enhanced feature restoration and noise reduction capabilities. In line with the imaging process, an undersampled image dataset was created to facilitate efficient feature learning and improved evaluation. The results show that the image reconstruction method for PIII system based on deep learning performs well in both qualitative and quantitative ways.
The authors thank the Editor-in-Chief, the reviewers, the School of Mechanical and Aerospace Engineering of Jilin University and College of Mechanical Engineering and Automation of Fuzhou University for this work.
The authors received no financial support for the research, authorship, and/or publication of this paper.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data that supports the findings of this study is available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
Curr. Opt. Photon. 2024; 8(4): 391-398
Published online August 25, 2024 https://doi.org/10.3807/COPP.2024.8.4.391
Copyright © Optical Society of Korea.
Qianchen Xu^{1}, Weijie Chang^{2}, Feng Huang^{2} , Wang Zhang^{1}
^{1}School of Mechanical and Aerospace Engineering, Jilin University, Changchun 130025, China
^{2}College of Mechanical Engineering and Automation, Fuzhou University, Fuzhou 350108, China
Correspondence to:*huangf@fzu.edu.cn, ORCID 0000-0003-4652-4312
**wangzhang@jlu.edu.cn, ORCID 0000-0001-9029-1320
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
An image reconstruction algorithm is vital for the image quality of a photonic integrated interferometric imaging (PIII) system. However, image reconstruction algorithms have limitations that always lead to degraded image reconstruction. In this paper, a novel image reconstruction algorithm based on deep learning is proposed. Firstly, the principle of optical signal transmission through the PIII system is investigated. A dataset suitable for image reconstruction of the PIII system is constructed. Key aspects such as model and loss functions are compared and constructed to solve the problem of image blurring and noise influence. By comparing it with other algorithms, the proposed algorithm is verified to have good reconstruction results not only qualitatively but also quantitatively.
Keywords: Deep learning, Image reconstruction, Optical imaging, Optical interferometry, Photonic integrated circuits
A photonic integrated circuit (PIC)-based interferometric imaging system has been developed by the Lockheed Martin Center and UC Davis for a new generation of optical imaging systems [1]. The photonic integrated interferometric imaging (PIII) system has many outstanding features, such as light weight, small size, high resolution, and low power consumption [1, 2]. In contrast with traditional optical imaging systems, the PIII system incorporates photonic integrated devices. By modulating and processing the optical signals through a PIC card, the size of the optical system can be greatly reduced, and the system can be highly integrated [2].
Sampling coverage is incomplete due to the structure of the PIII system, and the image directly restored by the system has some problems, such as imaging blurring and noise. Some classical algorithms are applied in the field of image reconstruction, such as the clean algorithm (Högbom CLEAN), and the total variational algorithm based on compressed sensing (TVAL3) [3, 4]. Although traditional image reconstruction algorithms can improve image quality, they need many iterations to reconstruct images, which leads to a long reconstruction time. The traditional algorithms still have limitations. With rapid developments in the field of computer science, image processing algorithms such as super-resolution and denoising based on deep learning have shown strong advantages. In this paper, a novel image-reconstruction algorithm for the PIII system is proposed. The algorithm proposes a novel architecture that balances the restoration of image details and denoising. We design a loss function to ensure the smoothness of the reconstructed images. A comparison of image quality between the traditional algorithm and the proposed algorithm confirms the superiority of the reconstruction algorithm. The structure of the PIII system is shown in Fig. 1.
The imaging principle of the PIII system is based on the Van Cittert–Zernike theorem. The PIII system uses sub-aperture interferometry of light emitted from incoherent target sources. The measured interference fringes can be used to extract the amplitude and phase information of the complex visibility [5]. Due to the constraints imposed by the arrangement of lenslets in the structure, a sparse sampling of spatial frequency will be acquired. However, the image restored in this way has defects of low resolution and missing content [6].
Spatial frequency has a linear relationship with the baselines [5–7].
where Δx and Δy represent the distance between apertures. λ̅ represents wavelengths, and z represents the distance from the target to the lenslet. B represents the length of the baseline. f represents spatial frequency.
The distance between the paired lenslets equals the length of the baseline B, which determines the resolution of the system. The longer the baseline B, the higher the spatial frequency f will be collected. Moreover, detailed information about the target can be obtained. The paired lenslets used to form short baselines will collect lower spatial frequencies, which can obtain the overall contour information of the target [8, 9].
Figure 2 shows the function and pairing method of the PIII system. The lenslets located at the top of the system collects light from the observed scene. A pair of lenslets on the same interference arm constitutes an interference baseline. Then, the light signal is coupled into the optical waveguide. The waveguide array collects light from a different field of view. A silicon-based photonic integrated circuit (PIC) chip is used for signal transmission [9]. The PIC integrates different optical devices, including an arrayed waveguide grating (AWG), phase modulator, and 90° optical hybrid. The AWG decomposes a wide spectral band to output multiple narrow spectral bands. The phase modulator adjusts the phase of the light signals in the two input waveguides to ensure the coherence of the beam [10]. A 90° hybrid mixer combines the two input light signals for interference.
The balanced detector denoises and differentially amplifies the IQ signals input from the 90° hybrid mixer. Then, the complex coherence coefficient of the interference fringes is calculated. According to the Van Cittert–Zernike theory, the mutual coherent intensity is equivalent to the complex amplitude of the interference fringe. By applying inverse Fourier transform to the complex amplitude, the brightness distribution of the observed target can be restored [11–15]. This is the theoretical basis of the PIII system.
The wavelength range will affect the imaging resolution of the PIII system. The relationship between resolution, wavelength, and the maximum baseline of the system can be represented as
where r represents the resolution of the system, z is the observation distance, and λ is wavelength range.
To simulate the PIII system, necessary parameters need to be configured. Table 1 lists the parameters required for the imaging simulation of the PIII system.
TABLE 1. System parameters for simulation.
Parameters | Value |
---|---|
Wavelength Range (nm) | 800–1,600 |
Number of Spectral Segments | 8 |
Observation Distance (km) | 100 |
Lenslets Size (mm) | 6 |
Number of Interference Arms | 35 |
Number of Single-arm Lenslets | 38 |
Images with strong contrast can be generated in the short-wave infrared band. The band is suitable for long-distance observation and has strong penetration power, which is consistent with the use scenario of the PIII system. Therefore, the wavelength range should be set within the short-wave infrared band. For components in a silicon-based photonic chip, the operating loss is lower around the center wavelength of 1,550 nm, which enhances the system’s imaging performance. Considering the practical operating conditions of the system and the optimal operating wavelength of silicon photonic devices, the wavelength range is ultimately defined as 800–1,600 nm [10].
Based on the previous analysis of the PIII system, the lenslets affect the acquisition of spatial frequency information, that is, u-v coverage.
This paper provides a simulation of the imaging process of the PIII system and a novel image reconstruction method based on deep learning for the PIII system. The method of direct image restoration, the design of the model and the training process are performed and analyzed in detail.
The u-v coverage describes the acquisition of spatial frequency information by the integrated interference system. In the formed u-v coverage, low-frequency points are distributed in the inner region. The frequency gradually increases from the center outward. The system collects light signals into planar waveguides through lenslets. In interferometry theory, two paired apertures will produce interference fringes in the image plane. The system processes the optical signals and calculates the amplitude and phase information extracted from the interference fringes. Each point in the u-v coverage collected by paired lenslets contains both amplitude and phase information. The image can be restored by reverse Fourier transform under limited u-v coverage.
The imaging process can be described as
where l (x, y) is the image directly restored by reverse Fourier transformation. F and F ^{−1} represent the process of Fourier transformation and reverse Fourier transformation, respectively. f (x, y) is the field of view captured by the system. S (u, v) represents the u-v coverage associated with the lenslets array. The u-v coverage of the lenslets array can be considered as a mask. The u-v coverage masks the spectrum formed by the Fourier transform of the observation field. Then, the restored image is obtained by applying an inverse Fourier transform to the retained spectral information. The AWG decomposes the input wide-spectrum optical signal into multiple narrow-spectrum signals. The number of spectral segments is equal to the number of output channels of the AWG. The central wavelengths of each spectral segment are used to calculate the frequency coverage.
In the u-v coverage, each point reflects the complex coherence information of the original observed target. The PIII system relies on lenslets to collect the light. Compared to a single large-aperture optical telescope, the PIII system has limited capacity for collecting light signals. Due to the incomplete sampling of spatial frequency information, the quality of the directly restored image is poor. Additionally, noise interference during imaging further degrades the image clarity and resolution. Therefore, the directly restored blurry images need to be processed by using an image reconstruction method to obtain high-quality images. The directly restored images are input into a pretrained model to reconstruct images.
The architecture of the CNN proved to be applicable in the field of image reconstruction, but the structure of a fully convolutional layer only carries out some feature learning and cannot solve problems such as noise and image smoothing. Although the use of a large convolution kernel can capture more information, it will greatly increase the computational burden. In the pursuit of a lightweight model, many papers show that a small convolution kernel can be used for multilayer training instead of a large convolution kernel [16]. Using a residual network structure is highly modular, with repeated building blocks stacked together. By enabling the training of deeper networks, ResNet can significantly increase the representational power of the architecture [17]. Due to incomplete sampling by lenslets array and processing by PIC cards, the reconstructed image is influenced by optical components and environmental factors.
The new architecture can be divided into three parts: coarse-level network, finer-level network and fusion learning network.
The structure of convolutional layers is employed in a coarse-level network. The first convolutional layer uses a 7 × 7 filter. A large kernel provides a larger receptive field, enabling the network to capture a broader context. The second layer uses a filter size of 5 × 5 to further learn complex features of images.
The finer-level network consists of two parts. At the start of this stage, coarse information begins to become more refined. The residual network structure is modified to fulfill the lightweight requirements of the algorithm [18]. The dropout layer can lead to better generalization and robustness of the network. Batch normalization layers are removed to enhance training speed. The modified residual block is shown in Fig. 3.
Following the residual network, an attention mechanism network is designed to extract more detailed information and remove certain noise. Block matching and 3D filtering (BM3D) have exceptional denoising capabilities and preserve fine details and edges. However, BM3D is computationally intensive, which makes it difficult for real-time applications [19]. A significant advantage of the PIII system is real-time and high-resolution imaging. The newly developed attentional mechanism architecture fulfills the system requirements. The architecture is shown in Fig. 4.
By applying a 1 × 1 convolutional filter, the network can reduce the number of output channels. A max-pooling layer is used to reduce the spatial dimensions, suppressing the less informative or noisy responses. Multiscale feature learning is applied to the extracted features. The features are sequentially subjected to upsampling and channel restoration. For the initial features, a connection can be established using a similar residual network, enabling the capture of more comprehensive features and the suppression of certain noise levels. In the finer network, the features of the original image are learned.
Features obtained from MRB and MCA in the finer feature learning network can be fused appropriately to preserve as many features as possible while reducing the impact of noise. Further learning is performed through multiple convolutional layers with a 3 × 3 filter size while gradually reducing the number of channels. The final two convolutional layers are designed to concurrently learn features and approximate a filtering operation.
The comprehensive network architecture is shown in Fig. 5. Earth observation is applicable to the functionalities of this system. The proposed model is trained on the iSAID dataset [20]. By simulating undersampling operations, the original images are transformed into blurred undersampled representations. A total of 212 pairs were used for training. Due to device requirements, the images are segmented into 160 × 160 small patches. The training dataset ultimately comprises 63,716 pairs. Color images are converted to grayscale to facilitate training and image reconstruction. The learning rate is adaptively tuned beginning from 1 × 10^{−4}. Due to device requirements, the images are segmented into 160 × 160 small patches [21].
MSE provides a clear measure of the intensity differences between corresponding pixels in the original and processed images. However, one significant limitation of MSE is its sensitivity to absolute pixel intensity differences, which can result in a lack of correlation with perceived visual quality. Unlike MSE, L1 loss tends to preserve edges and fine details. L1 loss provides robustness against outliers or noise. The loss function is defined as
Here f (x_{i}) is the output of the network, and γ_{i} is the original image label.
The L1 loss function is robust against outliers, but it may not be as sensitive as MSE in reducing high-frequency errors. MSE can provide smoother results to reconstruct detailed features. L1 loss contributes 10%, which appropriately enhances the model’s robustness.
To accurately evaluate the quality of restored images, quantitative analysis indicators are introduced.
The peak signal-to-noise ratio (PSNR) is an index for evaluating image quality based on pixel errors. The higher the PSNR is, the better the imaging quality is. Mean square error (MSE) based on image pixel statistics is used to evaluate image quality. peak represents the maximum brightness value. PSNR doesn’t comprehensively consider human visual characteristics. Thus, it can only be regarded as a rough estimation and cannot fully reflect the perception of image quality.
The structural similarity index (SSIM) is an index based on the human visual system. SSIM focuses on the perception of images and complies with human visual characteristics. It considers information such as image structure, color, brightness, etc., and can comprehensively reflect the quality of images. If the test image is denoted as I, and the original reference image as G, the definition is as follows:
where μ_{I} and μ_{G} are the average values of I and G,
We evaluate the performance of the proposed model with the iSAID dataset. The dataset consists of 36 images containing different scenarios, which is different from the images in the training set. The experiments were performed on a desktop with Xeon Silver 4210R CPU and NVIDIA 1080Ti GPU. To test the performance of the model, the TVAL3 algorithm and a CNN model previously proposed by our team are introduced to compare with the proposed model. The CNN reconstructs images through a series of convolutional layers and ReLU activation functions. Afterwards, the BM3D denoiser is used to reduce noise, smooth the images, and remove artifacts. The TVAL3 algorithm can be effectively applied to imaging of the PIII system, which requires some iterations. We choose several sets of images to compare the reconstruction results and show them in Fig. 6. The PSNRs and SSIMs for different scenarios are shown in Table 2.
TABLE 2. Comparison of different algorithms on the constructed dataset.
Figure 6 | TVAL3 | CNN Model | Ours | |||
---|---|---|---|---|---|---|
PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | |
i(a)–i(d) | 16.24 | 0.5250 | 17.78 | 0.6274 | 18.10 | 0.6894 |
ii(a)–ii(d) | 15.19 | 0.4452 | 15.65 | 0.4953 | 16.33 | 0.6111 |
iii(a)–iii(d) | 14.83 | 0.3672 | 15.18 | 0.6066 | 19.19 | 0.7098 |
iv(a)–iv(d) | 13.28 | 0.4233 | 19.54 | 0.5487 | 20.40 | 0.6010 |
v(a)–v(d) | 13.90 | 0.5328 | 16.07 | 0.4523 | 16.14 | 0.6465 |
vi(a)–vi(d) | 14.81 | 0.4580 | 15.00 | 0.4796 | 17.29 | 0.5897 |
The algorithms based on deep learning can achieve better detail restoration. The training of the algorithm based on deep learning requires a substantial amount of time to learn how to restore image features and reduce noise. However, the trained weights can be adapted for various scenarios without extensive iterations. In the process of image reconstruction, traditional algorithms typically require numerous iterations to produce results. So, the time for reconstruction is much longer than that of newly proposed algorithm. The images reconstructed by the CNN model are too smooth and fail to capture effective detail information. Additionally, the brightness and contrast of images reconstructed by the CNN model are the lowest among the reconstruction methods. While training for the same epochs, the proposed model requires 404.412 minutes, and the CNN model needs 449.038 minutes. The maximum number of channels of the model is limited to 64. By limiting the number of channels and reducing unnecessary training operations, the lightweight requirements of the model can be ensured. Additionally, the trained model can be transferred to other devices for use without the need for retraining, which greatly reduces the operational burden on the devices. The new image reconstruction method has advantages in both training time and reconstructed image quality.
By designing a multi-scale convolutional attention module and a modified residual block, a novel image reconstruction algorithm based on deep learning has been proposed. The model underwent training in three distinct phases and effectively enhanced feature restoration and noise reduction capabilities. In line with the imaging process, an undersampled image dataset was created to facilitate efficient feature learning and improved evaluation. The results show that the image reconstruction method for PIII system based on deep learning performs well in both qualitative and quantitative ways.
The authors thank the Editor-in-Chief, the reviewers, the School of Mechanical and Aerospace Engineering of Jilin University and College of Mechanical Engineering and Automation of Fuzhou University for this work.
The authors received no financial support for the research, authorship, and/or publication of this paper.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data that supports the findings of this study is available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
TABLE 1 System parameters for simulation
Parameters | Value |
---|---|
Wavelength Range (nm) | 800–1,600 |
Number of Spectral Segments | 8 |
Observation Distance (km) | 100 |
Lenslets Size (mm) | 6 |
Number of Interference Arms | 35 |
Number of Single-arm Lenslets | 38 |
TABLE 2 Comparison of different algorithms on the constructed dataset
Figure 6 | TVAL3 | CNN Model | Ours | |||
---|---|---|---|---|---|---|
PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | |
i(a)–i(d) | 16.24 | 0.5250 | 17.78 | 0.6274 | 18.10 | 0.6894 |
ii(a)–ii(d) | 15.19 | 0.4452 | 15.65 | 0.4953 | 16.33 | 0.6111 |
iii(a)–iii(d) | 14.83 | 0.3672 | 15.18 | 0.6066 | 19.19 | 0.7098 |
iv(a)–iv(d) | 13.28 | 0.4233 | 19.54 | 0.5487 | 20.40 | 0.6010 |
v(a)–v(d) | 13.90 | 0.5328 | 16.07 | 0.4523 | 16.14 | 0.6465 |
vi(a)–vi(d) | 14.81 | 0.4580 | 15.00 | 0.4796 | 17.29 | 0.5897 |