검색
검색 팝업 닫기

Ex) Article Title, Author, Keywords

Article

Split Viewer

Research Paper

Curr. Opt. Photon. 2024; 8(3): 215-224

Published online June 25, 2024 https://doi.org/10.3807/COPP.2024.8.3.215

Copyright © Optical Society of Korea.

Restoring Turbulent Images Based on an Adaptive Feature-fusion Multi-input—Multi-output Dense U-shaped Network

Haiqiang Qian, Leihong Zhang, Dawei Zhang, Kaimin Wang

Engineering Research Center of Optical Instrument and System, Ministry of Education and Shanghai Key Lab of Modern Optical System, University of Shanghai for Science and Technology, 200093 Shanghai, China

Corresponding author: *km_wang@usst.edu.cn, ORCID 0000-0001-8424-0332

Received: November 3, 2023; Revised: January 17, 2024; Accepted: February 23, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

In medium- and long-range optical imaging systems, atmospheric turbulence causes blurring and distortion of images, resulting in loss of image information. An image-restoration method based on an adaptive feature-fusion multi-input—multi-output (MIMO) dense U-shaped network (Unet) is proposed, to restore a single image degraded by atmospheric turbulence. The network’s model is based on the MIMO-Unet framework and incorporates patch-embedding shallow-convolution modules. These modules help in extracting shallow features of images and facilitate the processing of the multi-input dense encoding modules that follow. The combination of these modules improves the model’s ability to analyze and extract features effectively. An asymmetric feature-fusion module is utilized to combine encoded features at varying scales, facilitating the feature reconstruction of the subsequent multi-output decoding modules for restoration of turbulence-degraded images. Based on experimental results, the adaptive feature-fusion MIMO dense U-shaped network outperforms traditional restoration methods, CMFNet network models, and standard MIMO-Unet network models, in terms of image-quality restoration. It effectively minimizes geometric deformation and blurring of images.

Keywords: Asymmetric feature fusion, Image reconstruction, Multi-scale features, Turbulence image restoration, U-shaped network

OCIS codes: (010.1290) Atmospheric optics; (010.1330) Atmospheric turbulence; (100.3020) Image reconstruction-restoration

In recent years, astronomical imaging systems [1], space target recognition [2], and ghost imaging [3] have developed rapidly. However, the disturbance effect of the atmosphere on an optical system has not been completely resolved. Atmospheric motion affects the results of remote optical imaging systems, since it involves the transmission path of any optical system. Atmospheric turbulence is a significant type of motion due to various factors, such as solar radiation, surface radiation, thermal convection, and human activities that can cause irregular fluctuations in the temperature of the atmosphere. Consequently, these changes in temperature lead to random variations in the physical properties of the atmosphere, such as pressure, density, speed, and direction. Therefore, atmospheric turbulence is characterized by this irregular motion. When there is atmospheric turbulence, the way that light moves through an optical system becomes unstable, resulting in blurry images of the target object. Additionally, the turbulence can cause uneven distortions [4] in the image, making it challenging to extract information or identify features. To improve the quality of imaging, it is essential to process these images that have been degraded by atmospheric turbulence.

Early research focused on minimizing the impact of atmospheric turbulence on optical systems by altering the transmission path of light waves. To achieve this, observation systems were established at high altitudes and regions with thin clouds, but the issue of atmospheric turbulence persisted, and deployment of the observation system was limited by its environment. With the development of hardware-based adaptive-optics technology, an adaptive-optics system can now dynamically compensate and correct wavefront distortion caused by atmospheric turbulence, by connecting the wavefront sensor to a deformable mirror. Although adding more wavefront sensors and deformable mirrors can compensate for atmospheric-turbulence-induced wavefront distortion, it has drawbacks such as increased system cost and reduced reliability, and it cannot completely eliminate wavefront distortion.

The maturity of software-based digital image-processing technology is steadily increasing. Numerous classical image-restoration algorithms have the capability to partially restore original images affected by atmospheric turbulence. Furthermore, newer and more advanced image-restoration algorithms are consistently being developed. In 2013, Zhu and Milanfar [5] proposed a B-spline-based nonrigid image-registration technology capable of correcting geometric distortion in images. In 2018, Lau et al. [6] proposed a variational model that combines frame subsampling and clear image extraction, which significantly improves the quality of reconstructed images. Despite achieving some results, these image-restoration algorithms still fall short in effectively restoring images degraded by atmospheric turbulence.

In recent years, the rapid development of deep learning has led to significant advancements in the field of image-information extraction, particularly through the utilization of deep convolutional neural networks. Consequently, numerous researchers have started employing these networks to process images that have been degraded by atmospheric turbulence. In 2021, Lau et al. [7] introduced a novel single-frame-recovery generation algorithm that utilizes two GAN generators to address turbulence-induced deformation and blur. In that same year, Jin et al. [8] introduced TSR-WGAN, a generative adversarial network specifically designed to restore sequence data degraded by turbulence. In 2022, Rai and Jawahar [9] designed two two-level deep learning networks, namely DT-GAN+ and DTD-GAN+, specifically focusing on addressing geometric distortion and blur in turbulent images. The first-level network employs WarpNet to rectify geometric distortion, while the second-level network utilizes ColorNet to eliminate image blurriness, thereby enhancing image quality. The training process of a GAN-type network is frequently challenging, leading to issues in the generated images such as unnatural blur and distortion. In recent years, Unet-type neural networks have demonstrated promising outcomes in image restoration, offering the potential to effectively address atmospheric turbulence and reconstruct degraded images.

This paper presents an adaptive feature-fusion multi-input—multi-output dense U-shaped network (AFMIMO-DenseUnet). The proposed network leverages an end-to-end processing mechanism. It takes both the simulated turbulence-degradation image and the corresponding true-value image as input. The input image is scaled to three different sizes, and then encoded individually by encoders operating at distinct scales. Global and local features are extracted, followed by the fusion of the encoding features. By performing encoder processing at multiple scales, the network effectively suppresses atmospheric turbulence at a finer level, resulting in enhanced image restoration. This makes the network model proposed in this article very attractive for suppressing turbulence effects in optical systems.

2.1. Model for Degradation due to Atmospheric Turbulence

Geometric distortion and blurring of long-distance imaging are primarily induced by atmospheric turbulence. The model that describes the degradation process of atmospheric turbulence is expressed as follows [10]:

H=D(G(I))+ε.

Among these elements, H represents the image degraded by atmospheric turbulence, I represents the original clear image, G represents the optical-blur operator, D represents the geometric distortion operator, and ε represents various additive noises.

2.2. Turbulence Dataset

Traditional turbulence-restoration methods primarily involve manual parameter adjustments to achieve an approximately optimal solution. The requirement for real turbulence data is minimal, with low acquisition costs. A deep neural network can automatically discover an approximately optimal solution for turbulence restoration, without the need for manual parameter adjustments, saving time in the process. However, deep learning necessitates a significant amount of real turbulence data for training, resulting in additional costs. To mitigate the lack of available data, a viable solution is to train the network using simulated turbulence datasets, and validate its performance with real turbulence data.

2.2.1. Simulated Turbulence Dataset

The performance of a turbulence-restoration method using deep learning is intricately linked to the input simulation data. Greater similarity between simulated and actual turbulence data results in improved network performance. Currently, numerous scholars have conducted extensive research on simulating turbulence. Notably, the turbulence simulator developed by Chimitt and Chan [11] is eminently suitable for meeting the training requirements of turbulence-restoration networks. This is mainly because Chimitt and Chan [11] utilize a formula grounded in physical laws to simulate turbulence, and their work focuses on establishing the equivalence between the arrival-angle correlation proposed by Basu et al. [12] and the multiaperture correlation introduced by Chanan [13]. Table 1 presents the primary parameter configurations utilized in the turbulence simulator.

TABLE 1 Main parameters of the turbulence simulator

ParameterValue
Aperture Diameter (m)0.054
Wavelength (nm)525
Refractive-index Structure Constant7 × 10−16, 1 × 10−15, 3 × 10−15 m−2/3
Focal Length (m)0.3
Path Length (m)1,000


Once the parameters of the turbulence simulator have been defined, the input data must be provided to generate the required dataset to train the turbulence-restoration network. The input data is sourced from the dataset called Place2, which was developed by Zhou et al. [14]. The Place2 dataset consists of over 10 million images, belonging to more than 400 distinct scene categories. The data belonging to the selected scene category of outdoor buildings are fed into the turbulence simulator. The presence of boundary lines on the buildings aids in the characterization of the geometric distortion caused by turbulence. The turbulence simulator is limited to processing grayscale images, so prior to input into the turbulence simulator with the specified parameters, these data must undergo preprocessing to convert them to grayscale. Figure 1 displays the resulting turbulent image obtained from the simulation process.

Figure 1.Turbulent-degradation images of three different refractive-index structure constants. (a) Original, (b) Cn2 = 7 × 10−16 m−2/3, (c) Cn2 = 1 × 10−15 m−2/3, and (d) Cn2 = 3 × 10−15 m−2/3.

2.2.2. Real Turbulence Dataset

The study utilizes real turbulence data from the BVI-CLEAR dataset developed by Anantrasirichar and Bull [15], which consists of two types of atmospheric turbulence sequences: synthetic and real. Real data sequences are utilized as input data for evaluating the performance of the turbulence-restoration network. The real data sequences are categorized into three turbulence scenes: mirage, monument, and moving car. The real datasets were captured under arid desert conditions using a camera equipped with a 400-mm lens (Canon EOS-1D Mark IV; Canon, Tokyo, Japan), at a temperature of 46 ℃. Additionally, the actual data sequence requires clipping and grayscale preprocessing. The processed output is depicted in Fig. 2.

Figure 2.Real data sequences from the BVI-CLEAR dataset. (a) Mirage, (b) monument, and (c) moving car.

3.1. Overall Structure of the Network Model

This paper presents the AFMIMO-DenseUnet model as a solution to the simultaneous removal of geometric distortion and blur in the task of turbulence restoration. The overall architecture of AFMIMO-DenseUnet is illustrated in Fig. 3. The AFMIMO-DenseUnet architecture is based on Cho et al.’s MIMO-UNet [16], and is specifically designed to efficiently remove multiscale turbulence. AFMIMO-DenseUnet consists of essential components, including a multi-input single dense encoder (MISDE), a multi-output single dense decoder (MOSDD), a patch-embedding shallow-convolution module (PESCM), a feature-attention module (FAM), and an asymmetric feature-fusion module (ASFF). The input of AFMIMO-DenseUnet is single-channel grayscale turbulence-degraded images with a resolution of 256 × 256. The output is single-channel turbulence-restored images of the same size.

Figure 3.Structure of the AFMIMO-DenseUnet model.

The AFMIMO-DenseUnet architecture consists of multiscale dense-encoder and multiscale dense-decoder modules, incorporating three sets of dense encoding blocks (DEBs) and dense decoding blocks (DDBs) at different scales. This design enables improved handling of turbulence distortion in various regions of the image.

Initially, the network performs downsampling on the input 256 × 256 images to generate turbulence-degradation images at two different scales: 128 × 128 and 64 × 64. The original turbulence-degradation image with a scale of 256 × 256 is input into the DEB that corresponds to the original large scale, enabling the extraction of encoded features with 32 channels and 256 × 256 scale. The PESCM is utilized to extract 64 channels of turbulence-degraded image features at a scale of 128 × 128 from the original image. The FAM is employed to combine the features extracted from PESCM with those generated by the original large-scale encoder.

Subsequently, the combined features are input into the corresponding medium-scale DEB in the network, to process and extract 64 channels of encoded features at a scale of 128 × 128. The original large-scale image applies the PESCM to extract the features of turbulence degraded images with 128 channels and a scale of 64 × 64. The features extracted from the PESCM and the features generated from the medium-scale DEB are combined using FAM. Subsequently, the combined features are input into the corresponding small-scale DEB for processing, extracting encoded features with 128 channels and a scale of 64 × 64.

To achieve effective multiscale feature fusion in the network, this study initially feeds the features processed by three DEBs into three ASFFs that correspond to various scales. The features obtained from ASFF processing at various scales exhibit consistency with the output features of the respective DEB scale. By combining the output features of ASFF and DEB at small scales along the channel dimension, the resultant merged features are subsequently fed into a small-scale DDB for further processing. This allows for the extraction of decoding features with 128 channels and a scale of 64 × 64.

The features generated by the small-scale DDB are combined with those produced by the medium-scale ASFF. Subsequently, these concatenated features are fed into the medium-scale DDB for processing, allowing the extraction of decoding features with 64 channels and a scale of 128 × 128. Similar to the medium-scale DDB, the large-scale DDB follows analogous input and output processing, resulting in the extraction of decoding features with 32 channels and a scale of 256 × 256. The output features of the three DDBs are individually subjected to convolution to extract detail-restoration features at three different scales: 64 × 64, 128 × 128, and 256 × 256, all represented by a single channel. Subsequently, these features are incorporated into the respective scaled, blurred images to generate turbulence-restored images at the aforementioned three scales.

3.2. Multi-input Single Dense Encoder and Multi-output Single Dense Decoder

The multi-input-multi-output architecture of AFMIMO-DenseUnet enables improved capture of the correlations between input data and output predictions, thereby enhancing network performance. To further optimize the neural network’s performance, it is essential to redefine the basic unit module, ensuring that it effectively leverages the advantages of the network architecture. The multiscale encoder and multiscale decoder of AFMIMO-DenseUnet are collectively called the multi-input—multi-output dense encoder and decoder (MIMODED). Despite that their functions are distinct, the basic unit modules inside them are the identical. The conventional basic unit module typically consists of a residual block, which employs a standard convolution. However, this approach struggles to capture the global differences in information between blurred and clear image pairs, predominantly emphasizing local details. In this study, the devised fundamental building block is referred to as a Res dense block, depicted in Fig. 4. It is primarily derived from RDN residual dense block proposed by of Zhang et al. [17], with some modifications implemented. Nevertheless dense connections are retained, due to their ability to effectively mitigate the issue of vanishing gradients and enhance feature propagation. Simultaneously, the conventional convolution operation is substituted with a deep over-parametrized convolution (Do-Conv) [18]. Over-parametrization is a technique employed to expedite the training process of deep networks, leading to performance enhancements in the restoration of turbulent images without introducing additional parameters.

Figure 4.Structure of a single dense encoding block and decoding block.

3.3. Asymmetric Feature-fusion Module

The feature-fusion method plays a crucial role in maximizing the performance of the MISDE and the MOSDD. A conventional feature-fusion module initially performs upsampling and downsampling on the output features of encoders at different scales, subsequently combines the processed features along the channel dimension, and finally obtains fused features through convolutional processing. However, this feature-fusion method fails to adequately consider the relative importance of different scale features in the fused representation. This study investigates the adaptive feature-fusion module introduced by Liu et al. [19] in YOLOv3, and proposes three ASFFs of varying scales. As illustrated in Fig. 5, the conventional convolution in the module is substituted with Do-Conv. The ASFF initially resizes the feature dimensions of the dense-encoder outputs at three different scales, to align them with the dense-decoder inputs of corresponding scale. Subsequently, by means of training, the model learns the weight allocations among the features at different scales during fusion, aiming to identify the optimal feature-fusion method and to effectively leverage the capabilities of both dense encoder and decoder modules.

Figure 5.Structure of the multiscale asymmetric feature-fusion module (ASFF). (a) ASFF1, (b) ASFF2, and (c) ASFF3.

3.4. Patch-embedding Shallow-convolution Module

The input turbulence-degraded image of AFMIMO-DenseUnet needs to undergo shallow feature extraction using convolution to generate medium-scale and small-scale features specific to the turbulence degradation, which are subsequently fed to the medium-scale and small-scale DEBs. This study refers to the patch-embedding method employed in Swin Transformer by Liu et al. [20] and designs a PESCM that is applicable to images, as illustrated in Fig. 6. The PESCM utilizes a two-dimensional convolution operation with uniform kernel size and stride to downsampe the image. Additionally, layer normalization is applied to process the channel dimension of the image. The resulting image features are beneficial for DEB handling.

Figure 6.Structure of the patch-embedding shallow-convolution module (PESCM). (a) PESCM2, (b) PESCM1.

3.5. Loss Function

In this study, k ∈ {0, ..., K − 1}, Ŝk, Sk and ε are employed to respectively denote the k th level, the k th reconstructed image, the k th real clear image, and the constant value of 10−3 in AFMIMO-DenseUnet. Three loss functions are employed to train AFMIMO-DenseUnet:

Step 1. Charbonnier loss [21]:

Lchar=S^kSk2+ε2.

Step 2. Edge loss [21]:

Ledge =ΔS^kΔSk2+ε2,

where ∆ represents the Laplacian operator, and

Step 3. Frequency-reconstruction loss [16], which assesses the frequency discrepancy between reconstructed images and real clear images:

Lfreq=FTS^k FT(S)1,

where FT represents the fast Fourier transform (FFT) operation. Ultimately the loss function for AFMIMO-DenseUnet can be defined as:

L= k=0 K1L char+α1L edge+α2L freq,

where α1 and α2 are tradeoff parameters that are empirically set to 0.05 and 0.01 respectively.

4.1. Datasets and Implementation Details

This study prepares two sets of data for experiments, based on the data of three different values of refractive-index structure constant generated by the turbulence simulator mentioned above. The first dataset consists of 5,000 original, clear grayscale images and 5,000 turbulence simulation images with a refractive-index structure constant of 1 × 10−15 m−2⁄3. This dataset comprises a total of 5,000 pairs of size 256 × 256 pixels. Among the data, 3,000 pairs are allocated for training, 1,000 pairs for validation, and the remaining 1,000 pairs for testing purposes. The second dataset consists of 5,000 original, clear grayscale images and 5,000 simulated images with various levels of turbulence intensity. Specifically, there are 1,000 images with a turbulence intensity of 7 × 10−16 m−2⁄3, 2,000 images with a turbulence intensity of 1 × 10−15 m−2⁄3, and 2,000 images with a turbulence intensity of 3 × 10−15 m−2⁄3. Similar to the first dataset, 3,000 pairs of images are selected as the training set, with varying turbulence intensity. Among these, 3,000 simulated turbulence pictures have different turbulence intensities, with 600 images with a turbulence intensity of 7 × 10−16 m−2⁄3, 1,200 images with a turbulence intensity of 1 × 10−15 m−2⁄3, and 1,200 images with a turbulence intensity of 3 × 10−15 m−2⁄3. Both the validation set and test set still consist of 1,000 pairs of images, with the distribution of different turbulence intensities being consistent with that of the training set.

The network-training hyperparameters in this study are as follows: The image block size is set to 256 × 256, which matches the input image size to effectively capture the global information. The batch size is set to 3, the number of training epochs is set to 3,000, and the optimizer utilizes the Adam [22] strategy. The learning rate is initially set to 2 × 10−4 and decreases steadily to 1 × 10−6 during the training process, using the cosine annealing strategy [23]. Data-augmentation techniques are applied, including random horizontal or vertical flipping of each image patch. The experiment in this study is performed on a Windows 10 system equipped with an NVIDIA GeForce RTX 4070TI graphics card and Intel i5-13600k processor.

4.2. Evaluation Metrics

This study employs the peak signal-to-noise ratio (PSNR) as a metric to assess the quality of turbulent-image restoration. PSNR is defined as the ratio of the maximum pixel value in the image to the mean square error. A higher PSNR value indicates a closer resemblance between the restored image and the original image. Additionally, to account for human perception, this paper incorporates Structural Similarity (SSIM) as an evaluation metric, which quantifies the similarity between two images. The range of values for SSIM is [0, 1], with a larger value indicating greater similarity between the images, implying a stronger resemblance. In addition to PSNR and SSIM, this work also utilizes two image-quality-assessment metrics, namely the information fidelity criterion (IFC) [24] and the feature similarity index (FSIM) [25]. IFC, based on information theory, measures the information loss between the processed image and the original image; a higher IFC value indicates less information loss between the evaluated image and the original image. FSIM combines low- and high-level image features to evaluate the extent of feature preservation in the processed image; A higher FSIM value indicates higher similarity in features between the evaluated image and the original image.

4.3. Ablation Study

This work aims to experimentally analyze the impact of the AFMIMO-DenseUnet network architecture on simulating turbulence datasets. First, an appropriate dataset is selected. The initial dataset consists of turbulence data with a consistent intensity, which is more appropriate for analyzing the architectural performance of AFMIMO-DenseUnet. Second, this study adopts the MIMO-UNet [16], replacing the conventional convolution with DO-Conv, and employs it as a reference benchmark. The work initially examines the performance of the PESCM on the baseline network, followed by an evaluation of the performance of the ASFF, both on the baseline network. Additionally, the effectiveness of the MIMODED on the benchmark network is individually examined. Next, this paper evaluates the performance of AFMIMO-Unet, which is a variation of the MIMO-Unet model that incorporates modifications to the architecture between the encoder and decoder by replacing two feature-fusion modules with three ASFF modules. AFMIMO-Unet can be considered as an alternative to AFMIMO-DenseUnet, without employing the improved MIMODED and PESCM. Additionally, the performance of PESCM and MIMODED is individually validated using the AFMIMO-Unet framework. Finally, the performance of the proposed AFMIMO-DenseUnet is comprehensively verified. The results of all experiments for turbulence removal and image restoration are shown in Fig. 7, while the performance-evaluation results of each experiment are presented in Table 2.

Figure 7.The original turbulent images and zoomed-in patches are shown. For each image, from top left to bottom right are (a) the turbulent image, (b) results obtained through MIMO-UNet+DO-Conv, (c) MIMO-UNet+PESCM, (d) MIMO-UNet+ASFF, (e) MIMO-UNet+MIMODED, (f) AFMIMO-Unet, (g) AFMIMO-Unet+PESCM, (h) AFMIMO-Unet+MIMODED, (i) AFMIMO-DenseUnet, and (j) the original image.

TABLE 2 Testing results from the ablation experiment

ModelPSNR (dB)SSIM (%)FSIM (%)IFC
MIMO-Unet+DO-Conv29.433493.8095.125.6957
MIMO-Unet+PESCM29.633494.0694.065.7379
MIMO-Unet+ASFF29.551793.9595.205.6813
MIMO-Unet+MIMODED30.722494.9096.126.3908
AFMIMO-Unet30.052494.4195.675.8581
AFMIMO-Unet+PESCM30.205394.6295.866.0705
AFMIMO-Unet+MIMODED31.188195.2796.606.8127
AFMIMO-DenseUnet31.785795.8297.067.2675


From Fig. 7 and Table 2, it can be observed that using the standard MIMO-Unet for image restoration can improve image quality. Furthermore, incorporating designed blocks such as PESCMs and ASFFs into the standard MIMO-Unet leads to marginal performance improvement. Implemented the dense encoding and decoding module in the MIMO-Unet architecture leads to a significant improvement in performance: PSNR shows an improvement of 1.289 dB, while SSIM increases by 1.1% compared to the baseline network. In comparison to the MIMO-Unet+ASFF model, the AFMIMO-Unet (an incomplete model proposed in this study) exhibits performance improvement: PSNR increases by 0.501 dB and SSIM by 0.46%. The complete model AFMIMO-DenseUnet designed in this study combines patch-embedding shallow-convolution modules with MIMO dense encoding and decoding modules. It effectively processes input features and reasonably fuses the output features of each scale encoder, thereby improving the performance of the model. Comparative analysis with the baseline network reveals a PSNR increase of 2.352 dB and a 2.02% improvement in SSIM. The efficacy of our method is also validated using FSIM and IFC metrics. The effectiveness of this approach in turbulent-image restoration is evident.

4.4. Performance Comparison

This paper presents a comparative analysis between the proposed AFMIMO-DenseUnet method and other methods for turbulence removal in image restoration. In this study, the second dataset comprising data with three different turbulence intensities is used to evaluate and compare the performance of different methods, which can better simulate the uneven turbulence intensity under real atmospheric conditions. The restoration results for the different methods are visually presented in Fig. 8, while the specific performance indicators for the restoration process can be found in Table 3.

Figure 8.The original turbulent images and zoomed-in patches are shown. For each image, from top left to bottom right are (a) the turbulent image, (b) results obtained through TurbRecon-TCI, (c) CMFNet, (d) MIMO-Unet, (e) AFMIMO-DenseUnet, and (f) the original image.

TABLE 3 Testing results for different models in turbulent-image restoration

ModelPSNR (dB)SSIM (%)FSIM (%)IFC
TurbRecon-TCI19.969165.1777.581.1338
CMFNet24.465783.1687.682.7442
MIMO-Unet27.255289.7891.984.0138
AFMIMO-DenseUnet29.136092.5394.255.3472


In Fig. 8 and Table 3, the first method is the traditional turbulence-restoration method proposed by Mao et al. [26]. This method is capable of removing a certain degree of turbulence, but not uniformly, making the processing of low-frequency information in the image unnatural and that of high-frequency information too aggressive, resulting in pronounced distortions in object contours within the image. The second method is the CMFNet deep-learning deblurring method proposed by Fan et al. [27]. It can effectively process turbulence uniformly on a global scale, resulting in a relatively natural effect, but it may introduce distortion in certain high-frequency details. The third method, proposed by Cho et al. [16], is the MIMO-UNet deep-learning deblurring method, which does not utilize DO-Conv. This method can remove turbulence relatively uniformly, and maintain the naturalness of the high-frequency information in the image, without any apparent distortion. The fourth method, proposed in this study, is the AFMIMO-DenseUnet deep-learning turbulence-removal method, which can process the global information of the image more uniformly and minimize distortion in high-frequency details, leading to improved turbulence removal. The restoration results show that AFMIMO-DenseUnet outperforms the traditional turbulence restoration method, TurbRecon-TCI, achieving a higher PSNR of 9.1669 dB and a higher SSIM of 27.36%. Compared to MIMO-Unet, AFMIMO-DenseUnet also exhibits improvements in performance, with a higher PSNR of 1.8808 dB and a higher SSIM of 2.77%.

4.5. Image Restoration of Real-world Turbulent Scenes

The primary objective of this study is to verify the image-restoration performance of the proposed AFMIMO-DenseUnet network model under real turbulence conditions, using the BVI-CLEAR real-world turbulence dataset. The restoration results are shown in Fig. 9, demonstrating improvements in both overall clarity and local details of the turbulence-degraded images after restoration. The restoration results further validate the restoration performance of the network model proposed in this paper, as it effectively removes turbulence effects and enhances image quality. Furthermore, the restoration results serve as additional validation of the restoration performance of the proposed model for effectively removing turbulence effects and enhancing image quality.

Figure 9.Compare restoration results and real-world turbulent images: (a) Real turbulent image, (b) restored image.

In this study, a novel method for image restoration based on an adaptive feature fusion MIMO dense U-shaped network is proposed, to address the problem of turbulence degradation in single-image restoration. The proposed method adopts a data-driven approach to resolve the limitations of traditional methods, such as excessive reliance on prior information and unsatisfactory restoration results. The network model is based on the foundation of the MIMO-Unet architecture and incorporates patch-embedding shallow-convolution modules to enhance low-level image features. The intermediate features of the image are processed and extracted employing multiple-input—multiple-output dense encoding and decoding modules. To enhance the rational fusion of features, ASFFs are utilized in both encoding and decoding modules. Experimental results substantiate the effectiveness of the proposed method in improving image quality, rectifying geometric distortion, and achieving heightened clarity in the restored images. The method proposed in this paper is applicable to optical imaging systems, which is significant for reducing hardware costs and improving imaging quality.

National Nature Science Foundation of China (Grant No. 61805144, 61875125, 61775140 and 61405115); Natural Science Foundation of Shanghai (Grant No. 18ZR142 5800).

Data underlying the results presented in this paper are not publicly available at the time of publication, but may be obtained from the authors upon reasonable request.

  1. B. L. Ellerbroek, “First-order performance evaluation of adaptive-optics systems for atmospheric-turbulence compensation in extended-field-of-view astronomical telescopes,” J. Opt. Soc. Am. A 11, 783-805 (1994).
    CrossRef
  2. J. Zhang and X. Zhou, “Research on feature recognition algorithm for space target,” Proc. SPIE 6786, 678616 (2007).
    CrossRef
  3. P.-A. Moreau, E. Toninelli, T. Gregory, and M. J. Padgett, “Ghost imaging using optical correlations,” Laser Photonics Rev. 12, 1700143 (2018).
    CrossRef
  4. C. P. Lau, Y. H. Lai, and L. M. Lui, “Restoration of atmospheric turbulence-distorted images via RPCA and quasiconformal maps,” Inverse Probl. 35, 074002 (2019).
    CrossRef
  5. X. Zhu and P. Milanfar, “Removing atmospheric turbulence via space-invariant deconvolution,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 157-170 (2013).
    Pubmed CrossRef
  6. C. P. Lau, Y. H. Lai, and L. M. Lui, “Variational models for joint subsampling and reconstruction of turbulence-degraded images,” J. Sci. Comput. 78, 1488-1525 (2019).
    CrossRef
  7. C. P. Lau, C. D. Castillo, and R. Chellappa, “ATFaceGAN: Single face semantic aware image restoration and recognition from atmospheric turbulence,” IEEE Trans. Biom. Behav. Identity Sci. 3, 240-251 (2021).
    CrossRef
  8. D. Jin, Y. Chen, Y. Lu, J. Chen, P. Wang, Z. Liu, S. Guo, and X. Bai, “Neutralizing the impact of atmospheric turbulence on complex scene imaging via deep learning,” Nat. Mach. Intell. 3, 876-884 (2021).
    CrossRef
  9. S. N. Rai and C. V. Jawahar, “Removing atmospheric turbulence via deep adversarial learning,” IEEE Trans. Image Process. 31, 2633-2646 (2022).
    Pubmed CrossRef
  10. Y. Xie, W. Zhang, D. Tao, W. Hu, Y. Qu, and H. Wang, “Removing turbulence effect via hybrid total variation and deformation-guided kernel regression,” IEEE Trans. Image Process. 25, 4943-4958 (2016).
    CrossRef
  11. N. Chimitt and S. H. Chan, “Simulating anisoplanatic turbulence by sampling intermodal and spatially correlated Zernike coefficients,” Opt. Eng. 59, 083101 (2020).
    CrossRef
  12. S. Basu, J. E. McCrae, and S. T. Fiorino, “Estimation of the path-averaged atmospheric refractive index structure constant from time-lapse imagery,” Proc. SPIE 9465, 94650T (2015).
    CrossRef
  13. G. A. Chanan, “Calculation of wave-front tilt correlations associated with atmospheric turbulence,” J. Opt. Soc. Am. A 9, 298-301 (1992).
    CrossRef
  14. B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452-1464 (2018).
    Pubmed CrossRef
  15. N. Anantrasirichar and D. Bull, “BVI-CLEAR,” (University of Bristol, Published date: Apr 11, 2022), https://doi.org/10.5523/bris.1yh1e51t7tg2g2q9cwv96sdfc2 (Accessed date: May 11, 2023)
  16. S.-J. Cho, S.-W. Ji, J.-P. Hong, S.-W. Jung, and S.-J. Ko, “Rethinking coarse-to-fine approach in single image deblurring,” in Proc. IEEE/CVF International Conference on Computer Vision (Montreal, QC, Canada, Oct. 10-17, 2021), pp. 4621-4630.
  17. Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (Salt Lake City, UT, USA, Jun. 18-23, 2018), pp. 2472-2481.
    CrossRef
  18. J. Cao, Y. Li, M. Sun, Y. Chen, D. Lischinski, D. Cohen-Or, B. Chen, and C. Tu, “DO-conv: Depthwise over-parameterized convolutional layer,” IEEE Trans. Image Process. 31, 3726-3736 (2022).
    Pubmed CrossRef
  19. S. Liu, D. Huang, and Y. Wang, “Learning Spatial fusion for single-shot object detection,” arXiv:1911.09516 (2019).
  20. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF International Conference on Computer Vision (Montreal, QC, Canada, Oct. 10-17, 2021), pp. 9992-10002.
    Pubmed KoreaMed CrossRef
  21. S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, “Multi-Stage Progressive Image Restoration,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (Nashville, TN, USA, Jun. 20-25, 2021), pp. 14816-14826.
    CrossRef
  22. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1608.03983 (2014).
  23. I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” arXiv:1608.03983 (2016).
  24. H. R. Sheikh, A. C. Bovik, and G. de Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Trans. Image Process. 14, 2117-2128 (2005).
    Pubmed CrossRef
  25. L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. Image Process. 20, 2378-2386 (2011).
    Pubmed CrossRef
  26. Z. Mao, N. Chimitt, and S. H. Chan, “Image reconstruction of static and dynamic scenes through anisoplanatic turbulence,” IEEE Trans. Comput. Imaging 6, 1415-1428 (2020).
    CrossRef
  27. C.-M. Fan, T.-J. Liu, and K.-H. Liu, “Compound multi-branch feature fusion for real image restoration,” arXiv:2206.02748.

Article

Research Paper

Curr. Opt. Photon. 2024; 8(3): 215-224

Published online June 25, 2024 https://doi.org/10.3807/COPP.2024.8.3.215

Copyright © Optical Society of Korea.

Restoring Turbulent Images Based on an Adaptive Feature-fusion Multi-input—Multi-output Dense U-shaped Network

Haiqiang Qian, Leihong Zhang, Dawei Zhang, Kaimin Wang

Engineering Research Center of Optical Instrument and System, Ministry of Education and Shanghai Key Lab of Modern Optical System, University of Shanghai for Science and Technology, 200093 Shanghai, China

Correspondence to:*km_wang@usst.edu.cn, ORCID 0000-0001-8424-0332

Received: November 3, 2023; Revised: January 17, 2024; Accepted: February 23, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In medium- and long-range optical imaging systems, atmospheric turbulence causes blurring and distortion of images, resulting in loss of image information. An image-restoration method based on an adaptive feature-fusion multi-input—multi-output (MIMO) dense U-shaped network (Unet) is proposed, to restore a single image degraded by atmospheric turbulence. The network’s model is based on the MIMO-Unet framework and incorporates patch-embedding shallow-convolution modules. These modules help in extracting shallow features of images and facilitate the processing of the multi-input dense encoding modules that follow. The combination of these modules improves the model’s ability to analyze and extract features effectively. An asymmetric feature-fusion module is utilized to combine encoded features at varying scales, facilitating the feature reconstruction of the subsequent multi-output decoding modules for restoration of turbulence-degraded images. Based on experimental results, the adaptive feature-fusion MIMO dense U-shaped network outperforms traditional restoration methods, CMFNet network models, and standard MIMO-Unet network models, in terms of image-quality restoration. It effectively minimizes geometric deformation and blurring of images.

Keywords: Asymmetric feature fusion, Image reconstruction, Multi-scale features, Turbulence image restoration, U-shaped network

I. INTRODUCTION

In recent years, astronomical imaging systems [1], space target recognition [2], and ghost imaging [3] have developed rapidly. However, the disturbance effect of the atmosphere on an optical system has not been completely resolved. Atmospheric motion affects the results of remote optical imaging systems, since it involves the transmission path of any optical system. Atmospheric turbulence is a significant type of motion due to various factors, such as solar radiation, surface radiation, thermal convection, and human activities that can cause irregular fluctuations in the temperature of the atmosphere. Consequently, these changes in temperature lead to random variations in the physical properties of the atmosphere, such as pressure, density, speed, and direction. Therefore, atmospheric turbulence is characterized by this irregular motion. When there is atmospheric turbulence, the way that light moves through an optical system becomes unstable, resulting in blurry images of the target object. Additionally, the turbulence can cause uneven distortions [4] in the image, making it challenging to extract information or identify features. To improve the quality of imaging, it is essential to process these images that have been degraded by atmospheric turbulence.

Early research focused on minimizing the impact of atmospheric turbulence on optical systems by altering the transmission path of light waves. To achieve this, observation systems were established at high altitudes and regions with thin clouds, but the issue of atmospheric turbulence persisted, and deployment of the observation system was limited by its environment. With the development of hardware-based adaptive-optics technology, an adaptive-optics system can now dynamically compensate and correct wavefront distortion caused by atmospheric turbulence, by connecting the wavefront sensor to a deformable mirror. Although adding more wavefront sensors and deformable mirrors can compensate for atmospheric-turbulence-induced wavefront distortion, it has drawbacks such as increased system cost and reduced reliability, and it cannot completely eliminate wavefront distortion.

The maturity of software-based digital image-processing technology is steadily increasing. Numerous classical image-restoration algorithms have the capability to partially restore original images affected by atmospheric turbulence. Furthermore, newer and more advanced image-restoration algorithms are consistently being developed. In 2013, Zhu and Milanfar [5] proposed a B-spline-based nonrigid image-registration technology capable of correcting geometric distortion in images. In 2018, Lau et al. [6] proposed a variational model that combines frame subsampling and clear image extraction, which significantly improves the quality of reconstructed images. Despite achieving some results, these image-restoration algorithms still fall short in effectively restoring images degraded by atmospheric turbulence.

In recent years, the rapid development of deep learning has led to significant advancements in the field of image-information extraction, particularly through the utilization of deep convolutional neural networks. Consequently, numerous researchers have started employing these networks to process images that have been degraded by atmospheric turbulence. In 2021, Lau et al. [7] introduced a novel single-frame-recovery generation algorithm that utilizes two GAN generators to address turbulence-induced deformation and blur. In that same year, Jin et al. [8] introduced TSR-WGAN, a generative adversarial network specifically designed to restore sequence data degraded by turbulence. In 2022, Rai and Jawahar [9] designed two two-level deep learning networks, namely DT-GAN+ and DTD-GAN+, specifically focusing on addressing geometric distortion and blur in turbulent images. The first-level network employs WarpNet to rectify geometric distortion, while the second-level network utilizes ColorNet to eliminate image blurriness, thereby enhancing image quality. The training process of a GAN-type network is frequently challenging, leading to issues in the generated images such as unnatural blur and distortion. In recent years, Unet-type neural networks have demonstrated promising outcomes in image restoration, offering the potential to effectively address atmospheric turbulence and reconstruct degraded images.

This paper presents an adaptive feature-fusion multi-input—multi-output dense U-shaped network (AFMIMO-DenseUnet). The proposed network leverages an end-to-end processing mechanism. It takes both the simulated turbulence-degradation image and the corresponding true-value image as input. The input image is scaled to three different sizes, and then encoded individually by encoders operating at distinct scales. Global and local features are extracted, followed by the fusion of the encoding features. By performing encoder processing at multiple scales, the network effectively suppresses atmospheric turbulence at a finer level, resulting in enhanced image restoration. This makes the network model proposed in this article very attractive for suppressing turbulence effects in optical systems.

II. BASIC PRINCIPLES

2.1. Model for Degradation due to Atmospheric Turbulence

Geometric distortion and blurring of long-distance imaging are primarily induced by atmospheric turbulence. The model that describes the degradation process of atmospheric turbulence is expressed as follows [10]:

H=D(G(I))+ε.

Among these elements, H represents the image degraded by atmospheric turbulence, I represents the original clear image, G represents the optical-blur operator, D represents the geometric distortion operator, and ε represents various additive noises.

2.2. Turbulence Dataset

Traditional turbulence-restoration methods primarily involve manual parameter adjustments to achieve an approximately optimal solution. The requirement for real turbulence data is minimal, with low acquisition costs. A deep neural network can automatically discover an approximately optimal solution for turbulence restoration, without the need for manual parameter adjustments, saving time in the process. However, deep learning necessitates a significant amount of real turbulence data for training, resulting in additional costs. To mitigate the lack of available data, a viable solution is to train the network using simulated turbulence datasets, and validate its performance with real turbulence data.

2.2.1. Simulated Turbulence Dataset

The performance of a turbulence-restoration method using deep learning is intricately linked to the input simulation data. Greater similarity between simulated and actual turbulence data results in improved network performance. Currently, numerous scholars have conducted extensive research on simulating turbulence. Notably, the turbulence simulator developed by Chimitt and Chan [11] is eminently suitable for meeting the training requirements of turbulence-restoration networks. This is mainly because Chimitt and Chan [11] utilize a formula grounded in physical laws to simulate turbulence, and their work focuses on establishing the equivalence between the arrival-angle correlation proposed by Basu et al. [12] and the multiaperture correlation introduced by Chanan [13]. Table 1 presents the primary parameter configurations utilized in the turbulence simulator.

TABLE 1. Main parameters of the turbulence simulator.

ParameterValue
Aperture Diameter (m)0.054
Wavelength (nm)525
Refractive-index Structure Constant7 × 10−16, 1 × 10−15, 3 × 10−15 m−2/3
Focal Length (m)0.3
Path Length (m)1,000


Once the parameters of the turbulence simulator have been defined, the input data must be provided to generate the required dataset to train the turbulence-restoration network. The input data is sourced from the dataset called Place2, which was developed by Zhou et al. [14]. The Place2 dataset consists of over 10 million images, belonging to more than 400 distinct scene categories. The data belonging to the selected scene category of outdoor buildings are fed into the turbulence simulator. The presence of boundary lines on the buildings aids in the characterization of the geometric distortion caused by turbulence. The turbulence simulator is limited to processing grayscale images, so prior to input into the turbulence simulator with the specified parameters, these data must undergo preprocessing to convert them to grayscale. Figure 1 displays the resulting turbulent image obtained from the simulation process.

Figure 1. Turbulent-degradation images of three different refractive-index structure constants. (a) Original, (b) Cn2 = 7 × 10−16 m−2/3, (c) Cn2 = 1 × 10−15 m−2/3, and (d) Cn2 = 3 × 10−15 m−2/3.

2.2.2. Real Turbulence Dataset

The study utilizes real turbulence data from the BVI-CLEAR dataset developed by Anantrasirichar and Bull [15], which consists of two types of atmospheric turbulence sequences: synthetic and real. Real data sequences are utilized as input data for evaluating the performance of the turbulence-restoration network. The real data sequences are categorized into three turbulence scenes: mirage, monument, and moving car. The real datasets were captured under arid desert conditions using a camera equipped with a 400-mm lens (Canon EOS-1D Mark IV; Canon, Tokyo, Japan), at a temperature of 46 ℃. Additionally, the actual data sequence requires clipping and grayscale preprocessing. The processed output is depicted in Fig. 2.

Figure 2. Real data sequences from the BVI-CLEAR dataset. (a) Mirage, (b) monument, and (c) moving car.

III. THE CONTENT OF THE PROPOSED METHOD

3.1. Overall Structure of the Network Model

This paper presents the AFMIMO-DenseUnet model as a solution to the simultaneous removal of geometric distortion and blur in the task of turbulence restoration. The overall architecture of AFMIMO-DenseUnet is illustrated in Fig. 3. The AFMIMO-DenseUnet architecture is based on Cho et al.’s MIMO-UNet [16], and is specifically designed to efficiently remove multiscale turbulence. AFMIMO-DenseUnet consists of essential components, including a multi-input single dense encoder (MISDE), a multi-output single dense decoder (MOSDD), a patch-embedding shallow-convolution module (PESCM), a feature-attention module (FAM), and an asymmetric feature-fusion module (ASFF). The input of AFMIMO-DenseUnet is single-channel grayscale turbulence-degraded images with a resolution of 256 × 256. The output is single-channel turbulence-restored images of the same size.

Figure 3. Structure of the AFMIMO-DenseUnet model.

The AFMIMO-DenseUnet architecture consists of multiscale dense-encoder and multiscale dense-decoder modules, incorporating three sets of dense encoding blocks (DEBs) and dense decoding blocks (DDBs) at different scales. This design enables improved handling of turbulence distortion in various regions of the image.

Initially, the network performs downsampling on the input 256 × 256 images to generate turbulence-degradation images at two different scales: 128 × 128 and 64 × 64. The original turbulence-degradation image with a scale of 256 × 256 is input into the DEB that corresponds to the original large scale, enabling the extraction of encoded features with 32 channels and 256 × 256 scale. The PESCM is utilized to extract 64 channels of turbulence-degraded image features at a scale of 128 × 128 from the original image. The FAM is employed to combine the features extracted from PESCM with those generated by the original large-scale encoder.

Subsequently, the combined features are input into the corresponding medium-scale DEB in the network, to process and extract 64 channels of encoded features at a scale of 128 × 128. The original large-scale image applies the PESCM to extract the features of turbulence degraded images with 128 channels and a scale of 64 × 64. The features extracted from the PESCM and the features generated from the medium-scale DEB are combined using FAM. Subsequently, the combined features are input into the corresponding small-scale DEB for processing, extracting encoded features with 128 channels and a scale of 64 × 64.

To achieve effective multiscale feature fusion in the network, this study initially feeds the features processed by three DEBs into three ASFFs that correspond to various scales. The features obtained from ASFF processing at various scales exhibit consistency with the output features of the respective DEB scale. By combining the output features of ASFF and DEB at small scales along the channel dimension, the resultant merged features are subsequently fed into a small-scale DDB for further processing. This allows for the extraction of decoding features with 128 channels and a scale of 64 × 64.

The features generated by the small-scale DDB are combined with those produced by the medium-scale ASFF. Subsequently, these concatenated features are fed into the medium-scale DDB for processing, allowing the extraction of decoding features with 64 channels and a scale of 128 × 128. Similar to the medium-scale DDB, the large-scale DDB follows analogous input and output processing, resulting in the extraction of decoding features with 32 channels and a scale of 256 × 256. The output features of the three DDBs are individually subjected to convolution to extract detail-restoration features at three different scales: 64 × 64, 128 × 128, and 256 × 256, all represented by a single channel. Subsequently, these features are incorporated into the respective scaled, blurred images to generate turbulence-restored images at the aforementioned three scales.

3.2. Multi-input Single Dense Encoder and Multi-output Single Dense Decoder

The multi-input-multi-output architecture of AFMIMO-DenseUnet enables improved capture of the correlations between input data and output predictions, thereby enhancing network performance. To further optimize the neural network’s performance, it is essential to redefine the basic unit module, ensuring that it effectively leverages the advantages of the network architecture. The multiscale encoder and multiscale decoder of AFMIMO-DenseUnet are collectively called the multi-input—multi-output dense encoder and decoder (MIMODED). Despite that their functions are distinct, the basic unit modules inside them are the identical. The conventional basic unit module typically consists of a residual block, which employs a standard convolution. However, this approach struggles to capture the global differences in information between blurred and clear image pairs, predominantly emphasizing local details. In this study, the devised fundamental building block is referred to as a Res dense block, depicted in Fig. 4. It is primarily derived from RDN residual dense block proposed by of Zhang et al. [17], with some modifications implemented. Nevertheless dense connections are retained, due to their ability to effectively mitigate the issue of vanishing gradients and enhance feature propagation. Simultaneously, the conventional convolution operation is substituted with a deep over-parametrized convolution (Do-Conv) [18]. Over-parametrization is a technique employed to expedite the training process of deep networks, leading to performance enhancements in the restoration of turbulent images without introducing additional parameters.

Figure 4. Structure of a single dense encoding block and decoding block.

3.3. Asymmetric Feature-fusion Module

The feature-fusion method plays a crucial role in maximizing the performance of the MISDE and the MOSDD. A conventional feature-fusion module initially performs upsampling and downsampling on the output features of encoders at different scales, subsequently combines the processed features along the channel dimension, and finally obtains fused features through convolutional processing. However, this feature-fusion method fails to adequately consider the relative importance of different scale features in the fused representation. This study investigates the adaptive feature-fusion module introduced by Liu et al. [19] in YOLOv3, and proposes three ASFFs of varying scales. As illustrated in Fig. 5, the conventional convolution in the module is substituted with Do-Conv. The ASFF initially resizes the feature dimensions of the dense-encoder outputs at three different scales, to align them with the dense-decoder inputs of corresponding scale. Subsequently, by means of training, the model learns the weight allocations among the features at different scales during fusion, aiming to identify the optimal feature-fusion method and to effectively leverage the capabilities of both dense encoder and decoder modules.

Figure 5. Structure of the multiscale asymmetric feature-fusion module (ASFF). (a) ASFF1, (b) ASFF2, and (c) ASFF3.

3.4. Patch-embedding Shallow-convolution Module

The input turbulence-degraded image of AFMIMO-DenseUnet needs to undergo shallow feature extraction using convolution to generate medium-scale and small-scale features specific to the turbulence degradation, which are subsequently fed to the medium-scale and small-scale DEBs. This study refers to the patch-embedding method employed in Swin Transformer by Liu et al. [20] and designs a PESCM that is applicable to images, as illustrated in Fig. 6. The PESCM utilizes a two-dimensional convolution operation with uniform kernel size and stride to downsampe the image. Additionally, layer normalization is applied to process the channel dimension of the image. The resulting image features are beneficial for DEB handling.

Figure 6. Structure of the patch-embedding shallow-convolution module (PESCM). (a) PESCM2, (b) PESCM1.

3.5. Loss Function

In this study, k ∈ {0, ..., K − 1}, Ŝk, Sk and ε are employed to respectively denote the k th level, the k th reconstructed image, the k th real clear image, and the constant value of 10−3 in AFMIMO-DenseUnet. Three loss functions are employed to train AFMIMO-DenseUnet:

Step 1. Charbonnier loss [21]:

Lchar=S^kSk2+ε2.

Step 2. Edge loss [21]:

Ledge =ΔS^kΔSk2+ε2,

where ∆ represents the Laplacian operator, and

Step 3. Frequency-reconstruction loss [16], which assesses the frequency discrepancy between reconstructed images and real clear images:

Lfreq=FTS^k FT(S)1,

where FT represents the fast Fourier transform (FFT) operation. Ultimately the loss function for AFMIMO-DenseUnet can be defined as:

L= k=0 K1L char+α1L edge+α2L freq,

where α1 and α2 are tradeoff parameters that are empirically set to 0.05 and 0.01 respectively.

IV. EXPERIMENTS

4.1. Datasets and Implementation Details

This study prepares two sets of data for experiments, based on the data of three different values of refractive-index structure constant generated by the turbulence simulator mentioned above. The first dataset consists of 5,000 original, clear grayscale images and 5,000 turbulence simulation images with a refractive-index structure constant of 1 × 10−15 m−2⁄3. This dataset comprises a total of 5,000 pairs of size 256 × 256 pixels. Among the data, 3,000 pairs are allocated for training, 1,000 pairs for validation, and the remaining 1,000 pairs for testing purposes. The second dataset consists of 5,000 original, clear grayscale images and 5,000 simulated images with various levels of turbulence intensity. Specifically, there are 1,000 images with a turbulence intensity of 7 × 10−16 m−2⁄3, 2,000 images with a turbulence intensity of 1 × 10−15 m−2⁄3, and 2,000 images with a turbulence intensity of 3 × 10−15 m−2⁄3. Similar to the first dataset, 3,000 pairs of images are selected as the training set, with varying turbulence intensity. Among these, 3,000 simulated turbulence pictures have different turbulence intensities, with 600 images with a turbulence intensity of 7 × 10−16 m−2⁄3, 1,200 images with a turbulence intensity of 1 × 10−15 m−2⁄3, and 1,200 images with a turbulence intensity of 3 × 10−15 m−2⁄3. Both the validation set and test set still consist of 1,000 pairs of images, with the distribution of different turbulence intensities being consistent with that of the training set.

The network-training hyperparameters in this study are as follows: The image block size is set to 256 × 256, which matches the input image size to effectively capture the global information. The batch size is set to 3, the number of training epochs is set to 3,000, and the optimizer utilizes the Adam [22] strategy. The learning rate is initially set to 2 × 10−4 and decreases steadily to 1 × 10−6 during the training process, using the cosine annealing strategy [23]. Data-augmentation techniques are applied, including random horizontal or vertical flipping of each image patch. The experiment in this study is performed on a Windows 10 system equipped with an NVIDIA GeForce RTX 4070TI graphics card and Intel i5-13600k processor.

4.2. Evaluation Metrics

This study employs the peak signal-to-noise ratio (PSNR) as a metric to assess the quality of turbulent-image restoration. PSNR is defined as the ratio of the maximum pixel value in the image to the mean square error. A higher PSNR value indicates a closer resemblance between the restored image and the original image. Additionally, to account for human perception, this paper incorporates Structural Similarity (SSIM) as an evaluation metric, which quantifies the similarity between two images. The range of values for SSIM is [0, 1], with a larger value indicating greater similarity between the images, implying a stronger resemblance. In addition to PSNR and SSIM, this work also utilizes two image-quality-assessment metrics, namely the information fidelity criterion (IFC) [24] and the feature similarity index (FSIM) [25]. IFC, based on information theory, measures the information loss between the processed image and the original image; a higher IFC value indicates less information loss between the evaluated image and the original image. FSIM combines low- and high-level image features to evaluate the extent of feature preservation in the processed image; A higher FSIM value indicates higher similarity in features between the evaluated image and the original image.

4.3. Ablation Study

This work aims to experimentally analyze the impact of the AFMIMO-DenseUnet network architecture on simulating turbulence datasets. First, an appropriate dataset is selected. The initial dataset consists of turbulence data with a consistent intensity, which is more appropriate for analyzing the architectural performance of AFMIMO-DenseUnet. Second, this study adopts the MIMO-UNet [16], replacing the conventional convolution with DO-Conv, and employs it as a reference benchmark. The work initially examines the performance of the PESCM on the baseline network, followed by an evaluation of the performance of the ASFF, both on the baseline network. Additionally, the effectiveness of the MIMODED on the benchmark network is individually examined. Next, this paper evaluates the performance of AFMIMO-Unet, which is a variation of the MIMO-Unet model that incorporates modifications to the architecture between the encoder and decoder by replacing two feature-fusion modules with three ASFF modules. AFMIMO-Unet can be considered as an alternative to AFMIMO-DenseUnet, without employing the improved MIMODED and PESCM. Additionally, the performance of PESCM and MIMODED is individually validated using the AFMIMO-Unet framework. Finally, the performance of the proposed AFMIMO-DenseUnet is comprehensively verified. The results of all experiments for turbulence removal and image restoration are shown in Fig. 7, while the performance-evaluation results of each experiment are presented in Table 2.

Figure 7. The original turbulent images and zoomed-in patches are shown. For each image, from top left to bottom right are (a) the turbulent image, (b) results obtained through MIMO-UNet+DO-Conv, (c) MIMO-UNet+PESCM, (d) MIMO-UNet+ASFF, (e) MIMO-UNet+MIMODED, (f) AFMIMO-Unet, (g) AFMIMO-Unet+PESCM, (h) AFMIMO-Unet+MIMODED, (i) AFMIMO-DenseUnet, and (j) the original image.

TABLE 2. Testing results from the ablation experiment.

ModelPSNR (dB)SSIM (%)FSIM (%)IFC
MIMO-Unet+DO-Conv29.433493.8095.125.6957
MIMO-Unet+PESCM29.633494.0694.065.7379
MIMO-Unet+ASFF29.551793.9595.205.6813
MIMO-Unet+MIMODED30.722494.9096.126.3908
AFMIMO-Unet30.052494.4195.675.8581
AFMIMO-Unet+PESCM30.205394.6295.866.0705
AFMIMO-Unet+MIMODED31.188195.2796.606.8127
AFMIMO-DenseUnet31.785795.8297.067.2675


From Fig. 7 and Table 2, it can be observed that using the standard MIMO-Unet for image restoration can improve image quality. Furthermore, incorporating designed blocks such as PESCMs and ASFFs into the standard MIMO-Unet leads to marginal performance improvement. Implemented the dense encoding and decoding module in the MIMO-Unet architecture leads to a significant improvement in performance: PSNR shows an improvement of 1.289 dB, while SSIM increases by 1.1% compared to the baseline network. In comparison to the MIMO-Unet+ASFF model, the AFMIMO-Unet (an incomplete model proposed in this study) exhibits performance improvement: PSNR increases by 0.501 dB and SSIM by 0.46%. The complete model AFMIMO-DenseUnet designed in this study combines patch-embedding shallow-convolution modules with MIMO dense encoding and decoding modules. It effectively processes input features and reasonably fuses the output features of each scale encoder, thereby improving the performance of the model. Comparative analysis with the baseline network reveals a PSNR increase of 2.352 dB and a 2.02% improvement in SSIM. The efficacy of our method is also validated using FSIM and IFC metrics. The effectiveness of this approach in turbulent-image restoration is evident.

4.4. Performance Comparison

This paper presents a comparative analysis between the proposed AFMIMO-DenseUnet method and other methods for turbulence removal in image restoration. In this study, the second dataset comprising data with three different turbulence intensities is used to evaluate and compare the performance of different methods, which can better simulate the uneven turbulence intensity under real atmospheric conditions. The restoration results for the different methods are visually presented in Fig. 8, while the specific performance indicators for the restoration process can be found in Table 3.

Figure 8. The original turbulent images and zoomed-in patches are shown. For each image, from top left to bottom right are (a) the turbulent image, (b) results obtained through TurbRecon-TCI, (c) CMFNet, (d) MIMO-Unet, (e) AFMIMO-DenseUnet, and (f) the original image.

TABLE 3. Testing results for different models in turbulent-image restoration.

ModelPSNR (dB)SSIM (%)FSIM (%)IFC
TurbRecon-TCI19.969165.1777.581.1338
CMFNet24.465783.1687.682.7442
MIMO-Unet27.255289.7891.984.0138
AFMIMO-DenseUnet29.136092.5394.255.3472


In Fig. 8 and Table 3, the first method is the traditional turbulence-restoration method proposed by Mao et al. [26]. This method is capable of removing a certain degree of turbulence, but not uniformly, making the processing of low-frequency information in the image unnatural and that of high-frequency information too aggressive, resulting in pronounced distortions in object contours within the image. The second method is the CMFNet deep-learning deblurring method proposed by Fan et al. [27]. It can effectively process turbulence uniformly on a global scale, resulting in a relatively natural effect, but it may introduce distortion in certain high-frequency details. The third method, proposed by Cho et al. [16], is the MIMO-UNet deep-learning deblurring method, which does not utilize DO-Conv. This method can remove turbulence relatively uniformly, and maintain the naturalness of the high-frequency information in the image, without any apparent distortion. The fourth method, proposed in this study, is the AFMIMO-DenseUnet deep-learning turbulence-removal method, which can process the global information of the image more uniformly and minimize distortion in high-frequency details, leading to improved turbulence removal. The restoration results show that AFMIMO-DenseUnet outperforms the traditional turbulence restoration method, TurbRecon-TCI, achieving a higher PSNR of 9.1669 dB and a higher SSIM of 27.36%. Compared to MIMO-Unet, AFMIMO-DenseUnet also exhibits improvements in performance, with a higher PSNR of 1.8808 dB and a higher SSIM of 2.77%.

4.5. Image Restoration of Real-world Turbulent Scenes

The primary objective of this study is to verify the image-restoration performance of the proposed AFMIMO-DenseUnet network model under real turbulence conditions, using the BVI-CLEAR real-world turbulence dataset. The restoration results are shown in Fig. 9, demonstrating improvements in both overall clarity and local details of the turbulence-degraded images after restoration. The restoration results further validate the restoration performance of the network model proposed in this paper, as it effectively removes turbulence effects and enhances image quality. Furthermore, the restoration results serve as additional validation of the restoration performance of the proposed model for effectively removing turbulence effects and enhancing image quality.

Figure 9. Compare restoration results and real-world turbulent images: (a) Real turbulent image, (b) restored image.

V. CONCLUSION

In this study, a novel method for image restoration based on an adaptive feature fusion MIMO dense U-shaped network is proposed, to address the problem of turbulence degradation in single-image restoration. The proposed method adopts a data-driven approach to resolve the limitations of traditional methods, such as excessive reliance on prior information and unsatisfactory restoration results. The network model is based on the foundation of the MIMO-Unet architecture and incorporates patch-embedding shallow-convolution modules to enhance low-level image features. The intermediate features of the image are processed and extracted employing multiple-input—multiple-output dense encoding and decoding modules. To enhance the rational fusion of features, ASFFs are utilized in both encoding and decoding modules. Experimental results substantiate the effectiveness of the proposed method in improving image quality, rectifying geometric distortion, and achieving heightened clarity in the restored images. The method proposed in this paper is applicable to optical imaging systems, which is significant for reducing hardware costs and improving imaging quality.

FUNDING

National Nature Science Foundation of China (Grant No. 61805144, 61875125, 61775140 and 61405115); Natural Science Foundation of Shanghai (Grant No. 18ZR142 5800).

DISCLOSURES

The authors declare no conflict of interest.

DATA AVAILABILITY

Data underlying the results presented in this paper are not publicly available at the time of publication, but may be obtained from the authors upon reasonable request.

Fig 1.

Figure 1.Turbulent-degradation images of three different refractive-index structure constants. (a) Original, (b) Cn2 = 7 × 10−16 m−2/3, (c) Cn2 = 1 × 10−15 m−2/3, and (d) Cn2 = 3 × 10−15 m−2/3.
Current Optics and Photonics 2024; 8: 215-224https://doi.org/10.3807/COPP.2024.8.3.215

Fig 2.

Figure 2.Real data sequences from the BVI-CLEAR dataset. (a) Mirage, (b) monument, and (c) moving car.
Current Optics and Photonics 2024; 8: 215-224https://doi.org/10.3807/COPP.2024.8.3.215

Fig 3.

Figure 3.Structure of the AFMIMO-DenseUnet model.
Current Optics and Photonics 2024; 8: 215-224https://doi.org/10.3807/COPP.2024.8.3.215

Fig 4.

Figure 4.Structure of a single dense encoding block and decoding block.
Current Optics and Photonics 2024; 8: 215-224https://doi.org/10.3807/COPP.2024.8.3.215

Fig 5.

Figure 5.Structure of the multiscale asymmetric feature-fusion module (ASFF). (a) ASFF1, (b) ASFF2, and (c) ASFF3.
Current Optics and Photonics 2024; 8: 215-224https://doi.org/10.3807/COPP.2024.8.3.215

Fig 6.

Figure 6.Structure of the patch-embedding shallow-convolution module (PESCM). (a) PESCM2, (b) PESCM1.
Current Optics and Photonics 2024; 8: 215-224https://doi.org/10.3807/COPP.2024.8.3.215

Fig 7.

Figure 7.The original turbulent images and zoomed-in patches are shown. For each image, from top left to bottom right are (a) the turbulent image, (b) results obtained through MIMO-UNet+DO-Conv, (c) MIMO-UNet+PESCM, (d) MIMO-UNet+ASFF, (e) MIMO-UNet+MIMODED, (f) AFMIMO-Unet, (g) AFMIMO-Unet+PESCM, (h) AFMIMO-Unet+MIMODED, (i) AFMIMO-DenseUnet, and (j) the original image.
Current Optics and Photonics 2024; 8: 215-224https://doi.org/10.3807/COPP.2024.8.3.215

Fig 8.

Figure 8.The original turbulent images and zoomed-in patches are shown. For each image, from top left to bottom right are (a) the turbulent image, (b) results obtained through TurbRecon-TCI, (c) CMFNet, (d) MIMO-Unet, (e) AFMIMO-DenseUnet, and (f) the original image.
Current Optics and Photonics 2024; 8: 215-224https://doi.org/10.3807/COPP.2024.8.3.215

Fig 9.

Figure 9.Compare restoration results and real-world turbulent images: (a) Real turbulent image, (b) restored image.
Current Optics and Photonics 2024; 8: 215-224https://doi.org/10.3807/COPP.2024.8.3.215

TABLE 1 Main parameters of the turbulence simulator

ParameterValue
Aperture Diameter (m)0.054
Wavelength (nm)525
Refractive-index Structure Constant7 × 10−16, 1 × 10−15, 3 × 10−15 m−2/3
Focal Length (m)0.3
Path Length (m)1,000

TABLE 2 Testing results from the ablation experiment

ModelPSNR (dB)SSIM (%)FSIM (%)IFC
MIMO-Unet+DO-Conv29.433493.8095.125.6957
MIMO-Unet+PESCM29.633494.0694.065.7379
MIMO-Unet+ASFF29.551793.9595.205.6813
MIMO-Unet+MIMODED30.722494.9096.126.3908
AFMIMO-Unet30.052494.4195.675.8581
AFMIMO-Unet+PESCM30.205394.6295.866.0705
AFMIMO-Unet+MIMODED31.188195.2796.606.8127
AFMIMO-DenseUnet31.785795.8297.067.2675

TABLE 3 Testing results for different models in turbulent-image restoration

ModelPSNR (dB)SSIM (%)FSIM (%)IFC
TurbRecon-TCI19.969165.1777.581.1338
CMFNet24.465783.1687.682.7442
MIMO-Unet27.255289.7891.984.0138
AFMIMO-DenseUnet29.136092.5394.255.3472

References

  1. B. L. Ellerbroek, “First-order performance evaluation of adaptive-optics systems for atmospheric-turbulence compensation in extended-field-of-view astronomical telescopes,” J. Opt. Soc. Am. A 11, 783-805 (1994).
    CrossRef
  2. J. Zhang and X. Zhou, “Research on feature recognition algorithm for space target,” Proc. SPIE 6786, 678616 (2007).
    CrossRef
  3. P.-A. Moreau, E. Toninelli, T. Gregory, and M. J. Padgett, “Ghost imaging using optical correlations,” Laser Photonics Rev. 12, 1700143 (2018).
    CrossRef
  4. C. P. Lau, Y. H. Lai, and L. M. Lui, “Restoration of atmospheric turbulence-distorted images via RPCA and quasiconformal maps,” Inverse Probl. 35, 074002 (2019).
    CrossRef
  5. X. Zhu and P. Milanfar, “Removing atmospheric turbulence via space-invariant deconvolution,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 157-170 (2013).
    Pubmed CrossRef
  6. C. P. Lau, Y. H. Lai, and L. M. Lui, “Variational models for joint subsampling and reconstruction of turbulence-degraded images,” J. Sci. Comput. 78, 1488-1525 (2019).
    CrossRef
  7. C. P. Lau, C. D. Castillo, and R. Chellappa, “ATFaceGAN: Single face semantic aware image restoration and recognition from atmospheric turbulence,” IEEE Trans. Biom. Behav. Identity Sci. 3, 240-251 (2021).
    CrossRef
  8. D. Jin, Y. Chen, Y. Lu, J. Chen, P. Wang, Z. Liu, S. Guo, and X. Bai, “Neutralizing the impact of atmospheric turbulence on complex scene imaging via deep learning,” Nat. Mach. Intell. 3, 876-884 (2021).
    CrossRef
  9. S. N. Rai and C. V. Jawahar, “Removing atmospheric turbulence via deep adversarial learning,” IEEE Trans. Image Process. 31, 2633-2646 (2022).
    Pubmed CrossRef
  10. Y. Xie, W. Zhang, D. Tao, W. Hu, Y. Qu, and H. Wang, “Removing turbulence effect via hybrid total variation and deformation-guided kernel regression,” IEEE Trans. Image Process. 25, 4943-4958 (2016).
    CrossRef
  11. N. Chimitt and S. H. Chan, “Simulating anisoplanatic turbulence by sampling intermodal and spatially correlated Zernike coefficients,” Opt. Eng. 59, 083101 (2020).
    CrossRef
  12. S. Basu, J. E. McCrae, and S. T. Fiorino, “Estimation of the path-averaged atmospheric refractive index structure constant from time-lapse imagery,” Proc. SPIE 9465, 94650T (2015).
    CrossRef
  13. G. A. Chanan, “Calculation of wave-front tilt correlations associated with atmospheric turbulence,” J. Opt. Soc. Am. A 9, 298-301 (1992).
    CrossRef
  14. B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452-1464 (2018).
    Pubmed CrossRef
  15. N. Anantrasirichar and D. Bull, “BVI-CLEAR,” (University of Bristol, Published date: Apr 11, 2022), https://doi.org/10.5523/bris.1yh1e51t7tg2g2q9cwv96sdfc2 (Accessed date: May 11, 2023)
  16. S.-J. Cho, S.-W. Ji, J.-P. Hong, S.-W. Jung, and S.-J. Ko, “Rethinking coarse-to-fine approach in single image deblurring,” in Proc. IEEE/CVF International Conference on Computer Vision (Montreal, QC, Canada, Oct. 10-17, 2021), pp. 4621-4630.
  17. Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (Salt Lake City, UT, USA, Jun. 18-23, 2018), pp. 2472-2481.
    CrossRef
  18. J. Cao, Y. Li, M. Sun, Y. Chen, D. Lischinski, D. Cohen-Or, B. Chen, and C. Tu, “DO-conv: Depthwise over-parameterized convolutional layer,” IEEE Trans. Image Process. 31, 3726-3736 (2022).
    Pubmed CrossRef
  19. S. Liu, D. Huang, and Y. Wang, “Learning Spatial fusion for single-shot object detection,” arXiv:1911.09516 (2019).
  20. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF International Conference on Computer Vision (Montreal, QC, Canada, Oct. 10-17, 2021), pp. 9992-10002.
    Pubmed KoreaMed CrossRef
  21. S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, “Multi-Stage Progressive Image Restoration,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (Nashville, TN, USA, Jun. 20-25, 2021), pp. 14816-14826.
    CrossRef
  22. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1608.03983 (2014).
  23. I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” arXiv:1608.03983 (2016).
  24. H. R. Sheikh, A. C. Bovik, and G. de Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Trans. Image Process. 14, 2117-2128 (2005).
    Pubmed CrossRef
  25. L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. Image Process. 20, 2378-2386 (2011).
    Pubmed CrossRef
  26. Z. Mao, N. Chimitt, and S. H. Chan, “Image reconstruction of static and dynamic scenes through anisoplanatic turbulence,” IEEE Trans. Comput. Imaging 6, 1415-1428 (2020).
    CrossRef
  27. C.-M. Fan, T.-J. Liu, and K.-H. Liu, “Compound multi-branch feature fusion for real image restoration,” arXiv:2206.02748.