G-0K8J8ZR168
검색
검색 팝업 닫기

Ex) Article Title, Author, Keywords

## Article

Curr. Opt. Photon. 2021; 5(5): 514-523

Published online October 25, 2021 https://doi.org/10.3807/COPP.2021.5.5.514

## Absolute Depth Estimation Based on a Sharpness-assessment Algorithm for a Camera with an Asymmetric Aperture

Beomjun Kim, Daerak Heo, Woonchan Moon, Joonku Hahn

School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea

Corresponding author: jhahn@knu.ac.kr, ORCID 0000-0002-5038-7253

Received: June 14, 2021; Revised: July 19, 2021; Accepted: July 29, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Methods for absolute depth estimation have received lots of interest, and most algorithms are concerned about how to minimize the difference between an input defocused image and an estimated defocused image. These approaches may increase the complexity of the algorithms to calculate the defocused image from the estimation of the focused image. In this paper, we present a new method to recover depth of scene based on a sharpness-assessment algorithm. The proposed algorithm estimates the depth of scene by calculating the sharpness of deconvolved images with a specific point-spread function (PSF). While most depth estimation studies evaluate depth of the scene only behind a focal plane, the proposed method evaluates a broad depth range both nearer and farther than the focal plane. This is accomplished using an asymmetric aperture, so the PSF at a position nearer than the focal plane is different from that at a position farther than the focal plane. From the image taken with a focal plane of 160 cm, the depth of object over the broad range from 60 to 350 cm is estimated at 10 cm resolution. With an asymmetric aperture, we demonstrate the feasibility of the sharpness-assessment algorithm to recover absolute depth of scene from a single defocused image.

Keywords: Coded aperture, Depth estimation, Image reconstruction

OCIS codes: (100.2000) Digital image processing; (100.3008) Image recognition, algorithms and filters; (100.3020) Image reconstruction-restoration; (120.3940) Metrology; (170.1630) Coded aperture imaging

Depth estimation is one of the most important areas of study in three-dimensional (3D) metrology. Depth estimation occupies a crucial position in a variety of industries, as it has been actively used for self-driving cars, inspection of defects, 3D recognition, for example. Time-of-flight and structured-light illumination are regarded as representative technologies of depth estimation, where a specific light is emitted onto a target and the reflected light signal is collected by a detector. These technologies provide a depth map with high resolution and accuracy, but they usually require expensive and complicated optical instruments.

On the other hand, depth estimation from defocused images has the great advantage that it requires only one or two images from a single conventional camera and its system has a small form factor compared to previous ones. This method originates from the relation between level of defocus (LOD) and depth profile. When the object is placed outside of the depth of field, the captured image is defocused and LOD increases proportionally to the distance from the focal plane. Based on this fact, many studies of depth recovery have concentrated on obtaining LOD from defocused images [13]. Therefore, evaluation of LOD from the scene plays a significant role in various applications such as image deblurring, image segmentation and depth estimation.

In general, methods for depth recovery are classified into relative depth estimation (RDE) and absolute depth estimation (ADE) from the defocused images. In RDE studies, the LOD is evaluated by computing the standard deviation, which is inferred from edge-detection. The standard deviation is related to the size of the point-spread function (PSF) and can be obtained from the high-frequency content using the derivative operator [4, 5]. Pentland [6] proposed the framework of RDE to obtain the focal disparity map by utilizing the relation between high frequency content coming from the difference of two defocused images. But this approach has an inconvenience: It needs to capture several images from the same scene, with different camera parameters. Bae and Durand [7] recovered the depth map from a single defocused image by employing the method of Elder and Zucker [1] and Levin et al. [8]. Zhou and Terence [9] use a ratio of gradient between the defocused input image and the estimated image to calculate the depth map. The above methods evaluate LOD of a scene from the edge-detection of a defocused image modeled by convolution of a gaussian PSF and a sharp image. Thus they take little computational time and uncomplicated processing, but it is difficult to recover absolute depth without the relation between standard deviation and depth layer [10, 11].

Most methods of ADE are based on deconvolution to recover the depth from a defocused image. The features of the PSF are important to estimate the depth precisely, and it is necessary to specially design an optical stop with a coded pattern. Levin et al. [8] proposed the framework of ADE to recover absolute depth of scene and all-focus image using a camera with a centrosymmetric coded aperture. However, the centrosymmetric coded aperture makes it difficult to distinguish if the object is placed in front of or behind the focal plane. Zhou et al. [12] proposed a method using two asymmetric apertures that are optimized for reconstruction of defocused image. They recover the depth map over a broad range and obtain an all-focus image in high quality, but they need to take two defocused images with different coded apertures for the same scene.

In this paper, we propose a new ADE method to recover the depth of scene by using an asymmetric aperture based on the sharpness-assessment algorithm. We estimate the depth of scene in a broad range from 60 to 350 cm by using an asymmetric aperture. The proposed algorithm recovers the depth of scene by calculating the sharpness of the deconvolved images. In our experiments, we present the estimated depth and the difference in estimation compared to the ground truth. Also, we show the absolute depth map textured with an all-focus image for several objects placed at different distances. Therefore, we demonstrate that our algorithm provides a feasible solution to recover absolute depth of scene from a single defocused image.

This paper is organized as follows. In Section 2, the camera with asymmetric aperture is modeled. In Section 3, the sharpness-assessment algorithm is described along the entire procedure. In Section 4, we show the experimental results to demonstrate our proposed algorithm, and in Section 5 the conclusion is given.

### II. Modeling for a camera with an asymmetric aperture

When we take a picture, the scene apart from the focal plane is defocused, and this blurring becomes dominant as the scene gets farther away from the focal plane. Figure 1 shows a simple camera model representing a circle of confusion (CoC) and its radius when the half of the aperture size is 7 mm. The radius is related to the position of a point source from the focal plane. As the distance from the focal plane to the point source increases, the size of the CoC also increases. In this thin lens model, the radius of the blur circle σ is given by

Figure 1.Defocus model. (a) Geometry of the camera, and (b) the radius of the CoC when half of the aperture size is 7 mm.

σ=±A(1f/zobj1f/z11),

where A is the aperture radius, f is the focal length, zobj is the object’s distance, and z1 is the focal distance. When an object is located closer than the focal plane, the sign of Eq. (1) becomes negative.

Figure 1(b) shows the radius of the CoC when the distance ranges from 60 to 350 cm. Here the focal plane of the camera is at 160 cm. The radius of the CoC changes rapidly when the distance of the object is closer than the focal plane. Depth is better distinguished when the object is closer than the focal plane. On the other hand, when the object is farther than the focal plane, the variation of the CoC is very small and it is difficult to determine the depth of the object. So, we use these facts to determine the appropriate depth range.

In this study, the concept of ADE comes from the fact that the PSF differs according to capture distance. The captured CoC using a circular aperture has a simple symmetric shape, so it is impossible to distinguish whether the point source of light is located closer or farther than the focal plane. To solve this problem, we use a camera with an asymmetric aperture shaped like the numeral 7, as shown in Fig. 2(a). When the camera’s aperture is asymmetric, we get the PSF by taking a point light source with a camera. To easily move the point light source, it is displayed on a panel. The PSFs are obtained by shifting the display panel from 60 to 350 cm in 10 cm increments. Figure 2(b) shows several examples of captured PSFs when the focus plane is set to 160 cm.

Figure 2.Camera with an asymmetric aperture. (a) The shape of an aperture like a “7”, and (b) several examples of captured PSFs, from 60 to 350 cm.

The image of a simple planar scene is computed as the convolution of the scene x and the PSF of the camera P:

I=xP+n,

where is a convolution operation and n is the noise including the camera’s shot noise and aberration. Since the image in the scene is formed by convolution of the PSF and the depth layer, the defocused image is recovered by deconvolution using the corresponding PSF for the plane of the scene. On the other hand, the defocused image is not recovered clearly with a PSF at the wrong depth. Therefore, the depth of the defocused image is estimated as the depth of the PSF that produces the sharpest image.

### III. Absolute depth estimation based on the sharpness-assessment algorithm

In this section, we show a new method of ADE based on the sharpness-assessment algorithm. The process is shown in Fig. 3. First, the defocused image is deconvolved using the PSF set. Second, its high-frequency contents are obtained using edge-detection operators. Then, the defocused image is segmented with respect to the objects in the scene. To estimate the depths of the objects, the regions are segmented using a set of masks {Ok}, where the order of segmentation is indexed by k. Third, the depth from the pre-processed images is estimated using the sharpness-assessment algorithm. It is composed of the total-sum scaling normalization, denoising based on cumulative distribution function (CDF), and scoring the sharpness of the deconvolved image. The absolute depth map and all-focus image are obtained by applying this algorithm for every segmentation respectively.

Figure 3.Flow chart for absolute depth estimation (ADE) based on the sharpness-assessment algorithm.

### 3.1. Image Deconvolution

For deconvolution of a defocused image, we use the Richardson-Lucy method that is known as a non-linear iterative deconvolution algorithm [13, 14]. It is useful for retrieving a focused image when we know the PSF of the depth layer. In our experiments, the number of iterations is set to 20. The deconvolved image using the PSF is obtained by

Jt=deconvRL(I,Pt),

where t represents the order of the depth layer, ranging from 60 to 350 cm in 10 cm increments, and is an integer from 1 to 30.

Figure 4(a) shows the defocused image of the scene where a resolution chart is located 290 cm from the camera. As shown in Fig. 4(b), we compute the reconstructed image by deconvolution with the PSF at the corresponding depth.

Figure 4.Reconstruction of a defocused resolution chart. (a) Defocused image and (b) reconstructed image when the objects are located 290 cm from the camera.

### 3.2. Edge-detection

There are many gradient operators for edge-detection, such as Sobel, Prewitt, and Roberts, but these methods are too sensitive to choose proper parameters for various features of an image. Therefore, we use four derivative edge-detection operators in the x and y directions [15]. In the x-direction, the first and second derivative operators are defined by

xJt(i,j)=[Jt(i,j)Jt(i1,j)],

xxJt(i,j)=[Jt(i+2,j)Jt(i,j)][Jt(i,j)Jt(i2,j)]

For expanding the width of the edge, the window summations with respect to ∂xJt and ∂xxJt are defined by

Wx,t,k(i,j)=Ok(i,j) iwmi+wxJt(m,j),

Wxx,t,k(i,j)=Ok(i,j) iwmi+wxxJt(m,j).

Wx,t,k represents the window summation of ∂xJt from i-w to i + w when the x-axis coordinate i is given for each segment by masking Ok. Wy,t,k and Wyy,t,k are also computed in the same manner. After applying these window summations, the features of the edges stand out. In our experiments, we set the window size, w to 2.

Figure 5 shows the window summations of 1st and 2nd derivatives along the x-axis and y-axis respectively. Images are pre-processed for the region of interest of the resolution chart from the scene. The first derivative operator is used to compute the changes in image intensity, and the second derivative is used to localize the edges.

Figure 5.Window summations using (a) 1st derivative along the x-axis, (b) 1st derivative along the y-axis, (c) 2nd derivative along the x-axis, and (d) 2nd derivative along the y-axis.

### 3.3. Total-sum Scaling Normalization

The purpose of our depth estimation algorithm is to find the sharpest image among the defocused images. In general, a sharp edge has a greater energy density than a defocused edge under the same conditions, but unfortunately some defocused edges do have more energy than sharp edges. This comes from ringing artifacts, which are caused by deconvolution with an improper convolutional kernel having a different size. These ringing artifacts increase the energy of the defocused image, so that they obstruct accurate depth estimation. Therefore, we use total-sum scaling normalization to reduce the effect of these artifacts. The normalized window summations are defined by

NWx,t,k=Wx,t,kij Wx,t,k,

NWxx,t,k=Wxx,t,kij Wxx,t,k,

### 3.4. Denoising Based on the Cumulative Distribution Function

In the high-frequency parts of the edge image, there is a lot of noise that interferes with accurately assessing the sharpness. Therefore, we remove the noise contained in the high-frequency of each image. We discover the features of the noise from the histogram. Usually, the noise in edge images is located in the region of small magnitude, and there are many such pixels, as shown in Fig. 6(a). Figure 6(b) shows the CDF, which is useful to determine a threshold point. We divide the CDF graph into 10 sections along the y-axis, and then count the number of magnitudes of each section. The threshold for a section is determined in the section where the increment of the magnitude of the next section is at least twice the increment of the current section. The representative value for the section is selected as the maximum value of the range. In most cases, the threshold point is the magnitude where the CDF value is 0.9. Thus, the denoised normalized window summations are defined by

Figure 6.Denoising based on the cumulative distribution function (CDF). (a) Example of the histogram for an edge image, and (b) threshold point determined by the CDF.

DNWx,t,k(i,j)={NWx,t,k(i,j)forNWx,t,k>Threshold0otherwise,

DNWxx,t,k(i,j)={NWxx,t,k(i,j)forNWxx,t,k>Threshold0otherwise.

Figure 7 shows the denoised normalized window summations for 1st and 2nd derivatives with respect to x and y-axes. Compared to Figs. 5(a)5(d), the noise is eliminated successfully, except for the sharp components.

Figure 7.Denoised normalized window summations from (a) 1st derivative along the x-axis, (b) 1st derivative along the y-axis, (c) 2nd derivative along the x-axis, and (d) 2nd derivative along the y-axis.

### 3.5. Scoring the Sharpness of a Deconvolved Image

Commonly, the energy of a sharp edge is greater than that of a defocused edge. Based on this fact, the scores for sharpness are defined by

Sx,t,k=ij DNWx,t,k(i,j)Nx,t,k,

Sxx,t,k=ij DNWxx,t,k(i,j)Nxx,t,k,

where Nx,t,k and Nxx,t,k present the number of non-zero pixels in the kth segment.

The scores for sharpness are undesirably affected by the size of the PSF, so we need to compensate for this effect. The compensation factor λt is defined by

λt=1σt,

where σt is the radius of the CoC at the tth depth.

The sharpness of the deconvolved image is obtained by summing the overall scores. The total score for sharpness is defined by

St,ktotal=λt(Sx,t,k+Sy,t,k+Sxx,t,k+Syy,t,k).

We estimate the absolute depth of the object as the depth of the PSF for which the total score for sharpness has a maximum value within the depth range. The absolute depth for the kth segment is given by

Dk=argmaxt(St,ktotal).

Figure 8 plots the normalized total score for sharpness with respect to the depth, where each score is normalized by the highest score. The red circle at the peak represents the score of the target position. From the graph, the sharpness score increases as the depth of the PSF becomes close to the actual depth. The highest value occurs at 290 cm. Therefore, the depth of object is estimated as 290 cm, which corresponds to the ground-truth.

Figure 8.Normalized total score for sharpness, when the resolution chart is located at 290 cm.

Experimentally, we demonstrate the proposed algorithm for two examples. First, the depth of the target is estimated as the target moves from 80 to 350 cm in 30-cm increments. Since the PSFs are captured in 10-cm increments, the depth resolution is 10 cm. The focus of the camera is fixed at 160 cm in this study. We use a Canon EOS 650D DSLR (Canon, Tokyo, Japan) with a Nikon AF-S 50 mm f/1.8G lens (Nikon, Tokyo, Japan).

As shown in Fig. 9(a), the images of the resolution chart are taken by moving the target from 80 to 350 cm in 30-cm increments. The depth of the target is estimated using the sharpness-assessment algorithm. Figure 9(b) shows the normalized scores for different target depths. Figure 9(c) shows the difference between the estimated depth and the ground- truth from 80 to 350 cm. When the target is positioned at 170 cm, the depth difference is −20 cm; this means that the estimated depth is 150 cm. The depth difference is explained by the features of the PSF: As mentioned, the radius of the PSF is too small around the focal plane, and the change in PSF is slightly smaller behind the focal plane. Therefore, the accuracy of the depth estimation near and behind the focal plane is relatively low.

Figure 9.Depth estimation of the target. (a) Illustration of the experimental setup. (b) Normalized total scores for sharpness for several depths of target. (c) Depth difference, from 80 to 350 cm.

In Fig. 9(b), the values of two peaks are slightly different. The evaluated total score at 200 cm is only about 0.2% higher than that at 150 cm. Since our algorithm estimates the absolute depth of an object as the position with the highest total score of sharpness, the position of 200 cm is chosen. However, the double peak problem is a serious problem that can mislead with erroneous estimates. We think that this ambiguity of double peaks results from the “dead-zone” which is the region near the focal plane [8]. Near the focal plane it is relatively difficult to estimate the absolute depth precisely, due to the small variation in PSF. Therefore, the accuracy of depth estimation near the focal plane is relatively low. In addition, this small PSF brings about the ambiguity in the depth estimation, and it increases the opportunity for an additional peak to appear around 160 cm.

The second experiment is conducted on a defocused image containing a painting, a cup, a photo frame, and a post box positioned at different depths as shown in Fig. 10(a). They are positioned at 100, 160, 250, and 320 cm respectively. The focal plane of the camera is again set to 160 cm. Figure 10(b) shows the segmented image that comes from using the region merging algorithm proposed by Nock and Nielsen [16]. The regions of interest are numbered sequentially according to their depths. For each segmented region, the proposed depth estimation algorithm is applied. Figure 10(c) shows the combination of all four normalized window summations obtained from the deconvolved images for the corresponding real object depths. In this image each object is indicated by bright borders, which represent the segmented areas. When the boundary of an object overlaps with that of another object placed at a different depth, the segmentation process may distort the edge-sharpness of the original image; this is regarded as one of the factors that obstructs accurate depth estimation. For that reason, only an inner feature of a segmented area is used for depth estimation.

Figure 10.Image segmentation. (a) Captured image and (b) segmented image, by applying the region merging algorithm. (c) Combination of the normalized window summations of 1st and 2nd derivatives.

Figure 11(a) shows the normalized total scores for sharpness for the four regions. As shown in Fig. 11(b), the estimated depths are identical to the actual depths except for Region 3. In Region 3, the depth difference is −10 cm. Figure 11(c) shows the depth map textured with the all-focus image. The all-focus image is reconstructed by deconvolution using the PSF at the estimated depth for each region. On the other hand, the other regions having the table and wall are set to 350 cm. Therefore, the proposed algorithm provides a feasible solution to recover the absolute depth of the scene from a single defocused image.

Figure 11.Depth estimation for a defocused image. (a) Normalized total scores for sharpness, and (b) depth difference. (c) Depth map textured with the all-focus image.

In this paper, we have presented a new method to estimate the depth of scene by using an asymmetric aperture, based on the sharpness-assessment algorithm. The asymmetric aperture is used to distinguish whether the target is located closer or farther than the focal plane. The sharpness-assessment algorithm is composed of total-sum scaling normalization, denoising based on the CDF, and scoring of the sharpness of the deconvolved image. In our experiments we used an asymmetric aperture shaped like a “7”. With the proposed method, the depth of scene was estimated over the wide range from 60 to 350 cm and the depth difference was within −20 cm, even near and behind the focal plane. Therefore, we have demonstrated that our algorithm provides a feasible solution to recover the absolute depth of scene from a single defocused image. Even though the optimization of the features of an asymmetric aperture is very important to enhance the performance of depth estimation, we have focused on demonstration of the application of an asymmetric aperture and verification of the feasibility of our sharpness-assessment algorithm. In the future, we plan to replace the conventional camera lens with a multi-aperture lens to reduce the depth difference near and behind the focal plane. We also plan to optimize the coded aperture pattern for each aperture to enhance depth discrimination resolution and accuracy.

This research was supported by ‘The Cross-Ministry Giga Korea Project’ grant funded by Korea government (MSIT) (No. 1711116979, Development of Telecommunications Terminal with Digital Holographic Table-top Display).

1. J. H. Elder and S. W. Zucker, “Local scale control for edge detection and blur estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 699-716 (1998).
2. M. Subbarao, T.-C. Wei and G. Surya, “Focused image recovery from two defocused images recorded with different camera settings,” IEEE Trans. Image Process 4, 1613-1628 (1995).
3. C. Swain and T. Chen, “Defocus-based image segmentation,,” in Proc. International Conference on Acoustics, Speech, and Signal Processing-ICASSP, (Detroit, MI, USA, 1995). pp. 2403-2406.
4. M. Subbarao and G. Surya, “Depth from defocus: a spatial domain approach,” Int. J. Comput. Vis. 13, 271-294 (1994).
5. Z. Djemel and D. Francois, “Depth from defocus estimation in spatial domain,” Comput. Vis. Image Underst. 81, 143-165 (2001).
6. A. P. Pentland, “A new sense for depth of field,” IEEE Trans. Pattern Anal. Mach. Intell. 9, 523-531 (1987).
7. S. Bae and F. Durand, “Defocus magnification,” Comput. Graph. Forum 26, 571-579 (2007).
8. A. Levin, R. Fergus, F. Durand and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM Trans. Graph. 26, 70-es (2007).
9. S. Zhou and S. Terence, “Defocus map estimation from a single image,” Pattern Recognit. 44, 1852-1858 (2011).
10. S. H. Lai, C. W. Fu and S. Chang, “A generalized depth estimation algorithm with a single image,” IEEE Trans. Pattern Anal. Mach. Intell. 14, 405-411 (1992).
11. C. Chen and Y. Chen, “Recovering depth from a single image using spectral energy of the defocused step edge gradient,,” in Proc 18th IEEE International Conference on Image essing-ICIP, (Brussels, Belgium, 2011). pp. 1981-1984.
12. C. Zhou, S. Lin and S. K. Nayar, “Coded aperture pairs for depth from defocus,,” in Proc. IEEE 12th International Conference on Computer Vision-ICCV, (Kyoto, Japan, 2009). pp. 325-332.
13. W. H. Richardson, “Bayesian-based iterative method of image restoration,” J. Opt. Soc. America 62, 55-59 (1972).
14. L. B. Lucy, “An iterative technique for the rectification of observed distributions,” Astron. J. 79, 745-754 (1974).
15. J. Kumar, F. Chen and D. Doermann, “Sharpness estimation for document and scene images,,” in Proc. 21st International Conference on Pattern Recognition-ICPR2012, (Tsukuba, Japan, 2012). pp. 3292-3295.
16. R. Nock and F. Nielsen, “Statistical region merging,” IEEE Trans. Pattern Anal. Mach. Intell. 26, 1452-1458 (2004).

### Article

#### Article

Curr. Opt. Photon. 2021; 5(5): 514-523

Published online October 25, 2021 https://doi.org/10.3807/COPP.2021.5.5.514

## Absolute Depth Estimation Based on a Sharpness-assessment Algorithm for a Camera with an Asymmetric Aperture

Beomjun Kim, Daerak Heo, Woonchan Moon, Joonku Hahn

School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea

Correspondence to:jhahn@knu.ac.kr, ORCID 0000-0002-5038-7253

Received: June 14, 2021; Revised: July 19, 2021; Accepted: July 29, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

### Abstract

Methods for absolute depth estimation have received lots of interest, and most algorithms are concerned about how to minimize the difference between an input defocused image and an estimated defocused image. These approaches may increase the complexity of the algorithms to calculate the defocused image from the estimation of the focused image. In this paper, we present a new method to recover depth of scene based on a sharpness-assessment algorithm. The proposed algorithm estimates the depth of scene by calculating the sharpness of deconvolved images with a specific point-spread function (PSF). While most depth estimation studies evaluate depth of the scene only behind a focal plane, the proposed method evaluates a broad depth range both nearer and farther than the focal plane. This is accomplished using an asymmetric aperture, so the PSF at a position nearer than the focal plane is different from that at a position farther than the focal plane. From the image taken with a focal plane of 160 cm, the depth of object over the broad range from 60 to 350 cm is estimated at 10 cm resolution. With an asymmetric aperture, we demonstrate the feasibility of the sharpness-assessment algorithm to recover absolute depth of scene from a single defocused image.

Keywords: Coded aperture, Depth estimation, Image reconstruction

### I. INTRODUCTION

Depth estimation is one of the most important areas of study in three-dimensional (3D) metrology. Depth estimation occupies a crucial position in a variety of industries, as it has been actively used for self-driving cars, inspection of defects, 3D recognition, for example. Time-of-flight and structured-light illumination are regarded as representative technologies of depth estimation, where a specific light is emitted onto a target and the reflected light signal is collected by a detector. These technologies provide a depth map with high resolution and accuracy, but they usually require expensive and complicated optical instruments.

On the other hand, depth estimation from defocused images has the great advantage that it requires only one or two images from a single conventional camera and its system has a small form factor compared to previous ones. This method originates from the relation between level of defocus (LOD) and depth profile. When the object is placed outside of the depth of field, the captured image is defocused and LOD increases proportionally to the distance from the focal plane. Based on this fact, many studies of depth recovery have concentrated on obtaining LOD from defocused images [13]. Therefore, evaluation of LOD from the scene plays a significant role in various applications such as image deblurring, image segmentation and depth estimation.

In general, methods for depth recovery are classified into relative depth estimation (RDE) and absolute depth estimation (ADE) from the defocused images. In RDE studies, the LOD is evaluated by computing the standard deviation, which is inferred from edge-detection. The standard deviation is related to the size of the point-spread function (PSF) and can be obtained from the high-frequency content using the derivative operator [4, 5]. Pentland [6] proposed the framework of RDE to obtain the focal disparity map by utilizing the relation between high frequency content coming from the difference of two defocused images. But this approach has an inconvenience: It needs to capture several images from the same scene, with different camera parameters. Bae and Durand [7] recovered the depth map from a single defocused image by employing the method of Elder and Zucker [1] and Levin et al. [8]. Zhou and Terence [9] use a ratio of gradient between the defocused input image and the estimated image to calculate the depth map. The above methods evaluate LOD of a scene from the edge-detection of a defocused image modeled by convolution of a gaussian PSF and a sharp image. Thus they take little computational time and uncomplicated processing, but it is difficult to recover absolute depth without the relation between standard deviation and depth layer [10, 11].

Most methods of ADE are based on deconvolution to recover the depth from a defocused image. The features of the PSF are important to estimate the depth precisely, and it is necessary to specially design an optical stop with a coded pattern. Levin et al. [8] proposed the framework of ADE to recover absolute depth of scene and all-focus image using a camera with a centrosymmetric coded aperture. However, the centrosymmetric coded aperture makes it difficult to distinguish if the object is placed in front of or behind the focal plane. Zhou et al. [12] proposed a method using two asymmetric apertures that are optimized for reconstruction of defocused image. They recover the depth map over a broad range and obtain an all-focus image in high quality, but they need to take two defocused images with different coded apertures for the same scene.

In this paper, we propose a new ADE method to recover the depth of scene by using an asymmetric aperture based on the sharpness-assessment algorithm. We estimate the depth of scene in a broad range from 60 to 350 cm by using an asymmetric aperture. The proposed algorithm recovers the depth of scene by calculating the sharpness of the deconvolved images. In our experiments, we present the estimated depth and the difference in estimation compared to the ground truth. Also, we show the absolute depth map textured with an all-focus image for several objects placed at different distances. Therefore, we demonstrate that our algorithm provides a feasible solution to recover absolute depth of scene from a single defocused image.

This paper is organized as follows. In Section 2, the camera with asymmetric aperture is modeled. In Section 3, the sharpness-assessment algorithm is described along the entire procedure. In Section 4, we show the experimental results to demonstrate our proposed algorithm, and in Section 5 the conclusion is given.

### II. Modeling for a camera with an asymmetric aperture

When we take a picture, the scene apart from the focal plane is defocused, and this blurring becomes dominant as the scene gets farther away from the focal plane. Figure 1 shows a simple camera model representing a circle of confusion (CoC) and its radius when the half of the aperture size is 7 mm. The radius is related to the position of a point source from the focal plane. As the distance from the focal plane to the point source increases, the size of the CoC also increases. In this thin lens model, the radius of the blur circle σ is given by

Figure 1. Defocus model. (a) Geometry of the camera, and (b) the radius of the CoC when half of the aperture size is 7 mm.

$σ=±A(1−f/zobj1−f/z1−1),$

where A is the aperture radius, f is the focal length, zobj is the object’s distance, and z1 is the focal distance. When an object is located closer than the focal plane, the sign of Eq. (1) becomes negative.

Figure 1(b) shows the radius of the CoC when the distance ranges from 60 to 350 cm. Here the focal plane of the camera is at 160 cm. The radius of the CoC changes rapidly when the distance of the object is closer than the focal plane. Depth is better distinguished when the object is closer than the focal plane. On the other hand, when the object is farther than the focal plane, the variation of the CoC is very small and it is difficult to determine the depth of the object. So, we use these facts to determine the appropriate depth range.

In this study, the concept of ADE comes from the fact that the PSF differs according to capture distance. The captured CoC using a circular aperture has a simple symmetric shape, so it is impossible to distinguish whether the point source of light is located closer or farther than the focal plane. To solve this problem, we use a camera with an asymmetric aperture shaped like the numeral 7, as shown in Fig. 2(a). When the camera’s aperture is asymmetric, we get the PSF by taking a point light source with a camera. To easily move the point light source, it is displayed on a panel. The PSFs are obtained by shifting the display panel from 60 to 350 cm in 10 cm increments. Figure 2(b) shows several examples of captured PSFs when the focus plane is set to 160 cm.

Figure 2. Camera with an asymmetric aperture. (a) The shape of an aperture like a “7”, and (b) several examples of captured PSFs, from 60 to 350 cm.

The image of a simple planar scene is computed as the convolution of the scene x and the PSF of the camera P:

$I=x⊗P+n,$

where is a convolution operation and n is the noise including the camera’s shot noise and aberration. Since the image in the scene is formed by convolution of the PSF and the depth layer, the defocused image is recovered by deconvolution using the corresponding PSF for the plane of the scene. On the other hand, the defocused image is not recovered clearly with a PSF at the wrong depth. Therefore, the depth of the defocused image is estimated as the depth of the PSF that produces the sharpest image.

### III. Absolute depth estimation based on the sharpness-assessment algorithm

In this section, we show a new method of ADE based on the sharpness-assessment algorithm. The process is shown in Fig. 3. First, the defocused image is deconvolved using the PSF set. Second, its high-frequency contents are obtained using edge-detection operators. Then, the defocused image is segmented with respect to the objects in the scene. To estimate the depths of the objects, the regions are segmented using a set of masks {Ok}, where the order of segmentation is indexed by k. Third, the depth from the pre-processed images is estimated using the sharpness-assessment algorithm. It is composed of the total-sum scaling normalization, denoising based on cumulative distribution function (CDF), and scoring the sharpness of the deconvolved image. The absolute depth map and all-focus image are obtained by applying this algorithm for every segmentation respectively.

Figure 3. Flow chart for absolute depth estimation (ADE) based on the sharpness-assessment algorithm.

### 3.1. Image Deconvolution

For deconvolution of a defocused image, we use the Richardson-Lucy method that is known as a non-linear iterative deconvolution algorithm [13, 14]. It is useful for retrieving a focused image when we know the PSF of the depth layer. In our experiments, the number of iterations is set to 20. The deconvolved image using the PSF is obtained by

$Jt=deconvRL(I,Pt),$

where t represents the order of the depth layer, ranging from 60 to 350 cm in 10 cm increments, and is an integer from 1 to 30.

Figure 4(a) shows the defocused image of the scene where a resolution chart is located 290 cm from the camera. As shown in Fig. 4(b), we compute the reconstructed image by deconvolution with the PSF at the corresponding depth.

Figure 4. Reconstruction of a defocused resolution chart. (a) Defocused image and (b) reconstructed image when the objects are located 290 cm from the camera.

### 3.2. Edge-detection

There are many gradient operators for edge-detection, such as Sobel, Prewitt, and Roberts, but these methods are too sensitive to choose proper parameters for various features of an image. Therefore, we use four derivative edge-detection operators in the x and y directions [15]. In the x-direction, the first and second derivative operators are defined by

$∂xJt(i,j)=[Jt(i,j)−Jt(i−1,j)],$

$∂xxJt(i,j)=[Jt(i+2,j)−Jt(i,j)]−[Jt(i,j)−Jt(i−2,j)]$

For expanding the width of the edge, the window summations with respect to ∂xJt and ∂xxJt are defined by

$Wx,t,k(i,j)=Ok(i,j)∑ i−w≤m≤i+w∂xJt(m,j),$

$Wxx,t,k(i,j)=Ok(i,j)∑ i−w≤m≤i+w∂xxJt(m,j).$

Wx,t,k represents the window summation of ∂xJt from i-w to i + w when the x-axis coordinate i is given for each segment by masking Ok. Wy,t,k and Wyy,t,k are also computed in the same manner. After applying these window summations, the features of the edges stand out. In our experiments, we set the window size, w to 2.

Figure 5 shows the window summations of 1st and 2nd derivatives along the x-axis and y-axis respectively. Images are pre-processed for the region of interest of the resolution chart from the scene. The first derivative operator is used to compute the changes in image intensity, and the second derivative is used to localize the edges.

Figure 5. Window summations using (a) 1st derivative along the x-axis, (b) 1st derivative along the y-axis, (c) 2nd derivative along the x-axis, and (d) 2nd derivative along the y-axis.

### 3.3. Total-sum Scaling Normalization

The purpose of our depth estimation algorithm is to find the sharpest image among the defocused images. In general, a sharp edge has a greater energy density than a defocused edge under the same conditions, but unfortunately some defocused edges do have more energy than sharp edges. This comes from ringing artifacts, which are caused by deconvolution with an improper convolutional kernel having a different size. These ringing artifacts increase the energy of the defocused image, so that they obstruct accurate depth estimation. Therefore, we use total-sum scaling normalization to reduce the effect of these artifacts. The normalized window summations are defined by

$NWx,t,k=Wx,t,k∑i∑j Wx,t,k,$

$NWxx,t,k=Wxx,t,k∑i∑j Wxx,t,k,$

### 3.4. Denoising Based on the Cumulative Distribution Function

In the high-frequency parts of the edge image, there is a lot of noise that interferes with accurately assessing the sharpness. Therefore, we remove the noise contained in the high-frequency of each image. We discover the features of the noise from the histogram. Usually, the noise in edge images is located in the region of small magnitude, and there are many such pixels, as shown in Fig. 6(a). Figure 6(b) shows the CDF, which is useful to determine a threshold point. We divide the CDF graph into 10 sections along the y-axis, and then count the number of magnitudes of each section. The threshold for a section is determined in the section where the increment of the magnitude of the next section is at least twice the increment of the current section. The representative value for the section is selected as the maximum value of the range. In most cases, the threshold point is the magnitude where the CDF value is 0.9. Thus, the denoised normalized window summations are defined by

Figure 6. Denoising based on the cumulative distribution function (CDF). (a) Example of the histogram for an edge image, and (b) threshold point determined by the CDF.

$DNWx,t,k(i,j)={NWx,t,k(i,j)for NWx,t,k>Threshold0otherwise,$

$DNWxx,t,k(i,j)={NWxx,t,k(i,j)for NWxx,t,k>Threshold0otherwise.$

Figure 7 shows the denoised normalized window summations for 1st and 2nd derivatives with respect to x and y-axes. Compared to Figs. 5(a)5(d), the noise is eliminated successfully, except for the sharp components.

Figure 7. Denoised normalized window summations from (a) 1st derivative along the x-axis, (b) 1st derivative along the y-axis, (c) 2nd derivative along the x-axis, and (d) 2nd derivative along the y-axis.

### 3.5. Scoring the Sharpness of a Deconvolved Image

Commonly, the energy of a sharp edge is greater than that of a defocused edge. Based on this fact, the scores for sharpness are defined by

$Sx,t,k=∑i∑j DNWx,t,k(i,j)Nx,t,k,$

$Sxx,t,k=∑i∑j DNWxx,t,k(i,j)Nxx,t,k,$

where Nx,t,k and Nxx,t,k present the number of non-zero pixels in the kth segment.

The scores for sharpness are undesirably affected by the size of the PSF, so we need to compensate for this effect. The compensation factor λt is defined by

$λt=1−σt,$

where σt is the radius of the CoC at the tth depth.

The sharpness of the deconvolved image is obtained by summing the overall scores. The total score for sharpness is defined by

$St,ktotal=λt(Sx,t,k+Sy,t,k+Sxx,t,k+Syy,t,k).$

We estimate the absolute depth of the object as the depth of the PSF for which the total score for sharpness has a maximum value within the depth range. The absolute depth for the kth segment is given by

$Dk=arg maxt(St,ktotal).$

Figure 8 plots the normalized total score for sharpness with respect to the depth, where each score is normalized by the highest score. The red circle at the peak represents the score of the target position. From the graph, the sharpness score increases as the depth of the PSF becomes close to the actual depth. The highest value occurs at 290 cm. Therefore, the depth of object is estimated as 290 cm, which corresponds to the ground-truth.

Figure 8. Normalized total score for sharpness, when the resolution chart is located at 290 cm.

### IV. EXPERIMENTAL RESULTS

Experimentally, we demonstrate the proposed algorithm for two examples. First, the depth of the target is estimated as the target moves from 80 to 350 cm in 30-cm increments. Since the PSFs are captured in 10-cm increments, the depth resolution is 10 cm. The focus of the camera is fixed at 160 cm in this study. We use a Canon EOS 650D DSLR (Canon, Tokyo, Japan) with a Nikon AF-S 50 mm f/1.8G lens (Nikon, Tokyo, Japan).

As shown in Fig. 9(a), the images of the resolution chart are taken by moving the target from 80 to 350 cm in 30-cm increments. The depth of the target is estimated using the sharpness-assessment algorithm. Figure 9(b) shows the normalized scores for different target depths. Figure 9(c) shows the difference between the estimated depth and the ground- truth from 80 to 350 cm. When the target is positioned at 170 cm, the depth difference is −20 cm; this means that the estimated depth is 150 cm. The depth difference is explained by the features of the PSF: As mentioned, the radius of the PSF is too small around the focal plane, and the change in PSF is slightly smaller behind the focal plane. Therefore, the accuracy of the depth estimation near and behind the focal plane is relatively low.

Figure 9. Depth estimation of the target. (a) Illustration of the experimental setup. (b) Normalized total scores for sharpness for several depths of target. (c) Depth difference, from 80 to 350 cm.

In Fig. 9(b), the values of two peaks are slightly different. The evaluated total score at 200 cm is only about 0.2% higher than that at 150 cm. Since our algorithm estimates the absolute depth of an object as the position with the highest total score of sharpness, the position of 200 cm is chosen. However, the double peak problem is a serious problem that can mislead with erroneous estimates. We think that this ambiguity of double peaks results from the “dead-zone” which is the region near the focal plane [8]. Near the focal plane it is relatively difficult to estimate the absolute depth precisely, due to the small variation in PSF. Therefore, the accuracy of depth estimation near the focal plane is relatively low. In addition, this small PSF brings about the ambiguity in the depth estimation, and it increases the opportunity for an additional peak to appear around 160 cm.

The second experiment is conducted on a defocused image containing a painting, a cup, a photo frame, and a post box positioned at different depths as shown in Fig. 10(a). They are positioned at 100, 160, 250, and 320 cm respectively. The focal plane of the camera is again set to 160 cm. Figure 10(b) shows the segmented image that comes from using the region merging algorithm proposed by Nock and Nielsen [16]. The regions of interest are numbered sequentially according to their depths. For each segmented region, the proposed depth estimation algorithm is applied. Figure 10(c) shows the combination of all four normalized window summations obtained from the deconvolved images for the corresponding real object depths. In this image each object is indicated by bright borders, which represent the segmented areas. When the boundary of an object overlaps with that of another object placed at a different depth, the segmentation process may distort the edge-sharpness of the original image; this is regarded as one of the factors that obstructs accurate depth estimation. For that reason, only an inner feature of a segmented area is used for depth estimation.

Figure 10. Image segmentation. (a) Captured image and (b) segmented image, by applying the region merging algorithm. (c) Combination of the normalized window summations of 1st and 2nd derivatives.

Figure 11(a) shows the normalized total scores for sharpness for the four regions. As shown in Fig. 11(b), the estimated depths are identical to the actual depths except for Region 3. In Region 3, the depth difference is −10 cm. Figure 11(c) shows the depth map textured with the all-focus image. The all-focus image is reconstructed by deconvolution using the PSF at the estimated depth for each region. On the other hand, the other regions having the table and wall are set to 350 cm. Therefore, the proposed algorithm provides a feasible solution to recover the absolute depth of the scene from a single defocused image.

Figure 11. Depth estimation for a defocused image. (a) Normalized total scores for sharpness, and (b) depth difference. (c) Depth map textured with the all-focus image.

### V. CONCLUSION

In this paper, we have presented a new method to estimate the depth of scene by using an asymmetric aperture, based on the sharpness-assessment algorithm. The asymmetric aperture is used to distinguish whether the target is located closer or farther than the focal plane. The sharpness-assessment algorithm is composed of total-sum scaling normalization, denoising based on the CDF, and scoring of the sharpness of the deconvolved image. In our experiments we used an asymmetric aperture shaped like a “7”. With the proposed method, the depth of scene was estimated over the wide range from 60 to 350 cm and the depth difference was within −20 cm, even near and behind the focal plane. Therefore, we have demonstrated that our algorithm provides a feasible solution to recover the absolute depth of scene from a single defocused image. Even though the optimization of the features of an asymmetric aperture is very important to enhance the performance of depth estimation, we have focused on demonstration of the application of an asymmetric aperture and verification of the feasibility of our sharpness-assessment algorithm. In the future, we plan to replace the conventional camera lens with a multi-aperture lens to reduce the depth difference near and behind the focal plane. We also plan to optimize the coded aperture pattern for each aperture to enhance depth discrimination resolution and accuracy.

### ACKNOWLEDGEMENT

This research was supported by ‘The Cross-Ministry Giga Korea Project’ grant funded by Korea government (MSIT) (No. 1711116979, Development of Telecommunications Terminal with Digital Holographic Table-top Display).

### Fig 1.

Figure 1.Defocus model. (a) Geometry of the camera, and (b) the radius of the CoC when half of the aperture size is 7 mm.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 2.

Figure 2.Camera with an asymmetric aperture. (a) The shape of an aperture like a “7”, and (b) several examples of captured PSFs, from 60 to 350 cm.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 3.

Figure 3.Flow chart for absolute depth estimation (ADE) based on the sharpness-assessment algorithm.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 4.

Figure 4.Reconstruction of a defocused resolution chart. (a) Defocused image and (b) reconstructed image when the objects are located 290 cm from the camera.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 5.

Figure 5.Window summations using (a) 1st derivative along the x-axis, (b) 1st derivative along the y-axis, (c) 2nd derivative along the x-axis, and (d) 2nd derivative along the y-axis.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 6.

Figure 6.Denoising based on the cumulative distribution function (CDF). (a) Example of the histogram for an edge image, and (b) threshold point determined by the CDF.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 7.

Figure 7.Denoised normalized window summations from (a) 1st derivative along the x-axis, (b) 1st derivative along the y-axis, (c) 2nd derivative along the x-axis, and (d) 2nd derivative along the y-axis.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 8.

Figure 8.Normalized total score for sharpness, when the resolution chart is located at 290 cm.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 9.

Figure 9.Depth estimation of the target. (a) Illustration of the experimental setup. (b) Normalized total scores for sharpness for several depths of target. (c) Depth difference, from 80 to 350 cm.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 10.

Figure 10.Image segmentation. (a) Captured image and (b) segmented image, by applying the region merging algorithm. (c) Combination of the normalized window summations of 1st and 2nd derivatives.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### Fig 11.

Figure 11.Depth estimation for a defocused image. (a) Normalized total scores for sharpness, and (b) depth difference. (c) Depth map textured with the all-focus image.
Current Optics and Photonics 2021; 5: 514-523https://doi.org/10.3807/COPP.2021.5.5.514

### References

1. J. H. Elder and S. W. Zucker, “Local scale control for edge detection and blur estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 699-716 (1998).
2. M. Subbarao, T.-C. Wei and G. Surya, “Focused image recovery from two defocused images recorded with different camera settings,” IEEE Trans. Image Process 4, 1613-1628 (1995).
3. C. Swain and T. Chen, “Defocus-based image segmentation,,” in Proc. International Conference on Acoustics, Speech, and Signal Processing-ICASSP, (Detroit, MI, USA, 1995). pp. 2403-2406.
4. M. Subbarao and G. Surya, “Depth from defocus: a spatial domain approach,” Int. J. Comput. Vis. 13, 271-294 (1994).
5. Z. Djemel and D. Francois, “Depth from defocus estimation in spatial domain,” Comput. Vis. Image Underst. 81, 143-165 (2001).
6. A. P. Pentland, “A new sense for depth of field,” IEEE Trans. Pattern Anal. Mach. Intell. 9, 523-531 (1987).
7. S. Bae and F. Durand, “Defocus magnification,” Comput. Graph. Forum 26, 571-579 (2007).
8. A. Levin, R. Fergus, F. Durand and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM Trans. Graph. 26, 70-es (2007).
9. S. Zhou and S. Terence, “Defocus map estimation from a single image,” Pattern Recognit. 44, 1852-1858 (2011).
10. S. H. Lai, C. W. Fu and S. Chang, “A generalized depth estimation algorithm with a single image,” IEEE Trans. Pattern Anal. Mach. Intell. 14, 405-411 (1992).
11. C. Chen and Y. Chen, “Recovering depth from a single image using spectral energy of the defocused step edge gradient,,” in Proc 18th IEEE International Conference on Image essing-ICIP, (Brussels, Belgium, 2011). pp. 1981-1984.
12. C. Zhou, S. Lin and S. K. Nayar, “Coded aperture pairs for depth from defocus,,” in Proc. IEEE 12th International Conference on Computer Vision-ICCV, (Kyoto, Japan, 2009). pp. 325-332.
13. W. H. Richardson, “Bayesian-based iterative method of image restoration,” J. Opt. Soc. America 62, 55-59 (1972).
14. L. B. Lucy, “An iterative technique for the rectification of observed distributions,” Astron. J. 79, 745-754 (1974).
15. J. Kumar, F. Chen and D. Doermann, “Sharpness estimation for document and scene images,,” in Proc. 21st International Conference on Pattern Recognition-ICPR2012, (Tsukuba, Japan, 2012). pp. 3292-3295.
16. R. Nock and F. Nielsen, “Statistical region merging,” IEEE Trans. Pattern Anal. Mach. Intell. 26, 1452-1458 (2004).

Wonshik Choi,
Editor-in-chief