Ex) Article Title, Author, Keywords
Current Optics
and Photonics
Ex) Article Title, Author, Keywords
Curr. Opt. Photon. 2025; 9(1): 55-64
Published online February 25, 2025 https://doi.org/10.3807/COPP.2025.9.1.55
Copyright © Optical Society of Korea.
Dong-Ha Shin, Chee-Hyeok Song, Seung-Yeol Lee
Corresponding author: *seungyeol@knu.ac.kr, ORCID 0000-0002-8987-9749
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
As demand for high-resolution holographic displays in augmented and virtual reality (AR/VR) increases, the limitations of traditional computer-generated holography (CGH) upscaling methods, including bicubic interpolation and deep learning-based techniques, become apparent. These methods predominantly estimate additional pixels without considering the reduction of pixel pitch, inherently constraining their capacity to effectively expand the viewing angle. Our study introduces a novel approach for viewing angle expansion through light field (LF) extrapolation by applying an object detection algorithm. This process starts by analyzing the object position and depth information of each LF view extracted from CGH patterns with the object detection algorithm. The use of these data allows us to extrapolate LF views beyond their initial viewing angle limit. Subsequently, these expanded LF views are resynthesized into the CGH format to expand the viewing angle. With our approach, the viewing angle was successfully doubled from an initial 3.54 degrees to 7.09 degrees by upscaling a 2K 7.2 μm CGH to a 4K 3.6 μm CGH, which was verified with both numerical simulation and optical experiments.
Keywords: Computer-generated holography, Digital holography upscaling, Light field, Object detection algorithm, Viewing angle
OCIS codes: (090.1705) Color holography; (090.1760) Computer holography; (090.1995) Digital holography
As augmented and virtual reality (AR/VR) technologies advance, the demand for more immersive and realistic visual displays increases [1–3]. In this context, digital holography, particularly computer-generated holography (CGH), stands out as a promising technology due to its ability to reproduce high-quality three-dimensional images [4–6]. CGH offers sufficiently large depth of field and parallax that greatly enhance the user experience in AR/VR applications, and makes the interaction with virtual elements much more natural [7, 8].
A fundamental challenge in CGH display systems is achieving both wide viewing angles and adequate display size simultaneously. The viewing angle of a CGH increases in approximately inverse proportion to the pixel pitch [9], while the physical size is determined by the product of pixel pitch and the pixel number. This inherent trade-off between viewing angle and display size is known as the space-bandwidth product (SBP) problem [10, 11].
To illustrate this challenge, consider a CGH with 2,000 × 2,000 pixels and a pixel pitch of 0.1 mm. While this configuration achieves a reasonable display size of 20 cm × 20 cm, it yields a severely limited viewing angle of only 0.36 degrees. Conversely, reducing the pixel pitch to 1 μm would expand the viewing angle to 36 degrees, but at the cost of shrinking the display size to a mere 2 × 2 mm2. This trade-off becomes even more pronounced for practical applications - achieving a 45-degree viewing angle at 633 nm wavelength requires a pixel pitch below 0.8 μm, while maintaining a display size of 5 cm × 5 cm would demand more than four billion pixels.
Existing upscaling techniques have primarily focused on increasing the physical size of CGH patterns while maintaining the original pixel pitch [12–14]. These approaches, including deep learning-based super-resolution and traditional bicubic interpolation, aim to generate additional pixels to enlarge the hologram. However, by maintaining the original pixel pitch, these methods inherently preserve the limited viewing angle of the source hologram, constraining the overall viewing experience despite the increased size.
The fundamental limitation of current upscaling approaches lies in their treatment of pixel pitch. While they successfully increase resolution and physical size, they do not address the critical relationship between pixel pitch and viewing angle. Our research introduces a novel paradigm in CGH upscaling by specifically targeting pixel pitch reduction while maintaining physical dimensions. This approach directly addresses the viewing angle limitations that persist in conventional upscaling methods.
In this work, a method for viewing angle expansion using light field (LF) extrapolation is proposed, with the application of object detection algorithms. By analyzing object position and depth information across multiple LF views extracted from CGH patterns, additional views beyond the initial viewing angle limits are effectively extrapolated. These expanded LF views are then resynthesized into CGH format with reduced pixel pitch, enabling true viewing angle expansion. With our approach, we successfully demonstrated a doubling of the viewing angle from 3.54° to 7.09° by reducing the pixel pitch from 7.2 μm to 3.6 μm while maintaining the physical dimensions during the upscaling process from 2K to 4K resolution. These results were verified by both numerical simulations and optical experiments, confirming the practical viability of our method for expanding viewing angles in holographic displays.
Before explaining the key principles of the proposed work, it is necessary to briefly review the relationship between the pixel pitch of CGH and the viewing angle, as formulated by the diffraction angle equation (Eq. 1). The angular spread of diffracted light of a pixelated structure such as a spatial light modulator (SLM) is given as,
where θdiff is the first order diffraction angle of light diffracted from SLM, λ is the wavelength of the incident light, and p is the pixel pitch of the SLM. In Fig. 1, viewing angle characteristics for red (640 nm), green (520 nm), and blue (445 nm) light as a function of pixel pitch are plotted for general information.
In Fig. 2(a), the schematic of light field view composition after applying a conventional upscaling method, such as bicubic interpolation, is shown. The conventional CGH upscaling method primarily focuses on increasing the number of pixels without considering the reduction of pixel pitch or expansion of viewing angle. As a result, while the resolution is increased, the viewing angle remains limited because the pixel pitch reduction necessary for broader viewing angles is not achieved.
The upscaled image maintains the original viewing angles and merely increases the resolution by adding more pixels within the same angular spread.
In contrast, Fig. 2(b) demonstrates the proposed upscaling method. Our approach involves transforming CGH patterns into light fields, which consist of a set of 2D images viewed from various angles. By applying object detection algorithms, we manipulate and extrapolate these images effectively. The extrapolated light fields are then resynthesized into the CGH format, achieving both upscaling of resolution and reduction of pixel pitch. This transformation allows for a significant expansion of the viewing angle, as shown by the increased angular spread in the figure. Consequently, this dual enhancement of resolution and viewing angle, along with the ability to extrapolate images and re-synthesize them into CGH, is crucial for producing more immersive holographic displays.
The overall calculation process of proposed method is depicted in Fig. 3, beginning with the light field (e.g., 16 by 16 LF grid) extracted from CGH data of 2,000 × 2,000 resolution, 7.2 μm pixel pitch. Upon retrieval, these light fields are subject to an expansion process with an object detection algorithm, which facilitates the generation of extrapolated light field views beyond the initial field of view (FoV). The augmented light field (e.g., 32 by 32 LF grid) views enable us to achieve a broader viewing angle, surpassing the limitations inherent in the original light field data. Subsequently, these extended views are intricately resynthesized into hologram format, resulting in an upscaled CGH (e.g., 4K, 3.6 μm) with an improved angular spectrum.
The first step of the proposed method is to extract light fields from the original low-resolution CGH. The light fields are composed of various views captured with either perspective or orthographic projection geometries. We chose orthographic projection for its compatibility with subsequent processing steps and its ability to simplify the computations required for hogel-based CGH techniques. This geometry ensures that the extracted views are suitable for the image processing techniques employed in our approach.
Figure 4 delineates the process of extracting orthographic amplitude view images from a given CGH pattern. The orthographic amplitude view Ufx0,fy0 (x, y) corresponding to a distinct angle (θx0, θy0) is derived with the bandpass filtering of the hologram in a Fourier domain, which is described as [15],
where F[•] and F−1[•] denote the Fourier transform and the inverse Fourier transform, respectively. The binary mask Mfx0,fy0 (fx, fy) is designated as “1” within the bandwidth of Bp and “0” otherwise, centered at (fx0, fy0) = (sin θx0 / λ, sin θy0 / λ), where λ represents the wavelength.
Since the spatial frequency components near (fx, fy) correspond to light waves near angle (θx, θy) = [sin−1(λfx), sin−1(λfy)], the chosen bandwidth Bp around (fx0, fy0) dictates the angular scope of (λBp, λBp) surrounding the central angle (θx0, θy0) = [sin−1(λfx0), sin−1(λfy0)]. Therefore, broader Bp implies a wider angular coverage for each orthographic view extraction, albeit at the cost of reduced angular specificity.
The pixel pitch of resultant orthographic amplitude view, (∆xapparent, ∆yapparent), is the same as those of the initial CGH (∆xH, ∆yH). Nonetheless, the quality and resolution of the orthographic view image depends on Bp.
Here, apparent pixel pitch refers to the pixel pitch of the extracted view, which remains unchanged from the original CGH. On the other hands, Nyquist sampling pitch is determined by the bandwidth Bp and is inversely proportional to it. As Bp increases, the Nyquist sampling pitch decreases, enhancing the resolution of the orthographic views.
During the extraction process with Eq. (2), Bp remains less than the hologram’s bandwidth BH, determined by the hologram’s pixel pitch (∆xH, ∆yH), i.e., BH = 1/∆xH = 1/∆yH. It is important to note that the 3D scene’s inherent bandwidth, excluding any random phase carrier wave components in the hologram, might be less than the hologram’s bandwidth BH.
In this work, we set Bp = BH/8. This choice was guided by practical considerations in our simulation and experimental environment. Starting from an original hologram of 2,048 × 2,048 pixels, using one-eighth of the bandwidth reduces each extracted orthographic LF view to a 256 × 256-pixel resolution. Preliminary tests indicated that this resolution is sufficient for reliably detecting and identifying objects (the letters “KNU” and “IPOD” in our case) while maintaining manageable computational complexity. In other words, 256 × 256 was found to be a minimum resolution at which object shapes and positions remain discernible and can be effectively processed by the object detection algorithm.
While we employed Bp = BH/8 as a practical rule of thumb, this ratio is not an absolute standard. Researchers working with different hologram resolutions, object complexities, or computational resources may choose a different ratio. For example, a larger fraction (e.g., Bp = BH/4) could provide finer image details and potentially more accurate object detection, but at the expense of greater computational effort. Conversely, a smaller fraction might still yield recognizable patterns for simpler scenes while reducing computation. Thus, the selection of Bp should be tailored to the specific requirements and constraints of each application.
To detect objects in LF images containing numerous perspective views, our research uses the YOLOv5 model for automatic and efficient object detection. Unlike other classification algorithms that merely categorize images, YOLOv5 object detection models furnish bounding boxes that delineate an object’s location, size, and class, offering a comprehensive understanding of the image content [16, 17].
The architecture of YOLOv5 is tripartite: It consists of a backbone that extracts features at various scales through convolution and pooling layers, a neck that enhances performance by fusing these features through the path aggregation network (PANnet), and a head that employs convolutional layers to project the features into the output space. The outputs include the central coordinates, width, height parameters of detected objects, and their associated confidence scores. In the context of LF imaging in our research, the inputs are composed of a collection of 16 × 16 orthographic projection views, each with a resolution of 256 × 256, resulting in an aggregate resolution of 4,096 × 4,096. Given the substantial image size, we employed the YOLOv5 large model (YOLOv5l) for effective input data processing.
Prior to object detection with YOLOv5l, we prepared virtual objects such as “KNU” and “IPOD” from a single view of the generated LF images, as illustrated in Fig. 5. For the training process, prepared objects were resized from 0.8 to 1.2 times their original dimensions and superimposed with random noise up to 30% of the maximum signal contrast. These objects were then randomly positioned on a 256 × 256 resolution background to create a dataset of 1,000 training images. The YOLOv5l was trained with this dataset to efficiently detect each object from entire LF views for the acquisition of positional information of objects across different views.
Although we used a custom dataset generator for the specific recognition of letter objects in our pipeline, the adoption of widely used large-scale datasets such as ImageNet or common objects in context (COCO, a large-scale object detection, segmentation, and captioning dataset), or the application of zero-shot segmentation methods such as “Segment Anything,” or zero-shot object detection algorithms could expand the application of our research methodology to a broader range of general objects. This approach would extend the versatility and applicability of our techniques and make it possible to detect and analyze various objects in LF images without the need for predefined or extensive training data sets specific to our experimental setup.
While our current framework effectively handles relatively simple synthetic objects and moderate levels of overlap, the approach faces inherent limitations in more complex scenarios. In the case of substantial occlusions where one object significantly obscures another, the ability of object detection algorithms such as YOLOv5 to accurately identify all objects may deteriorate. Holographic reconstructions inherently represent the coherent superposition of fields, and in some cases, partial overlap may still yield distinguishable intensity patterns that enable detection of individual objects. However, severe occlusions remove critical visual cues, restricting recognition performance and hindering accurate positional and depth estimation for the obscured objects.
To mitigate this challenge, the application of multiple LF views is necessary. Even if one view fails to detect a highly occluded object, other views where the object is partially visible can still provide pertinent positional and depth information. By fusing and interpolating data across multiple successfully detected views, it is possible to infer the object’s location and shape in the obscured regions and improve robustness against overlaps. Moreover, integrating advanced segmentation techniques such as zero-shot segmentation (e.g., “Segment Anything”) or employing richer training datasets, could further enhance the system’s capability to handle intricate and diverse object configurations. Nonetheless, fully resolving heavily overlapped scenes or objects with very complex 3D structures remain an open challenge, as the extrapolation process inherently relies on observed intensity patterns. Without direct measurements of the hidden surfaces, the reconstruction of their geometry and textures becomes speculative. Future research may explore incorporating 3D shape priors, richer datasets, or other advanced computational holography techniques to extend the applicability of our method to more complex, real-world holographic scenes.
The orthographic projection geometry employed in our method ensures that object displacements across neighboring LF views follow approximately linear trajectories over small angular intervals. This linearity enables us to extrapolate object positions into new LF views that lie beyond the original boundary of the captured viewing angles.
First, we apply an object detection algorithm (YOLOv5) to each of the original LF views (e.g., an 8 × 8 LF grid, 64 views). Each view provides a set of object coordinates (
To extend beyond the initial 8 × 8 grid, we estimate how objects shift when they move from one view to an adjacent view. Specifically, we compute the mean horizontal and vertical displacements:
where (i, j) indexes the LF grid views and M is the count of all valid object pairs considered. By using averages rather than single measurements, we mitigate noise introduced by potential detection errors at lower resolutions.
Once Δxavg and Δyavg are obtained, we place objects in new, extrapolated views outside the original LF boundary. For example, to create a new column of views extending to the right, we assign:
where N is the original number of views along one dimension (e.g., N = 8). Similarly, expanding upward or downward involves shifting object positions by Δyavg.
The extrapolated results are shown in Figs. 6(b) and 6(c). In Fig. 6(b), the LF has been partially expanded to a 10 × 10 grid, demonstrating the object coordinate of intermediate extrapolation steps. By applying the same procedure iteratively, we eventually achieve the object coordinate of a fully extrapolated LF with a 16 × 16 grid, as illustrated in Fig. 6(c). In Figs. 6(d) and 6(e), intermediate and final scenes of the extrapolated LF are shown, respectively. For comparison, ground truth LF data, i.e., LF data extracted from a higher resolution with a half pixel pitch, is also shown in Fig. 6(f). The final LF array corresponds to a doubled overall viewing angle, allowing a much broader perspective on the reconstructed holographic scene.
The process of synthesizing a hologram from the light field essentially involves the inverse operation of the method described in Section 2.2, where light fields were extracted from a hologram. In this synthesis phase, the expanded light fields, now containing enhanced spatial information from the expanded viewing angles, are converted back into the CGH format.
To synthesize a hologram from the light field views, we first create a new CGH pattern using the inverse Fourier transform techniques outlined earlier. Each light field view, represented by the orthographic amplitude views obtained in Section 2.2, is back-projected into the hologram space. This back-projection is facilitated by reversing the Fourier transform process described in equation (6).
This equation synthesizes the hologram by aggregating all orthographic views and applying an inverse Fourier transform to project them back into the spatial domain. By applying this method, we effectively reconstruct the hologram with improved resolution and an expanded viewing angle, as illustrated by the initial 2K resolution CGH (7.2 μm pixel pitch) being upscaled to a 4K resolution CGH (3.6 μm pixel pitch).
The efficacy of our proposed methodology was validated through the synthesis of color holograms from light field data considered captured at various angles. In all our simulations, the wavelengths were set to red at 660 nm, green at 521 nm, and blue at 445 nm. Figure 7 presents an image of the red letters “KNU” located 8 cm from the SLM plane, and shows images of the white, red, green, and blue letters “IPOD” positioned 12 cm from the SLM plane. Each plane is propagated onto the SLM plane using the angular spectrum method with random phase imposition, which forms a multi-depth full-color hologram with a depth difference of 4 cm between the two planes. Here, the SLM has a resolution of 2,048 × 2,048 pixels with a pixel pitch set at 7.2 μm. This configuration corresponds to Case 1 in Fig. 8, where the first row shows images of “KNU” reconstructed at 8 cm, observed at varying viewing angles. According to Fig. 1(a), the full viewing angle is confirmed to be 3.54 degrees. In Case 1, the light field data was extracted from orthographic views at z = 10 cm in an 8 × 8 grid, with each light field view having a resolution of 256 × 256. With the pipeline of the proposed method, these light fields were expanded to a 16 × 16 grid of orthographic views with a viewing angle of 7.09 degrees, which is synthesized into the hologram shown as Case 2. Case 3 in Fig. 8 reflects the same setup as Case 1 but uses an SLM plane characteristically enhanced to a resolution of 4,096 × 4,096 pixels and a pixel pitch of 3.6 μm, approximately doubling the viewing angle compared to Case 1 and representing a ground truth CGH (Case 3 in Fig. 8) with a wider viewing angle. The simulation results indicate that Cases 2 and 3 in Fig. 8 share the same viewing angle, demonstrating the successful expansion and synthesis capabilities of our methodology.
To complement our numerical simulations and validate our holographic synthesis methodology in real-world conditions, we conducted optical reconstruction experiments as shown in Fig. 9. These experiments aimed to directly observe the behavior of the synthesized holograms when illuminated and projected using physical optics setups. The experimental setup included a high-precision reflective SLM with a resolution of 3,840 × 2,160 pixels and a pixel pitch of 3.6 μm.
In our experiments, we employed the near-eye display system schematically illustrated in Fig. 10. We adapted CGH patterns originally synthesized at a higher resolution of 4,096 × 4,096 pixels by cropping them to fit the 3,840 × 2,160 resolution of our SLM. This modification ensured that the CGH images were precisely matched to the capabilities of the SLM, as used in the more rigorous simulation scenarios (Case 3 in Fig. 9). A coherent laser source was employed, emitting wavelengths that corresponded to those specified in our simulations: 660 nm for red, 521 nm for green, and 445 nm for blue.
The optical reconstructions successfully demonstrated clear and accurate color rendering, and the depth cues precisely matched the intended distances. The observed images showed high fidelity to the simulated outcomes, with no noticeable aberrations or distortions, confirming the effectiveness of our light field expansion and CGH synthesis approach.
Significantly, the expanded viewing angles observed during these experiments closely matched those predicted by our simulations. Observations made from varying angles showed consistent image quality and depth accuracy, underlining the robustness of our holographic display technology in real-world applications.
In conclusion, a viewing angle expandable CGH upscaling through LF extrapolation is proposed. Using the YOLOv5 object detection algorithm, the analysis of the object position and depth information of each LF view is automatically conducted. After that, the extrapolation of LF views beyond their initial viewing angle limit can produce further angular views beyond the initial viewing angle condition after these LF views are resynthesized into the CGH format. We verified our proposed scheme both in numerical simulation and optical reconstruction, and the viewing angle was successfully doubled from an initial 3.54° to 7.09° by upscaling a 2K resolution, 7.2 μm pixel pitch CGH to a 4K resolution, 3.6 μm pixel pitch CGH.
This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean Government (MSIT) (Grant no. 2019-0-00001, Development of Holo-TV Core Technologies for Hologram Media Services) and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the MSIT (Grant Number RS-2024-00411892).
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
Curr. Opt. Photon. 2025; 9(1): 55-64
Published online February 25, 2025 https://doi.org/10.3807/COPP.2025.9.1.55
Copyright © Optical Society of Korea.
Dong-Ha Shin, Chee-Hyeok Song, Seung-Yeol Lee
School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea
Correspondence to:*seungyeol@knu.ac.kr, ORCID 0000-0002-8987-9749
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
As demand for high-resolution holographic displays in augmented and virtual reality (AR/VR) increases, the limitations of traditional computer-generated holography (CGH) upscaling methods, including bicubic interpolation and deep learning-based techniques, become apparent. These methods predominantly estimate additional pixels without considering the reduction of pixel pitch, inherently constraining their capacity to effectively expand the viewing angle. Our study introduces a novel approach for viewing angle expansion through light field (LF) extrapolation by applying an object detection algorithm. This process starts by analyzing the object position and depth information of each LF view extracted from CGH patterns with the object detection algorithm. The use of these data allows us to extrapolate LF views beyond their initial viewing angle limit. Subsequently, these expanded LF views are resynthesized into the CGH format to expand the viewing angle. With our approach, the viewing angle was successfully doubled from an initial 3.54 degrees to 7.09 degrees by upscaling a 2K 7.2 μm CGH to a 4K 3.6 μm CGH, which was verified with both numerical simulation and optical experiments.
Keywords: Computer-generated holography, Digital holography upscaling, Light field, Object detection algorithm, Viewing angle
As augmented and virtual reality (AR/VR) technologies advance, the demand for more immersive and realistic visual displays increases [1–3]. In this context, digital holography, particularly computer-generated holography (CGH), stands out as a promising technology due to its ability to reproduce high-quality three-dimensional images [4–6]. CGH offers sufficiently large depth of field and parallax that greatly enhance the user experience in AR/VR applications, and makes the interaction with virtual elements much more natural [7, 8].
A fundamental challenge in CGH display systems is achieving both wide viewing angles and adequate display size simultaneously. The viewing angle of a CGH increases in approximately inverse proportion to the pixel pitch [9], while the physical size is determined by the product of pixel pitch and the pixel number. This inherent trade-off between viewing angle and display size is known as the space-bandwidth product (SBP) problem [10, 11].
To illustrate this challenge, consider a CGH with 2,000 × 2,000 pixels and a pixel pitch of 0.1 mm. While this configuration achieves a reasonable display size of 20 cm × 20 cm, it yields a severely limited viewing angle of only 0.36 degrees. Conversely, reducing the pixel pitch to 1 μm would expand the viewing angle to 36 degrees, but at the cost of shrinking the display size to a mere 2 × 2 mm2. This trade-off becomes even more pronounced for practical applications - achieving a 45-degree viewing angle at 633 nm wavelength requires a pixel pitch below 0.8 μm, while maintaining a display size of 5 cm × 5 cm would demand more than four billion pixels.
Existing upscaling techniques have primarily focused on increasing the physical size of CGH patterns while maintaining the original pixel pitch [12–14]. These approaches, including deep learning-based super-resolution and traditional bicubic interpolation, aim to generate additional pixels to enlarge the hologram. However, by maintaining the original pixel pitch, these methods inherently preserve the limited viewing angle of the source hologram, constraining the overall viewing experience despite the increased size.
The fundamental limitation of current upscaling approaches lies in their treatment of pixel pitch. While they successfully increase resolution and physical size, they do not address the critical relationship between pixel pitch and viewing angle. Our research introduces a novel paradigm in CGH upscaling by specifically targeting pixel pitch reduction while maintaining physical dimensions. This approach directly addresses the viewing angle limitations that persist in conventional upscaling methods.
In this work, a method for viewing angle expansion using light field (LF) extrapolation is proposed, with the application of object detection algorithms. By analyzing object position and depth information across multiple LF views extracted from CGH patterns, additional views beyond the initial viewing angle limits are effectively extrapolated. These expanded LF views are then resynthesized into CGH format with reduced pixel pitch, enabling true viewing angle expansion. With our approach, we successfully demonstrated a doubling of the viewing angle from 3.54° to 7.09° by reducing the pixel pitch from 7.2 μm to 3.6 μm while maintaining the physical dimensions during the upscaling process from 2K to 4K resolution. These results were verified by both numerical simulations and optical experiments, confirming the practical viability of our method for expanding viewing angles in holographic displays.
Before explaining the key principles of the proposed work, it is necessary to briefly review the relationship between the pixel pitch of CGH and the viewing angle, as formulated by the diffraction angle equation (Eq. 1). The angular spread of diffracted light of a pixelated structure such as a spatial light modulator (SLM) is given as,
where θdiff is the first order diffraction angle of light diffracted from SLM, λ is the wavelength of the incident light, and p is the pixel pitch of the SLM. In Fig. 1, viewing angle characteristics for red (640 nm), green (520 nm), and blue (445 nm) light as a function of pixel pitch are plotted for general information.
In Fig. 2(a), the schematic of light field view composition after applying a conventional upscaling method, such as bicubic interpolation, is shown. The conventional CGH upscaling method primarily focuses on increasing the number of pixels without considering the reduction of pixel pitch or expansion of viewing angle. As a result, while the resolution is increased, the viewing angle remains limited because the pixel pitch reduction necessary for broader viewing angles is not achieved.
The upscaled image maintains the original viewing angles and merely increases the resolution by adding more pixels within the same angular spread.
In contrast, Fig. 2(b) demonstrates the proposed upscaling method. Our approach involves transforming CGH patterns into light fields, which consist of a set of 2D images viewed from various angles. By applying object detection algorithms, we manipulate and extrapolate these images effectively. The extrapolated light fields are then resynthesized into the CGH format, achieving both upscaling of resolution and reduction of pixel pitch. This transformation allows for a significant expansion of the viewing angle, as shown by the increased angular spread in the figure. Consequently, this dual enhancement of resolution and viewing angle, along with the ability to extrapolate images and re-synthesize them into CGH, is crucial for producing more immersive holographic displays.
The overall calculation process of proposed method is depicted in Fig. 3, beginning with the light field (e.g., 16 by 16 LF grid) extracted from CGH data of 2,000 × 2,000 resolution, 7.2 μm pixel pitch. Upon retrieval, these light fields are subject to an expansion process with an object detection algorithm, which facilitates the generation of extrapolated light field views beyond the initial field of view (FoV). The augmented light field (e.g., 32 by 32 LF grid) views enable us to achieve a broader viewing angle, surpassing the limitations inherent in the original light field data. Subsequently, these extended views are intricately resynthesized into hologram format, resulting in an upscaled CGH (e.g., 4K, 3.6 μm) with an improved angular spectrum.
The first step of the proposed method is to extract light fields from the original low-resolution CGH. The light fields are composed of various views captured with either perspective or orthographic projection geometries. We chose orthographic projection for its compatibility with subsequent processing steps and its ability to simplify the computations required for hogel-based CGH techniques. This geometry ensures that the extracted views are suitable for the image processing techniques employed in our approach.
Figure 4 delineates the process of extracting orthographic amplitude view images from a given CGH pattern. The orthographic amplitude view Ufx0,fy0 (x, y) corresponding to a distinct angle (θx0, θy0) is derived with the bandpass filtering of the hologram in a Fourier domain, which is described as [15],
where F[•] and F−1[•] denote the Fourier transform and the inverse Fourier transform, respectively. The binary mask Mfx0,fy0 (fx, fy) is designated as “1” within the bandwidth of Bp and “0” otherwise, centered at (fx0, fy0) = (sin θx0 / λ, sin θy0 / λ), where λ represents the wavelength.
Since the spatial frequency components near (fx, fy) correspond to light waves near angle (θx, θy) = [sin−1(λfx), sin−1(λfy)], the chosen bandwidth Bp around (fx0, fy0) dictates the angular scope of (λBp, λBp) surrounding the central angle (θx0, θy0) = [sin−1(λfx0), sin−1(λfy0)]. Therefore, broader Bp implies a wider angular coverage for each orthographic view extraction, albeit at the cost of reduced angular specificity.
The pixel pitch of resultant orthographic amplitude view, (∆xapparent, ∆yapparent), is the same as those of the initial CGH (∆xH, ∆yH). Nonetheless, the quality and resolution of the orthographic view image depends on Bp.
Here, apparent pixel pitch refers to the pixel pitch of the extracted view, which remains unchanged from the original CGH. On the other hands, Nyquist sampling pitch is determined by the bandwidth Bp and is inversely proportional to it. As Bp increases, the Nyquist sampling pitch decreases, enhancing the resolution of the orthographic views.
During the extraction process with Eq. (2), Bp remains less than the hologram’s bandwidth BH, determined by the hologram’s pixel pitch (∆xH, ∆yH), i.e., BH = 1/∆xH = 1/∆yH. It is important to note that the 3D scene’s inherent bandwidth, excluding any random phase carrier wave components in the hologram, might be less than the hologram’s bandwidth BH.
In this work, we set Bp = BH/8. This choice was guided by practical considerations in our simulation and experimental environment. Starting from an original hologram of 2,048 × 2,048 pixels, using one-eighth of the bandwidth reduces each extracted orthographic LF view to a 256 × 256-pixel resolution. Preliminary tests indicated that this resolution is sufficient for reliably detecting and identifying objects (the letters “KNU” and “IPOD” in our case) while maintaining manageable computational complexity. In other words, 256 × 256 was found to be a minimum resolution at which object shapes and positions remain discernible and can be effectively processed by the object detection algorithm.
While we employed Bp = BH/8 as a practical rule of thumb, this ratio is not an absolute standard. Researchers working with different hologram resolutions, object complexities, or computational resources may choose a different ratio. For example, a larger fraction (e.g., Bp = BH/4) could provide finer image details and potentially more accurate object detection, but at the expense of greater computational effort. Conversely, a smaller fraction might still yield recognizable patterns for simpler scenes while reducing computation. Thus, the selection of Bp should be tailored to the specific requirements and constraints of each application.
To detect objects in LF images containing numerous perspective views, our research uses the YOLOv5 model for automatic and efficient object detection. Unlike other classification algorithms that merely categorize images, YOLOv5 object detection models furnish bounding boxes that delineate an object’s location, size, and class, offering a comprehensive understanding of the image content [16, 17].
The architecture of YOLOv5 is tripartite: It consists of a backbone that extracts features at various scales through convolution and pooling layers, a neck that enhances performance by fusing these features through the path aggregation network (PANnet), and a head that employs convolutional layers to project the features into the output space. The outputs include the central coordinates, width, height parameters of detected objects, and their associated confidence scores. In the context of LF imaging in our research, the inputs are composed of a collection of 16 × 16 orthographic projection views, each with a resolution of 256 × 256, resulting in an aggregate resolution of 4,096 × 4,096. Given the substantial image size, we employed the YOLOv5 large model (YOLOv5l) for effective input data processing.
Prior to object detection with YOLOv5l, we prepared virtual objects such as “KNU” and “IPOD” from a single view of the generated LF images, as illustrated in Fig. 5. For the training process, prepared objects were resized from 0.8 to 1.2 times their original dimensions and superimposed with random noise up to 30% of the maximum signal contrast. These objects were then randomly positioned on a 256 × 256 resolution background to create a dataset of 1,000 training images. The YOLOv5l was trained with this dataset to efficiently detect each object from entire LF views for the acquisition of positional information of objects across different views.
Although we used a custom dataset generator for the specific recognition of letter objects in our pipeline, the adoption of widely used large-scale datasets such as ImageNet or common objects in context (COCO, a large-scale object detection, segmentation, and captioning dataset), or the application of zero-shot segmentation methods such as “Segment Anything,” or zero-shot object detection algorithms could expand the application of our research methodology to a broader range of general objects. This approach would extend the versatility and applicability of our techniques and make it possible to detect and analyze various objects in LF images without the need for predefined or extensive training data sets specific to our experimental setup.
While our current framework effectively handles relatively simple synthetic objects and moderate levels of overlap, the approach faces inherent limitations in more complex scenarios. In the case of substantial occlusions where one object significantly obscures another, the ability of object detection algorithms such as YOLOv5 to accurately identify all objects may deteriorate. Holographic reconstructions inherently represent the coherent superposition of fields, and in some cases, partial overlap may still yield distinguishable intensity patterns that enable detection of individual objects. However, severe occlusions remove critical visual cues, restricting recognition performance and hindering accurate positional and depth estimation for the obscured objects.
To mitigate this challenge, the application of multiple LF views is necessary. Even if one view fails to detect a highly occluded object, other views where the object is partially visible can still provide pertinent positional and depth information. By fusing and interpolating data across multiple successfully detected views, it is possible to infer the object’s location and shape in the obscured regions and improve robustness against overlaps. Moreover, integrating advanced segmentation techniques such as zero-shot segmentation (e.g., “Segment Anything”) or employing richer training datasets, could further enhance the system’s capability to handle intricate and diverse object configurations. Nonetheless, fully resolving heavily overlapped scenes or objects with very complex 3D structures remain an open challenge, as the extrapolation process inherently relies on observed intensity patterns. Without direct measurements of the hidden surfaces, the reconstruction of their geometry and textures becomes speculative. Future research may explore incorporating 3D shape priors, richer datasets, or other advanced computational holography techniques to extend the applicability of our method to more complex, real-world holographic scenes.
The orthographic projection geometry employed in our method ensures that object displacements across neighboring LF views follow approximately linear trajectories over small angular intervals. This linearity enables us to extrapolate object positions into new LF views that lie beyond the original boundary of the captured viewing angles.
First, we apply an object detection algorithm (YOLOv5) to each of the original LF views (e.g., an 8 × 8 LF grid, 64 views). Each view provides a set of object coordinates (
To extend beyond the initial 8 × 8 grid, we estimate how objects shift when they move from one view to an adjacent view. Specifically, we compute the mean horizontal and vertical displacements:
where (i, j) indexes the LF grid views and M is the count of all valid object pairs considered. By using averages rather than single measurements, we mitigate noise introduced by potential detection errors at lower resolutions.
Once Δxavg and Δyavg are obtained, we place objects in new, extrapolated views outside the original LF boundary. For example, to create a new column of views extending to the right, we assign:
where N is the original number of views along one dimension (e.g., N = 8). Similarly, expanding upward or downward involves shifting object positions by Δyavg.
The extrapolated results are shown in Figs. 6(b) and 6(c). In Fig. 6(b), the LF has been partially expanded to a 10 × 10 grid, demonstrating the object coordinate of intermediate extrapolation steps. By applying the same procedure iteratively, we eventually achieve the object coordinate of a fully extrapolated LF with a 16 × 16 grid, as illustrated in Fig. 6(c). In Figs. 6(d) and 6(e), intermediate and final scenes of the extrapolated LF are shown, respectively. For comparison, ground truth LF data, i.e., LF data extracted from a higher resolution with a half pixel pitch, is also shown in Fig. 6(f). The final LF array corresponds to a doubled overall viewing angle, allowing a much broader perspective on the reconstructed holographic scene.
The process of synthesizing a hologram from the light field essentially involves the inverse operation of the method described in Section 2.2, where light fields were extracted from a hologram. In this synthesis phase, the expanded light fields, now containing enhanced spatial information from the expanded viewing angles, are converted back into the CGH format.
To synthesize a hologram from the light field views, we first create a new CGH pattern using the inverse Fourier transform techniques outlined earlier. Each light field view, represented by the orthographic amplitude views obtained in Section 2.2, is back-projected into the hologram space. This back-projection is facilitated by reversing the Fourier transform process described in equation (6).
This equation synthesizes the hologram by aggregating all orthographic views and applying an inverse Fourier transform to project them back into the spatial domain. By applying this method, we effectively reconstruct the hologram with improved resolution and an expanded viewing angle, as illustrated by the initial 2K resolution CGH (7.2 μm pixel pitch) being upscaled to a 4K resolution CGH (3.6 μm pixel pitch).
The efficacy of our proposed methodology was validated through the synthesis of color holograms from light field data considered captured at various angles. In all our simulations, the wavelengths were set to red at 660 nm, green at 521 nm, and blue at 445 nm. Figure 7 presents an image of the red letters “KNU” located 8 cm from the SLM plane, and shows images of the white, red, green, and blue letters “IPOD” positioned 12 cm from the SLM plane. Each plane is propagated onto the SLM plane using the angular spectrum method with random phase imposition, which forms a multi-depth full-color hologram with a depth difference of 4 cm between the two planes. Here, the SLM has a resolution of 2,048 × 2,048 pixels with a pixel pitch set at 7.2 μm. This configuration corresponds to Case 1 in Fig. 8, where the first row shows images of “KNU” reconstructed at 8 cm, observed at varying viewing angles. According to Fig. 1(a), the full viewing angle is confirmed to be 3.54 degrees. In Case 1, the light field data was extracted from orthographic views at z = 10 cm in an 8 × 8 grid, with each light field view having a resolution of 256 × 256. With the pipeline of the proposed method, these light fields were expanded to a 16 × 16 grid of orthographic views with a viewing angle of 7.09 degrees, which is synthesized into the hologram shown as Case 2. Case 3 in Fig. 8 reflects the same setup as Case 1 but uses an SLM plane characteristically enhanced to a resolution of 4,096 × 4,096 pixels and a pixel pitch of 3.6 μm, approximately doubling the viewing angle compared to Case 1 and representing a ground truth CGH (Case 3 in Fig. 8) with a wider viewing angle. The simulation results indicate that Cases 2 and 3 in Fig. 8 share the same viewing angle, demonstrating the successful expansion and synthesis capabilities of our methodology.
To complement our numerical simulations and validate our holographic synthesis methodology in real-world conditions, we conducted optical reconstruction experiments as shown in Fig. 9. These experiments aimed to directly observe the behavior of the synthesized holograms when illuminated and projected using physical optics setups. The experimental setup included a high-precision reflective SLM with a resolution of 3,840 × 2,160 pixels and a pixel pitch of 3.6 μm.
In our experiments, we employed the near-eye display system schematically illustrated in Fig. 10. We adapted CGH patterns originally synthesized at a higher resolution of 4,096 × 4,096 pixels by cropping them to fit the 3,840 × 2,160 resolution of our SLM. This modification ensured that the CGH images were precisely matched to the capabilities of the SLM, as used in the more rigorous simulation scenarios (Case 3 in Fig. 9). A coherent laser source was employed, emitting wavelengths that corresponded to those specified in our simulations: 660 nm for red, 521 nm for green, and 445 nm for blue.
The optical reconstructions successfully demonstrated clear and accurate color rendering, and the depth cues precisely matched the intended distances. The observed images showed high fidelity to the simulated outcomes, with no noticeable aberrations or distortions, confirming the effectiveness of our light field expansion and CGH synthesis approach.
Significantly, the expanded viewing angles observed during these experiments closely matched those predicted by our simulations. Observations made from varying angles showed consistent image quality and depth accuracy, underlining the robustness of our holographic display technology in real-world applications.
In conclusion, a viewing angle expandable CGH upscaling through LF extrapolation is proposed. Using the YOLOv5 object detection algorithm, the analysis of the object position and depth information of each LF view is automatically conducted. After that, the extrapolation of LF views beyond their initial viewing angle limit can produce further angular views beyond the initial viewing angle condition after these LF views are resynthesized into the CGH format. We verified our proposed scheme both in numerical simulation and optical reconstruction, and the viewing angle was successfully doubled from an initial 3.54° to 7.09° by upscaling a 2K resolution, 7.2 μm pixel pitch CGH to a 4K resolution, 3.6 μm pixel pitch CGH.
This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean Government (MSIT) (Grant no. 2019-0-00001, Development of Holo-TV Core Technologies for Hologram Media Services) and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the MSIT (Grant Number RS-2024-00411892).
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.