Ex) Article Title, Author, Keywords
Current Optics
and Photonics
Ex) Article Title, Author, Keywords
Curr. Opt. Photon. 2024; 8(2): 127-137
Published online April 25, 2024 https://doi.org/10.3807/COPP.2024.8.2.127
Copyright © Optical Society of Korea.
Zhenlu Liu^{1,2}, Xiaolei Yu^{1,2} , Lin Li^{1}, Weichun Zhang^{1} , Xiao Zhuang^{1}, Zhimin Zhao^{1}
Corresponding author: ^{*}nuaaxiaoleiyu@126.com, ORCID 0009-0006-7837-4813
^{**}nuaa_wchzhang@126.com, ORCID 0009-0001-8418-8084
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The distribution of tags is an important factor that affects the performance of radio-frequency identification (RFID). To study RFID performance, it is necessary to obtain RFID tags’ coordinates. However, the positioning method of RFID technology has large errors, and is easily affected by the environment. Therefore, a new method using optical measurement is proposed to achieve RFID performance analysis. First, due to the possibility of blurring during image acquisition, the paper derives a new image prior to removing blurring. A nonlocal means-based method for image deconvolution is proposed. Experimental results show that the PSNR and SSIM indicators of our algorithm are better than those of a learning deep convolutional neural network and fast total variation. Second, an RFID dynamic testing system based on photoelectric sensing technology is designed. The reading distance of RFID and the three-dimensional coordinates of the tags are obtained. Finally, deep learning is used to model the RFID reading distance and tag distribution. The error is 3.02%, which is better than other algorithms such as a particle-swarm optimization back-propagation neural network, an extreme learning machine, and a deep neural network. The paper proposes the use of optical methods to measure and collect RFID data, and to analyze and predict RFID performance. This provides a new method for testing RFID performance.
Keywords: Deep learning, Denoise, Non-local means, Optical measurement, RFID tag
OCIS codes: (000.2190) Experimental physics; (100.0100) Image processing; (100.3008) Image recognition, algorithms and filters; (350.4600) Optical engineering
As a noncontact communication technology for automatic object recognition, radio-frequency identification (RFID) helps to reduce complex labor and to improve labor efficiency in warehouses, logistics, and access-control systems. RFID tags need to be effectively identified to obtain relevant information on time, but studies show that RFID’s performance in applications is lower than expected [1, 2].
Pandey et al. [3] focused on using RFID to solve hostel security issues. They performed facial recognition and compared the results to facial data in RFID to confirm the entry and exit of personnel in hostels. Bertoncini et al. [4] utilized a dynamic wavelet fingerprint for feature extraction and classification, to prevent unauthorized access and ensure the security of radio-frequency identification. Wong et al. [5] proposed a background-subtraction method for motion object detection combined with RFID technology, to detect traffic violations. But there has been little research on the performance of RFID, especially its reading distance.
Many factors affect RFID performance, including multi-tag channel collisions, packaging methods of goods, tag distribution, and power of readers [1, 6, 7]. Traditional methods have mostly solved the conflict problem of multiple tags through anticollision algorithms, to improve RFID performance. Few studies have analyzed the impact on RFID reading performance from the perspective of multitag distribution [8]. This paper establishes a multi-tag optimal distribution model, with the reading distance as the evaluation indicator.
In the past, RFID positioning technology based on the LANDMARC algorithm was mostly used for tag positioning [9, 10]. This method has complex system deployment, large positioning errors, and it is easily affected by the environment. It is difficult to use the multi-tag coordinates obtained for modeling. Therefore, we propose a novel method based on image processing and optical measurement to measure the three-dimensional coordinates and reading distances of tags. Template matching is used to calculate the coordinates of the collected RFID tag images. A laser ranging sensor is used to measure the maximum reading distance of all tags read by the RFID system. However, during the imaging process the RFID tag images become blurry, due to the relative displacement of the optical system or camera system.
At the same time, the captured images can also be affected by noise. The captured images are represented as:
where y denotes the collected image, H is a linear blur operator, h is the point-spread function (PSF), ⊗ denotes the convolution operator, x is the clear image, and n is zero-mean white Gaussian noise, n ~ N(0, σ^{2}).
The purpose of image deblurring is to restore the original image from the degraded observation. Image deblurring is an important preprocessing task in system design. A commonly used method for restoring sharp images is maximum posterior (MAP) estimation:
Due to the fact that image deblurring is an ill-posed problem, image restoration can yield many vastly different solutions. Therefore, some regularization terms are required. Φ(x) = −log p_{x}(x) is a prior distribution that represents the knowledge of the acquired image. Prior knowledge can regularize image-deblurring problems. The optimization problem becomes:
Image smoothness is an important assumption for image denoising. The adjacent pixels changed in an image present a continuity, which helps to remove noise from the image. But images also contain a large amount of non-smooth information, which is of great help for image deblurring. The smoothness processing of images makes it difficult to preserve the details and textural features of the image. At present, regularization methods based on local smoothing have been widely applied. Among them the TV model based on local gradients is particularly popular, because it can effectively restore the edges of segmented smooth images [11–15]. However, the regularization of television is influenced by blocky effects, which may lead to the smoothing out of unsmooth information.
Buades et al. [16] proposed a non-local means (NLM) filter for denoising, which utilizes feature blocks to capture the local structure of images. The contribution of NLM in the field of image processing proves its effectiveness [17, 18]. Image deconvolution is more complex than denoising, because NLM requires images to calculate weights. In [19], blurred images are used as reference images. However, using estimated weights from blurred images is ineffective, because blurred images contain much less detailed features than clear images. There are experiments using Tikhonov regularization to deconvolute images. However, the noise input has little impact on the weight parameters. Therefore, deconvolution by Tikhonov regularization is affected by noise, and the reconstructed image may exhibit noise signals. Shi et al. [20] proposed a fidelity term based on L1-norm to stabilize ill-posed problems, and introduced a soft threshold operator to enhance robustness to noise. Chen et al. [21] method of using singular values to construct regularization matrices reduces the variance of parameter estimation, while ensuring constant bias. Using a different approach, Lefkimmiatis [22] attempted to use neural networks to improve the nonlocal regularization strategy. This strategy effectively improves the denoising performance.
In this paper, an effective method for motion-blur removal is used for image restoration. RFID tag coordinates and reading distance are obtained based on optical measurement and image matching. The relationship model is established between tag distribution and reading distance. The main contributions of this paper are as follows:
1. A new tag-distribution optimization model is proposed. Based on image processing and optical measurement, the distribution of tags is measured, which avoids positioning errors caused by external interference.
2. Non-local means based deconvolution (NLM-D) is used to remove motion blur and noise from RFID tag images, while preserving the non-smooth information of each image.
3. A deep neural network based on dense-connection improvement is used to model the nonlinear relationship between RFID tag coordinates and reading distance.
Section 2 briefly reviews NLM filters and the construction of NLM-based image priors. The study deblurs the images with our method and compares it to other algorithms. Section 3 designs an RFID testing system and obtains tag coordinates and reading distance. In Section 4, we propose the method and process of an improved network, and model the relationship between tag coordinates and reading distance through the network. The paper concludes with a summary in Section 5.
In the NLM method, image feature blocks are used to characterize the local structural information of the image. Image feature blocks refer to a collection of pixels contained in a fixed shape, and these feature blocks have similar local structural information. NLM filtering can effectively reduce noise while preserving detailed information, such as image edges and textures. The filtering process of NLM can be expressed by the following weighted average:
where Ω is the search window’s size, ω(i;i′) represents the weight that characterizes the degree of similarity between two feature blocks i and i′. ω(i;i′) can be written as:
with S is image feature block; u is position within the feature block; h is smooth kernel width parameter; Zω(i) is normalization constant.
Markov random field (MRF) theory is an effective tool for analyzing and modeling image spatial-structure information. With the help of the Hammersley-Clifford theorem, also known as the MRF-Gibbs equivalence theorem, the MRF model can be widely used and promoted. The theorem proves that the joint probability distribution of MRF can be described by the Gibbs distribution with a concise mathematical expression. Gibbs random field theory uses the concept of a clique to describe the various possibilities of interaction between a location and its neighborhood. For image description, clusters represent a basic composition of image texture.
The random field X defined on the grid set is the Gibbs random field about its neighborhood system, if and only if the joint distribution function of X has the following form:
with Z_{MRF} is normalization constant, also called the partition function; T is temperature parameter; U(x) is the energy function, which can be expressed as:
with c is cluster, which represents a basic composition of texture; V_{c}(x) is cluster potential function associated with cluster c.
Obviously, the value of V_{c}(x) is related to the local structure of cluster c. A potential function V_{c}~ (x) is nonzero if and only if a particular cluster c^{~} exists; in other words, if and only if a set of positions forms a cluster c^{~}.
From the above description, using the Markov-Gibbs random field model to construct an image prior based on NLM can be achieved through the following two steps:
(1) Define neighborhood systems and clusters based on NLM; and
(2) Define the corresponding cluster potential function.
Note that L = {i | i ∈ M} represents a set of grid points of an image, where the two-dimensional vector coordinate i represents the position of the grid point, and M represents the image size. Usually the neighborhood system can be defined by spatial position, that is, by the distance from the center pixel. For example, the neighborhood system of i can be defined as:
where r is the search radius.
Similar to the above definition, the neighborhood system based on the NLM is defined here as all pixels in the search window Ω:
For the NLM neighborhood system defined above, its cluster can be expressed as:
where
The structural information of the considered image can be expressed by defining the cluster potential functions on these clusters. For simplicity, only the second-order cluster potential function is considered here (Only consider C_{1} and C_{2}); Then formula (7) can be expressed as:
According to the smoothness assumption, it is necessary for x(i) to be as close to x(i′) as possible; Then the cluster potential function can be written as:
where ρ(∙) and G(i;i′) are the potential function and the weight function, respectively.
The potential function ρ(∙) measures the size of the difference between the two-pixel values. In this paper, the potential function ρ(x) = |x| is selected.
The weight function G(i;i′) expresses the degree of correlation between two-pixel positions. The weight function G(i;i′) = ω(i;i′) is defined in this article. It should be pointed out that when the two feature blocks are completely different, the value of the weight function G(i;i′) is zero.
Therefore, the energy function (image prior) based on the NLM can be written as:
where D_{k} represents the difference operator between two points with a distance of k, for example, D_{(1,0)}x represents the horizontal gradient of x; And W_{k} is a block diagonal matrix, W_{k} = diag{G(i;i + k)}_{i}∈_{M}.
Taking the NLM image prior in Eq. (13) into the following formula, the image deblurring model J(x) of the NLM regularization can be written as:
with α——Regularization parameters, α = 2σ^{2}/T.
Since the constructed energy function based on the NLM value U(x) is a differentiable convex function, there is a unique minimum in Eq. (14). The minimum value can be solved by using the steepest-descent method based on negative-gradient-direction iteration. The basic iteration format is:
with λ——Iteration step, Under normal circumstances 0 < λ < 2;
∂U(x)/∂x——The gradient of U(x) can be obtained by taking the partial derivative of U(x) to x(m):
In the above formula,
where, t = m − u;
For concise expression, Eq. (17) is written in matrix representation:
In this paper, to simplify the calculation we ignore the influence of
Substituting (18) into (14):
It should be noted that Eq. (20) can be considered the fastest-descent process for J(x) under the given weight function {W_{k}}. Since J(x) is a convex function, the steepest-descent process of Eq. (20) can converge to a minimum.
First, use the image obtained in this paper as the initial value
To verify the effectiveness and practicability of the denoising algorithm in this paper, the method is used to deblur images with different degrees of motion blur. This study first deblurs a linear-motion-blur image. Then, in view of the complex diversity in the actual scene, this paper convolves the clear image with the complex blur kernel to obtain the complex-motion-blur image. Then we use the method in this paper to deconvolve these complex-motion-blur images to deblur. Finally, to verify the superiority of the algorithm in this paper, the experimental results for the algorithm of this paper are compared to the experimental results for learning deep convolutional neural network (IRCNN) [23] and fast total variation (fast TV) [24] methods, and the quantitative indicators peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are used to quantify the de-motion-blur results. The relevant experimental results are as follows:
From Fig. 1 and Table 1, the algorithm in this paper can effectively remove the motion blur in the label. Compared to IRCNN and Fast TV, the image processed by this method retains better details and edges, and the image after deblurring is clearer. To further quantify the deblurring algorithm in this paper, we use PSNR and SSIM indicators to describe the results. The calculated results show that, compared to IRCNN and Fast TV, the PSNR and SSIM values for the deblurring results of the method proposed in this paper are the largest, thus demonstrating the superiority of that method. In addition, the SSIM value for the deblurring result of the algorithm in this paper is indeed very large, close to 1, which shows that the structural information of the image can be well preserved after processing by the algorithm in this paper (Table 1).
TABLE 1 The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) results for different methods, with different motion lengths and angles
Image | Metric | Motion Degree (°) | IRCNN | Fast TV | Our Method |
---|---|---|---|---|---|
Tag 1 | PSNR SSIM | Length = 5 Angle = 5 | 36.1674 0.9615 | 35.9781 0.8588 | 42.694 0.9649 |
Length = 10 Angle = 10 | 31.1276 0.9272 | 34.7498 0.8924 | 40.5456 0.9696 | ||
Tag 2 | PSNR SSIM | Length = 5 Angle = 5 | 39.5463 0.9677 | 36.7709 0.8631 | 42.8876 0.9646 |
Length = 10 Angle = 10 | 34.8557 0.9489 | 35.8469 0.8961 | 41.7843 0.9707 |
The designed RFID measurement system is shown in Fig. 2. The system simulates the steps of goods entering and exiting a warehouse. After the RFID reader has read all tags, the control computer instructs the laser-ranging sensor to measure the distance from the sensor to the tag group. The reading distance of RFID is obtained through indirect measurement methods:
where d represents the reading distance of the RFID system, l is the distance measured by the laser-ranging sensor, s represents the distance from the laser-ranging sensor to the antenna bracket, and h is the height difference between the antenna and the tags. Both s and h are fixed values and have been measured.
To obtain the three-dimensional coordinates of the tags, the image of the tags is obtained from two different angles, horizontal and vertical. Then the image-matching method is used to locate the tags:
where g(a, b) is the correlation coefficient, t(x, y) and f (x, y) represent the images of the template and label respectively, and J and K are the number of horizontal and vertical pixels in the template image.
Then, ratio method was used to calculate the tag coordinates:
where n and m represent respectively the number of pixels from the tag to the coordinate origin and tag length in the image, and N and M represent the distance and length of the actual tag from the origin.
The reading distance of the tags is obtained by laser-ranging sensors. The three-dimensional coordinates and reading distance of the tags are shown in Table 2.
TABLE 2 Three-dimensional coordinates and reading distance of the RFID tags
Three-dimensional Coordinates | Reading Distance d (m) | ||||||
---|---|---|---|---|---|---|---|
x_{1} (cm) | y_{1} (cm) | z_{1} (cm) | … | x_{5} (cm) | y_{5} (cm) | z_{5} (cm) | |
209.1 | 33.6 | 309.1 | … | 118.4 | −160.3 | 292.3 | 1.97 |
76 | 210.4 | 294.4 | … | 207 | 25.6 | 307.2 | 1.63 |
199 | −10.9 | 309.3 | … | 47 | −112.5 | 298.3 | 2.18 |
︙ | ︙ | ︙ | … | ︙ | ︙ | ︙ | ︙ |
74.9 | 173 | 296.1 | … | 93.2 | −180.3 | 294.5 | 2.38 |
115.4 | 160.2 | 263.5 | … | 232.9 | −69.7 | 269.3 | 1.1 |
−95.4 | 140.5 | 292.8 | … | 196.7 | 31.7 | 241.2 | 1.28 |
The deeper the network layers, the more complex the data relationship mapping that can be achieved, and the stronger the expressive ability [25]. However, excessively deep layers can lead to the problems of gradient disappearance and gradient explosion, and can also increase the difficulty of training. The dense-connection method proposed for densely connected convolutional networks (DenseNet) enhances gradient back-propagation and makes the network easier to train [26, 27]. DenseNet directly concatenates feature maps from different layers, which can achieve feature reuse and improve efficiency.
The output of a traditional network at layer l is:
In DenseNet, the outputs of all previous layers are used as inputs to layer l, while the outputs of layer l are:
Among them, H_{l} (∙) represents the nonlinear transformation function, X_{i} is the output of the i-th layer of the network, and bracket [] represents all feature maps concatenated together on the channel dimension.
DenseNet’s bypass enhances feature reuse, which enables the network to better extract features. The convolutional layer of the network can learn information about adjacent positions and extract local features of the image. The pooling layer compresses features, expands the perceptual field, and obtains global features. The adjacent pixel values in image data are highly correlated, but most non-image data are relatively independent and unrelated. Therefore, convolution operations can extract key features from adjacent related information. In nonlinear regression, convolutional layers cannot extract associated features, but instead lose the basic features of the previous layer. Similarly, pooling layers cannot obtain features in nonlinear regression. Therefore, in nonlinear regression, this study replaces the convolutional and pooling layers with fully connected layers.
Similar to DenseNet, we propose a dense-connection method for deep neural networks, with many building blocks. The network-framework structure of the algorithm is shown in Fig. 3. This is a neural network composed of four blocks and fully connected layers, which can be used for regression. The initial input of the model passes through one fully connected layer, followed by four bottleneck layers, and finally the output of the network is obtained through a linear layer. Each building block concatenates the output of this layer with the output of the previous layers. Dense connections connect each building block to the last layer, allowing each layer to directly access gradients from loss functions and inputs. This model can preserve the initial input features while obtaining deeper data-representation results. Therefore, the model has better nonlinear regression ability.
The composition of the algorithm’s building blocks is shown in Fig. 4. There are no convolutional or pooling layers in this block, only 2 fully connected layers. Each building block has two layers of neural networks. Each layer includes batch normalization, fully connected layers, and activation functions. To accelerate the convergence speed of the model and alleviate the problem of scattered feature distribution in deep networks, this study retains batch normalization. ReLU is adopted as the activation function; as an activation function widely used in deep learning, it not only avoids the problems of vanishing and exploding gradients, but also simplifies the calculation process.
This section uses total 1,980 data sets of tags’ three-dimensional coordinates and corresponding reading distances. The data set is randomly divided into a model-training set and a test set, according to records. 1,780 sets of data are used to train the network, and the remaining 200 sets of data are used to test the performance of the network algorithm.
Before training, we need to preprocess the data. Data preprocessing is a fundamental task in data mining. Different types of data have different dimensions. To reduce the impact of different magnitudes and dimensions and increase the convergence rate, the data need to be preprocessed with 0–1 standardization by
where x represents the processed data, x is the original data, and x_{max} and x_{min} represent the maximum and minimum values of the obtained data respectively.
In terms of hardware, the CPU we use is an Intel i7-7700 HQ 2.81 GHz with an NVIDIA GTX 1050 Ti graphics GPU. The GRU networks are implemented based on open-source framework Tensorflow. The learning rate is 0.001. The batch size is 32, and the output layer’s dimension is 1.
To determine the optimal number of building blocks, networks with different numbers of blocks are trained on the dataset; the results are shown in Table 3. It can be seen that when the number of construction blocks is 4 or 5, the training error is relatively small. The training error of a network with 3 building blocks is relatively large. This indicates that the network of three building blocks is too simple for data regression and cannot solve complex nonlinear regression problems. When the number of building blocks exceeds 5, the training error begins to increase, and the parameters and computational complexity increase. Therefore, the best number of blocks in this paper is determined to be 5.
TABLE 3 Training results for different numbers of blocks
Blocks | MSE | MAE | MAPE (%) |
---|---|---|---|
3 | 0.0151 | 0.0589 | 3.78 |
4 | 0.0136 | 0.0524 | 3.24 |
5 | 0.0156 | 0.0485 | 3.02 |
6 | 0.0214 | 0.0582 | 3.66 |
7 | 0.0224 | 0.0578 | 3.77 |
8 | 0.0249 | 0.0563 | 3.80 |
9 | 0.0316 | 0.0676 | 4.19 |
By using deep-learning methods to model the nonlinear relationship between the three-dimensional coordinates of the tags and the corresponding reading distances, we predict the different distributions of the tag groups.
To prove the training stability of the algorithm, the training loss is shown in Fig. 5. As can be seen, the loss value decreases very quickly and then begins to stabilize. The decreasing trend of loss in the validation set is consistent with that in the training set, indicating that the algorithm in this paper converges quickly and has strong stability.
To study the algorithm’s nonlinear-modeling ability for the three-dimensional spatial distribution of tags and the corresponding recognition distance, a particle-swarm optimization back-propagation neural network (PSO-BP) [28], an extreme learning machine (ELM) [29], and a deep neural network (DNN) [30] are compared to the algorithm of this paper. Figure 6 and Table 4 show the prediction results for different algorithms.
TABLE 4 Prediction error for different algorithms
Method | MAE (m) | MAPE (%) | MSE (m^{2}) | RMSE (m) |
---|---|---|---|---|
PSO-BP | 0.3832 | 21.5 | 0.2457 | 0.6190 |
ELM | 0.3477 | 22.4 | 0.1696 | 0.5897 |
DNN | 0.0794 | 5.40 | 0.0281 | 0.2818 |
Ours | 0.0485 | 3.02 | 0.0156 | 0.2202 |
Figure 6 shows that the prediction error for this algorithm is smaller than for other algorithms. From Table 4, it can be seen that the mean absolute percentage error of our algorithm is 3.02%, while the values for PSO-BP, ELM, and deep neural network (DNN) are 21.5%, 22.4%, and 5.40% respectively. This indicates that our model’s predicted results are closer to the actual results. In particular, the prediction results for the algorithm in this paper are better than those for DNN, which indicates that the use of dense connections solves the problem of gradient vanishing and improves the algorithm’s prediction accuracy. Meanwhile, the mean square error, mean absolute error, and other metrics for the algorithm in this paper are smaller than those for other algorithms. Consequently, the proposed algorithm shows that it can effectively predict the nonlinear relationship between tag coordinates and reading distance.
In this paper, we have proposed a method for RFID data collection and analysis based on optical measurement. Aiming at the problem of motion blur in the collected images, the paper has proposed a deconvolution method based on nonlocal means. An RFID testing system was designed to use optical measurement to obtain the reading distance of RFID and the coordinate information of tags. The paper used a deep neural network based on dense connection to model the relationship between the tag coordinates and the reading distance. Experimental results showed that the error for this approach was 3.02%, which was lower than that for PSO-BP, ELM, and DNN algorithms. This shows that our algorithm can effectively predict the reading distance of the tags. The method proposed in this paper can remove blur from the image and predict the reading distance of the tag. This provides an effective method for analyzing tag performance.
The National Natural Science Foundation of China (NSFC) (61771240); China Postdoctoral Science Foundation (2022M711620).
The authors declare no conflict of interest.
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Curr. Opt. Photon. 2024; 8(2): 127-137
Published online April 25, 2024 https://doi.org/10.3807/COPP.2024.8.2.127
Copyright © Optical Society of Korea.
Zhenlu Liu^{1,2}, Xiaolei Yu^{1,2} , Lin Li^{1}, Weichun Zhang^{1} , Xiao Zhuang^{1}, Zhimin Zhao^{1}
^{1}College of Physics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
^{2}National Quality Supervision and Testing Center for RFID Product (Jiangsu), Nanjing 210029, China
Correspondence to:^{*}nuaaxiaoleiyu@126.com, ORCID 0009-0006-7837-4813
^{**}nuaa_wchzhang@126.com, ORCID 0009-0001-8418-8084
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The distribution of tags is an important factor that affects the performance of radio-frequency identification (RFID). To study RFID performance, it is necessary to obtain RFID tags’ coordinates. However, the positioning method of RFID technology has large errors, and is easily affected by the environment. Therefore, a new method using optical measurement is proposed to achieve RFID performance analysis. First, due to the possibility of blurring during image acquisition, the paper derives a new image prior to removing blurring. A nonlocal means-based method for image deconvolution is proposed. Experimental results show that the PSNR and SSIM indicators of our algorithm are better than those of a learning deep convolutional neural network and fast total variation. Second, an RFID dynamic testing system based on photoelectric sensing technology is designed. The reading distance of RFID and the three-dimensional coordinates of the tags are obtained. Finally, deep learning is used to model the RFID reading distance and tag distribution. The error is 3.02%, which is better than other algorithms such as a particle-swarm optimization back-propagation neural network, an extreme learning machine, and a deep neural network. The paper proposes the use of optical methods to measure and collect RFID data, and to analyze and predict RFID performance. This provides a new method for testing RFID performance.
Keywords: Deep learning, Denoise, Non-local means, Optical measurement, RFID tag
As a noncontact communication technology for automatic object recognition, radio-frequency identification (RFID) helps to reduce complex labor and to improve labor efficiency in warehouses, logistics, and access-control systems. RFID tags need to be effectively identified to obtain relevant information on time, but studies show that RFID’s performance in applications is lower than expected [1, 2].
Pandey et al. [3] focused on using RFID to solve hostel security issues. They performed facial recognition and compared the results to facial data in RFID to confirm the entry and exit of personnel in hostels. Bertoncini et al. [4] utilized a dynamic wavelet fingerprint for feature extraction and classification, to prevent unauthorized access and ensure the security of radio-frequency identification. Wong et al. [5] proposed a background-subtraction method for motion object detection combined with RFID technology, to detect traffic violations. But there has been little research on the performance of RFID, especially its reading distance.
Many factors affect RFID performance, including multi-tag channel collisions, packaging methods of goods, tag distribution, and power of readers [1, 6, 7]. Traditional methods have mostly solved the conflict problem of multiple tags through anticollision algorithms, to improve RFID performance. Few studies have analyzed the impact on RFID reading performance from the perspective of multitag distribution [8]. This paper establishes a multi-tag optimal distribution model, with the reading distance as the evaluation indicator.
In the past, RFID positioning technology based on the LANDMARC algorithm was mostly used for tag positioning [9, 10]. This method has complex system deployment, large positioning errors, and it is easily affected by the environment. It is difficult to use the multi-tag coordinates obtained for modeling. Therefore, we propose a novel method based on image processing and optical measurement to measure the three-dimensional coordinates and reading distances of tags. Template matching is used to calculate the coordinates of the collected RFID tag images. A laser ranging sensor is used to measure the maximum reading distance of all tags read by the RFID system. However, during the imaging process the RFID tag images become blurry, due to the relative displacement of the optical system or camera system.
At the same time, the captured images can also be affected by noise. The captured images are represented as:
where y denotes the collected image, H is a linear blur operator, h is the point-spread function (PSF), ⊗ denotes the convolution operator, x is the clear image, and n is zero-mean white Gaussian noise, n ~ N(0, σ^{2}).
The purpose of image deblurring is to restore the original image from the degraded observation. Image deblurring is an important preprocessing task in system design. A commonly used method for restoring sharp images is maximum posterior (MAP) estimation:
Due to the fact that image deblurring is an ill-posed problem, image restoration can yield many vastly different solutions. Therefore, some regularization terms are required. Φ(x) = −log p_{x}(x) is a prior distribution that represents the knowledge of the acquired image. Prior knowledge can regularize image-deblurring problems. The optimization problem becomes:
Image smoothness is an important assumption for image denoising. The adjacent pixels changed in an image present a continuity, which helps to remove noise from the image. But images also contain a large amount of non-smooth information, which is of great help for image deblurring. The smoothness processing of images makes it difficult to preserve the details and textural features of the image. At present, regularization methods based on local smoothing have been widely applied. Among them the TV model based on local gradients is particularly popular, because it can effectively restore the edges of segmented smooth images [11–15]. However, the regularization of television is influenced by blocky effects, which may lead to the smoothing out of unsmooth information.
Buades et al. [16] proposed a non-local means (NLM) filter for denoising, which utilizes feature blocks to capture the local structure of images. The contribution of NLM in the field of image processing proves its effectiveness [17, 18]. Image deconvolution is more complex than denoising, because NLM requires images to calculate weights. In [19], blurred images are used as reference images. However, using estimated weights from blurred images is ineffective, because blurred images contain much less detailed features than clear images. There are experiments using Tikhonov regularization to deconvolute images. However, the noise input has little impact on the weight parameters. Therefore, deconvolution by Tikhonov regularization is affected by noise, and the reconstructed image may exhibit noise signals. Shi et al. [20] proposed a fidelity term based on L1-norm to stabilize ill-posed problems, and introduced a soft threshold operator to enhance robustness to noise. Chen et al. [21] method of using singular values to construct regularization matrices reduces the variance of parameter estimation, while ensuring constant bias. Using a different approach, Lefkimmiatis [22] attempted to use neural networks to improve the nonlocal regularization strategy. This strategy effectively improves the denoising performance.
In this paper, an effective method for motion-blur removal is used for image restoration. RFID tag coordinates and reading distance are obtained based on optical measurement and image matching. The relationship model is established between tag distribution and reading distance. The main contributions of this paper are as follows:
1. A new tag-distribution optimization model is proposed. Based on image processing and optical measurement, the distribution of tags is measured, which avoids positioning errors caused by external interference.
2. Non-local means based deconvolution (NLM-D) is used to remove motion blur and noise from RFID tag images, while preserving the non-smooth information of each image.
3. A deep neural network based on dense-connection improvement is used to model the nonlinear relationship between RFID tag coordinates and reading distance.
Section 2 briefly reviews NLM filters and the construction of NLM-based image priors. The study deblurs the images with our method and compares it to other algorithms. Section 3 designs an RFID testing system and obtains tag coordinates and reading distance. In Section 4, we propose the method and process of an improved network, and model the relationship between tag coordinates and reading distance through the network. The paper concludes with a summary in Section 5.
In the NLM method, image feature blocks are used to characterize the local structural information of the image. Image feature blocks refer to a collection of pixels contained in a fixed shape, and these feature blocks have similar local structural information. NLM filtering can effectively reduce noise while preserving detailed information, such as image edges and textures. The filtering process of NLM can be expressed by the following weighted average:
where Ω is the search window’s size, ω(i;i′) represents the weight that characterizes the degree of similarity between two feature blocks i and i′. ω(i;i′) can be written as:
with S is image feature block; u is position within the feature block; h is smooth kernel width parameter; Zω(i) is normalization constant.
Markov random field (MRF) theory is an effective tool for analyzing and modeling image spatial-structure information. With the help of the Hammersley-Clifford theorem, also known as the MRF-Gibbs equivalence theorem, the MRF model can be widely used and promoted. The theorem proves that the joint probability distribution of MRF can be described by the Gibbs distribution with a concise mathematical expression. Gibbs random field theory uses the concept of a clique to describe the various possibilities of interaction between a location and its neighborhood. For image description, clusters represent a basic composition of image texture.
The random field X defined on the grid set is the Gibbs random field about its neighborhood system, if and only if the joint distribution function of X has the following form:
with Z_{MRF} is normalization constant, also called the partition function; T is temperature parameter; U(x) is the energy function, which can be expressed as:
with c is cluster, which represents a basic composition of texture; V_{c}(x) is cluster potential function associated with cluster c.
Obviously, the value of V_{c}(x) is related to the local structure of cluster c. A potential function V_{c}~ (x) is nonzero if and only if a particular cluster c^{~} exists; in other words, if and only if a set of positions forms a cluster c^{~}.
From the above description, using the Markov-Gibbs random field model to construct an image prior based on NLM can be achieved through the following two steps:
(1) Define neighborhood systems and clusters based on NLM; and
(2) Define the corresponding cluster potential function.
Note that L = {i | i ∈ M} represents a set of grid points of an image, where the two-dimensional vector coordinate i represents the position of the grid point, and M represents the image size. Usually the neighborhood system can be defined by spatial position, that is, by the distance from the center pixel. For example, the neighborhood system of i can be defined as:
where r is the search radius.
Similar to the above definition, the neighborhood system based on the NLM is defined here as all pixels in the search window Ω:
For the NLM neighborhood system defined above, its cluster can be expressed as:
where
The structural information of the considered image can be expressed by defining the cluster potential functions on these clusters. For simplicity, only the second-order cluster potential function is considered here (Only consider C_{1} and C_{2}); Then formula (7) can be expressed as:
According to the smoothness assumption, it is necessary for x(i) to be as close to x(i′) as possible; Then the cluster potential function can be written as:
where ρ(∙) and G(i;i′) are the potential function and the weight function, respectively.
The potential function ρ(∙) measures the size of the difference between the two-pixel values. In this paper, the potential function ρ(x) = |x| is selected.
The weight function G(i;i′) expresses the degree of correlation between two-pixel positions. The weight function G(i;i′) = ω(i;i′) is defined in this article. It should be pointed out that when the two feature blocks are completely different, the value of the weight function G(i;i′) is zero.
Therefore, the energy function (image prior) based on the NLM can be written as:
where D_{k} represents the difference operator between two points with a distance of k, for example, D_{(1,0)}x represents the horizontal gradient of x; And W_{k} is a block diagonal matrix, W_{k} = diag{G(i;i + k)}_{i}∈_{M}.
Taking the NLM image prior in Eq. (13) into the following formula, the image deblurring model J(x) of the NLM regularization can be written as:
with α——Regularization parameters, α = 2σ^{2}/T.
Since the constructed energy function based on the NLM value U(x) is a differentiable convex function, there is a unique minimum in Eq. (14). The minimum value can be solved by using the steepest-descent method based on negative-gradient-direction iteration. The basic iteration format is:
with λ——Iteration step, Under normal circumstances 0 < λ < 2;
∂U(x)/∂x——The gradient of U(x) can be obtained by taking the partial derivative of U(x) to x(m):
In the above formula,
where, t = m − u;
For concise expression, Eq. (17) is written in matrix representation:
In this paper, to simplify the calculation we ignore the influence of
Substituting (18) into (14):
It should be noted that Eq. (20) can be considered the fastest-descent process for J(x) under the given weight function {W_{k}}. Since J(x) is a convex function, the steepest-descent process of Eq. (20) can converge to a minimum.
First, use the image obtained in this paper as the initial value
To verify the effectiveness and practicability of the denoising algorithm in this paper, the method is used to deblur images with different degrees of motion blur. This study first deblurs a linear-motion-blur image. Then, in view of the complex diversity in the actual scene, this paper convolves the clear image with the complex blur kernel to obtain the complex-motion-blur image. Then we use the method in this paper to deconvolve these complex-motion-blur images to deblur. Finally, to verify the superiority of the algorithm in this paper, the experimental results for the algorithm of this paper are compared to the experimental results for learning deep convolutional neural network (IRCNN) [23] and fast total variation (fast TV) [24] methods, and the quantitative indicators peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are used to quantify the de-motion-blur results. The relevant experimental results are as follows:
From Fig. 1 and Table 1, the algorithm in this paper can effectively remove the motion blur in the label. Compared to IRCNN and Fast TV, the image processed by this method retains better details and edges, and the image after deblurring is clearer. To further quantify the deblurring algorithm in this paper, we use PSNR and SSIM indicators to describe the results. The calculated results show that, compared to IRCNN and Fast TV, the PSNR and SSIM values for the deblurring results of the method proposed in this paper are the largest, thus demonstrating the superiority of that method. In addition, the SSIM value for the deblurring result of the algorithm in this paper is indeed very large, close to 1, which shows that the structural information of the image can be well preserved after processing by the algorithm in this paper (Table 1).
TABLE 1. The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) results for different methods, with different motion lengths and angles.
Image | Metric | Motion Degree (°) | IRCNN | Fast TV | Our Method |
---|---|---|---|---|---|
Tag 1 | PSNR SSIM | Length = 5 Angle = 5 | 36.1674 0.9615 | 35.9781 0.8588 | 42.694 0.9649 |
Length = 10 Angle = 10 | 31.1276 0.9272 | 34.7498 0.8924 | 40.5456 0.9696 | ||
Tag 2 | PSNR SSIM | Length = 5 Angle = 5 | 39.5463 0.9677 | 36.7709 0.8631 | 42.8876 0.9646 |
Length = 10 Angle = 10 | 34.8557 0.9489 | 35.8469 0.8961 | 41.7843 0.9707 |
The designed RFID measurement system is shown in Fig. 2. The system simulates the steps of goods entering and exiting a warehouse. After the RFID reader has read all tags, the control computer instructs the laser-ranging sensor to measure the distance from the sensor to the tag group. The reading distance of RFID is obtained through indirect measurement methods:
where d represents the reading distance of the RFID system, l is the distance measured by the laser-ranging sensor, s represents the distance from the laser-ranging sensor to the antenna bracket, and h is the height difference between the antenna and the tags. Both s and h are fixed values and have been measured.
To obtain the three-dimensional coordinates of the tags, the image of the tags is obtained from two different angles, horizontal and vertical. Then the image-matching method is used to locate the tags:
where g(a, b) is the correlation coefficient, t(x, y) and f (x, y) represent the images of the template and label respectively, and J and K are the number of horizontal and vertical pixels in the template image.
Then, ratio method was used to calculate the tag coordinates:
where n and m represent respectively the number of pixels from the tag to the coordinate origin and tag length in the image, and N and M represent the distance and length of the actual tag from the origin.
The reading distance of the tags is obtained by laser-ranging sensors. The three-dimensional coordinates and reading distance of the tags are shown in Table 2.
TABLE 2. Three-dimensional coordinates and reading distance of the RFID tags.
Three-dimensional Coordinates | Reading Distance d (m) | ||||||
---|---|---|---|---|---|---|---|
x_{1} (cm) | y_{1} (cm) | z_{1} (cm) | … | x_{5} (cm) | y_{5} (cm) | z_{5} (cm) | |
209.1 | 33.6 | 309.1 | … | 118.4 | −160.3 | 292.3 | 1.97 |
76 | 210.4 | 294.4 | … | 207 | 25.6 | 307.2 | 1.63 |
199 | −10.9 | 309.3 | … | 47 | −112.5 | 298.3 | 2.18 |
︙ | ︙ | ︙ | … | ︙ | ︙ | ︙ | ︙ |
74.9 | 173 | 296.1 | … | 93.2 | −180.3 | 294.5 | 2.38 |
115.4 | 160.2 | 263.5 | … | 232.9 | −69.7 | 269.3 | 1.1 |
−95.4 | 140.5 | 292.8 | … | 196.7 | 31.7 | 241.2 | 1.28 |
The deeper the network layers, the more complex the data relationship mapping that can be achieved, and the stronger the expressive ability [25]. However, excessively deep layers can lead to the problems of gradient disappearance and gradient explosion, and can also increase the difficulty of training. The dense-connection method proposed for densely connected convolutional networks (DenseNet) enhances gradient back-propagation and makes the network easier to train [26, 27]. DenseNet directly concatenates feature maps from different layers, which can achieve feature reuse and improve efficiency.
The output of a traditional network at layer l is:
In DenseNet, the outputs of all previous layers are used as inputs to layer l, while the outputs of layer l are:
Among them, H_{l} (∙) represents the nonlinear transformation function, X_{i} is the output of the i-th layer of the network, and bracket [] represents all feature maps concatenated together on the channel dimension.
DenseNet’s bypass enhances feature reuse, which enables the network to better extract features. The convolutional layer of the network can learn information about adjacent positions and extract local features of the image. The pooling layer compresses features, expands the perceptual field, and obtains global features. The adjacent pixel values in image data are highly correlated, but most non-image data are relatively independent and unrelated. Therefore, convolution operations can extract key features from adjacent related information. In nonlinear regression, convolutional layers cannot extract associated features, but instead lose the basic features of the previous layer. Similarly, pooling layers cannot obtain features in nonlinear regression. Therefore, in nonlinear regression, this study replaces the convolutional and pooling layers with fully connected layers.
Similar to DenseNet, we propose a dense-connection method for deep neural networks, with many building blocks. The network-framework structure of the algorithm is shown in Fig. 3. This is a neural network composed of four blocks and fully connected layers, which can be used for regression. The initial input of the model passes through one fully connected layer, followed by four bottleneck layers, and finally the output of the network is obtained through a linear layer. Each building block concatenates the output of this layer with the output of the previous layers. Dense connections connect each building block to the last layer, allowing each layer to directly access gradients from loss functions and inputs. This model can preserve the initial input features while obtaining deeper data-representation results. Therefore, the model has better nonlinear regression ability.
The composition of the algorithm’s building blocks is shown in Fig. 4. There are no convolutional or pooling layers in this block, only 2 fully connected layers. Each building block has two layers of neural networks. Each layer includes batch normalization, fully connected layers, and activation functions. To accelerate the convergence speed of the model and alleviate the problem of scattered feature distribution in deep networks, this study retains batch normalization. ReLU is adopted as the activation function; as an activation function widely used in deep learning, it not only avoids the problems of vanishing and exploding gradients, but also simplifies the calculation process.
This section uses total 1,980 data sets of tags’ three-dimensional coordinates and corresponding reading distances. The data set is randomly divided into a model-training set and a test set, according to records. 1,780 sets of data are used to train the network, and the remaining 200 sets of data are used to test the performance of the network algorithm.
Before training, we need to preprocess the data. Data preprocessing is a fundamental task in data mining. Different types of data have different dimensions. To reduce the impact of different magnitudes and dimensions and increase the convergence rate, the data need to be preprocessed with 0–1 standardization by
where x represents the processed data, x is the original data, and x_{max} and x_{min} represent the maximum and minimum values of the obtained data respectively.
In terms of hardware, the CPU we use is an Intel i7-7700 HQ 2.81 GHz with an NVIDIA GTX 1050 Ti graphics GPU. The GRU networks are implemented based on open-source framework Tensorflow. The learning rate is 0.001. The batch size is 32, and the output layer’s dimension is 1.
To determine the optimal number of building blocks, networks with different numbers of blocks are trained on the dataset; the results are shown in Table 3. It can be seen that when the number of construction blocks is 4 or 5, the training error is relatively small. The training error of a network with 3 building blocks is relatively large. This indicates that the network of three building blocks is too simple for data regression and cannot solve complex nonlinear regression problems. When the number of building blocks exceeds 5, the training error begins to increase, and the parameters and computational complexity increase. Therefore, the best number of blocks in this paper is determined to be 5.
TABLE 3. Training results for different numbers of blocks.
Blocks | MSE | MAE | MAPE (%) |
---|---|---|---|
3 | 0.0151 | 0.0589 | 3.78 |
4 | 0.0136 | 0.0524 | 3.24 |
5 | 0.0156 | 0.0485 | 3.02 |
6 | 0.0214 | 0.0582 | 3.66 |
7 | 0.0224 | 0.0578 | 3.77 |
8 | 0.0249 | 0.0563 | 3.80 |
9 | 0.0316 | 0.0676 | 4.19 |
By using deep-learning methods to model the nonlinear relationship between the three-dimensional coordinates of the tags and the corresponding reading distances, we predict the different distributions of the tag groups.
To prove the training stability of the algorithm, the training loss is shown in Fig. 5. As can be seen, the loss value decreases very quickly and then begins to stabilize. The decreasing trend of loss in the validation set is consistent with that in the training set, indicating that the algorithm in this paper converges quickly and has strong stability.
To study the algorithm’s nonlinear-modeling ability for the three-dimensional spatial distribution of tags and the corresponding recognition distance, a particle-swarm optimization back-propagation neural network (PSO-BP) [28], an extreme learning machine (ELM) [29], and a deep neural network (DNN) [30] are compared to the algorithm of this paper. Figure 6 and Table 4 show the prediction results for different algorithms.
TABLE 4. Prediction error for different algorithms.
Method | MAE (m) | MAPE (%) | MSE (m^{2}) | RMSE (m) |
---|---|---|---|---|
PSO-BP | 0.3832 | 21.5 | 0.2457 | 0.6190 |
ELM | 0.3477 | 22.4 | 0.1696 | 0.5897 |
DNN | 0.0794 | 5.40 | 0.0281 | 0.2818 |
Ours | 0.0485 | 3.02 | 0.0156 | 0.2202 |
Figure 6 shows that the prediction error for this algorithm is smaller than for other algorithms. From Table 4, it can be seen that the mean absolute percentage error of our algorithm is 3.02%, while the values for PSO-BP, ELM, and deep neural network (DNN) are 21.5%, 22.4%, and 5.40% respectively. This indicates that our model’s predicted results are closer to the actual results. In particular, the prediction results for the algorithm in this paper are better than those for DNN, which indicates that the use of dense connections solves the problem of gradient vanishing and improves the algorithm’s prediction accuracy. Meanwhile, the mean square error, mean absolute error, and other metrics for the algorithm in this paper are smaller than those for other algorithms. Consequently, the proposed algorithm shows that it can effectively predict the nonlinear relationship between tag coordinates and reading distance.
In this paper, we have proposed a method for RFID data collection and analysis based on optical measurement. Aiming at the problem of motion blur in the collected images, the paper has proposed a deconvolution method based on nonlocal means. An RFID testing system was designed to use optical measurement to obtain the reading distance of RFID and the coordinate information of tags. The paper used a deep neural network based on dense connection to model the relationship between the tag coordinates and the reading distance. Experimental results showed that the error for this approach was 3.02%, which was lower than that for PSO-BP, ELM, and DNN algorithms. This shows that our algorithm can effectively predict the reading distance of the tags. The method proposed in this paper can remove blur from the image and predict the reading distance of the tag. This provides an effective method for analyzing tag performance.
The National Natural Science Foundation of China (NSFC) (61771240); China Postdoctoral Science Foundation (2022M711620).
The authors declare no conflict of interest.
The data that support the findings of this study are available from the corresponding author upon reasonable request.
TABLE 1 The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) results for different methods, with different motion lengths and angles
Image | Metric | Motion Degree (°) | IRCNN | Fast TV | Our Method |
---|---|---|---|---|---|
Tag 1 | PSNR SSIM | Length = 5 Angle = 5 | 36.1674 0.9615 | 35.9781 0.8588 | 42.694 0.9649 |
Length = 10 Angle = 10 | 31.1276 0.9272 | 34.7498 0.8924 | 40.5456 0.9696 | ||
Tag 2 | PSNR SSIM | Length = 5 Angle = 5 | 39.5463 0.9677 | 36.7709 0.8631 | 42.8876 0.9646 |
Length = 10 Angle = 10 | 34.8557 0.9489 | 35.8469 0.8961 | 41.7843 0.9707 |
TABLE 2 Three-dimensional coordinates and reading distance of the RFID tags
Three-dimensional Coordinates | Reading Distance d (m) | ||||||
---|---|---|---|---|---|---|---|
x_{1} (cm) | y_{1} (cm) | z_{1} (cm) | … | x_{5} (cm) | y_{5} (cm) | z_{5} (cm) | |
209.1 | 33.6 | 309.1 | … | 118.4 | −160.3 | 292.3 | 1.97 |
76 | 210.4 | 294.4 | … | 207 | 25.6 | 307.2 | 1.63 |
199 | −10.9 | 309.3 | … | 47 | −112.5 | 298.3 | 2.18 |
︙ | ︙ | ︙ | … | ︙ | ︙ | ︙ | ︙ |
74.9 | 173 | 296.1 | … | 93.2 | −180.3 | 294.5 | 2.38 |
115.4 | 160.2 | 263.5 | … | 232.9 | −69.7 | 269.3 | 1.1 |
−95.4 | 140.5 | 292.8 | … | 196.7 | 31.7 | 241.2 | 1.28 |
TABLE 3 Training results for different numbers of blocks
Blocks | MSE | MAE | MAPE (%) |
---|---|---|---|
3 | 0.0151 | 0.0589 | 3.78 |
4 | 0.0136 | 0.0524 | 3.24 |
5 | 0.0156 | 0.0485 | 3.02 |
6 | 0.0214 | 0.0582 | 3.66 |
7 | 0.0224 | 0.0578 | 3.77 |
8 | 0.0249 | 0.0563 | 3.80 |
9 | 0.0316 | 0.0676 | 4.19 |
TABLE 4 Prediction error for different algorithms
Method | MAE (m) | MAPE (%) | MSE (m^{2}) | RMSE (m) |
---|---|---|---|---|
PSO-BP | 0.3832 | 21.5 | 0.2457 | 0.6190 |
ELM | 0.3477 | 22.4 | 0.1696 | 0.5897 |
DNN | 0.0794 | 5.40 | 0.0281 | 0.2818 |
Ours | 0.0485 | 3.02 | 0.0156 | 0.2202 |