Biomedical Research

Journal Banner

Ear biometrics for human classification based on region features mining

Mohd Shafry Mohd Rahim1*, Amjad Rehman1, Fajri Kurniawan1 and Tanzila Saba2

1MaGIC-X UTM-IRDA Digital Media Centre, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia

2College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia

*Corresponding Author:
Mohd Shafry Mohd Rahim
MaGIC-X UTM-IRDA Digital Media Centre
Universiti Teknologi Malaysia, Malaysia

Accepted date: March 7, 2017

Visit for more related articles at Biomedical Research


This paper presents an ear biometric approach to classify humans. Accordingly an improved local features extraction technique based on ear region features is proposed. Accordingly, ear image is segmented in to certain regions to extract eigenvector from all regions. The extracted features are normalized and fed to a trained neural network. To benchmark results, benchmark database from University of Science and Technology Beijing (USTB) is employed that have mutually exclusive sets for training and testing. Promising results are achieved that are comparable in the state of art. However, a few region features exhibited low accuracy that will be addressed in the subsequent research.


Feature extraction, Region-based features, Eigenvector, Ear biometrics


Biometrics corresponds to self-verification or identification by considering personality such as behavioral or physical, which are related to the individual features. Therefore, the person could be identified based on self-personality, rather than by external identity such as ID card or password [1,2].

Nowadays, numerous automated biometrics systems have been used for facial recognition, fingerprint recognition, hand identification, ear identification. Gait analysis would significantly improve the effectiveness of forensic work adopted by security forces. Literature confirms that ear could be used for human identification [3]. Iannarelli [4] has proven the ear is a stable anatomical as it doesn’t vary significantly throughout human life. Moreover, the ear is visible and uncovered to allow a good hearing. During biometrics enrolment, people are more comfortable taking ear image rather fingerprint and face images [5]. The ear does not have emotional traits like face expressions. Burge and Burger [6] construct neighborhood graph by transforming the detected edges into Voronoi diagrams. Hurley et al. [7] develop an approach by extracting the energy features in-ear image. Ansari et al. [8] originate a method to locate the outer ear based on carrying out convex curved of edge regions. Although yielding accurate registrations, their method yields some false positives matching. It is due to the occlusion mistakenly included in-ear convex. Furthermore, the elliptical shape of the outer ear is considered by Arbab et al. [9]. They achieve promising results while dealing with occlusion. However, registration accuracy is same as compare to manual registration. Viola et al. [10] propose a robust approach to detect the object using Haar-like features. Nevertheless, the method produces inaccurate localization. Abate et al. [11] enhance method by taking the centroid of ear edge for localization. Unfortunately, it is excessively sensitive to occlusion. In addition, Abdel et al. [12] utilize Hausdorff edge template matching with skin colored regions of an image. However, it requires good lighting conditions to detect the skin region accurately and hair around ear could disturb edge detection of the outer ear [13-16]. Even numerous studies have been reported in the literature, however, unconstrained ear recognition still need to be improved [17,18]. This study proposes a new method to recognize ear image at a different pose. In this article author focus on presenting the region-based feature extraction methods developed for ear biometrics applications. Section 2 presents region-based methods of ear feature extraction, section 3 exhibits results and analysis and the conclusion is drawn in section 4.

Proposed Method

Overview of proposed method

This section presents a robust region-based approach to extract features of ear image extracted from a human head. Moreover, unconstrained ear images are captured at various angles including natural occlusions (hair and earring) and different illumination. However, ear image captured from the backside is avoided. The USTB ear images size is standard (300 × 400 pixels). Some samples from USTB ear database [19] are presented in Figure 1.


Figure 1: Ear image samples from USTB database [19].

Images at high illumination are too bright, so edges of the ear are unclear. Hence, the image contrast is normalized to obtain ear image with clear edge [20-23]. Therefore, pre-processing is performed to enhance images quality. Literature cited in the previous section is only suitable for side pose of the ear. Thus, this paper proposed local features to adapt pose variation of the ear. Accordingly, the ear image is sliced into several regions to extract local and independent features mainly due to the following reasons [24,25].

1. Decomposing the image into some region gives better textural information and therefore, contributes accurate recognition system.

2. Partitioning the ear texture can eliminate unclear regions into several blocks that reduce noisy regions to a minimum.

3. Local and global information of every region provides both important texture features and a high level of robustness.

Two geometrical features defined as Triangle Ratio Method (TRM) and Shape Ratio Method (SRM) is considered as proven yield promising accuracy. Furthermore, eigenvector feature is applied on each region. The proposed method works in the small region to handle pose variance. The different pose makes some part of the ear unclear or not fully captured. In such case, only some region of the ear will have significant features [26-30]. Finally, the extracted features are used as training data for the neural classifier. However, samples not included as training data, considered as testing data and output of the ANN is whether ear image can be identified or not [31-33].


The ear images have different contrast so the local contrast is applied rather global. Hence, Contrast Limited Adaptive Histogram Equalization (CLAHE) that can adapt the local contrast rather global contrast [34,35]. This method limits the slope of gray level to avoid saturation. It is using the clip limit factor to avoid over-saturation of the images, particularly in identical areas that present high peaks in the histogram [36].

Slicing image into region

The following steps are implemented to cut ear image into several regions.

1. The input image is denoted by m × n matrix.


Where, m and n are width and height of image respectively. The size of region matrix (mslice and nslice) is calculated as below:



2. Then, 6 region matrices are derived from matrix A according to size reference, mslice and nslice.


3. Finally, the size of each region matrix is expanded with respect to k (k<m, n) that produce overlapped matrix.


The result of slicing procedure is depicted in Figure 2. It produces six ear image region before expanded (Figure 2a) and after expanded with k=10 (Figure 2b). Expanded regions have overlapped portion that taken from its neighborhood. Further, eigenvector features are extracted from those regions.


Figure 2: Six regions of the ear image.

Feature extraction

In the proposed method, local and global feature are extracted from ear images identify human. Global features based on geometrical parameters and properties from ear edges. In this regards, edge detection method is performed to produce edges from ear images, in which the extracted edges contain unique information regarding geometrical and shape properties of the certain ear. After that, the feature vector is reconstructed from extracted edges on the basis of the adopted geometrical parameters [37-39].

Local features are extracted from a part of ear image. Hence, a slicing procedure as explain in section C is performed to split the ear image into apart. Both features are described as below:

1. Two global features originated by Choras [12] are briefly described as follow:

a) Triangle ratio method: This feature can represent contour in ear image by considering the invariant geometrical features. The critical step is selecting longest contour from ear edges. Afterward, using Equation 3 the triangle ratio is computed.


hm and hw are both represent the height of the triangle,

w1 and w2 are both correspond to sum of the length of two triangle side.

b) Shape ratio method: This feature calculated shape ratio of the main contour. The shape ratio defined as kk is described in Equation 4.


Lc is the contour length given by Equation 5, dkp is the length of the line connecting the ending points of each contour given by Equation 6.



Q-number of contour points, c-number of contours, for c=1, ..., C, (x, y)-coordinates of contour points, q-indexation of the current contour point.

2. Local features: In the early 90s, an eigenvector is commonly adopted in the field of face recognition [26]. Moreover, it is successfully implemented in various systems [27-29]. Hence, this study considered eigenvector as a feature for local ear image. Generally, an eigenvector is transforming a high dimension image into a lower dimension, in which will be recognized. Variance within vector space becomes clear and measurable. The Eigenvalues and eigenvectors are defined in Equation 7.


Where, the average covariance image matrix is denoted by C. λ corresponds to the Eigen values of C and u represents to the eigenvectors. The eigenvector of each slice is extracted as follow:

a) Each ear slice is grouped into six image set. Suppose, region I of ear image in training set should be in the same grouped, and similarly with another region.

b) The average vector in the region image set is calculated by summing the entire column vector and divide by the number of images. Equation 8 described the average vector.


c) Using above ψ vector, the normalized mean of the image can be derived using Equation 9. τi represent the differences between the set and the image. Afterward, C can be calculated using Equation 10.



The eigenvectors that consider from highest Eigen values is representing the significant features of the image set. Hence, it can use to estimate any image in the set. In this study, each slice has an eigenvector that consists of six Eigen value.


The training set is normalized and is fed to the neural classifier for training phase. A number of experiments are conducted with USTB ear database to obtain optimal ANN structure [40-42]. Finally, optimal structure of ANN is depicted in Table 1.

Input Hidden Output Momentum/learning rate
64 neuron 31 neuron 8 neuron 0.05

Table 1. Neural network optimal structure.

Result and Discussion

The proposed identification technique is tested on University of Science and Technology Beijing (USTB) ear database [11]. USTB database contains multi-pose ear data set composed of 79 subjects. The ear image is taken on a different angle by 5o in the range [0°-45°]. Additionally, images have different lighting condition. In this study, only small set images contain 750 ear images of 142 individuals with various pose or view are considered for the experiment. Furthermore, 600 ear images are used for a training session and the remaining images are considered for testing and evaluation. The experiments are conducted in two sections. In the first experiment, the ear identification is performed with global feature only. In the next section, identification is performed with a fusion of global feature and the eigenvector feature that extracted from segmented slice regions (local feature). Table 2 depicts that fused features, local and global features, achieved better recognition rate. However, additional features also increase computation cost.

Extracted features ANN Structure Number of features Accuracy
TRM-SRM I 64 86.67%
TRM-SRM-Eigen vector II 64+36 93.33%

Table 2. Neural network optimal structure.


This paper has presented a feature extraction method based on local and global features. The local features are extracted from small region rather the whole image in order to handle invariant pose. In this regards, a simple and effective slicing procedure is presented. Following pre-processing stage, an edge detection algorithm is performed and global features are extracted based on detected edges. It includes Triangle Ratio Method (TRM) and Shapes Ratio Method (SRM). Finally, to extract local features image is sliced into six regions. The feature vector is fed to the neural classifier to identify humana ear.