Feature Extraction Techniques for Leukocyte Classification - A Review

This paper covers an investigation on the various feature extraction techniques employed for the statistical estimation of leukocyte classification from blood sample images since the identification or analysis of these four classes of leukocytes plays a vital role in the early identification of various diseases. The manual estimation of these WBC’s by pathologist is error prone and time consuming. This paper mainly concentrates on the study of leukocyte classification methodology and various feature extraction techniques for the classification of four classes of Leukocytes such as Neutrophil, Lymphocyte, Monocyte, and Eosinophil which can be fed to SVM or neural network for further classification.


Introduction
Necessary substances like nutrients and oxygen to the cells are delivered by the body fluid called blood. The blood consist of different components like Plasma, Platelets, Red Blood Cells(RBC's) and White Blood Cells(WBC's).The main component of blood is Plasma which consist of water with ions, nutrients etc. About 55 % of blood volume is plasma. Blood clotting is the responsibility of platelets. Red Blood Cells (RBC's) carry Oxygen and Carbon dioxide. White blood cells are the important part of immune system which is very much important for the persistence of human body. White blood cells ensure the correct working of immune system. White blood cells are also called leukocytes. The WBC's constitute less than 1% of blood volume. The size of the WBC's is bigger than RBC's. WBC's have normal nucleus and mitochondria. There are mainly five types of WBC's which corresponds to two groups like granulocytes and granulocytes. Granulocytes group consist of Neutrophils, Eosinophils and Basophils which contain granules in their cytoplasm when viewed using a microscope after staining. Agranulocytes consist of monocytes and lymphocytes which do not contain cytoplasm. Maintaining the desired count of WBC's is quite important. The low rate of WBC's are due to different type of diseases like Aplastic anemia, HIV/AIDS, Hypersplenism, Kostmann's syndrome, Leukemia, Lumps, Malnutrition and Vitamin deficiencies, Rheumatoid arthritis, Tuberculosis etc [20]. The classification and counting of WBC's are performed manually by pathologists which is a hectic and time consuming process. This paper covers various feature extraction methods for leukocytes classification methods adopted by various researchers based on digital image processing techniques .The image data set is from the dataset provided by Sarrafzadeh et.al [3].

Related Work
The work by Muhammad Sajjad et.al [1] have concentrated in the estimation of classification of leukocytes or WBC which are supposed to be the basic building block of immune system of human body. This paper mainly deals with multi-class classification based on features extracted based on textural, wavelet transform and statistical properties. These features are fed to an ensemble multi-class SVM, for the classification of leukocytes to the classes such as Neutrophil, Lymphocyte, Monocyte, Eosinophil and Basophil. The dataset used in this experiment by the researchers are gathered from HMC. This work claims an accuracy of 94.3% for the classification of WBCs. The work carried out by Lin He et.al [6] utilized Discriminative Low-Rank Gabor Filter (DLRGF) method based classification of spectral spatial images. The methodology is based on the classification using DLRGF_SVM and DLRGF-LS. Lata A. Bhavnani et.al. [5] performed accurate counting of White Blood cells and Red Blood Cells (RBC) using digital image processing and estimating the accurate method for the same through comparison. The research techniques are segmentation using Ostu's hresholding, Erosion, Edge detection, watershed and Hough transform. The objective of the research performed by Margarita Gamarra et.al [7] is to have a detailed study to provide the trends in cell image processing and to have a detailed comparison of various feature extraction and segmentation techniques for cell image processing. This work performed comparison of segmentation techniques like edge-based segmentation, thresholding, clustering and color based feature extraction technique for cell identification. The work conducted by Xi Yin et.al [8] is to provide an approach for prediction of TMB segments using sparse coding algorithm. This work relays on Position Specific Scoring Matrix and Z-coordinate score for feature selection. Another relevant application where image processing techniques plays a vital role is in the detection of dengue fever through the estimation of platelet count from microscopic blood image samples. J Poornima et.al. [4] have conducted a research in the above area using segmentation techniques and morphological techniques for counting the platelet from blood samples. The automated approach for the leukocyte classification is depicted in the fig.1 and fig. 2. In this system the input is microscopic blood sample image. This system consists of two phases: training phase and testing phase. In both phases first three steps are same. These steps include the preprocessing of image followed by the segmentation and feature extraction. In the case of training phase the extracted features from the image are the parameters used for train the system for leukocyte classification using neural network. The system is trained using different classes of leukocytes. In testing phase the extracted features are the parameters which are used to identify the corresponding class of leukocyte. The preprocessing is mainly used for noise removal and image enhancement. The histogram equalization and different type of filters are the commonly used techniques for preprocessing [9].Segmentation is used for border identification by which the leukocytes are being isolated from other components in blood. The most commonly used segmentation methods are edge and border detection algorithms, region growing technique, filtering, mathematical morphology and watershed segmentation [10]. The segmentation is used to segment nucleus and cytoplasm from leukocytes [11].The feature extraction is the next step after segmentation. Fig.3 shows the examples of subtypes of Leukocytes.

Existing feature extraction techniques for the Classification of Leukocytes
Feature extraction is the most crucial and important step in the classification process, since these features represents the natural similarities which are the descriptors of an image. The features with their labels are used for matching different images by the classifier followed by designating those different images into corresponding classes [1].There are mainly three types of features such as color features, shape features and texture features can be extracted from an image. The most commonly used feature extraction techniques are as follows:

Gray Level Co-Occurrence Matrix(GLCM)
The texture features of an image can be obtained by GLCM. In this technique the texture can be modelled as two dimensional gray level variations which can be modelled as two dimensional arrays. The frequency of different combinations of pixel values occur in an image can be tabulate as co-occurrence matrix [12].The GLCM texture features include energy, correlation, sum, contrast, variance, average, homogeneity (inverse difference moment),sum variance, difference variance, sum entropy, difference entropy ,entropy and information measures of correlation describes the texture of the image [2]. For leukocytes the texture features are basically tabulated separately for both nuclei and cytoplasm.

Geometric Feature Extraction
Shape feature of leucocytes can be tabulated by extracting Geometric features such as Area, Perimeter, Convex area, Solidity, Major axis length, Minor axis length, Orientation, Filled area, eccentricity, Ratio between cell and nucleus areas, rectangularity of nucleus, circularity of cell, number of lobes, solidity and mean gray level intensity of cytoplasm . [13][14].

RST Moments
RST is a technique used to extract shape features. It uses region based moments which are invariant to geometrical transformations such as rotation, scaling and translation. The regular moment invariants are proposed by Hu [19] which is based on algebraic moments [12].

Pseudo Zernike (PZ) Moments
PZ moments are orthogonal and complex valued which can define a 2-dimensional function on unit circle. The PZ polynomials which are orthogonal to each other are used derive PZ-moments.
The absolute values of PZ moments are independent from image rotation. These moments have low sensitivity to noise and are mostly used for pattern recognition [15].

Discrete Wavelet Transform (DWT)
DWT is used to extract features from frequency domain. The DWT is applied to every dimension of the 2-dimensional blood images which produces four sub bands LL, LH, HH and HL.
Since the statistical rich features are present in LL band, the level -3 decomposition for feature extraction is used. So the DWT feature extraction procedure is repeated two times for LL band [1].

Local Directional Pattern (LDP)
The LDP is computed by using the relative edge response value of a pixel in all eight directions at each pixel position. The LDP consist of 8 bit binary code. In LDP Based feature extraction the LDP operator is used to transform a gray image into a LDP labelled image where the value of each pixel is the LDP code computed for the that pixel in the same position as that of original image. The LDP is proposed to overcome the disadvantages of LBP [16].

Gradient Inverse Coefficient of Variation (GICOV)
In this technique the feature extraction is based on GICOV score. For each contour the GICOV function assigns a score and the true boundary of leucocytes is the contour with highest score. GICOV is an edge based scoring statistic since the edge strength of a leukocyte remains constant around its boundary to some extent. The GICOV is the ratio of mean and standard deviation in the outward normal direction [17].

Linear Discriminant Analysis (Lda)
In Linear discriminant analysis the multidimensional dataset is reduced to six dimensions using Fishers linear discriminant. The linear combination of the features is computed by LDA which characterizes the WBC's into different classes [18].

Conclusion
This survey comprises of various image processing techniques for the statistical estimation of human blood properties like WBCs .The leukocyte classification and detection system along with various feature extraction methods adopted by several researchers are also discussed. The techniques for shape and texture feature extraction are also involved in this survey. The GLCM and Geometric Feature extraction are the most commonly used feature extraction method for leukocyte classification.