Coset decomposition method for storing and decoding fingerprint data

Biometrics such as fingerprints, irises, faces, voice, gait and hands are often used for access control, authentication and encryption instead of PIN and passwords. In this paper a syndrome decoding technique is proposed to provide a secure means of storing and matching various biometrics data. We apply an algebraic coding technique called coset decomposition to the model of fingerprint biometrics. The algorithm which reveals the matching between registered and probe fingerprints is modeled and implemented using MATLAB.


Introduction
With the increased integrations of computers and internet into individual's lives, it is substantial to protect sensitive and personal data. Now, biometric technologies are turning on the ground of an overall array of highly secure identification and personal verification solutions, see for instance [1] and [3]. Biometrics is largely automatic method of differentiating a person based on a physiological or behavioral characteristic. Examples of physiological characteristics include finger images, hand, facial characteristics and iris recognition. Behavioral characteristics are traits which are learned or acquired such as dynamic signature verification, speaker verification, gait, and keystroke dynamics. Biometrics method of identification shows various advantages over traditional methods such as ID cards (tokens) or PIN numbers (passwords) [8] for several reasons: The user to be sympathized has to present physically, often mandatory, at the point of identification and identification based on biometric techniques avoids the need to carry token or to remember any passwords. There are many types of biometrics currently in use, and many more new types are to come such as DNA and holograms. Different significant affairs have to be taken into consideration in order to design a practical biometric system. For instance, a user must be enrolled in the system so that his biometric template (or reference) can be captured. This template is securely stored in a central database or a smart card issued to the user. The template is used for matching when a user needs to be identified or authorized to login a system. It should be noting that a biometric system can operate either in verification (authentication) or an identification mode. Fingerprints which are the patterns of friction ridges and valleys on an individual's fingertips are unique for each finger of any person even for identical twins [4]. For decades, the implementation has been determining and classifying identity by matching certain points of ridge endings and bifurcations. For example, fingerprint recognition devices for laptop, desktop and cell phones access, at a low cost, are now excessively obtainable from various vendors. Users no longer require to type passwords, instead, the users fingerprints provides immediate entry. On the other hand, the codes provide a systematic way to send messages, with some additional information (check digits) in such a manner that an error occurring in the original messages will not simply be noticed (detected) by the receiver but, in many instances, may be adjusted. In parliamentary procedure for error correction to be effective, the decoding problem must be efficiently solvable. For security and matching reasons, it is desirable that the biometric information is stored in encrypted arrangement rather than in plain text. Similar to the problem of receiving and verifying a message through a noisy channel, when a user wants to access the system, the access device should allow access as long as biometrics does not vary by more than a definite number of binary digits. Many techniques which employ coding theory to tackle 'the secure biometric storage problem' are proposed. For instance, Martinian et al. [5] suggested an information theoretic solution which is based on the Slepian-Wolf theorem. The purpose of this paper is to develop an algorithmic method (called coset decomposition method) that uses syndrome bits as secure storage and decoding fingerprints data. In general, the syndrome bits should contain a sufficient data such as the preserved fingerprint (or template) data to infer the user's information. Thus, the syndrome bits should have enough attributes such that the fingerprint is securely stored and then matched in a later phase. The implemented algorithm is used to overcome the disadvantage that a person who has access to the system may not match any of fingerprint information. Therefore, the algorithm should compute a bit string which will furnish access to the system even though the bit string is not shut to any fingerprint information. For additional information about the syndrome decoding of biometrics information the reader is referred to [2], [7], and [15].

Fingerprints model
A fingerprint, as the name indicates, is the typography or the impression caused by the finger because of the style formed on the skin of the palms and fingers. It is completely formed at about seven months of fetus development and finger ridge configurations remain without changing throughout the whole life. With age, these marks get prominent but the pattern and the structures present in those fine lines do not undergo any change. Moreover, each of an individual ten fingerprints is different from one another as well as from those of each different person. For the fingerprints persistence and uniqueness, they have been used for not only in identification but also in the field of security as criminal and forensic investigation [17] for a long time. In general, every fingerprint comprises of ridges and furrows where the ridges are thick lines while the furrows are space between two ridges. Therefore, the biological principles of fingerprints count on the individual epidermal ridges and furrows which have various characteristics for different fingers. It should be is noted that the configurations and types only differ within limits that authorize for orderly assortment. Here we utilize a minutiae-based representation, which might contain more global attributes such as position and orientation of figure, fingerprint class, etc. As we will see below, a prevalent technique for working with fingerprints data is to extract a set of minutiae points and to perform some operations on them. Each biometric identifier has its own distinguishable features that can be exploited for identification purposes. In the case of fingerprints, the most important features are the ridge configurations: the way the ridge lines and the valleys between them are arranged. The configuration of the ridge lines can be analyzed at three different levels: global, local and micro levels. At the global level, alertness is driven to regions at which ridge lines take shapes of high-curvature. These are called singularities or singular regions and can be distributing into three essential types: loop, delta and whorl. The patterns on the ten fingertips should be all different, but they might also have some similar features. We can find loops, whorls and arches on our fingertips. Some fingertips have only one singularity, but some have two types of singularity [11] on one fingertip. At the local level, attention is paid to the ridge lines individually. A ridge line can be discontinuous in various ways. For example, it can turn up to an end suddenly, or it can divide into two ridges. The aim is to identify the point where a ridge line is discontinuous. These points are regarded as minutiae. Many types of minutiae can be identified from fingerprints, but the most common ones are termination, bifurcation, lake, independent ridge, island or point, spur and crossover. At present, fingerprints are preserved digitally by scanning the user fingertip. The scanning process is simple and rapid. Fingerprint sensors which work in an analogous approach are particularly designed to capture details of the fingertip. The fingerprint sensors are ordinary taken on a two-dimensional array. They are covered by a pellucid coat of glass or plastic [10]. The most common sensor types are optical and solid state. Optical sensors work by shining light on to the fingertip which is placed on the transparent sensing surface of the sensor. They reveal the light that is inverted back on to the light-sensitive sensors. The ridges, which are in contact with the sensitive surface are, either scatter or absorb the light and consequently appear dark. In contrast, valleys, which are the gaps between ridges, appear lighter because they are at a distance from the surface and so allow the light to be reflected to the light-sensitive sensors. On the other hand, solid-state sensors were primarily designed to reduce the physical size as well as the expense of the sensors. The concept was to structure an all in one silicon chip with a 2-D sensory array placed directly on the chip. To provide fingerprint images, users only touch the sensing surface of the chip directly. The idea of solid-state sensors is transform thermal, capacitive, piezoelectric or electric field information to electrical signals. Because of their simplicity and low cost, the capacitive sensors are most common type used [6]. Fingerprint identification is the oldest method that has been successfully utilized in various computer systems. Fingerprint matching is a process of evaluating the degree of similarity (or difference) of two given fingerprints. One difficulty faced in the matching process is that some fingerprints from different fingers can be similar. The differences between fingerprints from different fingers are known as interclass differences, so problems occur whenever there are small inter-class differences. Another difficulty is that some fingerprints from the same finger can be different, known as intra-class differences, so problems occur whenever there are large intra-class differences. The intra-class variations are particularly problematic, as they are much more likely to happen. There are several reasons for intra-class variations: Displacement (different parts of the fingertip are presented to the sensor); rotation (the fingertip is presented to the sensor at a different angle); pressure of the impression (the finger is pressed on the sensor with a different force); skin condition (on different occasions the fingertip may be dry, wet, scratched or dirty); condition of the sensor surface (on different occasions the surface may be clean, dirty or greasy); feature extraction accuracy. In general the matching procedures for fingerprints are categorized into: minutiae based matching, correlation based matching and ridge feature based matching. For instance, correlation-based matching works by superimposing one image over another image and changing their alignments until the correlation between the corresponding pixels of the two images is maximized. This is an intuitive method of matching fingerprints but the time and resources required to match the images pixel by pixel are huge. Sometimes, when the qualities of the fingerprint images are not good, minutiae extraction is difficult. The outcome of the matching process could be a similar value, or it could be a decision of either match or no match. Either way, an algorithm is needed to evaluate the overall difference between the two fingerprints [9]. When the outcome of the matching is required to be a decision (match or no match), a threshold is required. The degree of similarity between two fingerprints has to be higher than the threshold for the system to consider them as a match. The threshold is usually set according to the required security level: the higher the threshold, the more difficult it is for two fingerprints to be considered as a match; the lower the threshold the easier it is for them to be considered a match. The threshold and the acceptable difference level are crucial in determining whether two prints are a match, and their values need to be considered carefully in all situations where fingerprints are used for identification or authentication. As identification is a process of identifying an individual from a population of individuals, if the population is large, it may take a very long time to search through the database. The sensitivity of a fingerprint recognition system is determined by thresholds. The thresholds used in biometric recognition systems set the balance point between security and convenience. For example, when a threshold is set too low, different biometric data can appear to match when they are not the same. This is known as a false match. Conversely, when a threshold is set too high, biometric data from the same person can appear not to match because of slight variations. This is known as false non-match. To enhance the system performance, see [14], a common strategy is to divide the database into many bins. Each bin contains only fingerprints of the same class. When a fingerprint is to be identified, it is compared only with those in the bin of the same class. One simple and intuitive method is to classify the fingerprints using singularities. However, dividing the database into only five bins does not help much in improving the performance. Many real systems make use of other ridge information, such as ridge count between two distinctive features, to further divide the database into more bins. Other systems tag fingerprints with a number of attributes and classify them according to the tags. To conclude this section, we have to raise dome important issues. False match refers to incorrectly believing that two given sets of biometric data are matched. The consequence of the former error is that imposters could gain access to resources they are not allowed to access. False non-match refers to incorrectly believing that two given sets of biometric data are not matched. The consequence of the latter error is that legitimate users could be refused access to resources they are entitled to access. In practice, these two types of error are unavoidable with current technologies but, ideally, both types should be kept to a minimum.

Modeling approach
As mentioned above, the fingerprint templates acquired from the same person are most probably different, and needs error-correction. Therefore, for the same person, we attempt to match the preserved (enrolled) fingerprint and the inquest (verified) fingerprint which is modeled as a noisy channel. Once an icon of a user's fingerprint is scanned, the position of the minutiae is initially detected, and the torus is then let out into a cuboids region. Next, a stack of "Gabor" filters is used to evoke a bit gradation. "Gabor" filter act as a directed smoothing process which removes residual random noise. A MATLAB implementation could be practiced to do this most critical step. Then, the extracted feature vector w is produced by giving out bits at certain specified positions that were ground to be unreliable. Lastly, the bit string w is represented (encoded) into the secure biometrics by computing the syndrome of w with respect to a low density parity check (LDPC) code. In fact error-correcting codes can provide a tight technique to overcome the variations in biometric data. The same schemes that have been offered in the context of fingerprint data can also handle iris, face, signatures and voice information. Some outlines that make function of multi-biometrics are also starting to come out, see [12], [16], [19], [20]. A prevalent technique for dealing with fingerprint data is to distill a group of "minutiae points" and to carry out subsequent operations on these minutiae. A minutia is a discontinuity in the ridge map of a fingerprint which is depicted by its locative in two dimensions ) , ( y x and the angular orientation , see [18]. We defined the minutiae map of a fingerprint as   ) , ( Min y x if there exists a minutia point at ) , ( y x . A minutiae map is considered as a feature extraction function. The minutiae map which acts on the fingerprint image is pictured using a binary matrix, where a 1bit simply indicates the presence of minutiae at each concrete coordinate and 0-bit otherwise. It is commented that contrastive fingerprints normally have different numbers of minutiae. Furthermore, the number as well as the location of minutiae could slightly vary depending on the extraction algorithm that is practiced.
In addition to minutiae extraction, a feature transformation procedure that changes the two dimensional minutiae maps to binary feature vectors is utilized. The estimate is to generate binary feature vectors independent across different users such that different measurements of the same user are concerned by a binary symmetric channel. This is one of the principle channel models for low density parity check codes and therefore these standard codes can be used for Slepian-Wolf coding of the feature vectors. Following, we measure the number of minutiae points in a selected relatively small region across a training set of fingerprints. Then the threshold is defined as the median of the number of minutiae points in the chosen region. The threshold value may diverge for each area based on its location and intensity. If the number of minutiae points in any region overreach the threshold, then a '1' is added to the feature vector, otherwise a '0' is added. Eventually, we get an n-bit feature vector. In summation and to conclude this section, we should consider these significant two questions: What type of errorcorrecting codes should be practiced in biometrics problem and what takes place if templates of biometric data come with redundancy. These two questions will lead us to the look for error correcting codes with low-rate and largeminimum. These codes which have the planned length should also come with efficient decoding algorithm.

Coding solution for fingerprint matching problem
Consider the problems of securely storing and matching fingerprints with the help of linear coding theory since, as motioned above, biometric data is stored in binary form. The author syndrome decoding coset decomposition algorithm in [13] will be revisited to give a reliable and secure storing and authentication of fingerprint data.

Preliminaries
In this part of the article we briefly recall a few classic notions needed in the constructions of the decoding algorithm which will be utilized in fingerprint matching problem.
be the group of two elements and let , n k N  . As mentioned above, the biometric data is given in form of words (vectors) of length n as members of n F , the direct product of n copies of F . This direct product is an "Abelian" group under the addition operation. The weight of is defined to be the number of nonzero entries in the vector v . The distance, , as we are working in a product of copies of F in which every element is its own additive inverse. We define a coding function n k F F f n k   , : , and instead of storing a word w , we store the word ) (w f . There is a visible constraint on the selected coding function f : f is injective; otherwise there would be two distinct words of length k that would be received as the same word of length n . We say that ) , ( k n -code is a linear code over F if the images of f form a subgroup of n F and the elements of such a code are called codeword. For dN  an ) , , is an ) , ( k n -code for which d is the minimum distance between two different codewords. One preference of linear codes is that the minimum distance between codewords is comparatively easily found. We consider that there is an effective algorithm that is capable in decoding up to t errors, where 21 dt  . Let C be an ) , ( k n -code for some n and k . A generator matrix for the code C is a matrix ) , ( k n F G  whose rows are an F -basis of C . The matrix G which generates the code C should have rank k . A vector k F  w is encoded as the vector G  zw. It is possible that during the identification some bits of z are changed. The system receives the incorrect message y . The system solves the decoding problem, that is, it calculates , where d is the minimum distance of any two distinct codewords, then x is equal to the original vector z . In general any syndrome decoding technique, which is used to correct t errors in a codeword of length n , consists of main table including every binary n -tuples and the codeword into which it is to be detected. The rule for constructing this table is to decode an n -tuple into the nearest codeword. However, the table lookup decoding (coset decoding table) is feasible only for rather small codes. Therefore, one should persist looking for algorithmic decoding techniques which are considerably faster and request extremely less storage.

Syndrome decoding and coset decomposition
In this work, the preserved biometric w is binary and we use a linear code for the encoding function. Given the kn  binary generator matrix F  . This product is also referred to as the "coset" or "equivalence class" of w . It should be noted that any codewords produced from the system generator matrix G should satisfy the condition 0  H w . It is appropriate to put up two-column decoding table, one which contains just the column of coset leaders and the column of syndromes. Given a word w to decode, compute its syndrome, add to (subtract from, indeed) w the coset leader u which has the same syndrome -the word u w  will then be corrected version of w -finally read off the first k digits to reconstruct the original word. Let v stand for the received word when t -error correcting codeword w is transmitted over a channel corrupted by additive noise. Now e w v   , where e is a linear combination of some elements from the set }, 0 : { is the word of length n which has all digits 0 except the th i digit which is 1. To find the codeword w , the syndrome of the corrupted word v should be calculated, then this syndrome is expressed as combinations of the known syndromes, the error e is obtained as the same combinations of the corresponding coset leaders and in the end the corrected codeword is secured as e v w   . The fingerprint matching is onerous for codeword lengths of several thousand bits and tens of errors per word. Consequently, we need, as mentioned above, an algorithmic decoder which demands less storage. Suppose that we have a coding function for which the correlated code is linear code. Assume that the rate of errors is relatively high so that more parity check bits are added. Therefore, the number nk  becomes large leading a longer decoding table. In [13] we showed that the number of coset leaders in this table is reduced from k n 2 to 1 n  which presents advantage over other coding algorithms. In gain to low demand for storage, lies in two facts: high capability of correcting random errors and notable simplicity of doing calculations. The operations which are combinations of codewords are all XOR operations, thus, it could be easily programmed into hardware to evolve a fast decoder.

Fingerprint matching algorithm
The major problem to get over is the fact that the schema demands the preserved fingerprint to be compared (or matched) with inquest fingerprint; and the hardness (or practically impossible) in matching takes place when the preserved fingerprint has been provided with feature vectors from a different user or an attacker. There are also some other aspects that has to be fixed; for instance the possibility of rubbing out an unordered collection of fingerprint features. In our fingerprint matching algorithm, the generator and parity check matrices that define an error correcting code are clearly defined. Next, a fingerprint is scanned, and the minutiae are extracted and mapped to a binary feature vector. The redundantly encoded vector is obtained by applying the coset decomposition encoder. Finally, this encrypted vector of the fingerprint is retained (registered) in a secure storage medium for subsequent rapprochement during the phase of authentication. The verification (or authentication) procedure is similar to the enrollment phase. A fresh taken fingerprint sample is captured (and processed) during the admittance. In order to compare this fingerprint sample against a previously stored sample, if it exists, the feature extraction is conducted. The feature vector result from this process is usually different from the corresponding encoded vector (secret) which is calculated in the enrolment steps. Here we have a problem that is identical to the problem of transmitting an encoded secret through a noisy channel. To fix this error (or to match two vectors), we use the coset decomposition decoder. Therefore, the system attempts to identify the individual from a stored fingerprint database samples. A set of fingerprints measurements (acquired by Biometrika HiScan PIV Optical Fingerprint Scanner) has been used to evaluate the MATLAB implementation of the algorithm. All measurements have been successfully enrolled; measurements (authorized and unauthorized) have been attempted to serve as probes. The percentage of successful identification for unauthorized users is 100%, while the percentage of successful identification for authorized users exceeded 96%.

Conclusion
We developed an algorithm which is fit for matching fingerprints when several minutiae are missing or when some fictional minutia is detected. The algorithm could also be able to take care of translations, rotations and any further affine transformations. A number of modality biometrics matching algorithms comparable to the introduced algorithm have been proposed in the literature. However, the model for the secure biometrics problem based on the coset decomposition algebraic technique is, to the best of my knowledge, entirely new. My next aim is to apply the same algorithm to different biometrics systems, for instance, the iris biometrics which seem very promising. In particular, the iris matching could be easier in terms of high true match rates because of the large amount of information that might be extracted from an iris.