A new high capacity steganography based on bit-inverting method in DWT domain

Steganography is a technique which embeds the secret messages in the cover image and transmitted in such a way that the existence of information is undetectable. In this paper, we present a new technique that consists of two encoding algorithm such as Huffman encoding and bit inverting algorithm. In this technique, we proposed a modified secure and high capacity based steganography scheme of hiding a secret message such as text or image into a cover image. Our simulation results show that the algorithm has a good perceptual invisibility and high capacity in addition to be secure.


Introduction
There have been considerable progresses in all domains of communications and technology nowadays. Thanking to these improvements, we'd better to find a secure and safe method for data communication. Accordingly, the information security has transformed to one of the most important requirements in digital communication. Information hiding is one of the major solutions for confidential communication. Some of the methods have prepared security and safety for information transformation and reception. Cryptography, watermarking, and steganography are some the useful methods of this domain [1]. Cryptography refers to the information protection through transforming it into a new format that is called "cipher text". The purpose is to preserve the massages contents from unauthorized accesses. Cryptography methods use some keys for transferring data. Some of algorithms use a key, which is called "public key", and some others use two keys, which are called "public key and private key". The second algorithm is persistent enough for all types of attacks [2]. Watermarking is another method similar to cryptography, which is used, for copyright preservation. This is a protecting technique, which embeds a watermark about copy right in digital media. Watermarking has major features such as imperceptibility, security, robustness, and blind detection. For instance, imperceptibility indicates this subject that the human eye cannot distinguish the difference between the watermarked type picture and the original version. The most important purpose is the recognition of copy right through watermarking [1], [3]. Steganography refers to those algorithms, which hide the message inside a carrier media [2]. It is a technique for hiding a message, image, or a file inside a message, image or other file. The data steganography system is specified by three different parameters, which are related to each other. Capacity, security, and persistence are the aforesaid parameters that are important in steganography method. Capacity refers to the amount of data that can be assuredly hidden inside the media. Security mentions the eavesdropper inability for recognizing the hidden information. In fact it is impossible for the attackers to recognize and extract the confidential information. Robustness is the amount of changes that can be done in stego-media without eradication of the confidential information. Practically the competence of the embedded data to be intact provided that the stego image undergoes transformation and evolution caused by the intelligent stego attacks [1]. The information is obscurant in cryptography while the steganography attempts to hide the message existence that means the human eye can't distinguish the difference between the main media and stegoed one.
The primary goal of steganography is to avoid the detection or even raising the suspicion that a secret message is being passed on. Steganography is applicable to (i) Confidential communication and secret data storing, (ii) Protection of data alteration, (iii) Access control system for digital content distribution, (iv) Media Database systems and etc. In image steganography, the information hides exclusively in an image which is called cover image. After embedding the secret message, the cover image is called the stego-image. The steganography system can be useful provided that had prepared a method for invisible information embedding and the hidden messages must be meaningful after extraction. The hidden messages might be a text or images [4]. The image steganography schemes can be divided into two categories: spatial-domain based [5] and transform-domain based [1], [5]. In spatial domain the message embedded directly without any changes or transformation in the hidden and carried data. One of the simplest and most prevalent methods in this group is the Least Significant Bit (LSB) that has been used as a pioneer of the advanced methods [5], [6]. The main privilege of this group is gaining better quality and more capacity but they are not strong in attack confrontation. The embedding process has been accomplished directly in this method. Therefore the stego image had been sensitive and is not resistant against operations such as lossy compression, cropping, blurring and etc [7]. The first stage in transformation domain technique is the conversion of the carrying picture and the hidden messages into a set of frequency domain coefficients. After conversion to frequency coefficients, embedding in new domain, there is a need for reverse reformed coefficients transformation to make image stego. There are several techniques in transformation domain such as Discrete Cosine Transform (DCT), Fourier Transform (FT) and Discrete Wavelet Transform (DWT) that are the most important and applicable algorithms in this domain. The transformation domain techniques have overcome the weakness of robustness and imperceptibility contrary to the spatial domain. The DWT is also the best method in transformation domain for the integration and a combination of time and frequency. We have proposed a DWT steganography technique which achieves to considerable capacity and PSNR in this algorithm. The rest of the essay is as the following: Section 2 describes the existing works through evaluating several algorithms that are in spatial domain and especially in transformation domain. In section 3 the steganography method has been proposed with more details and the experimental results and analysis have been shown in section 4. Finally the work summary and ideas for future researches have been represented in section 5.

Related work
In this section we will display a background of steganography algorithms especially the transformation domain. As it mentioned before, there are two major types of steganography methods: spatial and transformation. In this essay we have represented a method based on transformation and ignore the previous work evaluation based on the local domain. Most of the researchers have used spatial domain techniques for their algorithm in the last investigations, but the privileges of the transformation domain techniques convinced them to use the transformation domain techniques specially Discrete Wavelet Transform (DWT) instead of spatial domain. We mentioned in the previous section that transformation domain techniques have considerable features versus the spatial domain techniques such as improvement in robustness and imperceptibility. There are several methods which use DWT for implementation of steganography but some disadvantages such as low robustness against attackers and low security forces the researchers to use other methods of information hiding techniques like encoding, cryptography, and compression. We can achieve better robustness, quality improvement, and higher capacity using these methods. Wavelet is a combination of sine and cosine functions which contains the basic signal information in both time and frequency domain. This competence has most important data versus other transformation domain techniques such as Fourier (DFT) and Cosine (DCT). Therefore in the latest researches the DWT is the main choice in transformation domain techniques [3], [5]. According to the data format either the cover objects or the hidden messages that are used in steganography method; we can use 1-Dimensional or 2-Dimensional DWT method. The main part in DWT computations relates to the wavelet filters. In fact they show whatever computations need for transformation. One of the most important, simplest, and common methods in this domain is Haar DWT. If the data be the vector (1 D array), 1-D DWT will be used. This is one step in 1-D DWT method. We have computed the addition of all data pair by pair in the first half of the data and will compute the subtraction of all data pair by pair in the next half. Fig. 1 shows a diagram for 1-D DWT [8]. This method is used for both vertical and horizontal dimension in 2-D DWT. Fig. 2 shows the 2-D DWT using transformed image. There are four sub-bands in DWT; LL, LH, HL, and HH. The LL sub-band is the main features of data and will be ignored from any changes in embedding method, but other sub-bands include high frequency coefficients which are the purpose of embedding method [9], [10]. We can use a single-level and multi-level DWT in 2-D DWT. Each transformation is being used as LL in the main subband. In single-level DWT it will be utilized in the whole image, but in multi-level DWT it will be applied on the LL sub-band. Fig. 3 demonstrates 1, 2, and 3-levels of 2-D DWT. It improves the robustness and capacity transformation through enhancing more depth and layer but it reduces the algorithm's quality. Generally, it is being used almost in single-level, however in some algorithms a 2-level has been used too. Also Fig. 4 shows a standard picture which has been analyzed by a single-level of 2-D DWT [11].

Proposed method
In this part we introduce our proposed steganography method for hiding a large amount of high security, good vision data without losing hidden information. The represented method includes two techniques: 1) Huffman algorithm 2) Bit-inverting method In the next sections we will explain the principles, process and diagram of these methods.

Huffman encoding
Huffman encoding is one of the popular encoding algorithms for data compression. The main idea of Huffman encoding is to find an optimum code word for compressing a set of data. This expression indicates the use of variable length instead of a fixed length for encoding the main data and the criterion for the length of each data is the frequency of each character. As an instance, in an encoding algorithm that uses a permanent length 8 bits is used for each symbols (like a pixel in a picture) but in variable form, shorter length codes are dedicated to the more frequent characters and longer length codes belong to the less frequent characters in the string. Therefore we can receive desirable result and total size of the data with variable length will decrease in encoding. The Huffman encoding consists of two major stages. The first step is the calculation of Huffman table and Huffman tree. The Huffman encoding algorithm initiates with the Huffman table construction. In this stage the main data character with their frequency are ordered and stored in a list accompanying by their ascending arrangement. This algorithm is being done by performing similar stages till it finishes. In any condition, two symbols of lowest probability/frequency are selected and then being replaced by an auxiliary node which is the sum of two selected probability nodes. The list will end while it reaches just an auxiliary symbol. The probability of symbol is 1.0 in this situation. In the next stage the tree will be constructed in bottom-up manner. In fact the two smallest symbols are the leaves of the tree and the auxiliary node must be add upon them as a parent up to the evacuation of the entering list when the binary tree is been constructed and shows all the characters frequency numbers in a file. Afterward, we need to calculate the corresponding code word of each symbol by traversing the tree from the root to the leaf assigning 0 for left analysis and 1 for right. Suppose that we have five characters with their probabilities represented in the table 1. A4 and a 5 are combined with each other and have created a 45 axillary name with 0.2 repetitions. The symbols a 4 and a 5 have been dropped in this step and the a 45 has been added instead.

2)
There are four sign in the table including a 1 with 0.4 probability, a 2 , a 3 , and a 45 with 0.2. Two signs of a 3 and a 45 are arbitrarily selected with the least probability, combined with each other and an auxiliary name as a 345 with probability of 0.4 will be replaced in the table instead.

3)
Three signs of a 1 , a 2 , a 345 with probabilities of 0.4, 0.2, and 0.4 respectively have been omitted till now. We arbitrarily select a 2 and a 345 and put it after combination in the table with an auxiliary name as a 2345 with probability of 0.6.

4)
Finally two a 1 and a 2345 remained symbols are combined together and a 12345 auxiliary name will be replaced by a probability of 1. Whenever just a symbol by 1 probability remains the tree will be complete. We arbitrarily assign 0 to the right edge and 1 for the left edge to identify the code word of each symbol finally. So the results of code words are 0, 10, 11, 1101, and 1100 respectively. It is necessary to mention that the assignment of the codes to the edges is arbitrary. So the mean of the code word's length size equals to: 0.4×1+ 0.2×2 + 0.2×3 + 0.1×4 + 0.1×4 = 2.2 bits/symbol (a) However what is more important is that the Huffman code is not unique for the symbols' selection. The trend being arbitrary and the reason in this event derive from the existence of two symbols with the same minimal frequency values (probability). For better conception of the previous example, five symbols can be combined differently with each other and various Huffman codes (e.g. 01, 11, 00, 101, and 100) can be obtained. The mean of the new code word's length size has not changed and will remains as the previous code: 0.4×2 + 0.2×2 + 0.2×2 + 0.1×3 + 0.1×3=2.2 bits/symbol (b) Suppose that we must have at least 3 bits in each symbol but we achieved just 2.2 bits. The accumulation ratio over the main scheme is (2.2/3)*100 = 73%. It can be shown that Huffman provides better compression for any communication issue in comparison to the common state [12], [13].

Bit-inverting method
The more important method in our mind is bit-inverting usage in the proposed method. This is a simple and effective method for reducing the power consumption in digital systems. The most important feature achieved from this algorithm is writing a data with lowest value obtained from the difference of carrier picture pixel's value and the message pixel's value or difference of carrier picture pixel's value and the message complemented pixel's value [14]. Let's take a parameter 'D' into consideration as the distance between the current message and the corresponding pixel amount for embedding and a parameter ' ' as the distance between the message pixel value complement and the corresponding pixel amount. In the second stage we can choose between D and . The one with lower amounts must be selected and the amount of corresponding pixel has to be embedded in the carrying pixel. Take the following example into consideration for better conception of the algorithm. Suppose as an instance that the purpose is embedding data in an eight bit pixel image frame. Also, it is intended to place steganography data inside each pixel in the image with 4 bit width. According to primitive LSB law, these 4 bits must be placed in the least value 4 bits. So the 4 least value bits of each pixel are compared with the amount of considered 4 bits due to submergence. Now if we suppose that the amount of least significant 4 bits in a pixel be equal to 1001 (9) and the amount of 4 bits for embedding equal to1010 (10), we have to perform the two following steps: 1) D computation. The first step is to write without change that in this situation the message pixel value (9) and the cover pixel value (10) amount of difference (deviation) equal 1. D=10-9=1 2) Complement amount computation. In the next step the amount of message data complement must be computed. Therefore, since the message is 4 bit, the maximum amount is Then if we subtract the message amount from the maximum amount the complement amount will be achieved. The complement amount becomes (6).
=10-6=4 4) D and comparison. D=1 =4 As it can be seen, the first state of D has less difference and writing it will produced a better results. Now suppose that the amount of 4 bits for embedding equals 0110 (6). In this situation the difference in real amount equals 3 and the difference in complement amount equals 1. So writing the complement in this state is better.

Encoding algorithm
The block diagram of the proposed steganography system is depicted in Fig.6. According to this Fig., the process of the embedding secret data can be described as follow: 1) Read the secret-object as a text or image.

2)
Prepare the secret object as bit stream.

3)
Apply Huffman Encoding method on bit stream input.

4)
Read the cover image.

5)
Decompose the cover image by using Haar wavelet transform.

6)
Apply bit inverting method on the output of the Huffman encoding and embed the message bits in 4 least value bits of each pixel from the one of the approximate sub bands as HH.

8)
Prepare stego image to display.

Decoding algorithm
The following steps explain the decoding process: 1) Read the stego-image 2) Decompose the stego-image by using Haar wavelet transform 3) Extract the decoded message from HH sub band 4) Apply inverse Bit-inverting method on encoded message.

5)
Apply inverse Huffman Encoding method on output of inverse bit inverting algorithm.

6)
Apply inverse Bitwise-input algorithm 7) Prepare secret -object to display. The schematic representation of decoding process was given in the Fig .7

Experimental result
The parameters of steganographic system, such as the number of data bits that can be hidden, the invisibility of the message, and its resistance to removal, can be related to the characteristics of communication system such as capacity and peak signal-to-noise ratio (PSNR) [15].
In our study, we use peak signal to noise ratio (PSNR) to measure the distortion between an original cover image and stego image. The PSNR and MSE of cover image verses stego image are defined as follows: Where the Mean Square Error (MSE) defined as: MSE is the mean square error representing the difference between the original cover image, a, sized M*N and the stego image, b, sized M*N, and the and are pixels located at the ith row the jth column of image a and respectively. A large PSNR value means that the stego image is most similar to original image and vice versa. It is hard for human eyes to distinguish between original cover image and stego image when the PSNR is larger than 30dB [16]. To evaluate the performance of the proposed method, we implemented the proposed method by using MATLAB. The Fig. 8. (a)    For some simulation results, 14 images with different characteristics are used to examine and compare the performance of the proposed algorithm .The message was hidden within each of cover images to study the influence of cover images nature. Table2 shows the experimental results of the PSNR between the cover image and stego image computed for both of secret text and secret image. From table.2 we observed that the PSNR values of proposed method are within the acceptable range (higher than 30 (dB)) which indicates that the quality is good. The other parameter of steganographic system, such as the number of data bits that can be hidden (capacity) is taken for our experiment. As mentioned in the section of Huffman encoding, this algorithm is an effective and popular method for variable length data compression. Our method produces a set of codewords with variable length that has the lowest average length and they are used instead of fixed length symbols. In table 3, steganographed data size value for a message or an image after the applied data conversion (primary string bit) and steganographed data size after the applied Huffman algorithm with percentage of improvement is shown. These results demonstrate that the proposed method improves the capacity more than 35%.  Usually, the high capacity requirement will conflict with the high PSNR requirement. Generally speaking, when the size of the message will decrease so the capacity is increase, and this will affect the PSNR inversely. So, a trade-off should be made between capacity and PSNR requirement. From the results shown in table 2 and table 3, it can be concluded that to progress in decrease the size of secret message and so achieve the higher capacity for embedding is better to satisfied to get this PSNR.

Conclusion
In this paper a secure image steganography technique is proposed to hide secret object as a text or image, which also tells how to hide data bits. The experimental results show that the technique produced good quality stego images with good PSNR values with reduced size of the secret object.