Iraqi Commission for Computers & Informatics ( ICCI ) Iraqi Journal for Computers and Informatics ( IJCI ) Vol ( 1 ) Issue ( 1 ) , 2014 55 Using Genetic Algorithm to Break Knapsack Cipher with Sequence Size 16

With the growth of networked system and applications such as eCommerce, the demand for effective internet security is increasing. Cryptology is the science and study of systems for secret communication. It consists of two complementary fields of study: cryptography and cryptanalysis.The genetic algorithm is one of the search methods, which finds the optimal solution. It is one of the methods, which is used to decrypt cipher.This work focuses on using Genetic Algorithms to cryptanalyse knapsack cipher. The knapsack cipher is with a knapsack sequence of size 16 to encrypt two characters together. Different values of parameters have been used: Population size, mutation rate, number of generation. :ةصلاخلا إف ،تينبشىا تيّوشخنىلإا ةساجخىا وثٍ ثاقيبطخىاو ًاظْىا وَّ عٍ .دايدصلاا يف زخآ جّشخّلإا ٍِأ تيىاعف ىيع بيطىا ُ ها ( Cryptologyraphy ) ٌيع وه يَنٍ ِييقح ٍِ ُونخحو .تيشسىا ثلااصحلاا ٌظّ تساسدو ِي : ها Cryptology) ) ها و ( Cryptanalysis .) ىا ،ثحبىا قشط ٍِ ةذحاو يه تيْيجىا تيٍصساوخىا خ ي ح ذج ح يخىا قشطىا ٍِ ةذحاو يهو .وثٍلأا وحىا لف يف ًذخخس .ةشفشىا ضمشي ه ز ا ى تيْيجىا ثايٍصساوخىا ًاذخخسا ىيع وَعىا كاسباّ ةشفش لف . كاسباّ ر ثا 61 شصْع شيفشخى عَىا ٍِ تفيخخٍ ٌيق جٍذخخسا ذقو .اعٍ ِيفشح شييا دذع ،ةشفطىا هذعٍو ،ُانسىا ٌجح : هايجلاا .


Introduction
With more and more developments in the field of computer networks and internet, the need for network, computer and information security is also increasing.There are different ways to secure information passed over the network.One such a technique is cryptology.Cryptology is the science and study of systems for secret communication [1].Cryptography is the science of building new powerful and efficient encryption and decryption methods.It deals with the techniques for conveying information securely.The basic aim of cryptography is to allow the intended recipients of a message to receive the message properly while preventing eavesdroppers from understanding the message.Cryptanalysis is the science and study of method of breaking cryptographic techniques i.e. ciphers.In other words it can be described as the process of searching for flaws or oversights in the design of ciphers [2].
Among the useful ciphering systems is the knapsack ciphers [1].One of the first knapsack ciphers was suggested by Merkle and Hellman in 1978 [3].It represented one of the initial attempts at a public key cryptosystem.While the cipher is based on an NP_complete problem [4].This paper focuses on attack on knapsack cipher with a knapsack sequence of size 16 using Genetic Algorithm (GA).Genetic Algorithms are optimization and search techniques based on the principles of genetic and natural selection [5].They contain three main operators: selection, crossover and mutation [6].

Knapsack Ciphers
Knapsack cipher was proposed by Merkle and Hellman in 1978 which is based on the NP-complete problem [2,7].Given n objects each with a known volume and a knapsack of fixed volume, there is a subset of n objects which exactly fill the knapsack, another way to express the problem is given; say 7 numbers (3,7,12,8,22,31,16) is there some combination of these numbers which add to exactly 46?The list of numbers corresponds to the volume of the 7 objects while 46 is the fixed volume of the knapsack.In this case the answer is yes, 3+12+31=46.The only way to discover that answer, however, is by trial and error.Search all the possible combinations of the 7 numbers until one that produces the target sum is found.With 7 numbers there are only 128 possible combinations [4].
The knapsack cryptosystem belongs to major categories of public/private key cryptosystem [2].The public/private key aspect of this approach lies in the fact that there are actually two different knapsack problemsreferred to as the easy knapsack and hard knapsack.The Markle-Hellman algorithm is based on this property [2,7].The easy knapsack is the private key.The hard knapsack is the public key [2].
The easy knapsacks have a sequence of numbers that are superincreasing.That is each number is greater than the sum of the previous numbers.Such a sequence is ( The knapsack solution with the superincreasing sequence proceeds as follows.The target sum is compared with a greatest number in the sequence.If the target sum is smaller than this number, the knapsack will not fill, otherwise it will.Then the smaller element is subtracted from the target sum, and the result of the subtraction, is compared with next element.Such operation is done until the smallest number of sequence is reached.If the target sum is reduced to 0 value, a solution exists.In other case solution doesn't exist [2,7].
The superincreasing knapsack is easy to decipher, which means that it does not protect the message .Anyone can recover the bit pattern from the target sum for a superincreasing knapsack if the elements of the superincreasing knapsack are known.Merkle and Hellman suggested that such a simple knapsack be converted into a trapdoor knapsack which is not superincreasing and so is difficult to break.

A. Converting Easy to Hard Knapsack
Each entity creates a public key and a corresponding private key [8].An integer n is fixed as a common system parameter.Alice should perform steps 3 -7.
Choosing a superincreasing sequence  

B. Encryption and Decryption
Bob encrypts a message m for Alice, which Alice decrypts [8].
Encryption. Bob should do the following:  Encryption Bob should do the following:

Genetic Algorithms
Genetic algorithms were developed by John Holland as a modification of what is called evolutionary programming [4].
Holland's idea was to construct a search algorithm modeled on the concepts of natural selection in the biological sciences.The result is a directed random search procedure.The process begins by constructing a random population of possible solutions.This population is used to create a new generation of possible solutions which is then used to create another generation of solutions, and so on.The best elements of the current generation are used to create the next generation.It is hoped that the new generation will contain "better" solutions than the previous generation [4,9].The steps of genetic algorithm are as the following: 1-A random population of chromosomes is 2-A fitness value for each chromosome in the population is determined.3-The selection operation is made.4-The crossover operation is made.5-The mutation process is executed to produce new population.6-Step 2 is repeated for the new population [2,7].Three processes which have a parallel in human genetics are used to make the transition from one population generation to the next.They are selection, mating and mutation.The basic genetic algorithm cycle based on these three processes is shown in Figure (1).Selection process determines which strings in the current generation will be used to create the next generation.The mating process determines the actual form of the strings in the next generation.At this point, two of selected parents are paired.The final step is one of mutation.A fixed small mutation probability is set at the start of the algorithm.Bits in all the new strings are then subject to change based on this mutation probability [4].

Cryptanalysis of knapsack Cipher Using Genetic Algorithm
Spillman [4] suggested genetic algorithm to solve the knapsack problem. Figure (2) shows the Markle-Hellman cryptosystem and cryptanalysis by means of genetic algorithm.The cryptanalysis starts from cipher text, which has an integer form.Each number represents a target sum of hard knapsack problem.The goal of the genetic algorithm is to translate each number into the correct knapsack, which represents the ASCII code for the plaintext characters.

Encoding
The following restriction have been made for encoding (1) Only the ASCII code will be encrypted.
(2) The super increasing sequence will have 16 elements; this number of elements guarantees that two characters have a unique encoding.

Initialization
A random population of chromosomes (binary string 0's and 1's) is generated.The number of bits in each chromosome is equal to the number of elements key (i.e.16).

Evaluation
In our work we used the following fitness function [1] to evaluate the generated individuals.Based on the fitness value obtained, it can be determined whether the optimal solution is reached or not.
Where, MaxDifference = max(Target, FullSum − Target) Target is the ciphertext.Sum is the sum of the current chromosome.FullSum is the sum of all components in the knapsack.
Based on the fitness function given in the equation above the fitness value evaluates how the given sum is close to the target value for the knapsack.The value of the fitness function should be in the range of 0 to 1. Fitness value 1 indicates an exact match with the target sum for the knapsack.If the value of sum is greater than targets then it have a lower fitness value of chromosome, in this way it produces the infeasible solution.If the value of sum is less than target then it will produce a high fitness value and produce feasible solutions.Feasible solutions have a greater chance of being followed by the algorithm.Small differences between the current chromosome and the target sum should be amplified.

Selection
The important part of algorithm is selection of a new population.Selection of individuals is done according to their fitness value obtained.In our work we used the stochastic universal sampling selection.Stochastic universal sampling selection procedure may be implemented as follows: 1-The fitness function is evaluated for each individual, providing fitness values, which are then normalized.Normalization means dividing the fitness value of each individual by the sum of all fitness values, so that the sum of all resulting fitness values equals 1. 2-The population is sorted by descending fitness values.3-Accumulated normalized fitness values are computed (the accumulated fitness value of an individual is the sum of its own fitness value plus the fitness values of all the previous individuals).The accumulated fitness of the last individual should be 1 (otherwise something went wrong in the normalization step).4-A random number R between 0 and 1 is chosen.5-The selected individual is the first one whose accumulated normalized value is greater than R.

Elite
Elite children are the individuals in the current generation with the best fitness values.These individuals automatically survive to the next generation [10].

Crossover
The single point crossover operation is applied in the algorithm.Single point crossover is shown in table (1).

Mutation
After crossover is performed, mutation takes place.Bit inversion is the type of mutation is used in this work.Bit inversion mutation process is shown in table (2).

Results
Genetic algorithms have been applied to cryptanalyses knapsack cipher successfully in short time.This paper used Genetic Algorithms to cryptanalyses knapsack cipher.The knapsack cipher is with a knapsack sequence of size 16.
The number of generations is 60, the population size is 1000, the selection type is stochastic universal sampling, the crossover type is single point crossover and this point is selected randomly, the crossover probability is 0.69, the mutation type is reversing, the mutation probability is 0.3, 6 bit from 16 bit are reversed in the mutation and elite is 0.01 are used in this paper.
The super increasing sequence used for the knapsack cryptosystem was (3,5,9,18,38,75,155,312,628,1265,2536,5077,10157,20317 ,40639,81280)  Figure (3) shows the best and mean fitness for each character in the word "macro".The best fitness of characters "ma" in the word "macro" becomes 1 in the generation 4 therefore the characters "ma" is obtained in the generation 25 as shown in Figure (3-a).While the characters "cr" in the word "macro" is obtained in the generation 46 shown in Figure (3-b) .The best fitness of character "o" in the word "macro" becomes 1 in the generation 12. Figure (4) shows the best fitness of all character in the word "macro".The character "o" in the word "macro" is obtained first, then the characters "ma", in the word "macro" then the characters "cr" in the word "macro".

Conclusions
This paper presents the attack of knapsack cipher of knapsack sequence of size 16 using Genetic Algorithm.This leads to the cryptanalysis of plaintext encrypted using knapsack cipher of knapsack sequence size 16 to encrypt two characters at the same time (8 bit ASCII code for each character).
This paper indicates that the efficiency of genetic algorithm attack on knapsack cipher can be improved by variation of mutation, crossover operation and size of population.The results are worse when the size of population decreases.The initial population size is inversely proportional to number of generations.The genetic algorithm offers a powerful tool for the cryptanalysis of knapsack cipher.
Representing the message m as a binary string of length

.
Sending the ciphertext c to Alice.Decryption.To recover plaintext m from c , Alice should do the following: By solving a superincreasing subset sum problem, find the message bits are For the above example: Bob encrypts a message kf m  for Alice, which Alice decrypts.

Figure 1 :
Figure 1: The basic genetic algorithm cycle.

Figure 2 :
Figure 2: Markle-Hellman cryptosystem and cryptanalysis by means of Genetic Algorithm.

Figure 3 :Figure 4 :
Figure 3: Best and mean fitness for each character in the word macro Send the ciphertext c to Alice.