| Week 13 (4-23-06) The GUI and my presentation are FINISHED!! The website is (almost) finished, with only one page left.
					Tomorrow I will be practicing my presentation and concluding my analysis.  The big presentation is 
					Tuesday afternoon - it will be such a relief to have it done.
 
 Weeks 11 and 12 (4-17-06)
 These past two weeks I have been busy gathering loose ends.  I continued development of the GUI
					and I have finished my final two algorithms for finding the plaintext/key.  The coding of the 
					matrix reduction algorithm proved to be the last difficult part.  Although I admit, had I done 
					my math correctly on paper and remembered all addition and inverses had to be done in mod 26 
					when I was verifying my algorithm worked, it would have taken me less time and saved on confusion.
 
 The basic algorithm that attempts to match characters purely on frequency counts is the least
					accurate.  This is logical considering the other two algorithms go through EVERY possibility 
					and select the best fit.  I was quite pleased with the other two algorithms.
 
 Next week is the big presentation.  Before then my to do list is:  finish the GUI, do the 
					analysis on the various methods, and prepare my presentation.
 
 Week 10 (4-02-06)
 I have started to 
						integrate my algorithms to find the key length into one C++ program for use by 
						my GUI. While it was a bit of a slow process, it also encouraged me to look 
						over my code again and clean it up. I also started my GUI. I am having 
						difficulties executing my GUI from the classes folder so until I can clear this 
						up with Dawn Rohm, it can be found at compsci.snc.edu/~webemm/CrackCode.pl
 
 My to do list: implement a matrix reduction algorithm, complete the final two 
						algorithms for decrypting a ciphertext, perform the analysis on the various 
						algorithms, make alterations to the function that combines all algorithms for 
						finding the key lengths according to the analysis, finish the GUI, and get 
						ready for the presentation.  Needless to say, these next two weeks I will be 
						busy.
 
 
 Week 9 (3-27-06)
 I decided to put 
							the algorithm I was implementing on hold and move on to a new one. As a result 
							of statistical analysis occuring only down a column, the only information I can 
							take advantage of initially is frequency analysis on single letters (meaning I 
							cannot consider digram or trigram frequencies in the English language such as 
							'th' and 'the'). My plan of attack for this algorithm is to associate the most 
							frequently occuring letter in a column with having been encrypted by the letter 
							'e'. Then compute the frequencies of the letters based on this encryption and 
							determine if it is close enough to the standard frequencies. If it is not, 
							determine another logical encryption and continue the method. Finally, after 
							reaching a guess on each column, allow the user to select a specific column to 
							change the resulting text to perhaps a more feasible guess. I'll work on this 
							after I complete the other two algorithms I want to implement.
 
 I have finished coding an algorithm which does quite well in decrypting a text. 
							This algorithm goes through the text column by column and takes the dot product 
							of each column with every possible shift of the alphabet. The dot product 
							closest to 0.065 is the probable shift of the column. This worked on all of my 
							test cases with no difficulties (hurray!).
 
 Now I am working on implementing a third algorithm which attempts to determine 
							the relative shift between two columns. I have an artificial understanding of 
							the algorithm, but to ensure that I am not making up my own reasons for it 
							working, I am going to confer with Dawn Rohm tomorrow. This algorithm requires 
							solving a system of equations - I picked up references from Dr. McVey today on 
							coding matrix reduction. I hope to have this implemented before Thursday.
 
 On a side note, as a result of Dawn Rohm's suggestion I altered the final 
							program I was working on last week to find the GCD of all repeating strings. 
							Instead of considering all repeating strings of length three or greater, I 
							increased the length a string must have to qualify as repeating to a length of 
							five. This resulted in great output - one GCD - which was correct for all test 
							cases but one. The one test case failed because there were no repeating strings 
							of length five, but at length four, it also worked correctly. These results 
							make sense because it eliminates (most) of the coincidental repeating strings. 
							Hence, in my final implementation I plan on having a defaulted length of five 
							and allowing users to alter this length.
 
 
 Week 7/8 (3-20-06)
 I have the final 
								algorithm working to find the keylength. The initial program simply recorded 
								the distances between repeating strings and attempted to find the GCD of ALL 
								these distances. The biggest issue I had with implementing the Kasiski Test in 
								this manner was a resulting overall GCD comming out to be 1. This results in no 
								gained knowledge. I attempted various ways of "throwing out" pieces of data to 
								get a meaningful GCD, however, none seemed to help. The solution I decided to 
								implement was to keep track of the distances for a specific string and compute 
								the GCD over all distances associated with a single string. If this GCD was 
								calculated to be 1, I threw out the string reasoning that there were 
								coincidental appearances of the string in various places, thus throwing off the 
								algorithm. However, if the GCD was greater than 1, I stored the result. 
								Finally, I took all of the GCD's for various strings and attempted to compute 
								the GCD of the compiled data. This proved to be a futile attempt as well: all 
								of my test cases resulted in an overall GCD of 1. Rather than attempting to 
								identify which numbers must be thrown out to get a useful GCD, I resorted to 
								printing out all GCD's computed. While this method may not be effective by 
								itself, I think (and hope) that the the resulting list will prove to be 
								beneficial when combining the various methods for finding the key length.
 
 At last I have moved on to the second stage of my project: working with the 
								cipher text to identify the plaintext. This requires knowing the keylength, 
								which for the moment, I am assuming the user can identify. The first algorithm 
								I am attempting is to divide the cipher text into rows of the same length as 
								the key. Then, using statistical analysis down a column (ie use the standard 
								english letter frequencies in comparison with the letter frequencies of the 
								cipher text). This method works because a column is essentially a 
								monoalphabetic cipher (ie a shift cipher). I am still working on this 
								algorithm. I have been able to get portions of the key, however, I cannot 
								obtain the full key, even on a simple example.
 
 This week I am going to continue working on this algorithm. Anticipating some 
								frustration/difficulty, I may start coding another method for finding the 
								plaintext and come back to this algorithm later.
 
 
 Week 6 (3-05-06)
 The third 
								algorithm works! The problem was not a rounding error; rather the algorithm I 
								was using was faulty. I was splitting my cipher text into substrings of various 
								lengths and computing the IC across a substring. However, after consulting 
								another source (and Dawn Rohm, I realized that the IC's need to be computed 
								DOWN a column (i.e. take the first letter of each substring and compute the IC, 
								then take the second letter of each substring and compute the IC...). This 
								makes perfect sense because if the substring lengths match the key length used, 
								down a column is a simple monoalphabetic shift cipher.
 
 I have also been working on implementing the Kasiski Test this week - yet 
								another method for finding the key length, although I have not actually done 
								any implementing yet. The algorithm appears easy: identify repeating strings, 
								determine the distance between the repeating strings, and compute the GCD of 
								the distances. I will be using Euclid's Algorithm for determining the GCD, but 
								the algorithm to find the repeating strings has required more thought. At first 
								I was going to do a 3-D array of dimensions 26x26x26 and go through the text 
								and make a tally mark at each three letter combination found. Then, any cell 
								with two tallies or more has a repeating string of length three. I decided to 
								throw this algorithm out because it would require a great deal more work to 
								determine the full repeating string. For example, if the cipher text included 
								two substrings of 'abcd' the algorithm would identify 'abc' AND 'bcd' as 
								repeating sequences. However, for my purposes, I need to identify it as a 
								single repeating string of four characters. I toyed with the idea of combining 
								repeating strings that are located at position x and x+1, however, this would 
								either prove to be inefficient or have data integrity issues, especially in the 
								case where 'abc' is a valid repeating string, but so is 'abcd'. Needless to 
								say, I threw this algorithm out. The next idea to try is to take my existing 
								shifting algorithm and adapt it to identify not just one character matching, 
								but sequences of three characters or more.
 
 I have updated the downloadable version of the slide show to include my 
								algorithms for identifying key length. I have realized that I am behind on my 
								posted schedule and that it requires some modification. However, I do not feel 
								too badly about this because I know I have been putting the effort into the 
								project. My goals for this week are to get this last algorithm coded and 
								working for determining the key length and begin design and development of an 
								algorithm that gets the key length as input and then uses the IC to attempt to 
								decode the cipher text.
 
 
 Week 5 (2-23-06)
 Here is an 
								equation for you: Debugging logic errors = No fun. With three algorithms coded 
								to find the key length, two working properly, I have one left to work with. 
								After having a short debugging session with Dawn Rohm, I have an idea of where 
								to go - the error in the algorithm appears to be a rounding error. So instead 
								of taking a complete text and cutting it into substrings, I am going to take 
								substrings and make progress toward the complete text.
 
 Next on my agenda is to fix the problem program and code one last algorithm for 
								finding the key length. This will require developing an algorithm to find all 
								repeating strings in the text. As Dr. McVey suggested, I am going to check for 
								a dynamic programming solution.
 
 
 Week 4 (2-19-06)
 I finally got 
								some code done! I have an application to encrypt and decrypt a text file given 
								the appropriate key and another short program to find the key length. While I 
								have fine tuning and testing to do yet, it is a good feeling to be working with 
								code. I have also started programming a second method for finding the key 
								length. This week, I will complete the all the algorithms for calculating the 
								key length so that next week I can focus on revising my data structures and 
								testing the short programs. I was thinking we should meet this week. 
								(Wednesday, Thursday, or Friday would work for me.)
 
 
 Week 3 (2-12-06)
 You can now find 
								the list of references and the tutorial on cryptography/introduction to my 
								project posted under the project section of the website. The initial 
								assumptions I am using for the project can be found in the tutorial. In 
								addition, the project timeline has been updated. This comming week I will be 
								focusing on producing a functional algorithm for finding the key length.
 
 
 Week 3 (2-8-06)
 This week I spent 
								really digging into the sources I've found and trying to understand the various 
								algorithms. Monday I met with Dawn Rohm to talk about how to determine the 
								keyword used in an encryption. The details of the function used to do the 
								frequency analysis were unclear in the source. Thus, we came to the conclusion 
								that it may be easier to try to do the algorithm on the computer and fine-tune 
								it as progress is made and then return to understanding the function.
 
 The meeting on Tuesday with DCP, McVey, and Rohm resulted in some clear 
								objectives and subtasks for my project. The two primary subtasks are: find the 
								length of the key used to encrypt the message, and after knowing the key 
								length, use a frequency analysis to determine the plaintext. Within each 
								subtask, there are multiple techniques to consider.
 
 My goal for Monday is to post a revised schedule, the (initial) assumptions I 
								am using for the project, a list of references, and a small tutorial on 
								cryptography on my website. In addition, I am aiming to have a functional 
								algorithm for finding the key length as well as a few test cases ready by the 
								end of the week.
 
 
 Week 2 (2-5-06)
 This past week 
								has been spent primarily on research and preparing for a discussion with the 
								professors. I know the steps to perform cryptanalysis on a polyalphabetic 
								substitution cipher text (as in an XOR encrypted text): in a ciphertext-only 
								attack, first determine the key length (by means of the Kasiski test, Friedman 
								test, index of coincidence, or contact analysis), then assume the calculated 
								key length, and finally use frequency analysis combined with guess-and-check to 
								determine the key, thus cracking the code. While researching, I was not clear 
								on the details of certain parts of the mathematics and theory behind the index 
								of coincidence, and as such, I am planning on visiting with Dawn Rohm Monday to 
								resolve some of this confusion. As I am getting further into my project, I am 
								wishing I would have taken a Mathematics course in Probability and Statistics. 
								In addition, I now recognize that my algorithm will require more user 
								interaction than what I had initially thought necessary.
 
 Looking forward to next week, my plans are to talk with Dawn Rohm Monday and 
								consult with DCP and McVey Tuesday (as discussed this week) to complete the 
								proof of concept requirement. Also, I will get my first program up and running 
								and pending my firm comprehension of applying the index of coincidence, begin 
								to develop algorithms to do a cryptanalysis of the XOR encryption scheme.
 
 
 Week 1 (1-29-06)
 After receiving 
								my senior capstone project I was excited to dig in and get started, and that 
								enthusiasm has still not worn off. This week I focused on gaining a general 
								understanding of my project requirements, including researching to acquire some 
								background knowledge in the field of cryptography. Now that my feet are wet, I 
								am aquatinted with the terminology surrounding the XOR Encryption scheme as 
								well as which topics to further research in relation to cracking the system. 
								Before moving on, however, I plan on developing a web-based program to allow a 
								user to enter plaintext to encrypt and then output the key and resulting 
								encrypted message. This will ensure that I have a comprehensive understanding 
								of XOR encryption prior to attempting to break the encryption.
 |