Martina's Senior Capstone Home Page

Week 13 (4-23-06)
The GUI and my presentation are FINISHED!! The website is (almost) finished, with only one page left. Tomorrow I will be practicing my presentation and concluding my analysis. The big presentation is Tuesday afternoon - it will be such a relief to have it done.

Weeks 11 and 12 (4-17-06)
These past two weeks I have been busy gathering loose ends. I continued development of the GUI and I have finished my final two algorithms for finding the plaintext/key. The coding of the matrix reduction algorithm proved to be the last difficult part. Although I admit, had I done my math correctly on paper and remembered all addition and inverses had to be done in mod 26 when I was verifying my algorithm worked, it would have taken me less time and saved on confusion.

The basic algorithm that attempts to match characters purely on frequency counts is the least accurate. This is logical considering the other two algorithms go through EVERY possibility and select the best fit. I was quite pleased with the other two algorithms.

Next week is the big presentation. Before then my to do list is: finish the GUI, do the analysis on the various methods, and prepare my presentation.

Week 10 (4-02-06)
I have started to integrate my algorithms to find the key length into one C++ program for use by my GUI. While it was a bit of a slow process, it also encouraged me to look over my code again and clean it up. I also started my GUI. I am having difficulties executing my GUI from the classes folder so until I can clear this up with Dawn Rohm, it can be found at compsci.snc.edu/~webemm/CrackCode.pl

My to do list: implement a matrix reduction algorithm, complete the final two algorithms for decrypting a ciphertext, perform the analysis on the various algorithms, make alterations to the function that combines all algorithms for finding the key lengths according to the analysis, finish the GUI, and get ready for the presentation. Needless to say, these next two weeks I will be busy.

Week 9 (3-27-06)
I decided to put the algorithm I was implementing on hold and move on to a new one. As a result of statistical analysis occuring only down a column, the only information I can take advantage of initially is frequency analysis on single letters (meaning I cannot consider digram or trigram frequencies in the English language such as 'th' and 'the'). My plan of attack for this algorithm is to associate the most frequently occuring letter in a column with having been encrypted by the letter 'e'. Then compute the frequencies of the letters based on this encryption and determine if it is close enough to the standard frequencies. If it is not, determine another logical encryption and continue the method. Finally, after reaching a guess on each column, allow the user to select a specific column to change the resulting text to perhaps a more feasible guess. I'll work on this after I complete the other two algorithms I want to implement.

I have finished coding an algorithm which does quite well in decrypting a text. This algorithm goes through the text column by column and takes the dot product of each column with every possible shift of the alphabet. The dot product closest to 0.065 is the probable shift of the column. This worked on all of my test cases with no difficulties (hurray!).

Now I am working on implementing a third algorithm which attempts to determine the relative shift between two columns. I have an artificial understanding of the algorithm, but to ensure that I am not making up my own reasons for it working, I am going to confer with Dawn Rohm tomorrow. This algorithm requires solving a system of equations - I picked up references from Dr. McVey today on coding matrix reduction. I hope to have this implemented before Thursday.

On a side note, as a result of Dawn Rohm's suggestion I altered the final program I was working on last week to find the GCD of all repeating strings. Instead of considering all repeating strings of length three or greater, I increased the length a string must have to qualify as repeating to a length of five. This resulted in great output - one GCD - which was correct for all test cases but one. The one test case failed because there were no repeating strings of length five, but at length four, it also worked correctly. These results make sense because it eliminates (most) of the coincidental repeating strings. Hence, in my final implementation I plan on having a defaulted length of five and allowing users to alter this length.

Week 7/8 (3-20-06)
I have the final algorithm working to find the keylength. The initial program simply recorded the distances between repeating strings and attempted to find the GCD of ALL these distances. The biggest issue I had with implementing the Kasiski Test in this manner was a resulting overall GCD comming out to be 1. This results in no gained knowledge. I attempted various ways of "throwing out" pieces of data to get a meaningful GCD, however, none seemed to help. The solution I decided to implement was to keep track of the distances for a specific string and compute the GCD over all distances associated with a single string. If this GCD was calculated to be 1, I threw out the string reasoning that there were coincidental appearances of the string in various places, thus throwing off the algorithm. However, if the GCD was greater than 1, I stored the result. Finally, I took all of the GCD's for various strings and attempted to compute the GCD of the compiled data. This proved to be a futile attempt as well: all of my test cases resulted in an overall GCD of 1. Rather than attempting to identify which numbers must be thrown out to get a useful GCD, I resorted to printing out all GCD's computed. While this method may not be effective by itself, I think (and hope) that the the resulting list will prove to be beneficial when combining the various methods for finding the key length.

At last I have moved on to the second stage of my project: working with the cipher text to identify the plaintext. This requires knowing the keylength, which for the moment, I am assuming the user can identify. The first algorithm I am attempting is to divide the cipher text into rows of the same length as the key. Then, using statistical analysis down a column (ie use the standard english letter frequencies in comparison with the letter frequencies of the cipher text). This method works because a column is essentially a monoalphabetic cipher (ie a shift cipher). I am still working on this algorithm. I have been able to get portions of the key, however, I cannot obtain the full key, even on a simple example.

This week I am going to continue working on this algorithm. Anticipating some frustration/difficulty, I may start coding another method for finding the plaintext and come back to this algorithm later.

Week 6 (3-05-06)
The third algorithm works! The problem was not a rounding error; rather the algorithm I was using was faulty. I was splitting my cipher text into substrings of various lengths and computing the IC across a substring. However, after consulting another source (and Dawn Rohm, I realized that the IC's need to be computed DOWN a column (i.e. take the first letter of each substring and compute the IC, then take the second letter of each substring and compute the IC...). This makes perfect sense because if the substring lengths match the key length used, down a column is a simple monoalphabetic shift cipher.

I have also been working on implementing the Kasiski Test this week - yet another method for finding the key length, although I have not actually done any implementing yet. The algorithm appears easy: identify repeating strings, determine the distance between the repeating strings, and compute the GCD of the distances. I will be using Euclid's Algorithm for determining the GCD, but the algorithm to find the repeating strings has required more thought. At first I was going to do a 3-D array of dimensions 26x26x26 and go through the text and make a tally mark at each three letter combination found. Then, any cell with two tallies or more has a repeating string of length three. I decided to throw this algorithm out because it would require a great deal more work to determine the full repeating string. For example, if the cipher text included two substrings of 'abcd' the algorithm would identify 'abc' AND 'bcd' as repeating sequences. However, for my purposes, I need to identify it as a single repeating string of four characters. I toyed with the idea of combining repeating strings that are located at position x and x+1, however, this would either prove to be inefficient or have data integrity issues, especially in the case where 'abc' is a valid repeating string, but so is 'abcd'. Needless to say, I threw this algorithm out. The next idea to try is to take my existing shifting algorithm and adapt it to identify not just one character matching, but sequences of three characters or more.

I have updated the downloadable version of the slide show to include my algorithms for identifying key length. I have realized that I am behind on my posted schedule and that it requires some modification. However, I do not feel too badly about this because I know I have been putting the effort into the project. My goals for this week are to get this last algorithm coded and working for determining the key length and begin design and development of an algorithm that gets the key length as input and then uses the IC to attempt to decode the cipher text.

Week 5 (2-23-06)
Here is an equation for you: Debugging logic errors = No fun. With three algorithms coded to find the key length, two working properly, I have one left to work with. After having a short debugging session with Dawn Rohm, I have an idea of where to go - the error in the algorithm appears to be a rounding error. So instead of taking a complete text and cutting it into substrings, I am going to take substrings and make progress toward the complete text.

Next on my agenda is to fix the problem program and code one last algorithm for finding the key length. This will require developing an algorithm to find all repeating strings in the text. As Dr. McVey suggested, I am going to check for a dynamic programming solution.

Week 4 (2-19-06)
I finally got some code done! I have an application to encrypt and decrypt a text file given the appropriate key and another short program to find the key length. While I have fine tuning and testing to do yet, it is a good feeling to be working with code. I have also started programming a second method for finding the key length. This week, I will complete the all the algorithms for calculating the key length so that next week I can focus on revising my data structures and testing the short programs. I was thinking we should meet this week. (Wednesday, Thursday, or Friday would work for me.)

Week 3 (2-12-06)
You can now find the list of references and the tutorial on cryptography/introduction to my project posted under the project section of the website. The initial assumptions I am using for the project can be found in the tutorial. In addition, the project timeline has been updated. This comming week I will be focusing on producing a functional algorithm for finding the key length.

Week 3 (2-8-06)
This week I spent really digging into the sources I've found and trying to understand the various algorithms. Monday I met with Dawn Rohm to talk about how to determine the keyword used in an encryption. The details of the function used to do the frequency analysis were unclear in the source. Thus, we came to the conclusion that it may be easier to try to do the algorithm on the computer and fine-tune it as progress is made and then return to understanding the function.

The meeting on Tuesday with DCP, McVey, and Rohm resulted in some clear objectives and subtasks for my project. The two primary subtasks are: find the length of the key used to encrypt the message, and after knowing the key length, use a frequency analysis to determine the plaintext. Within each subtask, there are multiple techniques to consider.

My goal for Monday is to post a revised schedule, the (initial) assumptions I am using for the project, a list of references, and a small tutorial on cryptography on my website. In addition, I am aiming to have a functional algorithm for finding the key length as well as a few test cases ready by the end of the week.

Week 2 (2-5-06)
This past week has been spent primarily on research and preparing for a discussion with the professors. I know the steps to perform cryptanalysis on a polyalphabetic substitution cipher text (as in an XOR encrypted text): in a ciphertext-only attack, first determine the key length (by means of the Kasiski test, Friedman test, index of coincidence, or contact analysis), then assume the calculated key length, and finally use frequency analysis combined with guess-and-check to determine the key, thus cracking the code. While researching, I was not clear on the details of certain parts of the mathematics and theory behind the index of coincidence, and as such, I am planning on visiting with Dawn Rohm Monday to resolve some of this confusion. As I am getting further into my project, I am wishing I would have taken a Mathematics course in Probability and Statistics. In addition, I now recognize that my algorithm will require more user interaction than what I had initially thought necessary.

Looking forward to next week, my plans are to talk with Dawn Rohm Monday and consult with DCP and McVey Tuesday (as discussed this week) to complete the proof of concept requirement. Also, I will get my first program up and running and pending my firm comprehension of applying the index of coincidence, begin to develop algorithms to do a cryptanalysis of the XOR encryption scheme.

Week 1 (1-29-06)
After receiving my senior capstone project I was excited to dig in and get started, and that enthusiasm has still not worn off. This week I focused on gaining a general understanding of my project requirements, including researching to acquire some background knowledge in the field of cryptography. Now that my feet are wet, I am aquatinted with the terminology surrounding the XOR Encryption scheme as well as which topics to further research in relation to cracking the system. Before moving on, however, I plan on developing a web-based program to allow a user to enter plaintext to encrypt and then output the key and resulting encrypted message. This will ensure that I have a comprehensive understanding of XOR encryption prior to attempting to break the encryption.