Week 13 (4-23-06)
The GUI and my presentation are FINISHED!! The website is (almost) finished, with only one page left.
Tomorrow I will be practicing my presentation and concluding my analysis. The big presentation is
Tuesday afternoon - it will be such a relief to have it done.
Weeks 11 and 12 (4-17-06)
These past two weeks I have been busy gathering loose ends. I continued development of the GUI
and I have finished my final two algorithms for finding the plaintext/key. The coding of the
matrix reduction algorithm proved to be the last difficult part. Although I admit, had I done
my math correctly on paper and remembered all addition and inverses had to be done in mod 26
when I was verifying my algorithm worked, it would have taken me less time and saved on confusion.
The basic algorithm that attempts to match characters purely on frequency counts is the least
accurate. This is logical considering the other two algorithms go through EVERY possibility
and select the best fit. I was quite pleased with the other two algorithms.
Next week is the big presentation. Before then my to do list is: finish the GUI, do the
analysis on the various methods, and prepare my presentation.
Week 10 (4-02-06)
I have started to
integrate my algorithms to find the key length into one C++ program for use by
my GUI. While it was a bit of a slow process, it also encouraged me to look
over my code again and clean it up. I also started my GUI. I am having
difficulties executing my GUI from the classes folder so until I can clear this
up with Dawn Rohm, it can be found at compsci.snc.edu/~webemm/CrackCode.pl
My to do list: implement a matrix reduction algorithm, complete the final two
algorithms for decrypting a ciphertext, perform the analysis on the various
algorithms, make alterations to the function that combines all algorithms for
finding the key lengths according to the analysis, finish the GUI, and get
ready for the presentation. Needless to say, these next two weeks I will be
busy.
Week 9 (3-27-06)
I decided to put
the algorithm I was implementing on hold and move on to a new one. As a result
of statistical analysis occuring only down a column, the only information I can
take advantage of initially is frequency analysis on single letters (meaning I
cannot consider digram or trigram frequencies in the English language such as
'th' and 'the'). My plan of attack for this algorithm is to associate the most
frequently occuring letter in a column with having been encrypted by the letter
'e'. Then compute the frequencies of the letters based on this encryption and
determine if it is close enough to the standard frequencies. If it is not,
determine another logical encryption and continue the method. Finally, after
reaching a guess on each column, allow the user to select a specific column to
change the resulting text to perhaps a more feasible guess. I'll work on this
after I complete the other two algorithms I want to implement.
I have finished coding an algorithm which does quite well in decrypting a text.
This algorithm goes through the text column by column and takes the dot product
of each column with every possible shift of the alphabet. The dot product
closest to 0.065 is the probable shift of the column. This worked on all of my
test cases with no difficulties (hurray!).
Now I am working on implementing a third algorithm which attempts to determine
the relative shift between two columns. I have an artificial understanding of
the algorithm, but to ensure that I am not making up my own reasons for it
working, I am going to confer with Dawn Rohm tomorrow. This algorithm requires
solving a system of equations - I picked up references from Dr. McVey today on
coding matrix reduction. I hope to have this implemented before Thursday.
On a side note, as a result of Dawn Rohm's suggestion I altered the final
program I was working on last week to find the GCD of all repeating strings.
Instead of considering all repeating strings of length three or greater, I
increased the length a string must have to qualify as repeating to a length of
five. This resulted in great output - one GCD - which was correct for all test
cases but one. The one test case failed because there were no repeating strings
of length five, but at length four, it also worked correctly. These results
make sense because it eliminates (most) of the coincidental repeating strings.
Hence, in my final implementation I plan on having a defaulted length of five
and allowing users to alter this length.
Week 7/8 (3-20-06)
I have the final
algorithm working to find the keylength. The initial program simply recorded
the distances between repeating strings and attempted to find the GCD of ALL
these distances. The biggest issue I had with implementing the Kasiski Test in
this manner was a resulting overall GCD comming out to be 1. This results in no
gained knowledge. I attempted various ways of "throwing out" pieces of data to
get a meaningful GCD, however, none seemed to help. The solution I decided to
implement was to keep track of the distances for a specific string and compute
the GCD over all distances associated with a single string. If this GCD was
calculated to be 1, I threw out the string reasoning that there were
coincidental appearances of the string in various places, thus throwing off the
algorithm. However, if the GCD was greater than 1, I stored the result.
Finally, I took all of the GCD's for various strings and attempted to compute
the GCD of the compiled data. This proved to be a futile attempt as well: all
of my test cases resulted in an overall GCD of 1. Rather than attempting to
identify which numbers must be thrown out to get a useful GCD, I resorted to
printing out all GCD's computed. While this method may not be effective by
itself, I think (and hope) that the the resulting list will prove to be
beneficial when combining the various methods for finding the key length.
At last I have moved on to the second stage of my project: working with the
cipher text to identify the plaintext. This requires knowing the keylength,
which for the moment, I am assuming the user can identify. The first algorithm
I am attempting is to divide the cipher text into rows of the same length as
the key. Then, using statistical analysis down a column (ie use the standard
english letter frequencies in comparison with the letter frequencies of the
cipher text). This method works because a column is essentially a
monoalphabetic cipher (ie a shift cipher). I am still working on this
algorithm. I have been able to get portions of the key, however, I cannot
obtain the full key, even on a simple example.
This week I am going to continue working on this algorithm. Anticipating some
frustration/difficulty, I may start coding another method for finding the
plaintext and come back to this algorithm later.
Week 6 (3-05-06)
The third
algorithm works! The problem was not a rounding error; rather the algorithm I
was using was faulty. I was splitting my cipher text into substrings of various
lengths and computing the IC across a substring. However, after consulting
another source (and Dawn Rohm, I realized that the IC's need to be computed
DOWN a column (i.e. take the first letter of each substring and compute the IC,
then take the second letter of each substring and compute the IC...). This
makes perfect sense because if the substring lengths match the key length used,
down a column is a simple monoalphabetic shift cipher.
I have also been working on implementing the Kasiski Test this week - yet
another method for finding the key length, although I have not actually done
any implementing yet. The algorithm appears easy: identify repeating strings,
determine the distance between the repeating strings, and compute the GCD of
the distances. I will be using Euclid's Algorithm for determining the GCD, but
the algorithm to find the repeating strings has required more thought. At first
I was going to do a 3-D array of dimensions 26x26x26 and go through the text
and make a tally mark at each three letter combination found. Then, any cell
with two tallies or more has a repeating string of length three. I decided to
throw this algorithm out because it would require a great deal more work to
determine the full repeating string. For example, if the cipher text included
two substrings of 'abcd' the algorithm would identify 'abc' AND 'bcd' as
repeating sequences. However, for my purposes, I need to identify it as a
single repeating string of four characters. I toyed with the idea of combining
repeating strings that are located at position x and x+1, however, this would
either prove to be inefficient or have data integrity issues, especially in the
case where 'abc' is a valid repeating string, but so is 'abcd'. Needless to
say, I threw this algorithm out. The next idea to try is to take my existing
shifting algorithm and adapt it to identify not just one character matching,
but sequences of three characters or more.
I have updated the downloadable version of the slide show to include my
algorithms for identifying key length. I have realized that I am behind on my
posted schedule and that it requires some modification. However, I do not feel
too badly about this because I know I have been putting the effort into the
project. My goals for this week are to get this last algorithm coded and
working for determining the key length and begin design and development of an
algorithm that gets the key length as input and then uses the IC to attempt to
decode the cipher text.
Week 5 (2-23-06)
Here is an
equation for you: Debugging logic errors = No fun. With three algorithms coded
to find the key length, two working properly, I have one left to work with.
After having a short debugging session with Dawn Rohm, I have an idea of where
to go - the error in the algorithm appears to be a rounding error. So instead
of taking a complete text and cutting it into substrings, I am going to take
substrings and make progress toward the complete text.
Next on my agenda is to fix the problem program and code one last algorithm for
finding the key length. This will require developing an algorithm to find all
repeating strings in the text. As Dr. McVey suggested, I am going to check for
a dynamic programming solution.
Week 4 (2-19-06)
I finally got
some code done! I have an application to encrypt and decrypt a text file given
the appropriate key and another short program to find the key length. While I
have fine tuning and testing to do yet, it is a good feeling to be working with
code. I have also started programming a second method for finding the key
length. This week, I will complete the all the algorithms for calculating the
key length so that next week I can focus on revising my data structures and
testing the short programs. I was thinking we should meet this week.
(Wednesday, Thursday, or Friday would work for me.)
Week 3 (2-12-06)
You can now find
the list of references and the tutorial on cryptography/introduction to my
project posted under the project section of the website. The initial
assumptions I am using for the project can be found in the tutorial. In
addition, the project timeline has been updated. This comming week I will be
focusing on producing a functional algorithm for finding the key length.
Week 3 (2-8-06)
This week I spent
really digging into the sources I've found and trying to understand the various
algorithms. Monday I met with Dawn Rohm to talk about how to determine the
keyword used in an encryption. The details of the function used to do the
frequency analysis were unclear in the source. Thus, we came to the conclusion
that it may be easier to try to do the algorithm on the computer and fine-tune
it as progress is made and then return to understanding the function.
The meeting on Tuesday with DCP, McVey, and Rohm resulted in some clear
objectives and subtasks for my project. The two primary subtasks are: find the
length of the key used to encrypt the message, and after knowing the key
length, use a frequency analysis to determine the plaintext. Within each
subtask, there are multiple techniques to consider.
My goal for Monday is to post a revised schedule, the (initial) assumptions I
am using for the project, a list of references, and a small tutorial on
cryptography on my website. In addition, I am aiming to have a functional
algorithm for finding the key length as well as a few test cases ready by the
end of the week.
Week 2 (2-5-06)
This past week
has been spent primarily on research and preparing for a discussion with the
professors. I know the steps to perform cryptanalysis on a polyalphabetic
substitution cipher text (as in an XOR encrypted text): in a ciphertext-only
attack, first determine the key length (by means of the Kasiski test, Friedman
test, index of coincidence, or contact analysis), then assume the calculated
key length, and finally use frequency analysis combined with guess-and-check to
determine the key, thus cracking the code. While researching, I was not clear
on the details of certain parts of the mathematics and theory behind the index
of coincidence, and as such, I am planning on visiting with Dawn Rohm Monday to
resolve some of this confusion. As I am getting further into my project, I am
wishing I would have taken a Mathematics course in Probability and Statistics.
In addition, I now recognize that my algorithm will require more user
interaction than what I had initially thought necessary.
Looking forward to next week, my plans are to talk with Dawn Rohm Monday and
consult with DCP and McVey Tuesday (as discussed this week) to complete the
proof of concept requirement. Also, I will get my first program up and running
and pending my firm comprehension of applying the index of coincidence, begin
to develop algorithms to do a cryptanalysis of the XOR encryption scheme.
Week 1 (1-29-06)
After receiving
my senior capstone project I was excited to dig in and get started, and that
enthusiasm has still not worn off. This week I focused on gaining a general
understanding of my project requirements, including researching to acquire some
background knowledge in the field of cryptography. Now that my feet are wet, I
am aquatinted with the terminology surrounding the XOR Encryption scheme as
well as which topics to further research in relation to cracking the system.
Before moving on, however, I plan on developing a web-based program to allow a
user to enter plaintext to encrypt and then output the key and resulting
encrypted message. This will ensure that I have a comprehensive understanding
of XOR encryption prior to attempting to break the encryption.
|