Michael Klosiewski's CSCI460 project weblog

(all year 2013)
Entry Current phase
April 1st

This coming week will be short and hectic, but I will be looking at Sentence Gatherer code to find out why it is behaving strangely and, with any luck, prevent it from doing so. As soon as I have it done I will be looking at making a program to tie the first two together and offer the user various options that the user should have. It would be nice in the end if it could launch the visual representation from there, but I can't have it do that until that part is done, and I need something that a user could actually use sooner than that.

I should start looking at Flash this week. To do that I will need to make sure it is installed.

Debugging Sentence Gatherer
April 6th

So far today has been very productive. I have gotten the sentence gatherer (which is a line gatherer at this point, and may remain so depending on what seems best) to work completely using stringstreams, which I find easier to understand than non-sequential file processing. That and it seems faster as well. My only real problem is that it is still crashing when it comes to deallocating the dynamic memory, and I still can't figure out why.

Later tonight I will begin modifying the word counter to include more appropriate data structures. I may run a few runtime experiments on a larger data file to see how fast it is with the linear searching so that I can compare after implementing the splay tree and string hash table. I look forward to publishing this data in the project section of this website (the home page).

The fact that I will be presenting this project in just over two weeks is both exciting and nerve-wracking. It is clearly not finished, and the schedule I originally conceived means very little at this point. I still have to create a user interface of some sort, attempt to represent the data visually, and prepare for my project, not to mention tests on the sentence gatherer to make sure that the data it is producing is usable. The presentation date has very much snuck up on me. I thought I had more time. As such I have committed myself to working on it every day until it is due. I feel I have made good progress today, and today is not over. If I take care of myself and commit myself to working as much as I need to, I can do this, but it will be a challenge as the end of semester and inevitable increase in workload draws nearer.

Debugging/Data Collection/Preparing for presentation
April 8th

I was writing a reflection for the Data Structures lab on Hash tables about an hour ago, discussing how I should probably reduce the amount of computation in my hash function (provided I get around to actually using it), and I was thinking that I could just work off of some of the characters in a word, since the likelihood of there being a similar word past a certain amount of characters was fairly small when an idea hit me. It was based on Dr. Pankratz's suggestion to look at the beginning characters in a word rather than trying to match suffixes when looking for words to combine. So what if I did have two words that matched to a certain amount of characters, and the hash table, by way of reducing it's computation, read them as the same word? Wouldn't that solve both the computation problem and the word combination problem, without creating any extra work? I think if properly handled, the answer is YES!

Needless to say I was thrilled and will be working extra hard to make sure I have time to implement that before my presentation.

Debugging/Verifying correctness of data
April 14th

I got sick Friday morning, and as a result my weekend has been less productive than I had hoped. I have been doing my utmost to make sure I am well rested and in good shape to think, but I was unable to work on the project at all on Friday night and today has been only half as productive as I would have liked. It feels like my head is in a cloud, and I have trouble understanding code that I've already written. This was never a problem before now, so the parts that are just barely too hard to understand aren't documented as well I'd like them to be for my current condition.

I've got part 3 of the process running, the one that accesses the data and displays it, but I don't think searching multiple words is working right yet. I'll have to look at that more when I'm not sick. I'm unsure of whether or not I'll be able to work on any other part before that.

Debugging/Verifying correctness of data
April 17th


The three programs that run the project are now complete and in working order. Henceforth I can focus on preparing things for the presentation, and afterward on making things run fast and smooth. I would REALLY like to try at least to turn the large array in WordCounter into a modified splay tree, which would likely cut the runtime down to less than of what it currently is, not to mention allowing me to reasonably combine words with little effort. I have an algorithm in mind... Now if I can only implement it.

Touching up
April 28th

I whipped up a program to find out the optimal hash table size for the 160 words that are currently included in this list of exceptions, in the process finding two duplicated words (there were actually 162 when I started). 199 resulted in 58 collisions with zero fatal collisions (collisions where a word fell out of a bucket), so that's the size I think I'll use. Implementing it should be much easier than the splay tree, mostly because I have a working class already.

Making things pretty

Website Information