Tag Cloud for BIG Data

THIS IS THE BLOG

Week 11 April 10rth, 2017
Progress
	- TagCloudGenerator3's clump finding algorithm is funky and doesn't work the way it is supposed to and gives me bad clumps, going to try to start over with TagCloudGenerator4
	- Started working on TagCloud4 where, using the algorithm I made, it will find all possible clumps and return the most efficient one.
	- Can't decide how to return the clump from the clumpfinder as. It currently returns a RectangleF object, but how do I use that to determine which words are in the clump? Should I have the program determine the words fully in the RectangleF?
	- Visited Dr. Mcvey and Dr. Pancratz. Walked back with an idea about crossover as well as where to "stop" and start preparing for the presentation.
	- Working on a redraw function to be able to recreate any solution. Randomizing the arrays has came back to bite me in the butt.
	- I believe that cntArray is to blame. I think it messes up the sizes of the font, because the rectangles are still in the correct location.
	- I was wrong. The error came from my swapping around arrays in a genome class function. Redraw function works now.
	- Working on mutate function.

Week 10 April 3rd, 2017
Progress
	- Still working on clump finding function. (Trying to implement it so that it finds the closest rectangle and expands the clump to include it).
	- Uploaded source code onto website with short description.
	- Clump algorithm can find closest words relative to clump, expand so that it includes that word, then look at how "clumped" it is via (areaUsed / totalArea).
	- Finding "close" words is very funky. It does not always return what seems to be the closest word...
	- Still tackling the clump finding algorithm over the weekend, slow but steady progress. Still no idea why clump efficiency numbers can get higher than 1... Distance seems to be working most of the time.
	- Going to start TagCloudGenerator4 where the clump finder goes through all possibilities of each genome to try to get the best clump by maintaining a clump arrays, TagCloudGenerator3 will be happy with any clump
	- An idea for TagCloudGenerator5 is that, instead of set efficiency rates for the clump, I want to try checking for jumps in efficiencies instead, because by adding an additional word, it is possible to create a much better clump (check a little bit further).

Week 5 February 20th, 2017
Progress
	- Found and read an interesting chapter concerning the generation of tag clouds by Jonathan Feinberg, person who developed Wordle.
	- Found, explored, and read articles from ai-junkie, a website written by textbook author Mat Buckland.
	- TO DO: ask prof about filter problems(check), ask about having a genetic algorithm fit words into preset shapes..., fitting in words in ANY whitespace.
	- input bunch of text instead of selecting sources, something "more" to do.
	- filter user selected word from tag/word cloud.
	- source filter, different sources, different colors/fonts?
	- clicks produce stats, imgs, context?
	- It seems like the question marks might have been caused by different languages from twitter.
	- Learned about encoding, decoding, and deciding on a fitness function. 
	- Looked at vega grammar website, but I am leaning towards just using C#.
	- Read about using Linq to "increased readability, reduced complexity, and shorten code. All these at the price of an ignorable insignificant performance drawback...", but decided against it.
	- Started experimenting with drawing words on a windows forms application.
	- Got stuck with figuring out how to determine the coordinates of the drawn words, visited Dr.McVey and came out with an alternate function that used a rectangle object as a parameter.
	- Made one program write heap content to file, and the windows form application to read heap content for display.
	- Started working on application GUI, namely adding an exit and minimizing button(it starts up fullscreen).

Week 1 January 26th, 2017
Progress
	- I got the website and blog up and running. It has a homepage that contains next to nothing, and a blog page that will hold my blogs. *this*
	- I have divided the project into smaller sub-tasks.
	- I have looked at past iterations of the project(2014 and 2015), and will be exploring using php to retrieve data from a source. update: since I would like the application to be general, I will not be pursuing a twitter api path
	- I have decided that my first task will be to figure out how to retrieve and organize the data.
	- Did some research to come up with potential ways to implement the program >> web crawlers, using hash tables and linked lists to store data, what genetic algorithms are, etc.
	- HTTrack seems like a good way to download all the files to be parsed from a particular source. It is software used to configure and download desirable files from any website e.g. .html, .js, gifs, etc
	- Exploring what the boost library for c++ is and how it might help with reading multiple files to store data. Should I use it?