Visualizing Text - David Ferris - 2014 - How To This document contains information on program use and modification. 1.) The C++ Application Use: Using the C++ application side of the project is fairly straightforward. All that must be supplied to the application are the sample_in and ignored_words files. As long as these files are in the project folder, along with Main.cpp, Hash.h, and Word.h, the project should compile and run. The program will produce two text, files, only one of which you need to worry about. The first, the sentences.txt file, is just for the program's use. This file contains each of the sentences from the input file, one per line. The second file, the sample_shared.txt file, must be copied into the capstone-visuals project folder, on the server. There is not much to see when running the C++ application, aside from a few dumps of debugging information. Most of what happens during the program's runtime goes unseen. Modification: If you wish to modify the amount of "Top Words" that the application writes data to the shared text file for, this can be accomplished by changing the value of the TOP_WORDS variable. It is much easier to change the output in this half than it is to change the expected input for the web half. 2.) The Web Application Use: A little bit more work is required to run the web-based application for the first time. First off, the project folder must be located on a server that will run PHP. This is very important, since the main page of the web portion is written in PHP, which includes reading the sample_shared file. compsci02.snc.edu is a good place to put the project. The web page must be supplied with a sample_shared.txt file, or the page will be very bare. This file is produced by the C++ application, so after the C++ app is run, the file can be uploaded to the server, into the project folder. The web app will then read from the file upon page load. There is not much more you then need to do to use the web app. Clicking on words will reveal a sample of sentences in which they appear in the input text file. If a word appears in less than 10 sentences, all of its sentences will be shown. If it appears in >10 sentences, a random sample of 10 will be shown. Modification: If you wish to modify the amount of words whose data is displayed on the page, there are a few steps that must be taken. Most of these steps are easy to accomplish via the use of copy & paste, but they are more time consuming than the changes made to the C++ application. To control the amount of data that is read from the shared text file, the $TOP_WORD_COUNT variable should be set to the amount of 'top' words that are expected to be in the text file. Sections of code will also need to be added or deleted to the index.php file that create the words on the left side and the divs on the right side (that contain the sentences). The divs will also need to be populated within these blocks. This can be accomplished by copying and slightly modifying the blocks of code that are already in place. Also, jQuery click events will need to be added to the scripts.js file for the new words. These changes can also be made by copying and slightly modifying code. Small CSS changes will also need to be made to alter the font size of the words on the left. This is necessary to continue the effect of word sizes ranging from large to small as use decreases. 3.) Helpful Information There are some cases that might cause the project to malfunction. These cases involve characters that fall outside of the 0-255 ascii range. Special characters including non-english alphabetic characters and specially styled punctuation will be misinterpreted by the C++ application. This is because the C++ application looks at the ascii values of characters to determine whether they mark the end of a sentence or the end of a word. These characters have integer values outside of the 0-255 range, and because of this, they will register as sentence or word terminating. This will cause odd fragmented sentences and words to degrade the quality of the program's output. To avoid this, I recommend removing these character from the plain text sample before feeding the file to the program. This takes a little time (especially in the case of punctuation), but it's worth it for the result. This is something that I would have liked to address in the code, had I had enough time to do so. Also, it is best to compile the C++ project using Microsoft Visual Studio 2010 - as this is the version that was used to create and write the project. Using another version of Visual Studio will reqire the project to be converted. Because of this, I believe that using Visual Studio 2010 is the fastest and safest way to go. The web application works well in both Google Chrome and Mozilla Firefox. I prefer to use Chrome, as the styles seem to look a bit more crisp. However, it will function identically in both.