The first enhancement I would suggest for future project owners would be using a wide-angle kinect lens. My camera is only assured to evaluate a width of 40 inches, which is far less than a standard size vehicle. A wide angel lens would increase this width to more realistic proportions. Future upgrades could have the kinect use the sound array to detect audible noises from particular angels to determine potential danger. Finally, the user interface could use more advanced augmented reality techniques to more accurately project vehicle trajectory and make the user display more informative in general. Overall, this was a fun capstone project and I would absolutely recommend assigning to future seniors.
This project is very reliant on real-time data analysis and less emphasis is put on storing and saving data as seen in many other capstone projects. However, as evident in previous posts, time and space complexity analysis is more important than utilization of data structures for my project. Object oriented programming is obviously required given the C# environment the application was created in. A final important CS element is finding the correct balance of abstraction. For this project, I assessed the cumulative data I could extract from any given frame and chose what is most important to show to on the user interface.
The first enhancement I would suggest for future project owners would be using a wide-angle kinect lens. My camera is only assured to evaluate a width of 40 inches, which is far less than a standard size vehicle. A wide angel lens would increase this width to more realistic proportions. Future upgrades could have the kinect use the sound array to detect audible noises from particular angels to determine potential danger. Finally, the user interface could use more advanced augmented reality techniques to more accurately project vehicle trajectory and make the user display more informative in general. Overall, this was a fun capstone project and I would absolutely recommend assigning to future seniors.
0 Comments
As the final week before presentations, I have been getting my interface and code "production" ready. I had many left over labels and methods used for testing values and other testing procedures. This implies that I also had code and commented code that was once used to quickly test different areas of the project. Although these were useful in the context of my testing methods, keeping these would make my code much more difficult to read and likely confuse future authors of my code.
Cleaning up the interface can be quite subjective. Because I will be using my interface during demonstrations, I needed to make the console interface uncluttered, yet informative. This means the variable information and state of processes must be carefully balanced and shown on screen on a clear, concise way. I have removed all variables that I felt were unnecessary to present during my presentation. With deadlines quickly approaching, I was unable to detect physical movement with the kinect. In my defense, Microsoft did not create the Kinect with the idea of dynamically tracking its physical position. Microsoft created the Kinect with the intention of keeping it statically placed and many of the Xbox Kinect games include code that conditionally checks to see if the Kinect is in motion and will refuse to proceed any computations until it detects that the camera is statically placed and stable.
This may be a feature a future CS student could improve upon, however, it is "working against the grain". The Kinect emphasizes statically detecting players (human skeletons) rather than specializing in general depth detection. Very similar to how the Kinect fails to recognize the full width of a vehicle without any modification, the Kinect hardware was not created to detect physical motion. One feature that would have been handy is the ability to recognize whether or not the physical kinect hardware has been moved. This would allow detection and interfacing indicating the vehicle is turning. This would help provide more accurate virtual augmentation for visuals on the user interface.
My main approach was to use sound. The Kinect has a sound array that can detect the angle it recognizes sound. I wanted to normalize the physical location of the Kinect using the initial direction directly in front of the camera. When the camera moved direction (during a vehicle turn realistically), I wanted the kinect to compare the current angle based on the original angle that was normalized. Through various tests and attempts to get this general idea implemented, I was unsuccessful. As a refresher, I make a temporary array and store every nth pixel in the array to calculate the minimum distance due to computational limitations. Originally, I had been retrieving my distance values by calculations of a subset of every 5000 pixels of the frame. I have changed my program to be more accurate and evaluate every 2500 pixels of the frame. Through testing, I have noticed performance significantly decreases if this value is divided further. I will be keeping the subset value of 2500. I have created a variable called "pixeloffset" to represent this value. Everything is coded based on this value except for the size of the temporary array. Because the value must be an exact size depending on the subset intervals, I was unable to calculate the temp array size based on one value.
The slider tool in C# is very easy to work with. There is a property to force incrementation in a specified amount of intervals. However, when as a vehicle accelerates, the movement speed increases in a steady pace. Graphically, a vehicle typically picks up speed represented by a smooth, linear line, rather than in step increments that a piece-wise function may represent. Therefore, the slider bar and its value moves in a smooth transition, rather than incremental.
The leftmost side of the slider represents a vehicle not moving and holds the value 1. The slider is a scalar multiple between 1 and 2 that is multiplied by the red and yellow boundaries. This implies that the rightmost value of the slider (when the car is moving at full speed) is 2. The scalar value can decimal value between 1 and 2. In a realistic implementation, the slider would begin at the leftmost position and slowly move right (increase in value) as the car changed speed during the entire reverse process. During project demonstrations in class, a classmate suggested that the boundaries of safety change according to how fast the vehicle is backing up. This was a great suggestion because I had not considered the possibility as I assumed every driver carefully backed up around 5 MPH. However, because the inherent driving nature of each individual is variable, I plan to integrate my backup camera with the speed the vehicle is moving in reverse. Obviously a real implementation would require a realistic implementation with a signalling system indicating the vehicle is in the reverse gear and actively reversing, but my simulation will use a slider to simulate this relationship.
The user interface has an added feature of taking a real-time snapshot of the camera at the push of a button. This may be helpful to drivers to help document a potential accident for insurance reasons or just save the image of the camera in general. It took some manipulation to get the images to save in the local project directory.
This may be a bit unrealistic in actual implementation, but there is an added status feed that indicates that a picture has been taken and stored in the stated directory. Creating this feature served as good practice utilizing the Kinnect bitmap creations and event handlers within my project. The kinect requires a wide-angel lens to appropriately work with a standard size vehicle. Assuming the camera is mounted somewhere above the bumper, it would put it approximately 30 inches off the ground. This implementation is scaled to roughly half the width of a standard vehicle. More details regarding the numerical values of mounting angel, height, and distance measured on camera can be found in the "Technical & Code" page.
*Note: Future projects should consider using a wide-angel kinect lens for a more realistic implementation. Visual alerting as described in the previous post is currently working based on variable values of safety levels.
Audio alerting corresponds to the same calculations used by the visual alerts, so the foundation is already set. During class walkthroughs, a popular suggestion was to make the audio alerts optional in case they were annoying or unnecessary. To create this as an option, I have an button on my interface that will serve as a toggle. This button corresponds to an event handler that talks to two boolean variables. One boolean value indicates whether noise should be played or not, and the other signals immediate change of state so that noise will immediately stop or start as soon as the button is pressed. The Audio alarms are referenced off of the bin\debug folder to make references self-reliant on the project folder. Two instances of the C# soundplayer are used to loop the alarm based on the current state of danger. Again, it was very important to recognize state changes during every frame iteration to make the sound function correctly. Now that I have the distance of the closest object on screen, I can make variable standards of distance to 3 different levels of safety:
1. Safe - Green visual - no audio alerting 2. Caution - Yellow visual - medium alarm audio 3. Danger - Red visual - Urgent alarm audio The visual alerts are represented by a border on the user interface with the border changing color based on the current state. Quickly painting in C# (as seen in professor McVey's 350 class) can be a very computationally expensive task. It is important to only paint the border when a safety state transitions. For example, if no object is detected in view of the camera, the border should be painted green initially and not every frame iteration or program will experience extreme lag and will likely crash. As I have seen throughout this project, carefully handling resources and using them efficiently is critical to success. The Kinect resolution is 640 X 480 and the depth of each pixel is stored in an array of size 307200 for every bitmap frame created. To avoid crashing the program by trying to do too much, I create a smaller array of every nth pixel. Then, during every frame iteration, I use the C# array function ".Min()" to return the smallest value of the subset array. This effectively reports the closest object on screen (assuming it is detected by the subset of hotspot pixels), which is crucial information to the person backing up the vehicle.
*Note: This ignores partitioning the screen to evaluating different, possibly more important, areas of view on the screen differently. This is something that can be tweaked to customize and improve the software. I receive the distance of the closest object in terms of millimeters and I convert this value to inches for the sake of convenience. I intend on making my alerting based on comparisons and checks on this value. My program evaluates 30 frames per second and in doing so, it creates an array for a bitmap that holds 300,000 pixels (so for each frame iteration, it iterates through the 300,000 pixels). It's implied that I don't have much computational time to evaluate each frame without crashing my program.
Originally, I was intending on evaluating every pixel for every frame iteration, but I'm thinking a series of designated "hotspots" would be much more feasible. I would like to equally spread out designated hotspot pixels to tell me the distance from that pixel and this will cut down 300k checks to a few hundred at most. Calculations are based on how far away an object is and the Distance value in the textbox is the real-time distance away in millimeters. I then calculate the danger level based on how far away an object is from the hotspot. I think this will open up a lot of flexibility and allow me to have different danger values for different sub-sections of each frame iteration. For example, objects directly behind the vehicle can be given more attention than objects above or to the side. Extracting only the distance from the camera is more difficult than I anticipated, mostly because the kinnect is built to recognize people rather than objects. With that said, a significant part of the kinect SDK is dedicated to tracking multiple players, skeletal joints, hands, etc. The same follows when trying to calculate depth as the depth is stored in a 16 bit value and the first 4 bits are used as a player index. This means bit manipulation is needed to extract the depth from the sensor. The following lines of code are used to get the depth at a certain coordinate bit: int pixelIndex = pixelX + (pixelY * frame.Width); int depth = pixelData[pixelIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth; Although allowing the programmer more availability by manipulating the bits, it makes transitions between raw depth data and readable depth data quite tedious. Former SNC CS grad, Quang Bui, stopped by SNC for an in-person meeting regarding the Kinect and his project last year. His project was quite different from mine, but the meeting was helpful nonetheless. He had used many extra DLL and helper classes to fit the needs of his project, but I will likely be able to get away with only the tools provided by the standard 1.8 SDK. The key takeaway from the meeting is that I will want to implement a way of evaluating the distance from the GUI. This means sensor data will need to be calculated and displayed in the interface as testing methods are limited with the real-time camera. The 1.8 SDK offers various template visual studio projects that serve as a model to help new users start programming the Kinect. Some examples include the basics of recognizing audio, creating a green screen effect, and other interfacing examples with the Kinect hardware. The template I have been experimenting with is "Depth Basics - WPF" as I am mostly concerned with detecting objects. The template allows visual studio to compile the Kinect output with a foundation for the user interface.
After backtracking, I have now installed Kinnect SDK v1.8 and the correct version of visual studio to begin testing. The kinnect has a few different cameras and sensors that parse information together to track distance, amount of people in the room, and irrelevant information like background scenery. Instead of starting from scratch, the SDK came with sample VS projects that I will likely start the final product using one of these free templates.
After installing Visual studio professional 2013 (downloaded free using DreamSpark), I also downloaded the most recent kinnect standard development kit (v2). However, the v2 SDK is not backwards compatible and requires the most recent kinnect camera. Although the new hardware and SDK had a bunch of great features, with no easy way to access the hardware, and less references, I will be adapting to the v1.8 SDK and first generation software.
For testing and demonstration, I will be attaching the camera to a mobile cart to simulate the movement of a vehicle backing up. I have also been advised to contact graduates (such as Quang Bui) that have worked with the xbox kinnect software previously. I will be reaching out to them for a brief discussion about the possible API and open source libraries the xbox kinnect features.
For now, I will be evaluating what drivers, software, and tools I will need for the project. According to my reading of previous year's projects, OpenCV and OpenKinnect look like viable options. One of the biggest milestones will to communication between one of these open source libraries, a compiler, and the kinnect hardware. This will be the main priority in the upcoming days. To plan the project, I will be utilizing two-week sprints. At the beginning of each sprint, I will allocate a number of hours to be dedicated to the project for the upcoming two-week period. I will then list out the tasks that need to be completed by the end of the sprint iteration with an estimated time each task will take. This will allow me to be agile in my planning and optimize my advisement sessions. If I individually begin to drift too far off of the correct path, I can easily rebound as I will be at most two-weeks behind.
I will be implementing the software behind a car backup camera typically installed in the dash of newer vehicles. The software should display the projected path of the vehicle based on the position of the wheel. Aside from the obvious camera, I will need distance sensors that will be able to monitor potential obstructing objects. I will be utilizing the xbox kinnect camera and its API as well as referencing existing projects of students in the past.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
April 2015
Categories |