Blog

At this point I am testing out my code and seeing how well it works with various videos. What I have found is that my code works really well with specific videos that meet a number of requirements. Here are the requirements for a video to be processed correctly: camera up high enough and static, every player stays on the screen, players are large enough (not too far away), only thing moving in the frame are the players to be tracked, not too congested (too many players on top of each other), no shadows, and no cue ball problem. The cue ball problem is when a player is running towards another player and once they are at the same x, the other player begins to move in the same direction while the first player stops (there's no way to handle this with just coordinates). So obviously there are a lot of things the perfect video needs to have to work correctly. This is a result of only one camera and openCV providing bad overall data. It has been difficult to find soccer drills on the internet that fit this extensive list of requirements, but I have found a few. If I need more videos, I can try to create my own.

When I first made this program I only relied on x coordinates. As I continued to build this software I added some y coordinate checks, but this was still minimal compared to the overwhelming number of computations and checks based on the x coordinates. I thought this was going to give me trouble in videos with more vertical movement, but surprisingly it didn’t. I was forced to implement a check in my BestFitPlayer function to include some y filtering, but this was very minimal. I asked Dr. McVey if I should spend the days required to implement additional y coordinate checking and cleaning, but with the program running so well with the correct videos we decided not to. This is because it would exponentially increase the complexity of my BestFitPlayer function and there is no guarantee that it would help anything at all considering things are working well as it is.

During my testing of various videos I have discovered that my code doesn’t work perfectly “right out of the box.” There is some trial and error when it comes to configuring the frame resize, MAX_DISTANCE (number of coordinates a player is able to move per frame), and potentially other small things. This is expected considering no two videos are the same. I will make sure to document these discoveries for future use of this code. To my furthest knowledge my code works effectively when manipulating these configurations and combined with the correct video parameters.

Now I am cleaning up my code, updating my website, finding more videos that work well, overall testing of code and UI, and planning for my presentation.

Stopped using the EDT:

I have stopped using the Euclidean Distance Tracker (EDT) to assign coordinates. The EDT wasn’t working well (adding too many ids / wasn’t maintaining ids throughout the video). This was a problem because in a 29 second video there were close to 70 ids assigned to the 5 players. This made it hard to know if my cleaning data code was doing an accurate job. Instead of feeding the center coordinates from openCV to the EDT, I now send the center coordinates to my own code to assign ids. This code is almost exactly the same as my clean data file code base. This uses a player dictionary of lists to assign the ids based on which list the coordinates are added to on the fly. This actually works well and doesn’t slow things down too much. To clarify, it wouldn’t matter if it did slow things down because the coordinates with the updated ids will still need to be cleansed (this is not the final product). Since I will still send the coordinates with the new ids to a file to clean, I don't need perfection, I just need to limit the number of useless ids assigned. Now in the 29 second video only 7 ids are assigned to the 5 players (I’m very happy about that).

Updated plan for cleaning datafile code:

This code will take the coordinates with the updated ids and make sure all coordinates are correctly assigned to the correct player. I can now trust the ids more now since there is a much higher level of accuracy to them. This means I have hope that I can more accurately assign coordinates to the correct player.

Path in GUI:

I have implemented a way to show the path in which the player moves in my GUI. I also have a button that allows users to toggle between showing the path of the player(s) being tracked and not showing the path.

Recently I stepped away from the problems of my last blog to work on my GUI. I am pretty happy with how it currently works. Below I will list the current functionalities.

Shows the video in a centered box

Fast forward, normal speed, slow-mo in forward and reverse

Checkboxes for all the players from datafile

Counter showing the current frame number

Restart video button

Set frame button that only allows valid input

The beginnings of video tracking and augmentation

This GUI works really well because I was able to find a way to put all of the video frames in a list. This allows me to loop through forward and backward and jump to any frame I want to. I also have the player_dict that stores all of the players coordinates at every frame. I have added a BackFill function to my clean_datafile.py that fills in all of the frame coordinates to all players even when they aren’t moving. This means that my player lists in the player_dict are exactly the same size as the list of video frames. I use a frame_count that increments every frame. This means that I can use frame_count as an index from the frame list and access my player lists at frame_count index to find the corresponding coordinates at that exact frame count.

Things to add to GUI:

I will want to add a help button that describes how to use the GUI and any other important information that the user may need to know. I am also considering adding a “show id” button that when clicked shows all player ids.

Currently, I am pretty stuck with my project. I am still dealing with the problem established in my last blog post. I am hesitant to make changes to my code because I feel like if I fix one error it will end up making 3 other things that were working perfectly fine not work. Trying to balance improvements with setbacks is difficult. I want to improve things, but when I fix one thing something (or more than one thing) breaks. I came into this project understanding that not all the coordinates would be assigned perfectly, however, I don’t know what to do when something does go wrong. The snowball effect is haunting me. I am fine with only 95 percent of the coordinates working, but when one thing goes wrong then all the coordinates are thrown off. When in reality very few times are there major errors, but when one occurs everything is ruined. This is tricky because I don’t want there to be human interaction, but there might have to be?

In addition to those problems, I am looking into a way to create the augmentation and the bounding boxes around the players that are being tracked. Dr. McVey and I are thinking this will be a layer on top of the video that uses the datafile’s cleaned coordinates to box the tracked players and a line that follows their path. I don’t know how this is going to work, but I do know that I will have a datafile with players coordinates. This means I will need this layer to read from this file and then augment appropriately. I also don’t know if this layer will be able to provide the UI or if I will need something else to do the UI.

I am currently struggling with a major problem in my project. My program does a great job of identifying players that are moving past each other in general. However, a specific spot that my program is struggling is when players move past each other (their coordinates are close to each other) AND openCV misses a number of frames of their movement. OpenCV will only pick up a player’s coordinates if the player is moving. There are times when a player changes direction, but doesn’t move enough for openCV to track this change. This means that my program thinks they are moving in one direction when really they are moving in the opposite direction. This means that I am not able to appropriately assign coordinates to the player. This isn’t always a big deal because interpolating will fill in this gap anyways. It becomes a big deal when losing those few frames makes it impossible to assign new frames to that player in the future. If I lose a few frames of a player changing directions then any coordinates that the player has will also be lost because my program thinks that it can’t possibly be them. Below I will show an example of this happening.

Player is moving right to left (assigned a direction of -1):

Player stops moving and changes his direction (now direction should be 1):

Player begins to move and starts to be tracked again:

Player change of direction isn't picked up so these coordinates won't be added to player:

Something that has made my program work better is removing the used_ids way of adding coordinates to players. What I was doing previously was finding which player and id belongs to and then “attaching” that id to that specific player. This means every time a coordinate is sent to DeterminePlayer, the function checks if that id has been used and if it has it automatically sets those new coordinates to that player. The problem with this is that if the program makes one wrong decision the rest of the coordinates will be assigned to the wrong player. Also I have been changing the file that processes the video and creates the ids in an effort to make the tracker pick up more precise movements. When I do this the ids tend to not work as well (ids are assigned from the Euclidean Distance Tracker file). I am willing to make this trade off because I wont be using the ids assigned form EDT after I process the coordinates (I am assigning my own player numbers). In addition, by not using the used_ids strategy the program can make a mistake based on the coordinates, but still has a chance to correct this mistake in the next few frames. This limits the number of mistakes overall. Since the ids are flawed to begin with, I now only trust the coordinates.

Note:

Even though I am not using the ids to help assign coordinates I am still using the ids for debugging purposes. They make it easier for me to see what is happening on the backend of the program.

Updating frames for all players:

I have fixed the player dictionary so that all players that are already found are updated, regardless if their player list is being updated. This means that all players are up to date with the most recent frame number at all times. This does not mean that all player lists are the same size. This is because I don’t fill in the coordinates of a player before they start initially moving (this is because who cares, they weren’t discovered yet). I accomplished this by making an InitializePlayer function that adds a spot in each player list that inherits the previous entries player data. Then when DeterminePlayer determines who the coordinates belong to it replaces the last entry in that player’s list with the updated coords (if player doesn’t exist then it just adds to a new list).

Interpolation addition:

I have completed an interpolating function that interpolates coordinates when there is a gap in the tracking of a player. This is used so that if there is a gap between coordinates (entries that are all the same) it fills in the estimated movement of the player so the coordinates don’t have a “jumping” / “teleporting” effect. This function works very well.

BestFitPlayer update:

The function BestFitPlayer previously used slope to determine the correct player. This wasn’t effective because of the inaccuracy of openCV. A player’s slope was typically incorrect because of the boxing that openCV produces bouncing around randomly. I explain this as the slope was too precise for a non precise tracking software. Instead, now I determine the path that a player is moving from the last time they were being tracked. This is an important distinction because I don’t want to know what the last 10 frames of a player are, I want to know the last 10 frames that a player was moving (this helps me better understand how they are moving). There are times where openCV coordinates cut out and show that a player is static when they are actually moving. By using the last time that a player moves I can avoid this error. A player will be given a 1, -1, or 0 which identifies moving right, left, or static, respectively. If a player is moving in a specific direction and the coordinates are in the path of that direction then I consider this player a possible candidate for the coordinates. If there is only one player that is possible I know that this is the correct player and I append the coordinates to that player. As of right now I don’t have a good way to decide where the coordinates belong if there are 2 or more possible players for a set of coordinates. More to come on that.

I received a great idea from DCP yesterday that sounds like it could really help. The idea is to interpolate the coordinates within FillPlayerCoordinates. Currently, FillPlayerCoordinates fills in the gaps between two entries with a gap in the frame numbers. For example if player one has an entry at index 2 with frame number 7 and an entry at index 3 with frame number 28 the function fills in the missing frame numbers so that there are coordinates for all frame numbers even if the tracker didn’t pick them up. Right now I am just replacing the coordinates from the entry with frame number 7 to 8,9,10,11,...,27. The problem with this is that the coordinates tell the story that the player teleported from frame 7 to the coordinates of the frame 28 entry. Even with the frame numbers filled in between these two entries the coordinates are just repeated so there will be a jump from frame 27 to 28 where the tracker picks up the player again. DCP’s idea is to make an estimate to guess where the player is at frames 8,9,10,11,...,27. This won’t be perfect, but it will be much better than having a large jump in the coordinate data.

An initial concern I had with this idea is what if the player is moving irregularly and I make a guess that is incorrect and the bounding boxes on the player are way off. DCP assured me this won’t be a big problem because even though there may be a 20-40 frame gap in the tracker, this is only about half a second in real time. This means that the player physically couldn’t move that much in that time. Which means my guesses at their location in this gap may almost be imperceptible.

This blog post is to explain how I can improve BestFitPlayer. Currently I am checking the slope of the players, but it also would be helpful to know if a player is static or if they are moving. Dr. McVey suggested to me that I edit how my player_dict is working. As of right now each player list is only updated when that player is being tracked. This means that each player list has different lengths and is only updated to the frame number where they were last tracked. This means I don’t always know based on player_dict lists if a player is static or not because there is no entry for the most recent frame number. If each list has a different frame number as their most recent entry that means that I am comparing slopes from different moments in time between the lists. This is obviously incorrect. The fix will most likely take place in the function FillPlayerCoordinates. This function fills in gaps in time (frames) between two entries in a player list BUT it does not update all player lists to the most recent frame number added to a different player list. The goal here is to update all lists after adding one entry to a list so that all player lists (once created) have the most recent frame number with their coordinates; even if the player hasn’t moved. What this allows me to do is check the movement of all players at the same moment in time. For example if I add coordinates to player 1 at frame 72, I want to make sure that there is an entry in player 2,3, and 4 with frame num 72 and the previous id, x, and y from their last movement.

This blog post is to introduce the new function BestFitPlayer. This function’s role is to take a list of integers (possible_players) that correspond to a player list in player_dict and return the int that the current coordinates belong to. A.K.A. determine which player the coordinates belong to. Currently BestFitPlayer calculates the slope between the most recently tracked coordinates for that player and 5 frames in the past for that player. This will give me the direction that the player is moving. I then calculate the slope between the current coordinates (coordinates that haven’t been assigned to a player yet) and 5 frames in the past for each player. I subtract these two slopes so I can see what the difference is. The smallest difference between all the players in the possible_player list should be the player that the coordinates belong to. Currently my test case is the first 80 frames of the video and there are 3 times that BestFitPlayer is called (three times that there are multiple players that current coordinates could belong to). BestFitPlayer only recognizes the correct player 1 ⁄ 3 times. See next blog post for my plan to improve BestFitPlayer.

This blog post is an addition to the last blog post. The last blog post is the current state of the algorithm, but I think it would be interesting to share what the algorithm started out as. First of all I didn’t have the function FillPlayerCoordinates. Dr. McVey gave me that idea, so thanks! Also instead of calculating the maximum distance that a player can move and checking for openCV errors, the algorithm just checked to see if the distance between the player coordinates being sent to the function were less than 50 from the player coordinates in the player_dict. If they were, the algo assumed that this was the same player. As you can imagine this didn’t work well at all, but it was a place to start. I will continue to make this algo more dynamic. Goals for the next improvement of the algorithm is to add a speed and slope functionality so I can see if the coordinates are consistent with the direction and pace that the player is moving.

I have been working on creating the algorithm to correctly identify which player the respective coordinates belong to. In my last blog post I explain that the EuclideanDistTracker doesn’t do a great job of setting id’s. This means that sometimes the tracker will use the correct unique id for a player (only for a few frames), which means that I need to figure out a way to identify which coordinates are for which player. I created a file called clean_datafile.py that will read in the datafile from openCV and parse the information to grab frame number, id, x coordinate, y coordinate and then send that data to a function called DeterminePlayer. DeterminePlayer’s job is to assign the coordinates to the correct player. After that happens the data that DeterminePlayer sets will be fed to a function called FillPlayerCoordinates. This function fills in the gaps that naturally occur within the tracking system. For example when a player stops moving that player stops being tracked from openCV. I want player coordinates for every player at every frame that exists, so FillPlayerCoordinates fills in a frame that currently doesn’t exist to the coordinates of the next previous frame so there is perfect information on each player's coordinates at all times in the video.

Documentation of clean_datafile.py

I am currently working on route 2 established in a previous blog post. I have the openCV tracker calculating center coordinates for players in the video I recorded and have these coordinates sent to a datafile along with frame number and id. The id is calculated through a class I found on the internet called EuclideanDistTracker. This distance tracker will take a list of tracked object coordinates and then try to determine if that object has already been detected or not. If the object is determined to be a new object a unique id is assigned to it, and vice versa. I write “try to determine” because this function doesn’t do a great job with my specific videos. In a video with 5 players ids assigned will be greater than 250 (the ids should only go up to 4 [0-4]). This means I can’t rely on the distance tracker to reliably tell me how many players there are, which in turn means I don’t know what players coordinates the current ids correspond to. This is why I am creating an algorithm that will better determine how many players there are on the field and their respective coordinates. Main question here: how do I do that?

I am currently working on Route 2 (from my last blog post), which requires me to track all players on the screen that are moving. This means that I need to be able to box all players and then calculate their coordinates and then send those coordinates to a datafile. The problem is I am getting too many contour bounding boxes in each frame (openCV is tracking multiple body parts of a player separately and not correctly identifying the players as one object). This is disastrous because I need only one set of coordinates per player.

Possible solution to too many contour lines (over identifying one object):

I found some code on the internet that uses an agglomerative clustering algorithm to determine if the contours bounding boxes are too close to each other and then merges them together. This essentially should combine all contour bounding boxes on each player to only one box around each player. The problem with this method is that it's super slow currently and is crashing the program. I will be further investigating this algorithm because I think this could be a fantastic solution to my problem.

See bounding boxes before and after the agglomerative clustering algorithm’s implementation:

Before

After

Today I had a really useful meeting with Dr. McVey. After I gathered footage of soccer drills with a static camera I was able to test out the trackers I have been using to see how it performs with a fully static camera (also only 5 players on the field). I noticed that there are times when players cross each other the tracker can get confused and switch to tracking the wrong player. This is the problem I am trying to solve: how to make the tracker more accurate. Dr. McVey and I figured that there are two routes to solve this problem…

Route 1:

Play the video with a player selected to track. If the tracker starts to track a wrong player, freeze the video and reselect the correct player. This will prevent the data file from receiving incorrect player coordinates. This step is clearly a preprocessing step that requires human interaction and correction (I don’t love this option because of that).

Route 2:

I prefer this option because it involves no human interaction and correction. The idea is to track all players coordinates and store them in a datafile. Then the user will select a player to track and as the tracker is tracking the player that the user selected the program will be monitoring the coordinates of the selected player to make sure that the coordinates are consistent with what's already stored in the datafile. Since we have data on all of the players movements I will be able to identify when the tracker starts tracking the wrong player and make the necessary corrections. This would be awesome!

Calculate and send coordinates to datafile:

At the moment I have decided to go with the cv2.TrackerCSRT_create() since it tracks the player pretty well and allows the user to pick what player to track (only one at the moment). I noticed that sometimes when players overlap the tracker picks up the wrong player and begins to track them. I would like to eliminate this problem the best I can. At the moment I believe the best way to solve this problem is to calculate the players coordinates on the screen at each frame and then store these coordinates in a datafile. The reason for this is so I can clean the data and eventually make judgments to help the tracker make accurate decisions. It should be known that by doing this I am subjecting the project to require a preprocessing step (not on the fly!).

Start augmenting video:

Dr. McVey suggested that I begin playing around with the video augmentation part of the project. Eventually I want the program to track the path of the players. I have found a method of the openCV library that draws a line based on coordinates. I currently have found a way to draw a line from a player's starting position to their current position. This isn’t exactly following a player path, but it's a starting point. I will need to find a way to analyze the coordinates in the data file and then draw the path of a player

Static Camera reminder:

This is a perfect example of why a moving camera causes problems. A player's coordinates are based on their exact position on the frame, if the camera moves now the same coordinates are now in a different location. A static camera seems necessary.

During my research into object tracking and detection methods of the openCV library I have realized a big problem. Soccer games are filmed with a moving camera! This makes object tracking and detection much more difficult. It also makes it impossible to hard code the location of the field in order to not detect fans and other objects on the sides of the field. Not to mention that video augmentation will be extremely difficult (impossible?) to do with film that is constantly moving and zooming. I have explored different types of film such as overhead cameras that are static and tactical cams. Overhead cameras are static, but it makes it much harder to view the game (no one wants to watch overhead cams!). Tactical cams are further back so more of the field is in frame at once. While tactical cams move and zoom much less than a standard film of a soccer match, there is still movement which is a problem. The current solution is to gather static footage to work with the openCV library. I will be recording training with a few SNC soccer players to test static footage for now.

I have been experimenting with the OpenCV library to determine the best methods to use for my desired goal. My current goal is to track just the players and the ball. Unfortunately, most of the methods that I have discovered have a serious problem with over detection or slow response time. Below I will lay out the different methods I have tried to better show the decisions I will have to make going forward.

cv2.createBackgroundSubtractorMOG2():

Background subtraction is an image processing technique that allows object detection and tracking. It extracts the foreground from a video in order to identify objects in the frame. This method colors objects white while the background is set to black. Using this method I would grab the contours from the mask (white and black frame created from cv2.createBackgroundSubtractorMOG2 ) and then draw boxes around the contours to show the objects in the frame. The big problem with this background subtraction is the over detection. Many objects in the frame that I do not want to track are getting boxes drawn around them. The main reason that there are huge contours on the field with no players in them is because as the camera moves and zooms openCV picks up the lines on the field and thinks they are moving objects.

cv2.HOGDescriptor():

The “HOG” (Histogram of oriented gradients) is used to track/detect humans in an image or video. This method works really well at identifying the players on the field, but it runs extremely slowly. I am currently not sure if the performance is something I can improve with a different language (C++) or some other way.

YOLOv8:

YOLOv8 is an object detection and tracking model that I have come across while researching. I actually want to avoid using it because I fear that it will overcomplicate the process and be overkill for this specific process. I have a feeling that a pre-trained model like YOLO will run super slow and not be worth it for what I am trying to do. The reason that this “route” is even being considered is because I’m desperately searching for a way to detect the players on the field and the ball. I think that YOLO has the capability to do this accurately, but seems to be at a high cost. If I were to use this method I would use it to detect objects on the field and then filter out all objects other than players and the ball.

cv2.TrackerCSRT_create():

This tracker works by creating a bounding box to identify a region of interest (ROI). This is a good thing because a part of my project requirement is for the user to choose which players to track. This would kill two birds with one stone (track players and build in GUI to draw a box around desired player to track). Maybe this method seems like a no-brainer, however, I worry about what specifically is taking place behind the scenes. The openCV documentation is vague at best, which worries me. I want to know what is happening on the backend to avoid future troubles. At the end of the day this tracker works very well.

So far during this capstone experience I have been trying to figure out what I want my project to be about and trying to figure out where to start. I have decided that I will be doing my project on creating video augmentation over a soccer game. The object detection and video augmentation will be tracking players and the ball showing their movements to better understand how they get into the positions that they do. This software should help break down the game and make movements and runs easier to see. Additionally, I have been spending my time researching and investigating OpenCV which is most likely the software I will be using to track objects. I have a lot of decisions to make about what libraries to use and what language(s) to use as well. My goal is to begin to track objects in a test video very soon.

Blog Posts

4/16/2024

4/13/2024

Stopped using the EDT:

Updated plan for cleaning datafile code:

Path in GUI:

4/3/2024

Things to add to GUI:

3/26/2024

3/23/2024

Player is moving right to left (assigned a direction of -1):

Player stops moving and changes his direction (now direction should be 1):

Player begins to move and starts to be tracked again:

Player change of direction isn't picked up so these coordinates won't be added to player:

3/23/2024

Note:

3/19/2024

Updating frames for all players:

Interpolation addition:

BestFitPlayer update:

3/8/2024

3/8/2024

3/8/2024

3/8/2024

3/1/2024

2/29/2024

Documentation of clean_datafile.py

2/19/2024

2/14/2024

Possible solution to too many contour lines (over identifying one object):

See bounding boxes before and after the agglomerative clustering algorithm’s implementation:

Before

After

2/14/2024

Route 1:

Route 2:

2/10/2024

2/9/2024

2/7/2024

2/1/2024

Blog Posts

4/16/2024

Blog Post #20 (General Update)

4/13/2024

Blog Post #19 (General Update)

Stopped using the EDT:

Updated plan for cleaning datafile code:

Path in GUI:

4/3/2024

Blog Post #18 (GUI)

Things to add to GUI:

3/26/2024

Blog Post #17 (general update)

3/23/2024

Blog Post #16 (Major problem with my program)

Player is moving right to left (assigned a direction of -1):

Player stops moving and changes his direction (now direction should be 1):

Player begins to move and starts to be tracked again:

Player change of direction isn't picked up so these coordinates won't be added to player:

3/23/2024

Blog Post #15 (removing used_ids strategy)

Note:

3/19/2024

Blog Post #14 (Spring Break updates)

Updating frames for all players:

Interpolation addition:

BestFitPlayer update:

3/8/2024

Blog Post #13 (DCP's great idea)

3/8/2024

Blog Post #12 (how to fix BestFitPlayer)

3/8/2024

Blog Post #11 (BestFitPlayer function)

3/8/2024

Blog Post #10 (DeterminePlayer update)

3/1/2024

Blog Post #9 (how the algo has already improved & soon to come improvements)

2/29/2024

Blog Post #8 (creation of the algorithm)

Documentation of clean_datafile.py

2/19/2024

Blog Post #7 (Route 2: next steps with coordinates)

2/14/2024

Blog Post #6 (Route 2: too many contours)

Possible solution to too many contour lines (over identifying one object):

See bounding boxes before and after the agglomerative clustering algorithm’s implementation:

Before

After

2/14/2024

Blog Post #5 (Strategy to increase tracker accuracy)

Route 1:

Route 2:

2/10/2024

Blog Post #4 (First video augmentation)

2/9/2024

Blog Post #3 (Moving Camera Difficulties)

2/7/2024

Blog Post #2 (Research Discoveries)

2/1/2024

Blog Post #1 (Intro)