Blog

Week 1 (1/26/26 – 1/31/26)

Coming into the first week, I had yet to come up with what exactly the project would be about. At first, I had thought a movie or video game suggestion system would be a fun topic, but decided against it in favor of football because I forgot to check recent projects before coming up with a topic. Meeting with the professors, we figured that the best course of action would be going with predicting earned points of a play based on different factors.

Week 2 (2/01/26 – 2/07/26)

The first goal that I had for this week was focusing on coming up with the project description and getting the website up and running. At this point, the concept was in place, but there wasn’t much for a question to try and answer.

The next step was getting the data from a single game, and see if I can get something close to a functional model based on the game data. The game that I decided was the matchup between the GB Packers and LA Rams in week 5 of the 2024 NFL season.

I also started to code the R programs to help clean the data. I started by using the data from the single game to make sure that the functions were working properly, before running it on the data from a whole season.

Week 3 (2/08/26 – 2/14/26)

This week was me focusing on getting the data cleaned. I was looking at getting the data from the 2021-2024 seasons all into one dataset, and then organize the play-by-play data so that it would be lined up in sequential order. It wet smoothly for the most part, the one exception being how to get kickoffs placed correctly. When a kickoff occurs, the clock doesn’t run until the ball is caught by the returner. However, if the ball goes out the back of the endzone, or the returner just takes a knee, no time is run off the clock. This means that the following play will have the same time remaining as the kickoff. At the same time, the clock doesn’t run in a game from the time someone scores, until after the ensuing kickoff. The result was a sorting list that meant I put extra points and 2-point conversions at the top of the list of plays with the same time on the clock, then kickoffs, followed by penalties/timeouts, then the list of different offensive play types.

Week 4 (2/15/26 – 2/21/26)

I started this week by finishing up the data cleaning. It was writing the few variables that help link plays to each other, even when they are not in sequential order in the dataset. The PlayID was easiest, once the data was sorted, just an incrementing value that resets when it is a new game. PreviousPlayID and NextPlayID were a little harder, as I would have to consider where I want to add the break points to not link everything together, and implement the code to do so properly. The solution that I came up with for this was that I separate the plays that I want to add previous and next IDs from those that don’t need them, before adding the values and putting the two groups back together.

Week 05 (2/22/26 – 2/28/26)

This week, there wasn’t much direct progress made on the project as a whole. A lot of work time was taken up by the increased workload that came with a take-home exam and practice for a choir concert. That isn’t to say that there wasn’t any progress made – as the in class discussions and homework for the week are important to put into practice with the capstone project as a user interface is being developed.

The topic this week was about HCI (Human-Computer Interaction). Tuesday, we looked at some examples in class, and discussed what goes into good HCI. Then there was homework for finding an example of good and bad HCI that we brought to class on Thursday. Class time Thursday was showing off these examples in a small group. We then were given an assignment over the weekend to think about HCI and apply it to our capstone project. Attached below my blog post for the week are the answers to this homework assignment.

Week 06 (3/01/2026-3/07/2026)

This week had a three pronged attack to it. First, I wrote up what the values meant for each variable and documented it. Second, I started to add a dummy variable to mark the end of each drive. Third, I made the PlayID variable reset for each game, rather than just a single incrementing variable.

For writing up the variable descriptions, I took the file that was created through the PlayTypeUpdate and Sort R scripts and listed off all of the variables. Then, I was checking out each variable, seeing the values that it had, and what it meant. Documenting this is important, because up to this point, to see what a variable meant required doing this process every time, which quickly got repetitive.

Adding the dummy variable for the end of the drive sounds easy enough right? A new drive starts when the defense becomes the offence. That is a simple check within R. But what if there is a fumbled punt return that is recovered by the team that punted? What if there is an interception by the defense, fumbled, and recovered by the original offence? What if the same team that receives the second half kickoff had the ball to end the first half? All of these cases result in the same team being on offence, but there is a new drive. Starting with the most simple case, I check for if the team on offence and defense switch from one play to the next. If that happens, I mark the play before the two flip as the last play in the drive. The other cases weren’t able to get covered this week.

With the PlayID, how I had it before was just a value that said the row number. after the first game, this number isn’t very intuitive. My original plan was to use it as the Primary Key for the database that the data will go into. However, there is also a unique ID for games. This means that I could allow for the PlayID to reset every game, and still have the combination of the two be unique for every row. This also allows for the PlayID to me more intuitive when looking through the data. The PlayID says what play of the game is being observed, rather than just what play number of our dataset.

Week 07 (3/08/2026-3/14/2026)

This week focused on finishing up the different cases for the change in drive. The key comes down to checking the PlayTypeUpdate. If a team is punting, regardless of what happens, the drive is over. If the returning team botches the recovery and the punting team recovers the fumble, then it should be considered a new drive, not an extension of the previous. Likewise with an onside kick. The previous play will always be a score, so I can check for if it was a Field Goal, Safety, Extra Point, or Two-Point Conversion. If there is a turnover, the offence and defense flip, unless there is a turnover by both teams, and the original offence maintains possession. Again, I would count the turnover as the end of one drive and the start of the next.

While writing the code to check this, I noticed an oddity in how offence and defense are listed for kickoffs. 2021 and 2022 have the team kicking on offence, but 2023 and 2024 have them listed on defense. This, along with other issues, has led to the decision between Dr. McVey, Dr. Dunbar, and myself to removing kickoffs from the dataset.

Week 08 (3/15/2026-3/21/2026)

This is the week of Spring Break for St. Norbert College. Rather than take a week away from the project, there are a few goals that I am trying to work on.

The main goal is to try and start working on seeing the expected points for a given situation. The “test” case that is being used is “What is the expected point value for a team that decides to run the ball on 1st and 10?” Being able to get this filter up and running allows for my project to have the framework for filtering by down and distance and type of play. I know it isn’t much, but it is a starting point. From this, it wouldn’t be much harder to add location on the field, what team, or compare runs to passes/different types of plays.