Adan M.P. Blog

Weekly Progress & Updates

Week 10: Dashboard Update + Visualizations & Happy Easter!

April 5, 2026

This week, I did a lot of work tweaking the front-end of my project, specifically the website and the live dashboard. My model is working great and predicts the median price of coffee based on three different variables, but I wanted to give the user more control.

I spent a lot of time over Easter break changing the user menu to be completely dynamic. I replaced the old, static 20% "Up/Down" dropdowns with custom input boxes. Now, the user can enter the exact dollar or cent amount they want to shock the market by.

Custom User Input Dashboard

(If you have a question about the underlying math of the model, see last week's blog post and technical document!)

LOGIC: If you want a 50-cent increase, you just type 0.50 into the box. The Python code takes the historical average (let's say $4.00) and adds 0.50 directly to it to get the new scaled input.
New Price = $4.00 + $0.50 = $4.50.

However, this means the user has to input realistic shocks to get a realistic coffee price! For context:

After upgrading the inputs, I built a time-series graph using Chart.js to visually showcase how the user's custom input impacts the price of coffee over a sustained 3-month period. Here are two test cases showing a low-price shock and a high-price shock:

Low Price Shock Test Case
High Price Shock Test Case

Looking Ahead: Building these new tools took a lot of time over the Easter weekend, but I am happy with the results. Now, I need to go back into the historical data and find exactly where the model stops working (the "Starbucks Era") and try to get more data to validate it. I also have a massive international coffee dataset I need to figure out how to incorporate.

I am presenting what I have so far in class this Thursday, and I'm really excited to share the live dashboard and get ideas from everyone. Overall, it was a very productive week, and I am excited to enter the final stages of this project!

Week 9: Starbucks Era/EDA & Documentation

March 30, 2026

This week was a transitional one. My model is now fully functional, tested, and successfully running in a live virtual environment. With the deployment finished, my next step is diving back into EDA to find the exact cutoff where this model stops working.

I would like to find the year where milk, sugar, and cocoa stop becoming dominant factors in the median price of a cup of coffee, we came up with the name "Starbucks Era" which is kinda funny.

I also realized that I haven't fully explained the math of the shock model on this blog. If you are interested in the exact linear regression and how the Z-score scaling works under the hood, I put together a formal technical document. You can read the full breakdown here: AMPmodel1.

Looking Ahead: My immediate goals are to define that "Starbucks Era" timeline and then finally start bridging the gap to the international scene. I want to see if and how global coffee production data can be linked into my current model.

Finally, I want to upgrade the dashboard to give users ultimate freedom. Instead of just static 20% "Up/Down" shocks, I want users to be able to enter exact custom amounts ("What happens if milk drops by 1 cent, but sugar spikes by 70 cents?"). I have a lot of work to do as presentations and the end of the semester looms, but I'm confident in my abilites.

Weeks 7 & 8: Spring Break

March 22, 2026

Getting back to work after Spring Break and missing a week, this is a double update to recap everything I have been working on. These past two weeks, I spent my time relearning the lab we did in class a few weeks ago to build a virtual environment. The goal was to finally get my static HTML website to successfully talk to my dynamic Python machine-learning script.

Unfortunately, the campus VPN wasn't working from my house these past couple of days, which locked me out of the CS server (WinSCP). I instead and spent the weekend doing local testing on my laptop to build the pipeline.

The App Logic: I built a mini HTML dashboard that connects to a Flask app in Python. This uses the exact linear regression model I validated earlier, but now it acts as a "Shock Simulator." It takes the user's inputs, shocks the current commodity prices by 20% in either direction, scales the math, and returns a real-dollar prediction for a cup of coffee.

Here are a few screenshots showing the virtual environment running in my terminal and the live HTML predicting the prices:

Python Virtual Environment Running
Testing the HTML Input
Live Dashboard Results

Looking Ahead: This weekend was mainly a massive test to get the script talking to the dashboard and to prove I could build a working virtual environment. Now that the pipeline works flawlessly on my local machine, the next step is to apply this to the live CS server once I am back on campus so it can be accessed online at all times, not just on my local computer!

Week 6: Training/ Validation Work

March 8, 2026

This week, we put the model to test. Instead of letting the model see all the data at once, I trained our model on our training data, but completely hid a random set of 10 months to use as a validation test.

I took that trained model and filled it with the actual sugar, milk, and cocoa variables from those 10 hidden months to see what coffee output it would predict. The model uses this formula to calculate the scaled prediction based on the weights (the $\beta$ coefficients) it learned during training:

$$Scaled\_Prediction = \beta_0 + (\beta_1 \times Scaled\_Milk) + (\beta_2 \times Scaled\_Sugar) + (\beta_3 \times Scaled\_Cocoa)$$

To translate that statistical prediction back into real money that actually makes sense on our scorecard, it runs this unscaling equation:

$$Final\ Price = (Scaled\_Prediction \times SD_{price}) + Mean_{price}$$

The Results: I am impressed with how this is coming together day by day. On completely unseen data, we were only off by about 5 cents. Here is the actual scorecard from my Python terminal:

--- VALIDATION SCORECARD (10 Random Months) ---
Actual Price  Predicted Price  Error ($)
        3.33             3.34       0.01
        3.45             3.40       0.05
        3.50             3.56       0.06
        3.50             3.49       0.01
        3.21             3.21       0.00
        3.50             3.46       0.04
        3.47             3.46       0.01
        3.26             3.30       0.04
        3.59             3.34       0.25
        3.23             3.24       0.01

==================================================
MEAN ABSOLUTE ERROR (MAE): $0.048 per cup
==================================================
        

Looking Ahead: Now that we know the logic is good, Dr. Dunbar and I talked about using this exact same formula and backtracking 50 years to get even more data. By training the model on a massive half-century dataset, we are going to try and accurately predict the price of coffee for the upcoming year.

Week 5: Python Model

March 1, 2026

I got a lot of work done this weekend/week, as I finally made the leap into Python! After meeting with my professors, the immediate goal was to build a "Proof of Concept" model that takes actual user inputs and spits out a predicted price.

To do this, I took 15 months of recent data (Feb 2023 to Dec 2025) tracking the actual Median Price of a cup of coffee and merged it with my raw commodity costs. After doing all the heavy statistical lifting and diagnostics in R, I exported the clean dataset and built an interactive terminal simulator in Python.

The Raw Data: Here is a quick look under the hood at the merged tibble dataset right before it gets scaled and fed into the Python script. You can see how the target variable (Median Price) aligns perfectly with the economic indicators:

#Median Price   milk  sugar    coco
0          3.00  4.163  0.893  2686.2
1          3.04  4.098  0.887  2744.1
2          3.07  4.042  0.900  2927.5
3          3.09  4.042  0.920  2950.8
4          3.13  3.985  0.940  3100.3
5          3.16  3.971  0.950  3150.0
        

How it works: The Python script takes user inputs (simulating a 20% market shock going up or down), automatically scales the data behind the scenes to match the regression model, and then "unscales" the prediction back into real, readable dollars. Here are the results of my extreme stress tests:

========================================
   THE COMMODITY SHOCK SIMULATOR
========================================
Type 'up', 'down', or 'same' for each.

Milk price (up/down/same): down
Sugar price (up/down/same): down
Cocoa price (up/down/same): down

----------------------------------------
SCENARIO: Milk down | Sugar down | Cocoa down
PREDICTED MEDIAN CUP PRICE: $2.46
----------------------------------------

Milk price (up/down/same): up
Sugar price (up/down/same): up
Cocoa price (up/down/same): up

----------------------------------------
SCENARIO: Milk up | Sugar up | Cocoa up
PREDICTED MEDIAN CUP PRICE: $4.24
----------------------------------------
        

Report: The outputs are incredibly realistic. A total commodity crash brings a cup down to $2.46, while a massive inflation spike pushes it over four bucks. It proves the underlying math is solid and works dynamically.

Looking Ahead: Now that I have a Python "brain" that successfully handles inputs and outputs, the next step is connecting this script to my PHP dashboard so users can run these scenarios directly on my website instead of in a terminal. Also, I'm looking to expand to US imports from different countries to see how the shock from other countires can affect the price of coffee prices.

Week 4: Model Diagnostics & The 30-Cup Brew

February 23, 2026

This week has been a bit of a quieter week as I am in-between stages of my project, but there is still a lot of work ahead. I adjusted my methodology: instead of assuming 40 cups of coffee per pound, I recalculated the "Price Per Cup" (PPC) metric to assume a stronger brew of 30 cups per pound. I ran a new regression model and, more importantly, put it through rigorous diagnostic testing.

The Baseline Results: The model remains incredibly strong. Even with the adjusted price metric, the model explains about 84% of the variance in coffee prices, with Milk and Sugar remaining highly significant.

Call:
lm(formula = PPC ~ milk + sugar + coco + CPI_USA, data = master_clean)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.813e-03  7.866e-03   1.120 0.263309    
milk         1.321e-02  3.364e-03   3.926 0.000104 ***
sugar        2.610e-01  2.042e-02  12.781  < 2e-16 ***
coco         2.948e-07  9.056e-07   0.326 0.744971    
CPI_USA     -2.756e-04  6.300e-05  -4.374  1.6e-05 ***
---
Multiple R-squared:  0.8403,	Adjusted R-squared:  0.8385 
        

Testing: A high R-squared doesn't show the whole story. I ran a Variance Inflation Factor (VIF) test and a Breusch-Pagan (BP) test to check the model's underlying health.

> vif(model_30)
     milk     sugar      coco   CPI_USA 
 3.172797 15.736831  2.293888 11.276804 

> bptest(model_30)
	studentized Breusch-Pagan test
BP = 55.013, df = 4, p-value = 3.229e-11
        

Why I am dropping CPI:

1. Multicollinearity (VIF): Usually,any VIF score over 5 is problematic. Sugar (15.7) and General CPI (11.2) are severely inflated. They are so highly correlated that they are fighting each other in the math, which is why the model output a negative coefficient for inflation . My next step is to completely drop CPI_USA from the model.

2. Heteroskedasticity (BP Test): The BP test checks if the variance of errors changes over time. With a p-value near zero, the model failed this test. I will need to address this, potentially by taking the log of my variables or using robust standard errors.

Looking Ahead: Dropping the CPI and adjusting for these diagnostics is my next step. On the technical side, I am starting to build out a mini dashboard in PHP. Coming up, we have a lab scheduled to get more familiar with running a PHP front-end connected to a Python script on the back-end, which will be the exact architecture I need for my final interactive predictive model.

Week 3: Feature Engineering & Initial Models

February 15, 2026

This week was a major turning point. I moved from "Data Collection" to "Feature Engineering." To solve the problem of CPI being an abstract index, I engineered a new variable: "Price Per Cup" (PPC). By converting raw commodity costs (Price per Pound) into a per-cup metric, I can now model the actual dollar cost of a cup of coffee over the last 30 years.

Correlation Matrix of Coffee Factors

Current work: I ran my first Correlation Matrix and Linear Regression models. The results were fascinating—I discovered a massive "Inflation Trap" (Multicollinearity) when looking at short-term data. However, my long-term model (1990–2026) proved that Milk and Sugar prices are actually stronger predictors of coffee costs than general inflation. This validates my decision to use the historical dataset over the short-term one.

Looking ahead: Now that I have proved the "Cost" side of the equation (Milk/Sugar), next week is about the "Supply" side. I plan to incorporate the harvest data from Brazil and Vietnam into the model to see if global production shocks can explain the remaining variance in price. I also plan to start coding the skeleton of the interactive dashboard.

Week 2: Data Auditing & Creating a Plan

February 8, 2026

Now that I have officially decided on my project, I can go full steam ahead and find as much data available about coffee world production, milk, chocolate, and other coffee variables. I have spent most of this week searching for data and downloading CSV files.

Current work: I am currently in the stage of auditing my data, trying to figure out columns and the true amount of data that I have. I also have a plan to build an interactive dashboard for my project. It would be an interactive map of the world with trade routes and a slider for predicting different prices.

Looking ahead: For this upcoming week, I want to have my correlation matrix done and start thinking about what model I'm going to use. I also want to clean my website up to make it more coffee-themed.

Week 1: The Pivot to Coffee Data

February 4, 2026

My initial project idea was to track the impact of remote workers on inflation in Mexico City. However, after auditing the data from InsideAirbnb and Indeed, I realized the datasets were too disjointed to build a reliable Time Series model.

The Breakthrough: While reviewing economic data, I discovered a comprehensive 50-year dataset from the USDA and FRED regarding global coffee production. This data is clean, continuous, and allows for a much more rigorous statistical analysis.