Data Science - MISpace Hackathon

Project Write Up

Our team built a complete data and machine learning pipeline that uses the MiSpace Network Common Data Form (NetCDF) dataset to generate accurate, human readable ice concentration forecasts for the Great Lakes. The system includes data preparation, a forecasting model, a visualization pipeline, and documentation that explains how each part works. Together, these components form an end-to-end tool that aligns with the operational needs of the United States Coast Guard (USCG) ice mission.

We began by working with the provided NetCDF files from January 11 to 31. These files contain daily 1024×1024 environmental grids with latitude, longitude, and a temperature variable that represents ice concentration. Darren analyzed the dataset early in the project and reviewed the related documentation. His work helped the team understand the structure of the historical data, the test data, and the goals of the challenge. He also outlined the steps required for success and continued providing reviews and guidance as the technical system developed. Elijah organized all the raw NetCDF files into a clear directory structure and verified that the 21 days formed a continuous time series. This gave us a clean and consistent dataset for supervised learning.

Using this organized dataset, we built the forecasting system. Marcos developed the data analysis and machine learning pipeline. He processed the January NetCDF files, created tools to load and visualize the 1024×1024 grids, and assembled the seven day training sequences. He implemented the U-Net (Universal Network) model and trained several versions of it over multiple weeks. We reviewed each generation of the model and looked for issues such as drift, loss of detail, or unrealistic melting patterns. Through repeated tuning and validation, the model became stable and produced realistic spatial forecasts. Marcos then used the January 25 to 31 window to generate predictions for February 1 to 4. Each prediction was saved as a high resolution PNG and added to a GIF animation for easy interpretation.

Diego built the frontend using HyperText Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript (JS). He designed a GitHub Pages dashboard that displays the model outputs clearly and makes the forecasts easy to understand. His work focused on user experience and visual clarity. He also created the project’s demo video to summarize the workflow and display the predictions.

The final forecasting system takes the previous seven days of ice concentration and predicts the next full 1024×1024 map. Multi day forecasting is done by feeding each prediction back into the next input window. We used a fixed 0 to 6 color scale that matches the dataset so the images remain consistent across days. Darker blue represents higher ice concentration and lighter blue represents lower concentration or open water. The maps are flipped vertically so north appears at the top, which makes them easier to interpret for operational users. The GIF animations show how the predicted ice evolves across several days.

This project meets the hackathon requirements by using only the provided MiSpace NetCDF dataset and transforming it into a working Great Lakes ice forecasting tool. We prepared the data carefully, trained and refined the model over several weeks, produced scientifically meaningful images, and integrated everything into a clear user facing dashboard. The idea is practical and extends the value of the dataset by converting static historical records into short term forecasts that could support USCG routing and safety decisions.

Overall, our work demonstrates that machine learning can provide short term Great Lakes ice prediction using the supplied dataset. The final system produces accurate and readable maps and offers a foundation that can be expanded for future operational use.

⚙️ Data Processing Workflow

Data Acquisition:
Download NetCDF files from weather data sources (NOAA, ECMWF, NASA, etc.)
Data Loading:
Use xarray or netCDF4 to load and explore the dataset structure
Data Cleaning:
Handle missing values, outliers, and data quality issues
Feature Engineering:
Create derived features like temperature gradients, moving averages, seasonal indicators
Model Training:
Train machine learning models on processed data
Evaluation & Visualization:
Assess model performance and create visualizations

📁 Our Project Structure

MISpaceHackathon/
├── assets/
│   └── css/
│       └── style.css
│
├── data/
│   ├── raw/                 # (optional; primary raw data lives outside repo)
│   ├── processed/           # ML-ready arrays (X.npy, y.npy)
│   └── external/
│
├── notebooks/
│   └── 01_data_exploration.ipynb
│
├── src/
│   ├── daily_visualizations/         # Jan11-Jan31 PNGs
│   │
│   ├── data_processing/
│   │   ├── load_and_visualize.py        # Generate daily PNGs and cache raw arrays
│   │   ├── inspect_nc.py                # Examine structure and metadata of NetCDF files
│   │   ├── downsample_data.py           # Create lower-resolution datasets for fast experiments
│   │   ├── processor.py                 # Utility class for NetCDF and shapefile preprocessing
│   │   └── nc_visualizer_outputs/       # Saved figures from netCDF visualization scripts
│   │
│   ├── models/
│   │   ├── predict_unet.py              # Run trained U-Net to produce February predictions and GIFs
│   │   ├── train_unet.py                # Train U-Net for 5 epochs using (X,y) processed arrays
│   │   └── checkpoints/                 # Model weights saved after each epoch
│   │
│   ├── utils/
│   │   └── gif_utils.py (optional)
│   │
│   └── predictions_ver_*/            # Feb predictions + GIFs
│
├── index.html
├── data-science.html
└── README.md

🌦️ MISpace Hackathon

Laker IceLabs Team Members:

Elijah Morgan - Data Reading

Darren Fife - Data Reading

Marcos Sanson - ML Model Development

Diego de Jong - Frontend Development

About This Project

Technologies

Demo Video

💻 Youtube Video Link 💻

Project Write Up

⚙️ Data Processing Workflow

📁 Our Project Structure