Hackathon Experience: Winning "Hack the Climate" Challenge
In late 2023, I was part of the winning group in the "Hach the Climate" hackathon. We created a machine learning model to predict nutrient runoff from farmland based on rainfall. In this post, I want to share my first hackathon experience.
The Unexpected Invitation
During the Fall semester of 2023, I was a teaching assistant for a course called "Introduction to Data Science". The professor of that course, Antorweep Chakravorty, approached me one week before the Hackathon and asked me if I would like to participate. As I would learn, he had put together a team consisting of his friend and professor at UIO Bikash Agrawal, and Morten Helgaland, a business specialist. Luckily I could convince him to also take Maroof Mushtaq on board who is probably the smartest person I know and a good friend.
To be invited to join such a talented group is a great honor for me.
Pre-Hackathon Preparations
During the week leading up to the hackathon, we had several meetings to decide on the problem we wanted to choose and try to figure out what the actual challenge would be. The teams had to apply for one of the four possible problems, and only a brief description was available for each one.
The only challenge about a Norwegian place was the Setesdal + Stavanger challenge. The organizers summarized it as "How to manage water resources intelligently?"
With only one blurry image of a slide from the brainstorming event and the brief description to go off, we used our time before the official kick-off to figure out what the exact problem could be. Our approach was to research and share as much valuable information as possible to then discuss it in several meetings. We were able to narrow the problem down to two possibilities:
- Predict flooding and find ways to mitigate it.
- Predict nutrient and soil runoff and keep polluted water from entering freshwater systems.
As we will see now, our second guess was spot on.
Blurry image showing the slides from the brainstorming event. © Nordic Edge
Day 1: Challenge Reveal and Strategy Formation
On Friday evening, our team met up at Innoasis, a coworking space owned by Nordic Edge which they were so nice as to let us use as an office for the weekend. The challenges were presented in an online meeting hosted by PFR, the organizer of the hackathon. When they announced the details we finally got a clear picture of the problem.
Lake Hålandsvannet as well as other Lakes and Fjords suffer from high concentrations of Nitrogen and Phosphorus. This is mainly because of fertilizers which get washed off the agricultural fields by rainwater. If the concentration in the freshwater body gets too high, this may have fatal outcomes like extreme algae growth and mass die-off of fish.
In a satellite image, the impact of the high nutrition in Lake Hållandsvannet is evident.
Hålandsvannet next to Stokkavannet. The slope of the surrounding hills leads most rainwater from the surrounding fields into the smaller lake.
Our task in the hackathon was to find a way to fix water pollution with nutrients by predicting runoff, alerting citizens and farmers, and improving transparency and communication between citizens, farmers, and the city. Additionally, we should propose preventive measures and possibly propose a way to reuse the nutrients.
Together with the presentation of the problem, we received a list of useful datasets and an Excel file with ten records of ten testing places at the shore of the lake. The readings were roughly two weeks apart and indicate some correlation between the different test points.
We quickly realized that this number of data points is not enough to train a reliable (ML) model so we started to look into other ways of predicting the runoff using the available data. One idea was to gather information on the soil quality and the slope of the land and try to simulate the runoff manually. This would have been a lot of work with a high risk of failure due to none of us being experts in the field.
While we were researching, Morten posted a link into the chat with words along the lines of "Guys, I found the jackpot". The link he just posted was for JOVA. The first paragraph on their site reads: "The Norwegian Agricultural Environmental Monitoring Programme (JOVA) is a national programme for soil and water monitoring in agriculture dominated catchments in Norway." Indeed, Jackpod!
JOVA tracks the rainfall, runoff, and nutrient pollution for 13 catchments with differing crops around Norway and the best thing is that they make all of their data available to download. The lowest resolution at which the data can be downloaded is Hourly for the rainfall and runoff data and daily for the nutrient pollution data. With some of the data going back to 1997, we had more than enough data to train a predictive model.
JOVA catchment areas. © JOVA
With this big win in mind, we wrapped things up for the first day and went home to sleep...
Well at least we planned on sleeping but most of us were too excited about the project. Downloading the JOVA data had its challenge which was nagging some of us enough to stay awake: The downloader only allows a user to fetch a maximum of six months at a time. This led us to develop a downloader that would allow us to get the data of any time frame by fetching it daily. This allowed us to download a large amount of data in a relatively short time.
Day 2: Technical Breakthrough
On day 2, Maroof and I met up at Innoasis at 08:30 in the morning. Antorweep had finished the data downloader and tasked me to fetch 6 years of data for all locations. I then created a script to take all files and merge them into one. Meanwhile, Maroof worked on creating a geojson for possible visualization.
Later, the rest of the team arrived, and while Morten was planning the pipeline, joining meetings, and creating presentations, Antorweep and Bikash implemented their first models. Our pipeline should be triggered n times a day using the weather forecast for the next ten days.
What I did not mention before is that the first time we fetched the data, we downloaded it one week at a time. When we visualized the data later, we found out that if only one data point in a download is faulty, the whole time frame will be downloaded as null values. This was now apparent by the one-week holes in our data plot.
A plot of the data for the catchment area "Sku". Each color is a separate segment divided by missing data. The data was downloaded daily.
After we noticed the holes in the data, we downloaded it again but only one day at a time. This ensured that a missing data point would only affect one day. In the image above there are still some missing values which were later linearly interpolated.
At this point, Antorweep and Bikash had hit a wall with their models. The accuracy was just not good enough to be presentable and it was already 14:00. At this time, we only considered the average rainfall of one day as an input parameter and the expected runoff as the output. In reality, however, the runoff is not only dependent on the rain today but also on the rainfall of yesterday and the day before that. This is when we had the idea of using an LSTM model instead, using 30 days of rainfall information as "context" to predict one day of runoff.
We knew we were on to something when Antorweep showed us the output of his new model. It wasn't just random noise anymore but there was a direction visible and it almost matched the actual runoff.
Another key improvement was in the way we divided our data into train and test datasets. In the beginning we just randomly sampled days to be included in the train data and used the rest to test the model. This does not make a lot of sense when training an LSTM model because it uses a series of values as the input. For this reason we created a script to sample by either week of the year or month of the year. This new approach was implemented in a way that keeps 80% of the June-Data for training while taking 20% for testing. This ensured that there was a similar amount of data for every month in the training data set.
Predicted runoff plotted against the actual runoff.
With all optimizations implemented, we achieved an output like the one shown above. We were happy to say that our predicted runoff roughly follows the trend of the actual measurements. This result showed us that even with only taking the rainfall as a factor, we could achieve reasonable results in only one day of work.
The final work for the day was predicting the runoff for the coming ten days. Maroof implemented the API call to YR, a Norwegian weather service to fetch the next ten days while i downloaded records for the last 30 days, matched the format to Maroofs, and ran the prediction models on it. After a long day we had predicted the runoff for the next ten days.
Runoff prediction ten days ahead. The red and green lines are based on weather forecast accuracy.
Day 3: Finalizing and Success
The last day was hectic, of course. It was Sunday so the busses in Stavanger only went once in a while. While we had planned to meet up at 08:00, we met at Innoasis at 10:00 and the deadline was 12:00. Two hours to go.
Those last two hours had to be used efficiently. Morten prepared the pitch, Maroof helped with visuals while Antorweep and I were cleaning up the GitHub repository, documenting code, and most importantly writing the readme. The readme had to show at a glance what we had achieved and make it possible for the judges to recreate our results. I don't remember much about the last hour. Morten took the others to the side to get the pitch approved while I was still writing the readme. 11:55. Antorweep takes over the readme and fixes the last bits. 11:59. "Time!"
12:00. We had done it. We managed to create a pipeline containing everything from automated data fetching to predicting models up to creating alerts and geojson objects. The following image shows an overview of the different files in our project and how they contribute to our final prediction of runoff.
Overview of the Python files in our project and their responsibilities.
After turning in our GitHub repository and the PowerPoint for our pitch we went out to eat dinner and as I was eating the great carbonara, I realized how tired I was. I can imagine the others felt the same way. We had some interesting conversations about our backgrounds, work, and the differences between life in Norway and Germany.
The finalists were announced at 15:00. We were thrilled to be among them but this also meant that we had to present our pitch. When we joined the meeting, Morten did a phenomenal job making sure everyone got that what we created was something special and that this idea has the potential to become something big.
We had to wait another hour or two before the final meeting in which the winning teams were announced. I could feel the tension when they started to read out the names. "The winner of the Bieruń challenge The Bills". At this point, I was wide awake. "The winner of the Rzeszów challenge is the Green City Initiative". The title of the slide changed to Setesdal + Stavanger.
"The winner of the Setesdal + Stavanger challenge is team Continua" That's us! We won!
Looking Ahead: Implementation Journey
In the coming months, we will turn our proof of concept into a working solution. As a team, we are more than motivated to push this project to great heights. We have until May 2024 when we will present the solution at the Nordic Edge Expo.
We are also here to stay. Everyone on Team Continua believes that our solution has the potential to minimize wasted fertilizer, improve water quality, and enable a new way of ecologically friendly farming. Morten finished our pitch with the words "Let's fix Hålandsvannet" but I believe we can fix much more.
Conclusion
I went into this as a blank page. I had never got around to doing a hackathon before so I did not know what to expect. This hackathon was everything I thought it could be and much more. I have a great time working together with this team of bright people, sprinting side by side for one weekend and continuing the work together.
This continues to be one of the best learning experiences I have had in my educational life. As I reflect on the weekend, I am filled with immense gratitude and admiration for my team members. Your dedication, expertise, and collaborative spirit were the driving forces behind our success, and I want to take a moment to express my heartfelt thanks.
Thank you all for the late nights, the brainstorming sessions, the shared laughs, and the unwavering commitment.