How We Won 1st Place at the MIT LLM Hackathon
Three years ago, I was standing in a lecture hall at Harvard, watching a judge react to the audio our AI generated from an old photograph. We won Best Google Cloud that day with ReAlive, and it changed how I thought about what you could build in 36 hours. This September, I was at MIT for the LLM Hackathon for Chemistry and Materials, and we won first place with something far more ambitious.
The project is called Catalyze. It's a multi-agent system that helps chemists go from a research question to a validated experimental protocol.
Why Multi-Agent
The problem with chemistry research workflows is that they span multiple domains. You need to search the literature, design experiments, figure out what instruments to use, and check that nothing will explode. No single LLM prompt handles all of that well. The model either goes too shallow on each step or hallucinates critical details.
Our insight was to split the problem across four specialized agents, each with its own system prompt, tools, and domain focus.
The Architecture
Research Agent. Handles literature search and synthesis. It queries PubChem, arXiv, and a curated chemistry knowledge base to find relevant prior work. Its output is a structured literature summary with key findings, methodologies, and gaps.
Protocol Agent. Takes the research summary and generates a step-by-step experimental protocol. It understands equipment constraints, reagent availability, and common lab procedures. This agent went through the most iteration during the hackathon because generating protocols that are actually executable (not just plausible) is a hard problem.
Automation Agent. Maps protocol steps to specific lab instruments and generates control sequences. This is the most specialized agent, and honestly, the one we had the least time to polish. But even in its rough state, it showed the potential for LLM-driven lab automation.
Safety Agent. Reviews the entire pipeline for hazards. Chemical compatibility, reaction exotherms, exposure limits, PPE requirements. This agent has veto power. If it flags a critical safety issue, the protocol gets sent back to the Protocol Agent for revision.
The Orchestration Challenge
Getting four agents to work together coherently is harder than building any one of them. We used a simple orchestrator that manages the workflow as a directed graph. Each agent receives structured input from the previous stage, does its work, and passes structured output forward.
The key design decision was structured communication -- the same principle I later wrote about in multi-agent systems in production. Agents don't pass free-text between each other. They pass typed data structures. The Research Agent outputs a ResearchSummary object. The Protocol Agent expects that exact schema as input. This eliminates an entire class of errors where one agent misinterprets another's output.
We also built a feedback loop between the Safety Agent and the Protocol Agent. If safety review fails, the Safety Agent's specific concerns get appended to the Protocol Agent's context, and it regenerates. In the demo, we showed this loop catching a reagent incompatibility and automatically revising the protocol. That moment was when the judges leaned forward.
What Three Years of Hackathons Taught Me
HackHarvard in 2022 taught me how to scope aggressively and build fast. HackUMass taught me that the demo matters as much as the tech. Mentoring at HackGT9 taught me how to explain technical decisions clearly. All of those lessons showed up at MIT.
The biggest one: start with the demo, then build backward. We spent the first two hours sketching exactly what we wanted to show the judges, what inputs we'd use, what the output should look like, what the "wow" moment would be. Then we built toward that specific demo. Every architectural decision was filtered through "does this make the demo better?"
The Result
First place. At MIT. For a project that wasn't just technically interesting but actually addressed a real workflow problem in chemistry research.
I've been doing hackathons since 2022. The technical skills compound, but what really compounds is the judgment. Knowing what to build, how to scope it, when to cut a feature and when to push through. Catalyze worked because we made good decisions fast. The multi-agent architecture was the right call. The structured communication was the right call. Starting from the demo was the right call.
Three years of building under pressure, and the craft finally showed.
Related Posts
Best AI/ML Hack at HackUMass: Building Meta-Identity
We built a system that clones your voice and face to create a digital twin for the metaverse. Then we won Best AI/ML hack with it.
How We Won Best Google Cloud at HackHarvard: Building ReAlive
36 hours, a team of four, and an AI that brings old photographs to life with sound. Here's how we built ReAlive.
How to Win Hackathons: Lessons from 3 Wins in 2 Months
I won Best Google Cloud at HackHarvard, Best AI/ML at HackUMass, and mentored at HackGT9 in two months. Here's everything I learned about winning hackathons.