It sounds like a simple question. It shows up in sprint planning meetings, roadmap discussions, and late-night debugging sessions. But in the era of agentic development—where systems can propose, implement, and even validate changes—the question becomes much more complex.
Answering it well no longer depends solely on human intuition or static planning. It requires something deeper: a system you can trust.
That idea sits at the core of why we’re launching the Flexcompute Engineering Blog.
At Flexcompute, we spend a lot of time solving problems that don’t have clean, textbook answers. These are the kinds of challenges that emerge when you’re working at the intersection of high-performance computing, simulation, and AI:
How do you move fast without breaking correctness?
How do you scale systems that are both computationally intensive and highly precise?
How do you build tools that engineers can rely on—even as complexity grows?
This blog is where we’ll share those stories.
Not polished marketing summaries, but real engineering thinking:
the trade-offs, failures, breakthroughs, and systems that make it all work.
We’re kicking things off with a deep dive into a deceptively simple question:
“What should we work on next?”
In a traditional workflow, the answer might come from a roadmap or a backlog. But in an agentic system—where code can evolve through automated suggestions and mutations—the answer must be grounded in verification, not just prioritization.
Our first article explores how we make that possible.
1. Harness Design
We build structured environments where changes can be tested rigorously and repeatedly. A good harness doesn’t just check correctness—it defines the boundaries of safe exploration.
2. Diff-Scoped Mutation Testing
Instead of testing everything, everywhere, all at once, we focus on what actually changed. This allows us to move faster while maintaining confidence in the results.
3. A Verification “Floor”
Perhaps most importantly, we establish a baseline level of validation that always holds—whether or not a human is actively monitoring the system. This ensures that progress never comes at the cost of stability.
Together, these ideas form a framework where development can be both autonomous and trustworthy.
As engineering systems become more automated, the bottleneck shifts.
It’s no longer just about generating ideas or writing code quickly - it’s about knowing which changes are safe to trust.
Without that confidence, speed becomes risk.
With it, speed becomes leverage.
That’s the shift we’re exploring - not just in theory, but in practice.
This is just the beginning.
In future posts, we’ll dive into more of the hard problems we encounter while building systems for:
High-performance simulation
Scalable cloud computation
AI-driven engineering workflows
If you’re interested in how modern engineering systems are evolving—and what it takes to make them reliable at scale—we’d love for you to follow along.
🔍 “What should we work on next?” — a case study in making agentic development trustworthy
🔗 https://engineering.flexcompute.com/articles/what-should-we-work-on-next