In this article, I don’t mean to prescribe universally applicable advice, nor to downplay the importance or utility of agile, or to even suggest that more than a tiny fraction of software engineering teams should use waterfall. My purpose is simply that there’s always a right tool for the job, and that tool is not always agile.
In short, we’re using waterfall at Passenger AI, and it’s working. I’m going to explain our experience and attempt to rationalize it.
For reference, at Passenger AI, we’re building artificial intelligence software for self-driving cars to keep them clean and to protect passengers. Our offering is an edge operating system that uses computer vision, deep learning, and machine learning to track passenger behavior.
From programming’s inception up until the 2000’s, software engineering was existentially difficult because there were no known patterns to execute on and because there was always a looming threat of total project failure. Even building a website was difficult because there were no cloud infrastructure providers or web frameworks.
As we rolled into the 2000’s, IaaS providers like AWS proliferated, web frameworks like Ruby on Rails launched and stabilized, and distributed systems patterns became more widely understood. Now, almost anyone can build a website or an app, the two components that are the bread and butter of most tech businesses.
What happened between these two eras was a shift from technological to sociological risk.
Classes of Risk
A technological risk is one where there’s uncertainty as to whether or not computers can do what’s needed, or it’s questionable that the needed technology can be built in any reasonable amount of time. Suppose that you’re training a deep neural net for an embedded system; then a technological risk is that you may need more computational resources to power that model than are available.
A sociological risk (“politics”) is one where communication between people, departments, and to customers can lead to a project failure. Imagine that you’re a product manager whose customers have no need for your software now, but that they will in six months; then a sociological risk is that your product is not what those customers will need in the future.
Technological risk dominated most software projects in the 90’s and prior, whereas sociological risk dominates most projects from the 2000’s onward.
That’s not to say that projects in the past didn’t involve sociological risk. Even books from the 70’s, like The Mythical Man Month, recognized this peril:
Therefore the most important function that software builders do for their clients is the iterative extraction and refinement of the product requirements. For the truth is, the clients do not know what they want.Frederick P. Brooks Jr., The Mythical Man-Month: Essays on Software Engineering
Let’s further explore how today’s project management confronts this danger.
Good project management mitigates risk, so modern project management focuses on resolving sociological issues like poor communication, lack of customer feedback, estimating velocity, and changing requirements. Those problems are exactly what methodologies like scrum, kanban, and XP are designed to solve. (I’ll loosely group these methodologies into “agile” from here on.)
Indeed, these methodologies work exceedingly well for teams building apps or websites using known patterns and frameworks for customers who don’t know what they want. Businesses of this type just so happen to dominate today’s tech industry.
The book Peopleware speaks in an honest way on this view:
We, along with nearly everyone else involved in the high-tech endeavors, were convinced that technology was all, that whatever your problems were, there had to be a better technology solution to them. But if what you were up against was inherently sociological, better technology seemed unlikely to be much helpTom DeMarco & Timothy Lister, Peopleware: Productive Projects and Teams
While sociological risk dominates most projects today, there are still plenty in which the primary hazard is in the technology.
Technological risk is a different beast because it often involves complex dependencies, front-loaded experimentation, totally unknown costs, and focused execution on known requirements.
Think of any deep learning startup, or any company building an operating system, or any team commercializing a research project. In none of these cases is it easy to estimate tasks, nor is there much to show customers between wireframe mocks and the final product.
These problems sound suspiciously similar to those that most pre-2000’s projects faced.
This set of technology-focused projects could operate under agile, but this methodology is not designed to meet their distinct challenges.
Note also that agile took off primarily because it tightened the feedback loop between engineering teams and customers. This loop enables engineering teams to function independently and to respond directly to customers rather than working through managers and product teams.
It goes without saying that engineers work best when given requirements and the freedom to execute on them. Then without direct contact with customers, engineers can’t make informed trade-offs on where to budget their time and effort. Managers still need to provide some requirements and direction, so it’s unclear what form those should take.
The question then is: what’s the best way to manage a technology-focused project if not with agile?
If there are no metrics like revenue available to engineers, and customers aren’t providing direct feedback, then the development team needs some other signal to work with. The waterfall methodology answers this challenge by planning the important tasks and milestones ahead and then working backwards to establish the timeline required for success. A popular way of visualizing these tasks and milestones is using a Gantt chart (above).
This process involves talking with the engineers who will be implementing the project to make sure that they understand the scope, the goals, and to get reasonable estimates. This procedure also requires an architect, or small team of architects, to create and maintain a consistent project architecture.
The advantage of this approach is that engineers then know how much time and effort they should allocate for each task. The per-task time budget signals that task’s relative importance. The point isn’t to impose arbitrary deadlines, to be inflexible, or to shame engineers for under- or overshooting, but to know at a glance if the project is on-time or if a task is blocked and to shift resources or to descope accordingly.
It’s fashionable to resent waterfall project management, and with good reason. It should be no secret from reading this article’s title that I surreptitiously see otherwise.
This pattern is exactly what we’re using at Passenger AI, and it’s working incredibly well. We even use a Gantt chart as described above. We tried agile in various manifestations but found that it had too much ceremony and it didn’t answer at a glance the important questions we asked of it.
I have my concerns, and the system isn’t perfect, but our team is getting as much done as anyone could possibly ask of them, and everyone is happy with our choice.