We interviewed Adam Tornhill, a software architect who combines degrees in Engineering and Psychology to get a different perspective on software. We discuss his book ‘Your Code as a Crime Scene‘, in which he describes using Forensic Psychology techniques to identify high-risk code, defects and bad design. He describes how his techniques are useful by themselves, but can also be used to make other practices like code reviews and unit testing even more effective.
Adam’s Code Maat software is on GitHub.
Adam Tornhill is a Programmer and Software Architect who combines degrees in engineering and psychology to get a different perspective on software. He’s the author of “Lisp for the Web”, “Patterns in C”, as well as “Your Code as a Crime Scene”, in which he describes using forensic techniques to identify defects and bad design in code.
Adam, thank you so much for taking the time to join us today. I really look forward to hearing what you have to say. Why don’t you say a bit about yourself?
Oh yeah, sure. It’s a pleasure to be here. I’m Adam Tornhill and I’m from Sweden, which is why I have this wonderful accent. I’ve been a programmer for a long time. I’ve been doing this for almost two decades now, and I still love what I do.
I wanted to touch on one of your books, “Your Code as a Crime Scene” where you apply forensic psychology techniques to software development. What made you think to apply techniques from what would seem to be such an unrelated field?
Complexity metrics, the old kind, they just didn’t cut it
It actually came a bit surprising to me as well. What happened was some years ago I was in the middle of my psychology studies, working towards my master’s degree and then got hired into architectural roles where I had to prioritize technical debt and identify problematic areas. I found that it was terribly hard to do that because complexity metrics, the old kind, they just didn’t cut it.
When I tried to talk to the different team members and developers, I found out that nobody seemed to have a holistic picture. It was virtually impossible to get that picture out of a large code base.
At the same time, I took a course in criminal psychology. I was just struck by the similarities in the mindset to what we needed to have in the software business. That’s how it all started. I started to think, how can we apply this stuff to software?
I wanted to jump into a couple of the techniques you cover. How can forensics psychology help us to detect problematic code?
The first kind of technique that I would like to introduce, and that’s actually where I started, is a technique I call a Hotspot Analysis. It’s based on a forensic concept called Geographical Offender Profiling.
I find it really fascinating. What Forensic Psychologists do is basically they try to spot patterns in the spatial movement of criminals, to detect our home bases, so we know the area to inspect.
I thought what if we could do the same for software, that would be pretty cool. If we could take a large code base and somehow spot the spatial movement not of criminals, but of programmers, then we could identify the kinds of, the parts of the code that really matter.
That’s where it started, and what I do is, I start to think about where can I find that information, I realized, it was there right in front of us in a version control system. A version control system, basically records every interaction we have with the code.
So I tried to mine source code repositories and then I looked for code with high change frequencies, then I overlaid that with the complexity analysis, which allows us to identify the most complicated code, that is also the code we have to work with often, which is a hotspot. So that’s where it all started. That’s the starting point for the rest of the analysis basically.
You also say that you can use your techniques to help improve the architecture of applications. How is that so?
First we have to consider what an architecture actually is. A fundamental idea in my book is that software is a living entity. It constantly changes, evolves and sometimes degrades.
So I came to architecture more as a set of guiding principles to help that evolution happen in a successful way. Just to give you a complete example, consider microservices, they seem to be all the rage right now.
So that means that tomorrow’s legacy applications will be microservices. In a microservice architecture, a typical one where each microservice is cohesive and independent, so that’s basically one architectural principle.
What you do is you try to measure and identify violations of that principle. Again I look at the evolution of the codebase in the source code repository. And I try to identify, in that case, multiple services that tended to change at the same time because that’s a violation of your architectural principles. That’s one way to do it.
If you want to truly improve software, we need to look beyond technique and look at the social side
You also cover how “most studies on groups of people working together find that they perform below their potential”. Why is this and what are some of the problems teams can experience?
Social Psychologists have basically known for decades that teamwork comes with a cost. The theory that I come to in my book is called Process Loss. The idea behind process loss, it’s pretty much as how a mechanical machine cannot operate at 100% efficiency all the time, so neither can software teams.
That loss often results in the communication and coordination overhead or perhaps motivation loss. The interesting thing is when you talk to software teams about this, they are aware of the problems. Quite often we mis-attribute it to a technical issue, when in reality we have a social problem.
In one team I worked, the developers, they noticed that they had a lot of bugs in certain parts of the codebase. They thought it was a technical problem. What happened in reality was they were having excess parallel development in those parts of the codebase.
There were multiple programmers working in the same parts of the code all the time. So merely all they did, they said that, they complained a lot about potential merge conflicts. They were really scared to do merges of different feature branches.
When we started to look into it, we identified that what actually happened was that, due to the way they were working, they didn’t really have a merge problem. What they had was basically a problem that the architecture just couldn’t support their way of working.
I think that’s really important to understand that if you want to truly improve software, we need to look beyond technique and look at the social side.
What can we do to avoid such problems?
First thing I recommend is always to use something I call Social Hacks. Basically it takes the social situation and tries to tweak some aspect of it, to make social biases less likely because social biases, they are a huge reason why we get process loss.
One of the simple techniques that I recommend is, in every team, assign someone the role of the devil’s advocate. The role of the devil’s advocate is just to take the opposite stance in every discussion to question every decision made. That helps you reduce a bunch of biases because you’re guaranteed that someone will speak up. It also has the nice benefit of making teams more risk averse. There are a bunch of social stuff we can do and I also present a number of analyses that they can apply to investigate and mine social metrics from their codebases.
So what sort of results have you seen from those applying your techniques?
I’ve seen that most people seem to use it for their original purpose, which was to identify the code that matters the most for maintenance. I’ve also seen some interesting uses mostly from managers. Managers seem to love the social aspects of the analysis because it makes it possible for them to suddenly reason about something that they couldn’t measure before.
Another interesting area, is a couple of years ago, I worked with a really good test team. These testers they used to do a bit of exploratory testing at the end of each iteration. So what I did was basically I generated a heat map over the complete source code repository, so we can see where most of the development activity was, and we would start to communicate with the testers. They knew where they should focus a little bit of extra energy. That helped us to identify a lot of bugs much earlier.
We’re not so much writing code. What we do most of the time is actually making a modification to existing code.
How do you see current development practices like unit testing and code reviews, working with those that you’ve described?
Techniques that I present, they don’t replace any current practices, save maybe wild guesses and panic near the deadline. But otherwise, they are there to complement what you already do today.
For example, take code reviews. I often use a hot spot analysis to identify code most in need of review. To prioritize reviews as well. Unit testing is interesting, because I actually have a complete chapter in my book about building a safety net around tests. Automated tests are terribly hard to get right in practice. It’s actually something you can do, by again, measuring the evolution of the codebase, looking at application code and testing them both together.
Ultimately you say developers should optimize for understanding in their code, why is this so important?
It’s important because we programmers, we’re not so much writing code. What we do most of the time is actually making a modification to existing code.
If you look at the research behind those numbers you will see that the majority of that time is spent trying to understand what the code does in the first place. If you optimize for understanding, we optimize for the most important phase of coding.
What are some tools that people use to mine source code data, to use your techniques?
When I started out there weren’t any tools really, there were a bunch of academic tools. They were powerful, but the licences were quite limited, you weren’t allowed to use them on commercial projects, they also focused on one single aspect.
I actually had to write my own set of tools. I actually open-sourced all my tools, because I want the readers of my book to have the possibility to try the techniques and stuff.
If you’re interested in this kind of stuff, you can just go to my GitHub account, Adamtornhill, and download the tools and play around with them. I think we’re going to see a lot of stuff happening in this area now.
Beyond your book, are there any resources you can recommend for those interested in learning more about writing maintainable code and improving their code design?
My favorite book when it comes to software design has to be ‘The Structure and Interpretation of Computer Programs’. It’s just brilliant and it had a tremendous influence on how I approached software design.
When it comes to coding itself, I would say Kent Beck’s ‘Smalltalk: Best Practice Patterns’. It’s a brilliant book and it takes us way beyond Smalltalk. It’s actually something I recommend for everyone.
And the final book I would like to mention is ‘Working Effectively with Legacy Code’ by Michael Feathers. It’s one of the few books that takes this evolutionary perspective on software, so it’s a great book by a great author.
Adam, thank you so much for joining us today.
Thanks so much for having me here, it was a true pleasure.