Does software engineering matter for your research prototype? Johannes and I published a paper in the latest SOAP Workshop at PLDI and tend to answer “yes of course!”. We inspected a data flow analysis using the IFDS framework. Even though we only discuss and inspect this particular case in the paper, the argument itself is, however, more general.
When developing static code analysis, we – as in we the researchers or practitioners – forget everything that we learned in our software engineering classes and write code that is very much sequential and not modularized very well. Given that the domain can be pretty challenging and may require intensive testing it is very surprising to me that we tend to write code that is hard to test in isolation. So what we do most of the time is integration testing with huge chunks of test code that we analyze and then check the results against our expectations for the chunks. But this also means that if your analysis target is large so is your expectation set and tracing and fixing bugs in your implementation can be a pretty tough task.
We faced this problem while implementing FlowTwist using Heros which is an implementation of the IFDS framework/solver by Reps et al. The framework forces you to separate your analysis steps into so called flow functions that represent treatments of data flow facts at instructions. It thus forces you to design along the lines of the framework which is a good first step.
However, a simple flow function implementation has to cover a lot of concerns such as the treatment of assignments, conditionals, field accesses, etc. Therefore, our flow function implementations tended to become incredibly long pieces of code (over 300 LOC) that were hard to read, incredibly hard to explain to other people, hard to test, hard to maintain and impossible to reuse. So plainly a big big mess with a method header.
Given that we actually teach students not to write code in this manner, we were – let’s say – unsatisfied with the situation and refactored. We quickly observed that we were handling multiple concerns in the same flow function. What took a while is the insight that we actually scattered things over the four flow function types.
So we came up with a new design of Propagators that factored out these concerns in a single neat and tidy class. We quickly became aware that we needed a way of expressing a sequence like first filter then do expensive calculations in order to produce something that will actually work and introduced phases as a means to organize propagators. The paper goes into the details here.
In the end, the story is that software engineering matters even in your research prototypes and – YES – you should care about it. Believe it or not sometimes your research actually gets picked up and continued or extended by other people. The likelihood of this happening will dramatically increase if your prototype is easily understandable, testable, maintainable and extendable. Even though I haven’t got any proof for this last very bold sentence, I believe it to be true – and might be supported by the gods of software engineering.