Research Topics

The overall goal of my research is to provide practical, automatic techniques and tools that can improve the effectiveness and efficiency of software development and maintenance. In particular, my research to date has employed research areas of program-analysis-based software engineering, testing, machine learning, and information visualization to aid in the processes of testing and debugging.

My research addresses the problems of software debugging and maintenance. Software developers commonly face difficulties in understanding, diagnosing, and fixing bugs in software. Whereas many software-engineering researchers typically attempt to create techniques to provide fully automatic identification and location of bugs, my approach to such research takes a different tack: My research addresses the large class of bugs that are caused by logical inconsistencies — an incongruence between the developers’ expectation of how the program should behave and the way it actually does. Such common logical inconsistencies typically require developer attention and comprehension, and usually are not amenable to fully algorithmic location and repair.

As such, my goal is to assist software developers performing software maintenance and debugging tasks by facilitating their comprehension of the software and its behavior. In my research, I place a strong emphasis on practicality and efficiency — preferring potential real-world impact over expensive technical wizardry or inflexible, prescriptive workflow. I work to enable efficient and effective software engineering by assisting developers and researchers in their cognition of software behavior, with the ultimate goal of equipping them to produce higher quality software, more economically, and with less frustration.

Fundamentally, the challenges of software maintenance and debugging are primarily challenges of human comprehension, e.g., understanding:

  • where the bugs reside in the codebase,
  • why the code behaves incorrectly,
  • who are the developers best equipped to understand and fix problems, and
  • when were the changes made that introduced bugs (and why were they).

To answer such questions, we study the following research topics.

Fault Comprehension

One of the most difficult tasks in debugging software for a developer is to understand the nature of the fault. Techniques have been proposed by researchers that can help locate the fault, but mostly neglected is a way to describe the nature of the fault. We are developing software models, visualizations, and techniques to aid in the diagnosis of the faults in the software.


Software History Mining

In addition to the dynamic nature of software while executing, this dynamism extends to the evolution of the software’s code itself. The software’s evolution is often captured in its entirety by revision-control systems (such as CVS, Subversion, and Git). By utilizing this rich artifact, as well as other historical artifacts (e.g., bug-tracking systems and mailing lists), we can offer a number of techniques for recommending future actions to developers. We have developed techniques that utilize these artifacts to enable developers to view selected code lineage and suggest developer assignments for future and present development tasks.


Software Fault Analysis

In order to produce effective fault-localization, debugging, failure-clustering, and test-suite maintenance techniques, researchers would benefit from a deeper understanding of how faults (i.e., bugs) behave and interact with each other. Some faults, even if executed, may or may not propagate to the output, and even still may or may not influence the output in a way to cause failure. Furthermore, in the presence of multiple faults, faults may interact in a way to obscure each other or in a way to produce behavior not seen in their isolation. We have investigated the nature of faults and their behavior.


Collaboration for Software Development and Maintenance

One of the many challenges of software development and maintenance is the need to collaborate among many constituents and stakeholders. For example, clients interact with software development organizations; software-development organizations consist of many developers and maintainers within the same location and across different locations; and the development organization often outsources some of the testing efforts to independent test agencies. Each of these parties may reside in different locations, often across many very disparate time zones. And, due to intellectual property constraints, they often cannot share all code and information. We have investigated this aspect of software engineering and have developed a vision for a system that would enable these parties to interact in a way that overcomes some of the constraints.


Failure Clustering

We developed techniques for clustering of failures. Failure-clustering techniques attempt to categorize failing test cases according to the bugs that caused them. Test cases are clustered by utilizing their execution profiles (gathered from instrumented versions of the code) as a means to encode the behavior of those executions. Such techniques can offer suggestions for duplicate submissions of bug reports. Today, bug reports that are submitted by users (or developers) are identified as duplicates of existing, already-submitted, bug reports based on the textual descriptions of the symptoms reported in the bug reports. Alternatively, the bug reports are recognized as duplicates upon finding and fixing the bug which caused one bug report, and only later when investigating other bug reports is it found that other bug reports are no longer valid — their bugs had been fixed by earlier bug-report debugging. Such erroneous duplicate identification can cause information overload (i.e., thousands of open bug reports) and bug investigations that utilize less information than could have been offered if the duplication were correctly found. The automated techniques would provide heuristic suggestions to the developer in finding similar bug reports.


Test-Suite Maintenance

Test suites often need to adapt to the software that it is intended to test. The core software changes and grows, and as such, its test suite also needs to change and grow. However, the test suites can often grow so large as to be unmaintainable. We have developed techniques to assist in the maintenance of these test suites, specifically in allowing for test-suite reduction (while preserving coverage adequacy) and test-suite prioritization.


Software Visualization

One method of facilitating developers to understand the complex inner nature of software that we have employed is the use of information visualization. Software is often so complex that even the developers who initially created it cannot understand all of the possible runtime behaviors that it can exhibit — specifically, all of the bugs that it may contain. In order to present large code bases with innumerable characteristics and relationships of its components (e.g., instructions, variables, values, and timings) we have developed a number of novel visualizations of software.


Fault Localization

We developed a fault-localization technique that utilized correlation-based heuristics. The technique and tool was called Tarantula. Tarantula uses the pass/fail statuses of test cases and the events that occurred during execution of each test case to offer the developer recommendations of what may be the faults that are causing test-case failures. The intuition of the approach is to find correlations between execution events and test-case outcomes — those events that correlate most highly with failure are suggested as places to begin investigation. These event correlations may not be causative of the failures, but they offer hints to reduce the search space of the fault. Execution event types that have been evaluated include statement execution, branch execution, data-flows, dynamic invariants, and performance profiles.