Posted by dcoetzee on March 25th, 2011
Citation: A. Barth, Benjamin I. P. Rubinstein, M. Sundararajan, J. C. Mitchell, Dawn Song, and Peter L. Bartlett. A learning-based approach to reactive security. In Proceedings of Financial Cryptography and Data Security (FC10), 2010. (PDF)
Abstract: Despite the conventional wisdom that proactive security is superior to reactive security, we show that reactive security can be competitive with proactive security as long as the reactive defender learns from past attacks instead of myopically overreacting to the last attack. Our game-theoretic model follows common practice in the security literature by making worst-case assumptions about the attacker: we grant the attacker complete knowledge of the defender’s strategy and do not require the attacker to act rationally. In this model, we bound the competitive ratio between a reactive defense algorithm (which is inspired by online learning theory) and the best fixed proactive defense. Additionally, we show that, unlike proactive defenses, this reactive strategy is robust to a lack of information about the attacker’s incentives and knowledge.
Discussion: This 2010 paper, a collaboration between security and machine learning researchers, makes the bold claim that rather invest your resources in making your system as secure as possible up-front, in many scenarios it’s just as good — or even preferable — to fix security problems based on what systems attackers target, a paradigm they call reactive security. It justifies this with a simple game-theoretic model in which a defender with finite resources, typically a high-level security manager, must allocate them among particular lower-level security tasks.
The primary model used by the paper, of interest on its own, is a two-player game taking place on a graph: the attacker begins at a start vertex and moves through the graph by executing successful attacks. Each vertex has a payoff value that the attackers receives once that vertex is reached, and each edge has a cost, indicating how much the attacker must pay to cross it (representing effort invested in an attack). The initial cost of an edge is based on the surface area of the system being attacked, and the defender can invest in defending an edge, temporarily increasing its cost for one round, but has only finite resources to go around during each round. Attackers and defenders alternate in rounds: the defender picks a set of edges to invest in, then the attacker gets to execute a series of attacks, always beginning at the start vertex. Finally, edge costs are hidden to the defender until the attacker uses the edge; this models how it’s difficult to determine a priori where security weaknesses in a system are.
Although the examples in the paper have vertices representing system components like machines on a network, I think vertices in the graph are best thought of not as particular systems being exploited, but rather a set of resources controlled by the attacker, or more generally, the current attacking capabilities of the attacker. At the start vertex, they control nothing but their own system; as each edge is traversed, they add new attacking resources to their collection. This allows systems like the one shown to the right, where an attacker may or may not have full control over a front-end system before attacking a back-end system, to be modelled.
The main result of the paper is this: under the assumption that the defender’s investment in an edge is linearly reflected in the attacker’s cost for that edge, a specific machine learning algorithm for reactive security (based on placing more weight on recently attacked nodes and exponentially decaying the weight over time) performs comparably to a pure proactive approach, in which the defender knows all edge costs a priori and picks a fixed, optimal strategy. This is done essentially by reducing the problem to a standard online learning problem and using known results. Moreover, they argue that in cases where the edge costs are estimated incorrectly or the attacker acts unexpectedly due to incomplete knowledge, the reactive security algorithm is superior to the proactive approach. Besides providing support for reactive security, this model also provides formal support for other pragmatic measures like defense in depth, which is investing in defense measures that are never needed unless some other defensive measure is first overcome.
Although the model is simple, it is quite general: not only can vertices represent machines on a LAN, but also components of an application, or a hybrid thereof. Traversing an edge can correspond to an attack from outside, a privilege elevation on a single machine, or to an attack between machines on a LAN. Two different attacks on the same system can be expressed by distinct edges with distinct costs, if the attacker possesses a different set of resources in each case, which makes sense. Even social engineering can be modelled: once the attacker has invested in tricking or bribing the employee, they can add the employee to their set of resources, lowering the cost of attacking new systems.
On the other hand, the model also has a number of weaknesses. A couple were made explicit in the paper:
- The model assumes that the mapping from defender investment in an edge to attacker cost to overcome it is linear. This is justified heuristically with the claim that the rate of discovering new security defects in software is roughly constant. Experience shows however that exploits that are easy to exploit can be hard to fix (e.g. DoS attacks), and exploits that are hard to exploit can be easy to fix (e.g. a buffer overflow in code that doesn’t directly handle user input). Moreover, the same “investment” in a system can be spent many different ways, and the model offers little insight into which way is most effective. One solution is to successively use the model itself to decompose vertices into subgraphs.
- Reactive security is unsuitable for dealing with situations in which an attack is so devastating to the company that it cannot recover. For example, a company that suffers a high-profile blow to its reputation in the press may see a drop in sales that ultimately drives it into bankruptcy, or may have major investors pull out. Even an optimal response in such a scenario is too little, too late.
Some other weaknesses are less obvious:
- The assumption that edge costs become known as soon as an attacker exploits that edge. In reality, the detection of a single exploit of a system gives very little information about the overall security of a system, particularly after that exploit is repaired. The frequency of exploits of an edge might be a better indicator (the learning may already effectively take this into account).
- The assumption that edges have a fixed cost which only increases temporarily if the defender invests in them. In reality, the cost of an attack depends on many dynamic factors, including the skill set and resources available to the attacker marketplace, and the behavior of the system under attack. As new attackers enter the attacker marketplace, as attackers learn new techniques, as transaction costs among attackers change (or as they form teams), or as new hacking tools become available, the cost of attack can go up or down. Patches, upgrades, installation of new applications, or even changes in load can expose new vulnerabilities or fix old ones (in particular, patches implemented in reaction to particular attacks do not go away after the round is over). All these factors can change quickly and are largely invisible to the defender.
- Similarly, payoff is assumed to be fixed from round to round. Payoff changes by the minute based not only on what new information the attacked resources are storing, but also the marketplace value of the information, which can be rapidly shifting and unpredictable. Reactive investment in defense of a system that, tomorrow, is of little interest to attackers is a bad idea.
- Edge costs are assumed to be independent, in the sense that investing in one edge does not affect the cost of any other edge. In reality, as is easy to see in the diagram above, improvements to one edge may quite directly affect the cost of another related edge as well. Another common example would be rolling out an operating system upgrade in the data center, which could increase the cost of all edges simultaneously.
- Attackers are assumed, after each round, to lose control over all their attacked resources. Long-lived attacks such as rootkits can go undetected during attempts to clean up after an attack, allowing the attacker to start in the next round with more resources in the bag at the beginning. In the worst case, the rootkit itself delivers a positive payoff to the attacker, and the attacker doesn’t need to take any additional action at all!
Although the system provides formal evidence that reactive security can be beneficial, it also provides formal evidence that extreme reactive security, pouring all your resources into the system that was just attacked last week, is a terrible strategy. Simple attack strategies which alternate between systems can exploit this kind of reactionary tactic.
Despite the many weaknesses and limitations enumerated above, as I would expect to find in any nascent research area, I find this work exciting and think it opens up a range of possibilities for software security management and challenges the intuition that reacting to attackers is impulsive or short-sighted. Perhaps future work in this area may provide richer models that will offer more new and surprising strategies to defenders.