Exploitation Demystified: Why, What and How, Part 1

Exploits have been the major enabler of advanced persistent threats for over a decade now, but it is only in the recent years that the term has become widely used and acknowledged as a main attack vector.

However, this acknowledgement too often assumes people understand what exploits actually are. This is mainly due to the highly technical nature of exploits. One should be familiar with how applications are processed and executed in order to understand the means in which this flow can be maliciously manipulated.

As an active and increasing risk, knowledge on exploits various patterns and implementations is critical for sound security decision making – even non-technical people can’t be out of the loop.

This is the first in a series of posts to address security-oriented, but non-technical readers. It is meant to walk you through the building blocks of exploitation, including basic motivations, various techniques, the evolution of protection mechanisms and counterattack measures. Hopefully this series will elucidate many of the terms that you often hear and enable you to know the answers to questions such as:

What is the difference between vulnerability and an exploit?
What does it mean when a software vendor rates vulnerability as critical?
Why can’t exploits be detected by signature-based solutions?
What is the difference between stack overflow and use-after- free vulnerabilities?
What is the difference between memory corruption and Java vulnerabilities?

Let’s begin. We will be focusing first on memory corruption exploits due to their critical role in the threat landscape.

The Remote Attacker's Problem

Threat actors can have many motivations, but whatever the motivation is, in order to fulfill it an attacker must execute code on a victim's machine. This code could be anything from silent data exfiltration to wiping out hard disks or sabotaging a SCADA system. However, the attacker's code should be executed to fulfill the attack's objective.

The obvious problem is that under normal circumstances the attacker does not have access to its target machine. So, for example, a cutting edge data exfiltration malware is useless until the attacker achieves the ability to remotely run that malware on the target's machine.

Exploitation is a solution to the remote attacker's problem. The exploit gives the attacker user privileges on the compromised machine. It is the means by which attacker obtains a firm, minimal foothold in a victim’s machine, and that foothold enables the main components of an attack – backdoor, C2C communication, key loggers, etc. – to be downloaded and executed.

To achieve that, attackers target the applications by which users normally interact with external content. The most commonly used applications are browsers, readers, players and Microsoft Office documents. Attackers will attempt to embed their code in the file the targeted application will open. Let us understand how this is achieved.

The Essential blocks – Vulnerabilities and Shellcode

Applications Input

Applications accept input and process it. In theory this input could be either well-formed and executed or malformed and rejected.

In practice, however, there is also a third possibility in which the file is indeed malformed, but is still accepted by the application. The application will start executing the file but somewhere along the line the execution flow will deviate from its predesignated path and the application will crash.

Memory Address Space

To understand why this happen we should recall what happens when a program is executed. We will refer to an executed program as process and use this term from now on. The operating system allocates the process a memory address space which contains the process code and data. The process memory space is the playground in which the process is executed and will be present with us through the entire series. (We will later dive deeper into how this space is structured.)

In the meantime, our interest in the process address space lies in the fact that an executed process can be viewed as a mere sequence of memory addresses. These addresses are fetched to the Central Processing Unit (CPU) one at a time according to a predesignated sequence. The CPU executes the instructions in the fetched address and proceeds to the consecutive one and so on and so forth until the final instruction, which is exiting the process.

This is what happens when the file is well-formed. However, in the scenario we described above, at a certain point in the sequence the execution flow deviates. The CPU is fetched with an address which does not contain any instructions which will stop the execution. This is, roughly, what happens when a process crashes.

Until this point we have not described anything malicious. The fact that applications accept malformed input files rather than reject them all together is a built in feature of code writing. The application writer by definition is unable to predict all the possible file alterations.

The Attacker's Perspective

From the attacker's perspective this is an attractive scenario. If a certain malformed input causes the execution flow to deviate to an empty address, an input could also be crafted to overwrite this empty with the attacker's instructions. In that case the execution flow will no longer crash but will instead be redirected.

At this point we understand how an attacker can gain control on process execution flow. We have not yet understood what there is to be gained. The memory space the attacker is in is still the memory space the process received from the operating system and in this space the only valid code is the application code.

But what if this memory space contained also a dormant malicious code? In this case the attacker could leverage its control and redirect the execution flow to execute this code. The CPU will keep on executing since it will be fed with valid instructions. If this small piece of code could, upon execution, open a connection between the attacker's machine and the victim's machine the attacker will be able to remotely use the machine with legitimate user privileges.

So, in order to take control of a victim machine attacker must craft an application input (usually a data file) in the following manner:

Embed a piece of code in the file which will open a connection between the attacker and the target machine when executed.
Craft the file to be malformed so the execution flow will deviate from its path (as we have explained above).
Calculate the predicted location of the address the process deviates to and feed the address with the location of the attacker's code.

After completing these preperatory stages the attacker needs to makes sure that the target user will indeed open this crafted file. This would typically be done through social engineering by sending email attachment (the crafted files in that case will be Office documents, Readers, etc.) or compromising a commonly visited web page.

When the user browses the webpage or opens the attachment, the chain we have described will be triggered: the application will start executing. At a certain point – without the user ever noticing – the execution flows will deviate, the deviated address will be overwritten, the execution flow will redirected toward the attacker’s embedded code, the code will be executed and a connection will be established between the attacker's machine and the target user. From this point onwards the endpoint – the user’s machine -- is compromised and the attacker is free to download any malware it desires to the targeted endpoint according to its specific needs and objects.

Common Terminology

We can now define the following terms:

Vulnerability – is the bug in the application which causes execution flow to deviate from its designated path when trying to process a malformed data file. Various vulnerabilities correspond to different malformation variants. A vulnerability enables an attacker to intercept a process execution flow. The vulnerability by itself is merely the potential for compromise.

Shellcode – is the small sized code the attacker embeds in the malformed input file. The shellcode establishes connection between the attacker and the targeted machine. This connection is utilized further on to perform reconnaissance and download additional malware.

Exploit – is the code which redirects the execution flow from the overwritten address to the dormant shellcode. Successful exploitation is accomplished when the shellcode is executed and a connection between the attacker and the target machine is established.

The actual exploitation flow can be thus broken to the following parts:

In upcoming posts we will present different kinds of vulnerabilities and the main implementations of these three parts.