The gcc C compiler generates its output in the form of assembly code, a textual representation of the machine code giving the individual instructions in the program. gcc then invokes both an assembler and a linker to generate the executable machine code from the assembly code.
Our presentation is based on two related machine languages: Intel IA32 ( “Intel Architecture 32-bit,”), the dominant language of most computers today, and x86-64, its extension to run on 64-bit machines.
- IA32 ---- Intel processors
- x86-64 ---- Advanced Micro Devices (AMD) processors (capable of running the exact same machine-level
programs as Intel processors ), the extension of IA32 to 64 bits
Whereas a 32-bit machine can only make use of around 4 gigabytes (2^32 bytes) of random-access memory, current 64-bit machines can use up to 256 terabytes (2^48 bytes). Most of the microprocessors in recent server and desktop machines, as well as in many laptops, support either 32-bit or 64-bit operation. However, most of the operating systems running on these machines support only 32-bit applications, and so the capabilities of the hardware are not fully utilized.
本章主要内容:
- representation and manipulation of data and the implementation of control in IA32.
- how 'if, while, and switch statements' are implemented.
- the implementation of procedures, including how the program maintains a run-time stack to support the
passing of data and control between procedures, as well as storage for local variables. - how data structures such as arrays, structures, and unions are implemented at the machine level.
- the problems of out of bounds memory references and the vulnerability of systems to buffer overflow attacks.
- some tips on using the gdb debugger for examining the run-time behavior of a machine-level program.
补充内容:
Another Web Aside gives a brief presentation of ways to incorporate assembly code into C programs.
The more recent “SSE” instructions were developed to support multi-media applications, but in their more recent versions (version 2 and later), and with more recent versions of gcc, SSE has become the preferred method for mapping floating point onto both IA32 and x86-64 machines.
3.1 A Historical Perspective
Linux uses what is referred to as flat addressing, where the entire memory space is viewed by the programmer as a large array of bytes.
A number of formats and instructions have been added to x86 for manipulating vectors of small integers and floating-
point numbers. These features were added to allow improved performance on multimedia applications, such as image processing, audio and video encoding and decoding, and three-dimensional computer graphics.
3.2 Program Encodings
compile code on an IA32 machine using a Unix command line:
unix>gcc -O1 -o p p1.c p2.c
//option -O1 instructs the compiler to apply level-one optimizations
unix>gcc -O1 -S code.c
unix> gcc -O1 -c code.c
unix>objdump -d code.o
3.2.1 Machine-Level Code
Two of computer system abstraction are especially important for machine-level programming.
- First, the format and behavior of a machine-level program is defined by the instruction set architecture, or “ISA,” defining the processor state, the format of the instructions, and the effect each of these instructions will have on the state.
- Second, the memory addresses used by a machine-level program are virtual addresses, providing a memory model that appears to be a very large byte array.
From IA32 machine code, we can see following processor state:
- The program counter ( “PC,” and called %eip in IA32)
- The integer register file contains eight named locations storing 32-bit values.
- The condition code registers hold status information about the most recently executed arithmetic or logical instruction.
- A set of floating-point registers store floating-point data.
3.2.2 Code Examples
Several features about machine code and its disassembled representation are worth noting:
- IA32 指令不是固定长度的,是变长的。IA32 instructions can range in length from 1 to 15 bytes. The instruction encoding is designed so that commonly used instructions and those with fewer operands require a smaller number of bytes than do less common ones or ones with more operands.
- IA32 指令的设计有一定的规则,例如,只有指令pushl %ebp 可以以55开头。The instruction format is designed in such a way that from a given starting position, there is a unique decoding of the bytes into machine instructions. For example, only the instruction pushl %ebp can start with byte value 55.
-
The disassembler determines the assembly code based purely on the byte sequences in the machine-code file. It does not require access to the source or assembly-code versions of the program.
链接后的可执行文件大小比未链接前的大很多。The file prog has grown to 9,123 bytes, since it contains not just the code for our two procedures but also information used to start and terminate the program as well as to interact with the operating system.
链接后的汇编代码与之前未链接的汇编代码的不同点:
- One important difference is that the addresses listed along the left are different—the linker has shifted the location of this code to a different range of addresses.
- A second difference is that the linker has determined the location for storing global variable accum.
3.3 Data Formats
As the table indicates, most assembly-code instructions generated by gcc have a single-character suffix denoting the size of the operand. For example, the data movement instruction has three variants: movb(move byte), movw(move word), and movl (move double word).
注意:用 l 表示double和4字节整数不会产生歧义,因为浮点数使用的是一套完全不同的指令和寄存器。
Note that the assembly code uses the suffix ‘l’ to denote both a 4-byte integer as well as an 8-byte double-precision floating-point number. This causes no ambiguity, since floating point involves an entirely different set of
instructions and registers.
3.4 Accessing Information
For the most part, the first six registers can be considered general-purpose registers with no restrictions placed on their use. The final two registers (%ebp and %esp) contain pointers to important places in the program stack.
The low-order 2 bytes of the first four registers can be independently read or written by the byte operation instructions. 这是为了向后兼容(backwards compatiblity),也就是能让更早的代码正常地工作。
3.4.1 Operand Specifiers 操作数
operands, specifying the source values (源值) to reference in performing an operation and the destination location (保存结果的目标地址) into which to place the result.
操作数可以被分为三个类型:
- The first type, immediate (立即数), is for constant values. (
$-577
or$ 0x1F
). Any value that fits into
a 32-bit word can be used. - The second type, register (寄存器), denotes the contents of one of the registers. 用符号 来表示任意寄存器 a,用引用 来表示它的值。这是将寄存器集合看成一个数组R,用寄存器的名称作为索引。
- The third type of operand is a memory reference(存储器引用), in which we access some memory location according to a computed address, often called the effective address. 用M[Addr]表示对存储器中的字节值的引用。there are many different addressing modes allowing different forms of memory references,包括:绝对寻址(Absolute)、间接寻址(就是熟悉的指针!)(Indirect)、基址+偏移量寻址(Base + displacement)、变址寻址(Indexed)、比例变址寻址(Scaled indexed).
This general form is often seen when referencing elements of arrays.
solution:参考Figure 3.3 的Form列,看看每个Operand对应哪个形式,再用Operand value列的公式计算得到结果值。
3.4.2 Data Movement Instructions
The instructions in the mov class copy their source values to their destinations. S, D 一个表示值,一个表示地址。The source operand designates a value that is immediate, stored in a register, or stored in memory. The destination operand designates a location that is either a register or a memory address.
IA32 imposes the restriction that a move instruction cannot have both operands refer to memory locations. Copying a value from one memory location to another requires two instructions—the first to load the source value into a register, and the second to write this register value to the destination.
example:
Both the movs and the movz instruction classes serve to copy a smaller amount of source data to a larger data location, filling in the upper bits by either sign expansion (movs) or by zero expansion (movz). With sign expansion, the upper
bits of the destination are filled in with copies of the most significant bit of the source value.
The final two data movement operations are used to push data onto and pop data from the program stack. With IA32, the program stack is stored in some region of memory. The stack pointer %esp holds the address of the top stack element.
the behavior of the instruction
pushl %ebp
is equivalent to the following pair of instructions:
subl $4,%esp //Decrement stack pointer
movl %ebp,(%esp) //Store %ebp on stack
the instruction popl %eax
is equivalent to the following pair of instructions:
movl (%esp),%eax //Read %eax from stack
addl $4,%esp //Increment stack pointer
Since the stack is contained in the same memory as the program code and other forms of program data, programs can access arbitrary positions within the stack using the standard memory addressing methods. For example, assuming the
topmost element of the stack is a double word, the instruction movl 4(%esp),%edx
will copy the second double word from the stack to register %edx.
3.4.3 Data Movement Example
3.5 Arithmetic and Logical Operations
3.5.1 Load Effective Address
The load effective address instruction leal copies the effective address of S to the destination.
3.5.2 Unary and Binary Operations
Binary Operations: The first operand can be either an immediate value, a register, or a memory location. The second can be either a register or a memory location. As with the movl instruction, however, the two operands cannot both be memory locations.
3.5.3 Shift Operations
The shift amount k is encoded as a single byte, since only shift amounts between 0 and 31 are possible (only the low-order 5 bits of the shift amount are considered). The shift amount is given either as an immediate or in the single-
byte register element %cl. (These instructions are unusual in only allowing this specific register as operand.)
The destination operand of a shift operation can be either a register or a memory location.
3.5.4 Discussion
3.5.5 Special Arithmetic Operations
IA32 also provides two different “one-operand” multiply instructions to compute the full 64-bit product of two 32-bit values—one for unsigned (mull), and one for two’s-complement (imull) multiplication. For both of these, one argument must be in register %eax, and the other is given as the instruction. source operand. The product is then stored in registers %edx (high-order 32 bits) and %eax (low-order 32 bits).
3.6 Control
Machine code provides two basic low-level mechanisms for implementing conditional behavior: it tests data values and then either alters the control flow or the data flow based on the result of these tests.
- Data-dependent control flow / conditional data transfers
- conditional control transfers
3.6.1 Condition Codes
CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.
- CF: Carry Flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.
- ZF: Zero Flag. The most recent operation yielded zero.
- SF: Sign Flag. The most recent operation yielded a negative value.
- OF: Overflow Flag. The most recent operation caused a two’s-complement overflow—either negative or positive.
正溢出与负溢出:
首先,一个正数与一个负数相加,不可能溢出,因为结果的绝对值一定小于两个加数的绝对值,既然两个加数能合理表示出来,结果一定也能合理表示出来。
其次,正溢出是由于两个很大的正数相加,导致符号位变成1的情况如0110+0011=1001(假设最大只能运算4位)。负溢出则是两个很小的负数相加,导致符号位变成0的情况,如1011(-5)+1011(-5)=10110->0110溢出,如1111(-1)+1111(-1)=11110->1110则没溢出。
因此,
- 正溢出的判断标准是符号位或最高位有进位。
- 负溢出的判断标准是符号位和最高位只有一个发生了进位。符号位和最高位同时发生进位则没溢出。
注意,这里的最高位指的是去掉符号位后的最高位,即符号位后面一位。
可以结合上面列举的负溢出的例子理解。
系统是怎么根据操作来设置条件码寄存器的呢?以什么为判断基准?
比如系统用一条ADD指令完成了等价于t=a+b的功能,这时候会用以下表达式为判断基准,来设置条件码寄存器:
无符号操作溢出时的表现就是最高位出现了进位,对应CF。
OF: OF代表发生了有符号数溢出,需要满足两个条件,一是两个加数符号相同,二是结果的符号要和任意一个加数相反。
CMP S1,S2会计算S2-S1并根据结果设置条件码。TEST S1,S2会计算S1&S2并根据结果设置条件码。
TEST : Typically, the same operand is repeated (e.g.,testl %eax,%eax to see whether %eax is negative, zero,
or positive), or one of the operands is a mask indicating which bits should be tested.
3.6.2 Accessing the Condition Codes
一般不直接访问条件码,而是根据条件码的组合设置某个字节为0或1,对应的就是SET指令。如下图
一个例子:
A typical instruction sequence to compute the C expression a<b, where a and b are both of type int, proceeds as follows:
此时比较的是%edx(a)-%eax(b),结果被设置到%al中。movzbl设置%eax的高3个字节为0。
3.6.3 Jump Instructions and Their Encodings
跳转指令的目标地址是如何编码的呢?
understanding how the targets of jump instructions are encoded will become important when we study linking in Chapter 7.
There are several different encodings for jumps, but some of the most commonly used ones are PC relative.
-
Program Counter 相对位置编码:把目标地址和跳转指令后面那条指令对应的地址之差作为编码。
That is, they encode the difference between the address of the target instruction and the address of the instruction immediately following the jump. These offsets can be encoded using 1, 2, or 4 bytes.
编码绝对地址: A second encoding method is to give an “absolute” address, using 4 bytes to directly specify the target.
The assembler and linker select the appropriate encodings of the jump destinations.
By using a PC-relative encoding of the jump targets, the instructions can be compactly encoded (requiring
just 2 bytes), and the object code can be shifted to different positions in memory without alteration. (位置无关码)
3.6.4 Translating Conditional Branches
3.6.5 Loops
C provides several looping constructs—namely, do-while, while, and for. Most compilers generate loop code based on the do-while form of a loop, even though this form is relatively uncommon in actual programs. Other loops are transformed into do-while form and then compiled into machine code.
Do-While Loops
While Loops
For Loops
3.6.6 Conditional Move Instructions
条件控制转移指令存在一种缺陷,处理器是通过流水线的方式处理指令的,在执行前一条指令的算术运算时,同时去取下一条指令。因此需要预先确定好指令的执行序列。当出现条件跳转时,处理器会对分支进行预测,虽然准确率很高,但一旦预测失败,处理器需要丢掉它为此跳转指令所做的所有工作,重新填充流水线。这会导致程序性能下降。
As we will see in Chapters 4 and 5, processors achieve high performance through pipelining, where an instruction is processed via a sequence of stages, each performing one small portion of the required operations /*(e.g., fetching the instruction from memory, determining the instruction type, reading from memory, performing an arithmetic operation, writing to memory, and updating the program counter.) */This approach achieves high performance by overlapping the steps of the successive instructions, such as fetching one instruction while performing the arithmetic operations for a previous instruction. To do this requires being able to determine the sequence of
instructions to be executed well ahead of time in order to keep the pipeline full of instructions to be executed. When the machine encounters a conditional jump (referred to as a “branch”), it often cannot determine yet whether or not the jump will be followed. Processors employ sophisticated branch prediction logic to try to guess whether or not each jump instruction will be followed. As long as it can guess reliably (modern microprocessor designs try to achieve success rates on the order of 90%), the instruction pipeline will be kept full of instructions. Mispredicting a jump, on the other hand, requires that the processor discard much of the work it
has already done on future instructions and then begin filling the pipeline with instructions starting at the correct location. As we will see, such a misprediction can incur a serious penalty, say, 20–40 clock cycles of wasted effort, causing a serious degradation of program performance.
An alternate strategy is through a conditional transfer of data. This approach computes both outcomes of a conditional operation (先把条件分支的多个值计算出来), and then selects one based on whether or not the condition holds. 优势在于无需为此丢掉跳转指令后面所做的工作,当然代价就是需要多做一次计算,因此条件传送指令的适用条件有限,编译器需要根据浪费的计算和分支预测错误导致的性能处罚中作权衡,然而实际上它无法很好地判断,因此,只有当两个表达式都十分容易计算时,编译器才会选用条件传送指令,有时候即使分支预测错误的开销更大,仍会选择条件控制转移指令。This strategy makes sense only in restricted cases, but it can then be implemented by a simple conditional move instruction that is better matched to the performance characteristics of modern processors.
For IA32, the source and destination values can be 16 or 32 bits long. Single-byte conditional moves are not supported.
Unlike conditional jumps, the processor can execute conditional move instructions without having to predict the outcome of the test. The processor simply reads the source value (possibly from memory), checks the condition code, and
then either updates the destination register or keeps it the same.
Not all conditional expressions can be compiled using conditional moves. If one of those two expressions could
possibly generate an error condition or a side effect, this could lead to invalid behavior.
invalid example 1:
invalid example 2:
Using conditional moves also does not always improve code efficiency. For example, if either the then-expr or the else-expr evaluation requires a significant computation, then this effort is wasted when the corresponding condition does
not hold. Compilers must take into account the relative performance of wasted computation versus the potential for performance penalty due to branch misprediction. In truth, they do not really have enough information to make this decision reliably;
Overall, then, we see that conditional data transfers offer an alternative strategy to conditional control transfers for implementing conditional operations. They can only be used in restricted cases, but these cases are fairly common and
provide a much better match to the operation of modern processors.
3.6.7 Switch Statements
an efficient implementation using a data structure called a jump table. A jump table is an array where entry i is the
address of a code segment implementing the action the program should take when the switch index equals i. gcc selects the method of translating a switch statement based on the number of cases and the sparsity of the case values. Jump tables are used when there are a number of cases (e.g., four or more) and they span a small range of values.
switch通过跳转表实现,它是一个数组,里面每一项都是一个代码段的地址,GCC根据开关数量决定是否使用跳转表(如大于4个,且值跨度较小会用)
These locations are defined by labels in the code, and indicated in the entries in jt by code pointers, consisting of the labels prefixed by ‘&&.’ (Recall that the operator & creates a pointer for a data value. In making this extension, the authors of gcc created a new operator && to create a pointer for a code location.)
why declare index as unsigned?
Answer: It further simplifies the branching possibilities by treating index as an unsigned value, making use of the fact that negative numbers in a two’s-complement representation map to large positive numbers in an unsigned representation. It can therefore test whether index is outside of the range 0–6 by testing whether it is greater than 6.
The key step in executing a switch statement is to access a code location through the jump table. In our assembly-code version, on line 6, where the jmp instruction’s operand is prefixed with ‘*’, indicating an indirect jump, and the operand specifies a memory location indexed by register %eax, which holds the value of index. (We will see in Section 3.8 how array references are translated into machine code.)
Examining all of this code requires careful study, but the key point is to see that the use of a jump table allows a very efficient way to implement a multiway branch.
3.7 Procedures
3.7.1 Stack Frame Structure
The portion of the stack allocated for a single procedure call is called a stack frame.
The stack pointer can move while the procedure is executing, and hence most information is accessed relative to the frame pointer.
Suppose procedure P (the caller) calls procedure Q (the callee). The arguments to Q are contained within the stack frame for P. In addition, when P calls Q, the return address within P where the program should resume execution when
it returns from Q is pushed onto the stack, forming the end of P’s stack frame. The stack frame for Q starts with the saved value of the frame pointer (a copy of register %ebp), followed by copies of any other saved register values.
3.7.2 Transferring Control
The effect of a call instruction is to push a return address on the stack and jump to the start of the called procedure. The return address is the address of the instruction immediately following the call in the program, so that execution will
resume at this location when the called procedure returns.
The ret instruction pops an address off the stack and jumps to this location. The proper use of this instruction is to have prepared the stack so that the stack pointer points to the place where the preceding call instruction stored its return address.
The leave instruction can be used to prepare the stack for returning. It is equivalent to the following code sequence:
3.7.3 Register Usage Conventions
The set of program registers acts as a single resource shared by all of the procedures. Although only one procedure can be active at a given time, we must make sure that when one procedure (the caller) calls another (the callee), the callee does not overwrite some register value that the caller planned to use later. For this reason, IA32 adopts a uniform set of conventions for register usage that must be respected by all procedures, including those in program libraries.
By convention, registers %eax, %edx, and %ecx are classified as caller-save registers. When procedure Q is called by P, it can overwrite these registers without destroying any data required by P. On the other hand, registers %ebx, %esi, and %edi are classified as callee-save registers. This means that Q must save the values of any of these registers on the stack before overwriting them, and restore them before returning, because P (or some higher-level procedure) may need these values for its future computations. In addition, registers %ebp and %esp must be maintained according to the conventions described here.
3.7.4 Procedure Example
3.7.5 Recursive Procedures
3.8 Array Allocation and Access
3.8.1 Basic Principles
For data type T and integer constant N, the declaration
T A[N];
has two effects. First, it allocates a contiguous region of L * N bytes in memory, where L is the size (in bytes) of data type T . Let us denote the starting location as , Second, it introduces an identifier A that can be used as a pointer to the beginning of the array. The value of this pointer will be .
3.8.2 Pointer Arithmetic
That is, if p is a pointer to data of type T , and the value of p is , then the expression p+i has value , where L is the size of data type T .
The array subscripting operation can be applied to both arrays and pointers. The array reference A[i] is identical to the expression *(A+i).
3.8.3 Nested Arrays
The general principles of array allocation and referencing hold even when we create arrays of arrays. For example, the declaration
int A[5][3];
is equivalent to the declaration
typedef int row3_t[3];
row3_t A[5];
Data type row3_t is defined to be an array of three integers. Array A contains five such elements, each requiring 12 bytes to store the three integers. The total array size is then 4 * 5 * 3 = 60 bytes.
In general, for an array declared as
T D[R][C];
array element D[i][j]
is at memory address
&D[i][j] = x_D + L(C * i + j),
where L is the size of data type T in bytes.
3.8.4 Fixed-Size Arrays
3.8.5 Variable-Size Arrays
3.9 Heterogeneous Data Structures
3.9.1 Structures
3.9.2 Unions
Unions provide a way to circumvent the type system of C, allowing a single object to be referenced according to multiple types.
Rather than having the different fields reference different blocks of memory, they all reference the same block. The overall size of a union equals the maximum size of any of its fields.
Unions can be useful in several contexts. However, they can also lead to nasty bugs, since they bypass the safety provided by the C type system. One application is when we know in advance that the use of two different fields in a data structure will be mutually exclusive. Then, declaring these two fields as part of a union rather than a structure will reduce the total space allocated.
3.9.3 Data Alignment
Many computer systems place restrictions on the allowable addresses for the primitive data types, requiring that the address for some type of object must be a multiple of some value K (typically 2, 4, or 8). Such alignment restrictions simplify the design of the hardware forming the interface between the processor and the memory system.
The IA32 hardware will work correctly regardless of the alignment of data. However, Intel recommends that data be aligned to improve memory system performance. Linux follows an alignment policy where 2-byte data types (e.g., short) must have an address that is a multiple of 2, while any larger data types (e.g., int, int *, float, and double) must have an address that is a multiple of 4. Note that this requirement means that the least significant bit of the address of an object of type short must equal zero. Similarly, any object of type int, or any pointer, must be at an address having the low-order 2 bits equal to zero.
Alignment is enforced by making sure that every data type is organized and allocated in such a way that every object within the type satisfies its alignment restrictions.
Library routines that allocate memory, such as malloc, must be designed so that they return a pointer that satisfies the worst-case alignment restriction for the machine it is running on, typically 4 or 8. For code involving structures, the compiler may need to insert gaps in the field allocation to ensure that each structure element satisfies its alignment requirement. The structure then has some required alignment for its starting address.
.align 4
//This ensures that the data following it will start with an address that is a multiple of 4.
3.10 Putting It Together: Understanding Pointers
Here we highlight some key principles of pointers and their mapping into machine code.
- Every pointer has an associated type. This type indicates what kind of object the pointer points to.
- Every pointer has a value. This value is an address of some object of the designated type. The special NULL (0) value indicates that the pointer does not point anywhere.
- Pointers are created with the & operator.
- Pointers are dereferenced with the * operator. The result is a value having the type associated with the pointer.
- Arrays and pointers are closely related.
- Casting from one type of pointer to another changes its type but not its value.
- Pointers can also point to functions. This provides a powerful capability for storing and passing references to code, which can be invoked in some other part of the program.
3.11 Life in the Real World: Using the gdb Debugger
It is very helpful to first run objdump to get a disassembled version of the program.
We start gdb with the following command line:
unix> gdb prog
Rather than using the command-line interface to gdb, many programmers prefer using ddd, an extension to gdb that provides a graphic user interface.
3.12 Out-of-Bounds Memory References and Buffer Overflow
A more pernicious use of buffer overflow is to get a program to perform a function that it would otherwise be unwilling to do. This is one of the most common methods to attack the security of a system over a computer network. Typically, the program is fed with a string that contains the byte encoding of some executable code, called the exploit code, plus some extra bytes that overwrite the return address with a pointer to the exploit code. The effect of executing the ret instruction is then to jump to the exploit code.
In one form of attack, the exploit code then uses a system call to start up a shell program, providing the attacker with a range of operating system functions. In another form, the exploit code performs some otherwise unauthorized task, repairs the damage to the stack, and then executes ret a second time, causing an (apparently) normal return to the caller.
3.12.1 Thwarting Buffer Overflow Attacks
The techniques we have outlined—randomization, stack protection, and limiting which portions of memory can hold executable code—are three of the most common mechanisms used to minimize the vulnerability of programs to buffer
overflow attacks. Unfortunately, there are still ways to attack computers [81, 94], and so worms and viruses continue to compromise the integrity of many machines.
3.13 x86-64: Extending IA32 to 64 Bits
A shift is underway to a 64-bit version of the Intel instruction set. Originally developed by Advanced Micro Devices (AMD) and named x86-64,it is now supported by most processors from AMD (who now call it AMD64) and by Intel,
who refer to it asIntel64. Most people still refer to it as “x86-64,” and we follow this convention. (Some vendors have shortened this to simply “x64”.)
For example, procedure parameters are now passed via registers rather than on the stack, greatly reducing the number of memory read and write operations.
3.13.1 History and Motivation for x86-64
For applications that involve manipulating large data sets, such as scientific computing, databases, and data mining, the 32-bit word size makes life difficult for programmers. They must write code using out-of-core algorithms, where the data reside on disk and are explicitly read into memory for processing.
In this text, we use “IA32” to refer to the combination of hardware and gcc code found in traditional 32-bit versions of Linux running on Intel-based machines. We use “x86-64” to refer to the hardware and code combination running
on the newer 64-bit machines from AMD and Intel. In the worlds of Linux and gcc, these two platforms are referred to as “i386” and “x86_64,” respectively.
3.13.2 An Overview of x86-64
The main features include:
- Pointers and long integers are 64 bits long. Integer arithmetic operations support 8, 16, 32, and 64-bit data types.
- The set of general-purpose registers is expanded from 8 to 16.
- Much of the program state is held in registers rather than on the stack. Integer and pointer procedure arguments (up to 6) are passed via registers. Some procedures do not need to access the stack at all.
- Conditional operations are implemented using conditional move instructions when possible, yielding better performance than traditional branching code.
- Floating-point operations are implemented using the register-oriented instruction set introduced with SSE version 2, rather than the stack-based approach supported by IA32.
Data Types
Assembly-Code Example
3.13.3 Accessing Information
3.13.4 Control
Procedures
3.13.5 Data Structures
One difference is that x86-64 follows a more stringent set of alignment requirements. For any scalar data type requiring K bytes, its starting address must be a multiple of K. Thus, data types long and double as well as pointers, must be aligned on 8-byte boundaries. In addition, data type long double uses a 16-byte alignment (and size allocation), even though the actual representation requires only 10 bytes. These alignment conditions are imposed to improve memory system performance—the memory interface is designed in most processors to read or write aligned blocks that are 8 or 16 bytes long.
3.13.6 Concluding Observations about x86-64
The formulation of both the x86-64 hardware and the programming conventions changed the processor from one that relied heavily on the stack to hold program state to one where the most heavily used part of the state is held in the much faster and expanded register set.
The biggest drawback in transforming applications from 32 bits to 64 bits is that the pointer variables double in size, and since many data structures contain pointers, this means that the overall memory requirement can nearly double.
3.14 Machine-Level Representations of Floating-Point Programs
We call this combination of storage model, instructions, and conventions the floating-point architecture for a machine.
- method of storing floating-point data
- additional instructions to operate on floating-point values
- instructions to convert between floating-point and integer values
- instructions to perform comparisons between floating-point values
-
conventions on how to pass floating-point values as function arguments and to return them as function results
3.15 Summary
By contrast, Java is implemented in an entirely different fashion. The object code of Java is a special binary representation known as Java byte code. This code can be viewed as a machine-level program for a virtual machine. As its name suggests, this machine is not implemented directly in hardware. Instead, software interpreters process the byte code, simulating the behavior of the virtual machine.