数据结构对齐
数据结构对齐是代码编译后在内存的布局与使用方式。包括三方面内容:数据对齐、数据结构填充(padding)与包入(packing)。
现代计算机一般是32比特或64比特地址对齐,如果要访问的变量没有对齐,可能会触发总线错误。
当数据小于计算机的字(word)尺寸,可能把几个数据元素放在一个字中,称为包入(packing)。
许多编程语言自动处理数据结构对齐。Ada语言,[1][2] PL/I,[3] Pascal,[4] 某些C语言与C++实现, D语言,[5] Rust,[6] 与汇编语言允许特别控制对齐的方式。
定义
内存地址a被称为n字节对齐,当a是n的倍数(n应是2的幂)。
一次内存访问被称为对齐的,当被访问的数据长度为n 字节且该数据地址为n字节对齐。如果内存未对齐,称作misaligned。显然,字节访问总是对齐的。
内存指针是对齐的,如果它所指的数据是对齐的。指向聚合数据(aggregate data,如struct或数组)是对齐的,当且仅当它的每个组成数据是对齐的。
体系结构
RISC
Most RISC processors will generate an alignment fault when a load or store instruction accesses a misaligned address. This allows the operating system to emulate the misaligned access using other instructions. For example, the alignment fault handler might use byte loads or stores (which are always aligned) to emulate a larger load or store instruction.
Some architectures like MIPS have special unaligned load and store instructions. One unaligned load instruction gets the bytes from the memory word with the lowest byte address and another gets the bytes from the memory word with the highest byte address. Similarly, store-high and store-low instructions store the appropriate bytes in the higher and lower memory words respectively.
The Alpha architecture has a two-step approach to unaligned loads and stores. The first step is to load the upper and lower memory words into separate registers. The second step is to extract or modify the memory words using special low/high instructions similar to the MIPS instructions. An unaligned store is completed by storing the modified memory words back to memory. The reason for this complexity is that the original Alpha architecture could only read or write 32-bit or 64-bit values. This proved to be a severe limitation that often led to code bloat and poor performance. To address this limitation, an extension called the Byte Word Extensions (BWX) was added to the original architecture. It consisted of instructions for byte and word loads and stores.
Because these instructions are larger and slower than the normal memory load and store instructions, they should only be used when necessary. Some C and C++ compilers have an “unaligned” attribute that can be applied to pointers that need the unaligned instructions.
x86
x86体系架构最初是不要求内存对齐。一些SSE2指令要求数据是128比特(16字节)对齐。有些CPU指令用于未对齐访问如MOVDQU。读写内存操作仅在对齐时才是原子的。
C语言struct在x86上的对齐
C语言数据结构内的成员先后顺序不能改变。
常见的C语言编译器在32比特x86上,double是8字节对齐,但Linux上是4字节对齐(编译选项-malign-double实现8字节对齐)。
一些编译器(Microsoft,[7] Borland, GNU,[8]等等)使用#pragma directive指定对齐的包入(packing)。例如:
#pragma pack(push) /* push current alignment to stack */
#pragma pack(1) /* set alignment to 1 byte boundary */
struct MyPackedData
{
char Data1;
long Data2;
char Data3;
};
#pragma pack(pop) /* restore original alignment from stack */
这个结构在32位系统的大小为6字节。
缺省packing与#pragma pack
Microsoft编译器的项目缺省packing(编译选项/Zp)与#pragma pack指令。#pragma pack指令仅能减少packing尺寸。[9]
参考文献
- . . [2015-08-30]. (原始内容存档于2015-10-13).
- . (PDF). [2015-08-30].
- (PDF). IBM. July 1966: 55–56 [2017-11-21]. C28-6571-3. (原始内容存档 (PDF)于2019-05-29).
- Niklaus Wirth. (PDF): 12. July 1973 [2017-11-21]. (原始内容 (PDF)存档于2015-03-15).
- . [2012-04-13]. (原始内容存档于2012-04-09).
- . [2016-06-19]. (原始内容存档于2016-05-09).
- . [2017-11-21]. (原始内容存档于2017-03-28).
- . [2017-11-21]. (原始内容存档于2017-01-08).
- . MSDN Library. Microsoft. 2007-07-09 [2011-01-11]. (原始内容存档于2012-10-18).
- Bryant, Randal E.; David, O'Hallaron. 2003. Upper Saddle River, NJ: Pearson Education. 2003 [2017-11-21]. ISBN 0-13-034074-X. (原始内容存档于2007-08-06).
外部链接
- IBM developerWorks article on data alignment 页面存档备份,存于
- Article on data alignment and performance 页面存档备份,存于
- MSDN article on data alignment
- Article on data alignment and data portability 页面存档备份,存于
- Byte Alignment and Ordering 页面存档备份,存于
- Intel Itanium Architecture Software Developer's Manual 页面存档备份,存于
- Data Alignment when Migrating to 64-Bit Intel® Architecture 页面存档备份,存于
- PowerPC Microprocessor Family: The Programming Environments for 32-Bit Microprocessors