in ,

Technical insights into ARM security feature PA (Pointer Authentication)


Preface

The ARM v8.3 version adds a new hardware-based security feature PA (Pointer Authentication), and a new set of related instructions pac and aut. The pac* instruction is used to sign the target pointer, and the signature content is stored in the head of the pointer. In the free space, the aut* instruction is used to verify whether the signature content stored in the pointer header is correct, thereby achieving control flow integrity (CFI) and data flow integrity (DFI) protection.

Defense

Enhanced PA inspection function

From the paper PAC it up: Towards Pointer Integrity using ARM Pointer Authentication

The paper mentioned that the fatal flaw of PAC is that it will be reused, including the reuse of PA code, PA signature gadget, pointer body replay, etc. In response to this situation, the author has made an optimized security solution to solve it.

Two PAC attack models are mentioned:

Malicious PAC generation

  • Gadget generated by PAC allows attackers to use fragments to complete signature operations

PAC reuse attack

  • Roll back a PAC signed pointer to the previously signed pointer value
  • Replace the signature pointer with another signature pointer using the equal PA modifier

Reuse attacks are equivalent to manipulating the program control flow in a small range, but they cannot make unlimited jumps.

Enhancement plan

(1) Add function type as signature input value

For the PAC signature of the return address, you can add function type type-id information. In this way, even if it is replaced with another signature value, the verification will still fail if the function type is inconsistent.

(2) DFI verification scheme for data pointer type

The author divides the use of data pointers into two situations, one is the on-load type and the other is the on-use type.

When using on-load type pointers, the value after dereference will remain in the register, which requires separate PAC signature protection.

The use of on-use type pointers will be cleared after dereference, so some optimized combinations of PAC instructions can be used.

(3) Processing of pointer type conversion scenarios

In this case of pointer type conversion, the original PAC signature will be removed, and then the new type information will be used for the PAC signature, so that when the pointer is used, the new type will be used for verification without an error.

Use PA to enhance existing defense mechanisms

From the paper Protecting the stack with PACed canaries

As can be seen from the title of the paper, the idea is to use the pac instruction to strengthen the canary mechanism.

The canary mechanism is the core defense against stack overflow, but its disadvantage is that it is vulnerable to information leakage and brute force attacks.

Three vulnerabilities are mentioned in the article:

  1. When a single program is running, the canary value is fixed data (unique to the process);
  2. Canary data is stored in unsafe memory and is easily tampered with;
  3. Only inserting detection at the end of the stack frame, overflow of local variables cannot be detected;

The author specifically proposes a canary solution based on PAC:

  1. Ability to protect local single variables in the stack;
  2. There is no need to store canary data in a safe space;
  3. Able to achieve a unique canary for each function call;
  4. The overall solution takes advantage of hardware features and has better performance;

The author's final solution is also relatively simple and crude. Insert the detection canary directly after each local buffer. At the same time, each canary is a value signed by PAC. The modifier of this signature is the last 48 bits of the SP pointer, plus the type of the function. id value. Each canary inserted in this way will be different, and it will be detected if it is tampered with.

But this will insert a lot of extra instructions, but in general, compared with the original stack canary solution, the performance overhead is almost the same, but it can achieve more defense capabilities.

Using PA for memory vulnerability detection

From the paper Hardware-based Always-On Heap Memory Safety

This paper uses hardware-based heap memory bounds checking and use-free vulnerabilities, similar to Asan.

The opening chapter raises issues with existing boundary checks:

  1. Additional instruction overhead for bounds checking and metadata storage
  2. Complex metadata handling

The author proposes five major challenges:

  1. Register extension
  2. Bounds checking operations
  3. metadata proliferation
  4. High memory overhead
  5. Complex metadata processing solutions

Targeting the above five challenges, the author proposes targeted solutions

  1. Using the signature and verification methods related to the PA instruction, the signature information of the pointer can be stored in the header of the pointer, without the need to extend the register or manually perform metadata diffusion. The copy action of the pointer itself can replace the action of metadata diffusion. ;(solve challenges 1 and 3)
  2. For challenge 2, perform a boundary check when dereferencing the pointer. Because after the PAC signature, there must be a signature verification operation before the pointer is used, so the boundary check can be performed at that time; (solve challenge 2)
  3. By indexing in a hash table, you can quickly find the boundary range of a pointer to quickly determine whether there is an out-of-bounds behavior in the next pointer access; (solve challenges 4 and 5)

The author then describes more specific details of the solution, which involves both software and hardware parts.

Software modifications include:

  • Extension of new special instruction set: The author developed a new instruction set to perform bounds checking
  • Modification of the compiler: by modifying the compilation process in the compiler, adding checking instructions in the process of malloc and free
  • Modification of operating system: need to adapt to the newly added instruction set and exception handling of instructions

The hardware content is a bit difficult for a software engineer like me to understand. It mainly includes the operation of two state machines.

From the paper PACMem: Enforcing Spatial and Temporal Memory Safety via ARM Pointer Authentication

This paper is more complete than the previous one, and its functions and solutions are more mature. The main goal is to use PAC instructions to check more memory problems with lower overhead.

The author's starting point is that the memory overhead, volume overhead, and performance overhead of various sterilizers are very huge. The PAC-based sterilizer they studied can optimize the above problems very well. (This entry point is really good, and it has risen a lot in no time)

The whole idea is similar to the hardware paper above. A table structure will be created to store the pointer base address, creation timestamp, and object size information. One of the more interesting ones is the creation of timestamp, which is a data calculated by PAC using two input values ​​of random number and SP register, which can be regarded as a unique identifier of this chunk.

Of course, this calculation method will definitely have a certain probability of PAC data conflict. The conflict resolution strategy given by the author is also the simplest one, which is to directly reduce the original data by one until there is no conflict.

The creation point of metadata is very obvious on malloc. When a chunk is created, the corresponding information will be converted into table structure data that we maintain, and maintained during the life cycle of the chunk.

The location of the checkpoint is also very obvious. When the pointer is dereferenced, the detection in the hash table will be triggered. The detection is divided into two points:

  1. Whether the object exists at this time depends on the creation timestamp;
  2. Whether the object is out of bounds at this time depends on the range obtained by adding the object size and base address;

In terms of compatibility, when a signed pointer is transferred to an unsigned module, PACMem will detect whether there is an exception, remove the PAC signature value, and convert it into a normal pointer. When an unsigned pointer is transferred to the protection module, PACMem will generate metadata for it as a new chunk. This metadata will be filled in with the minimum base address and maximum object size to ensure compatibility.

In addition to the above-mentioned main solutions, some performance optimization strategies have also been implemented.

  1. Optimization of loop access: It will only check if there is a problem before the loop starts. If the pointer loop increases or decreases, it will be optimized to only check the final result;
  2. Optimization for duplicate addresses: If it is determined that the address is the same again, the second address detection will be optimized;
  3. Only write data check optimization: In the default scenario, security checks are performed for both reading and writing, but sometimes for performance reasons, you can only check when writing data to increase detection efficiency;

In summary, overall, the cost of detecting stack type problems such as buffer overflow, use after free, and double release has decreased. However, compared with ASan, PACMem still detects fewer types of vulnerabilities. ASan can also detect memory leaks, unused Issues such as initializing memory.

Attack Chapter

Analysis of Apple’s PA Implementation

From the paper Demystifying Pointer Authentication on Apple M1

If PA is enabled on both EL0 and EL1, the signature obtained by EL0 can be verified on EL1. (In order to prevent this problem, the Linux kernel refreshes the PA-related keys when switching between user mode and kernel mode.) This is a cross-layer attack.

During the analysis of Apple's implementation, the author found:

  • Apple has a special hardware mechanism that prevents us from observing the real PA key
  • Apple's PAC instructions behave inconsistently with ARM's native instructions

Some tips when reverse engineering Apple PA:

  • Obtain the most basic Apple special PA register identification code through the test function
  • Solve the problem of register aliasing through the mapping of EL1 and EL2 layer registers
  • Through the above two steps, continuously search for instruction information in the kernel. If unknown content is encountered again, repeat steps 1-2 again.

After Apple enables its own PA mode, signature commands such as pacia will not use the IA key of the EL1 layer, but will use the IA key of the EL2 layer.

Apple has magically modified the pac command operation at the hardware level. Mainly in these aspects:

  • additional key register
  • Registers for scenarios between host and virtual machine
  • Special control register to enable Apple PA mode
  • Apple’s PA signature algorithm is different from Qualcomm’s QARMA, and there are some salt value XOR operations.

Describes the attack methods of signature replacement at different layers/environments of PAC signatures, such as the pac value of the signature in user mode, which is used in kernel mode. But this can only be applied to scenarios without key switching, but both Apple and Linux will perform key switching when switching levels.

  • Cross-VM attack method (cross-over attack between host and virtual machine)
  • Cross-key attack method (Akey and Bkey are the same)
  • If PA's signatures for a certain pointer all have the same value, then the attacker can copy this value to reuse the signature, leading to a reuse attack.
  • Cross-EL layer attack (this problem will exist if the key is not updated when switching between EL layers)

The above is the process dependency of each Apple Key, and the usage of each signature instruction in user mode and kernel mode.

Apple uses IB to sign return address pointers, IA to sign other types of function pointers, DA to sign kernel data pointers, DB to sign user-mode data pointers, and GA to sign interrupt vector tables and extremely sensitive data areas.

Attack surface analysis for PAC:

  • PAC has no protected data pointer
  • PAC has no protected interrupt context data
  • Signature gadget
  • Key leak

Vulnerabilities in Apple's PA implementation

The early ones to analyze the flaws of Apple's PA mechanism were top teams from the industry, Google P0 and Tencent Keen.

Attack surface analysis of some of the methodologies mentioned:

  • PA memory key leaked
  • PAC forgery across the EL layer, such as using PAC instructions to sign kernel pointers in user mode
  • If the key is common, there may be forgery attacks with different PAC key signing instructions.
  • PAC signed gadget, using gadget to sign your own pointers
  • PAC’s violent solution

However, when the author actually explored the above attack surface, I found that Apple has made a lot of considerations. The above attack surface is basically useless except for gadgets and brute force solving. And the suspect was placed on the hardware. Apple's hardware has done special processing for the key.

There was also a very funny place where I only made a comparison, but did not jump or do any other actions, so I returned directly.

P0 published a more detailed bypass idea on blackhat one year after the blog was discovered, including a bypass idea that uses an interrupt mechanism. One of the interrupt bypasses is listed below:

After that, the academic community also began to analyze Apple PA. This is exactly what the above paper does. It analyzes the weird operations done in Apple hardware and displays them through virtual machine testing. A signature gadget and some unprotected sensitive data pointers were unearthed.

The shocking exploit chain revealed on 37C3

One of them involves the Apple PAC bypass, which I am a little unclear about, so the threshold is too high and I am at a stage where I cannot understand it. Here is a demonstration picture of the attack chain for everyone to see. What is more sinister is the chip-level vulnerability. I don’t know how it came out, and there are different opinions.

Picture from:https://securelist.com/operation-triangulation-the-last-hardware-mystery/111669/

Collection of related event articles:https://securelist.com/trng-2023/

Summarize

A lot of things can be done using the PAC instruction, including strengthening the detection mechanism of PAC itself, enhancing existing defense mechanisms, and detecting more types of memory vulnerabilities. As a new instruction in ARM v8.3, with Apple's vigorous promotion, it has been greatly developed in commercial use. As for the latest MTE, Google has been the first to commercialize it. I wonder if this is considered an arms race?

The most outstanding thing about the PA mechanism is that it reuses the empty position in front of the address as the signature value. It is low-overhead and does not require additional saving of signature information. The original pointer copy can also be completely reused. The overall design idea basically blocks control flow hijacking, because you must first break through the PA mechanism to achieve code execution, but to attack the PA mechanism, you often need to rely on code execution, which becomes a cycle of dependency. Egg problem. Of course, after looking at the analysis ideas of the laboratories of various major manufacturers above, there are still some flaws in the implementation process of PA, but as long as it is tempered for a period of time, it can definitely become a killer vulnerability mitigation mechanism, and may be able to put a lid on memory vulnerabilities. Board? let us wait and see!

After all memory vulnerabilities are blocked, what will be next? Could logic holes become the next battleground? How to effectively block and prevent unpredictable logical loopholes? These should be things that security practitioners need to consider next.

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

[Information]This bill in the United States is the only one in the world

EigenLayer Airdrop Has Highlighted the Problems in the Crypto Market