PHMon: A Programmable Hardware Monitor and Its Security Use Cases

This paper presents PHMon, a Programmable Hardware Monitoring system for enforcing flexible security policies using a match-action (match-event) pipeline.

Paper Summary

This paper presents PHMon, a Programmable Hardware Monitoring system for enforcing flexible security policies using a match-action (match-event) pipeline. They modify the CPU to pass a 5-member tuple of (instruction, program_counter, next_pc, addr, data) to the PHMon co-processor. The co-processor match-unit (mu) checks the tuple for matches to any entries in the events table, passing any hits to the action unit (au) for further processing. They evaluate their system on 4 use-cases including protection from Return-Oriented-Programming (ROP) attacks, Hardware Accelerated Fuzzing, and Preventing Information Leakage. Evaluation results showed a .9% overhead for their shadow-stack protections, and improved performance up to 16x for hardware accelerated AFL.

PHMon makes the following contributions:

  1. The design of a flexible hardware monitor for in-line event monitoring
  2. Propose 4 use-cases where this architecture can improve performance from the state-of-the-art
  3. Implement their design on an FPGA board

The Problem

There is a desire to move more of a systems security policy enforcement to hardware-level abstractions, e.g. Intel MPK, Arm Trustzone. However, such mechanisms take a considerable amount of time and financial backing to become market ready products. Furthermore, hardware mechanisms can only enforce a static security policy that is valid at the time, with a restricted set of enforcement actions that lack flexibility to change in the future.

The Solution (or Approach)

This paper proposes a solution using a Programmable Hardware Monitor (PHMon) that can be used to enforce a flexible security policies that can change to match the requirements of the evolving security threatscape. They provide this flexibility using a event-action approach consisting of a Trace Unit (TU) in the CPU for collection instruction traces, a Match Unit (MU) for examining traces to see if they require further action, and an Action Unit (AU) for processing required instruction traces. System-designs can specify and updates rules that represent the kind of events they are interested, e.g. high-level file access, or low-level instructions like branching. They then specify a corresponding action to take place when a tuple triggers a ‘hit’ in the events table.

Evaluation

They evaluate a FPGA prototype of their system on 4 use-cases. PHMon is implemented as a Rocket Chip Co-Processor (RoCC) connected to a RISC-V Rocket processor interface, targeting the Linux kernel (v4.15).

They do not provide a testbed representative of a ‘real’ system, unable to provide a L2 cache often seen in CPUs due to limitations with the Rocket chip. They also cap the frequency of their PHMon co-processor due to the hardware limitations of their evaluation board. With this in mind, it should provide an indication of worst-case performance that could be improved in future iterations.

Performance metrics show a .9% overhead for their shadow stack implementation, with highest performance overhead being seen in applications requiring processing of strings (e.g. compilation, string searching, and parsing) or searching. This could be a result of higher memory requirements as they do not provide a L2 cache. However more ‘numerical’ workloads e.g. compression with bzip, so better performance than alternatives. In their AFL use-case, they observed a performance improvement of up to 16x, with PHMon finding more (12) vulnerabilities in comparison to base AFL (11). It is suggested that this discrepancy is a result of the probabilistic nature of AFL, concluding that PHMon found more bugs due to the performance increase. They don’t however, expand on the types of bugs found. Hardware debugging showed a constant time performance (due to having to maintain debugging data in software), in comparison to the linear growth in performance for GDB. Finally, evaluation showed a linear growth for PHMon in terms of power and area as number of matching units increased.

What Do You Think?

Quite an interesting paper, with a good, well-reasonable approach. To me this feels like a contextualizing of a match-action pipeline for security monitoring in an operating system, so not really a novel approach (e.g. see P4 in software-defined-networking). However, they do a full-stack implementation that shows that this system could be used in a real-world context, therefore demonstrating the impact and fidelity of a research prototype that is expected for USENIX.

The only comment I would raise is that their threat model feels a little simplistic for me, i.e. kernel is trusted, and they do not consider micro-architectural attacks. In their system the L1 cache is used as a store for retrieving data for PHMon, however caches have been identified as side-channels/attack vectors that can influence and/or leak data in the past. However I understand that this was out-of-scope (or potentially a plan for future work).

Comments/Questions

  • Are there any caching opportunities for PHMon that could improve performance?
  • Does PHMon deal with branch prediction in monitoring?
  • Can monitoring rules be updated in real-time, or does it require PHMon to be offline?