GKI 16-6.12 android-mainline errata

This page describes important issues and bug fixes found on android-mainline that might be significant to partners.

November 15, 2024

  • Clang is updated to 19.0.1 for android-mainline and android16-6.12

    • Summary: The new version of Clang introduces a bounds sanitizer for arrays, where the array's size is stored in a separate variable linked to the array using the __counted_by attribute. This feature might cause a kernel panic if the array size isn't properly updated. The error message looks like this:
    UBSAN: array-index-out-of-bounds in common/net/wireless/nl80211.c
    index 0 is out of range for type 'struct ieee80211_channel *[] __counted_by(n_channels)' (aka 'struct ieee80211_channel *[]')
    
    • Details: The bounds sanitizer is essential to protect the integrity of the kernel by detecting out-of-bounds access. And with CONFIG_UBSAN_TRAP enabled, bounds sanitizer triggers a kernel panic on any finding.

      • Previous version of the bounds sanitizer checked only fixed-size arrays and couldn't check dynamically allocated arrays. The new version uses the __counted_by attribute to determine the array bounds at runtime and detect more cases of out-of-bound access. However, in some cases, the array is accessed before the size variable is set, triggering the bounds sanitizer and causing a kernel panic. To address this issue, set the array's size immediately after allocating the underlying memory, as illustrated in aosp/3343204.
    • About CONFIG_UBSAN_SIGNED_WRAP: The new version of Clang sanitizes signed integer overflow and underflow despite the -fwrapv compiler flag. The -fwrapv flag is designed to treat signed integers as two's complement unsigned integers with defined overflow behavior.

      • While sanitizing signed integer overflow in the Linux kernel can help identify bugs, there are instances where overflow is intentional, for example, with atomic_long_t. As a result, CONFIG_UBSAN_SIGNED_WRAP has been disabled to allow UBSAN to function solely as a bounds sanitizer.
    • About CONFIG_UBSAN_TRAP: UBSAN is configured to trigger a kernel panic when it detects an issue to protect the integrity of the kernel. However, we disabled this behavior from October 23 to November 12. We did this to unblock the compiler update while we fixed known __counted_by issues.

November 1, 2024

  • Linux 6.12-rc4 landing
    • Summary: CONFIG_OF_DYNAMIC potentially causing severe regressions for faulty drivers.
    • The details: While merging Linux 6.12-rc1 into android-mainline we noticed issues with out-of-tree drivers failing to load. The change that exposed the driver bugs was identified as commit 274aff8711b2 ("clk: Add KUnit tests for clks registered with struct clk_parent_data") and we temporarily reverted it in aosp/3287735. The change selects CONFIG_OF_OVERLAY, which selects CONFIG_OF_DYNAMIC. With !OF_DYNAMIC, ref-counting on of_node_get() and of_node_put() is effectively disabled as they are implemented as noops. Enabling OF_DYNAMIC again exposes issues in drivers wrongly implementing ref-counting for struct device_node. This causes various types of errors like memory corruption, use-after-free, and memory leaks.
    • All uses of OF parsing related APIs must be inspected. The following list is partial, but contains cases we have been observing:
      • Use after free (UAF):
        • Reuse of the same device_node argument: Those functions call of_node_put() on the node given, potentially need to add an of_node_get() before calling them (for example, when calling repeatedly with the same node as argument):
          • of_find_compatible_node()
          • of_find_node_by_name()
          • of_find_node_by_path()
          • of_find_node_by_type()
          • of_get_next_cpu_node()
          • of_get_next_parent()
          • of_get_next_child()
          • of_get_next_available_child()
          • of_get_next_reserved_child()
          • of_find_node_with_property()
          • of_find_matching_node_and_match()
        • Use of device_node after any type of exit from certain loops:
          • for_each_available_child_of_node_scoped()
          • for_each_available_child_of_node()
          • for_each_child_of_node_scoped()
          • for_each_child_of_node()
        • Keeping direct pointers to char * properties from device_node around, for example, using:
          • const char *foo = struct device_node::name
          • of_property_read_string()
          • of_property_read_string_array()
          • of_property_read_string_index()
          • of_get_property()
      • Memory leaks:
        • Getting a device_node and forgetting to unref it (of_node_put()). Nodes returned from these need to be freed at some point:
          • of_find_compatible_node()
          • of_find_node_by_name()
          • of_find_node_by_path()
          • of_find_node_by_type()
          • of_find_node_by_phandle()
          • of_parse_phandle()
          • of_find_node_opts_by_path()
          • of_get_next_cpu_node()
          • of_get_compatible_child()
          • of_get_child_by_name()
          • of_get_parent()
          • of_get_next_parent()
          • of_get_next_child()
          • of_get_next_available_child()
          • of_get_next_reserved_child()
          • of_find_node_with_property()
          • of_find_matching_node_and_match()
      • Keeping a device_node from a loop iteration. If you're returning or breaking from within the following, you need to drop the remaining reference at some point:
        • for_each_available_child_of_node()
        • for_each_child_of_node()
        • for_each_node_by_type()
        • for_each_compatible_node()
        • of_for_each_phandle()
    • The earlier mentioned change was restored while landing Linux 6.12-rc4 (see aosp/3315251) enabling CONFIG_OF_DYNAMIC again and potentially exposing faulty drivers.