Skip to content

[Clang] Emit LLVM flatten attribute instead of per-callsite alwaysinline#188615

Open
grigorypas wants to merge 3 commits intollvm:mainfrom
grigorypas:change_clang_flatten
Open

[Clang] Emit LLVM flatten attribute instead of per-callsite alwaysinline#188615
grigorypas wants to merge 3 commits intollvm:mainfrom
grigorypas:change_clang_flatten

Conversation

@grigorypas
Copy link
Member

Follow-up to #174899 which added the flatten function attribute to LLVM IR and implemented recursive inlining in the AlwaysInliner pass.

This patch updates Clang to emit the LLVM flatten attribute on functions with __attribute__((flatten)), instead of the previous approach of marking each call site with alwaysinline. This completes the transition to matching GCC's flatten semantics.

Changes:

  • Remove the callsite alwaysinline annotation logic from CGCall.cpp
  • Emit the flatten function attribute in CodeGenModule.cpp
  • Update clang/test/CodeGen/flatten.c to reflect the new IR output
  • Update clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c to reflect the new behavior
  • Add release notes documenting the behavior change

RFC: https://discourse.llvm.org/t/rfc-function-level-flatten-depth-attribute-for-depth-limited-inlining

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Mar 25, 2026
@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2026

@llvm/pr-subscribers-clang-codegen

Author: Grigory Pastukhov (grigorypas)

Changes

Follow-up to #174899 which added the flatten function attribute to LLVM IR and implemented recursive inlining in the AlwaysInliner pass.

This patch updates Clang to emit the LLVM flatten attribute on functions with __attribute__((flatten)), instead of the previous approach of marking each call site with alwaysinline. This completes the transition to matching GCC's flatten semantics.

Changes:

  • Remove the callsite alwaysinline annotation logic from CGCall.cpp
  • Emit the flatten function attribute in CodeGenModule.cpp
  • Update clang/test/CodeGen/flatten.c to reflect the new IR output
  • Update clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c to reflect the new behavior
  • Add release notes documenting the behavior change

RFC: https://discourse.llvm.org/t/rfc-function-level-flatten-depth-attribute-for-depth-limited-inlining


Full diff: https://github.com/llvm/llvm-project/pull/188615.diff

5 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+10)
  • (modified) clang/lib/CodeGen/CGCall.cpp (-11)
  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+3)
  • (modified) clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c (+81-34)
  • (modified) clang/test/CodeGen/flatten.c (+3-14)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 0dbe667e4f07a..23d9fb246977a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -221,6 +221,16 @@ Attribute Changes in Clang
   foreign language personality with a given function. Note that this does not
   perform any ABI validation for the personality routine.
 
+- The ``__attribute__((flatten))`` attribute behavior has changed to match
+  GCC. Previously, Clang only inlined direct callees of the attributed
+  function. Now, all calls are inlined transitively, including calls
+  introduced by inlining. Calls that cannot be inlined are left as-is:
+  this includes callees marked ``noinline``, callees with incompatible ABI
+  attributes (e.g. SME), callees without a visible definition, and
+  recursive calls where a function already appears in the inlining chain.
+  Flatten also works across ThinLTO module boundaries when callee
+  definitions are available.
+
 - The :doc:`ThreadSafetyAnalysis` attributes ``guarded_by`` and
   ``pt_guarded_by`` now accept multiple capability arguments with refined
   access semantics: *writing* requires all listed capabilities to be held
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 4c0ea9ec3ea9c..5470f33d1ec7e 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5968,17 +5968,6 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   // Apply some call-site-specific attributes.
   // TODO: work this into building the attribute set.
 
-  // Apply always_inline to all calls within flatten functions.
-  // FIXME: should this really take priority over __try, below?
-  if (CurCodeDecl && CurCodeDecl->hasAttr<FlattenAttr>() &&
-      !InNoInlineAttributedStmt &&
-      !(TargetDecl && TargetDecl->hasAttr<NoInlineAttr>()) &&
-      !CGM.getTargetCodeGenInfo().wouldInliningViolateFunctionCallABI(
-          CallerDecl, CalleeDecl)) {
-    Attrs =
-        Attrs.addFnAttribute(getLLVMContext(), llvm::Attribute::AlwaysInline);
-  }
-
   // Disable inlining inside SEH __try blocks.
   if (isSEHTryScope()) {
     Attrs = Attrs.addFnAttribute(getLLVMContext(), llvm::Attribute::NoInline);
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 3fcd6f5f904db..090ff06e1c555 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2946,6 +2946,9 @@ void CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D,
   if (CodeGenOpts.DisableOutlining || D->hasAttr<NoOutlineAttr>())
     B.addAttribute(llvm::Attribute::NoOutline);
 
+  if (D->hasAttr<FlattenAttr>())
+    B.addAttribute(llvm::Attribute::Flatten);
+
   F->addFnAttrs(B);
 
   llvm::MaybeAlign ExplicitAlignment;
diff --git a/clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c b/clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c
index 2071e66e0d652..baa02926d9bf7 100644
--- a/clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c
+++ b/clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c
@@ -1,5 +1,5 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme -target-feature +sme2 %s -DUSE_FLATTEN -o - | FileCheck %s
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme -target-feature +sme2 %s -DUSE_ALWAYS_INLINE_STMT -o - | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme -target-feature +sme2 %s -DUSE_FLATTEN -o - | FileCheck %s --check-prefix=CHECK-FLATTEN
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme -target-feature +sme2 %s -DUSE_ALWAYS_INLINE_STMT -o - | FileCheck %s --check-prefix=CHECK-ALWAYS-INLINE
 
 // REQUIRES: aarch64-registered-target
 
@@ -31,14 +31,26 @@ void caller(void) {
     STMT_ATTR fn_streaming_new_za();
     STMT_ATTR fn_streaming_new_zt0();
 }
-// CHECK-LABEL: void @caller()
-//  CHECK-NEXT: entry:
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @fn_streaming
-//  CHECK-NEXT:   call void @fn_locally_streaming
-//  CHECK-NEXT:   call void @fn_streaming_new_za
-//  CHECK-NEXT:   call void @fn_streaming_new_zt0
+// For flatten: fn() and fn_streaming_compatible() are inlined, streaming functions
+// are blocked by TTI (non-streaming caller), new_za/new_zt0 are always blocked.
+// CHECK-FLATTEN-LABEL: void @caller()
+//  CHECK-FLATTEN-NEXT: entry:
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming
+//  CHECK-FLATTEN-NEXT:   call void @fn_locally_streaming
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_za
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_zt0
+
+// For always_inline: Clang's wouldInliningViolateFunctionCallABI controls.
+// CHECK-ALWAYS-INLINE-LABEL: void @caller()
+//  CHECK-ALWAYS-INLINE-NEXT: entry:
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_locally_streaming
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_za
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_zt0
 
 FN_ATTR void caller_streaming_compatible(void) __arm_streaming_compatible {
     STMT_ATTR fn();
@@ -48,14 +60,26 @@ FN_ATTR void caller_streaming_compatible(void) __arm_streaming_compatible {
     STMT_ATTR fn_streaming_new_za();
     STMT_ATTR fn_streaming_new_zt0();
 }
-// CHECK-LABEL: void @caller_streaming_compatible()
-//  CHECK-NEXT: entry:
-//  CHECK-NEXT:   call void @fn
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @fn_streaming
-//  CHECK-NEXT:   call void @fn_locally_streaming
-//  CHECK-NEXT:   call void @fn_streaming_new_za
-//  CHECK-NEXT:   call void @fn_streaming_new_zt0
+// For flatten: TTI allows inlining fn(), fn_streaming_compatible(), fn_streaming(),
+// fn_locally_streaming() because they don't have incompatible ops. Only new_za/new_zt0 blocked.
+// CHECK-FLATTEN-LABEL: void @caller_streaming_compatible()
+//  CHECK-FLATTEN-NEXT: entry:
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_za
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_zt0
+
+// For always_inline: Clang blocks fn() (streaming-compatible caller, non-streaming callee).
+// CHECK-ALWAYS-INLINE-LABEL: void @caller_streaming_compatible()
+//  CHECK-ALWAYS-INLINE-NEXT: entry:
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_locally_streaming
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_za
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_zt0
 
 FN_ATTR void caller_streaming(void) __arm_streaming {
     STMT_ATTR fn();
@@ -65,14 +89,26 @@ FN_ATTR void caller_streaming(void) __arm_streaming {
     STMT_ATTR fn_streaming_new_za();
     STMT_ATTR fn_streaming_new_zt0();
 }
-// CHECK-LABEL: void @caller_streaming()
-//  CHECK-NEXT: entry:
-//  CHECK-NEXT:   call void @fn
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @fn_streaming_new_za
-//  CHECK-NEXT:   call void @fn_streaming_new_zt0
+// For flatten: TTI allows all except new_za/new_zt0. fn() is inlined because
+// streaming caller can execute non-streaming callee's code (no incompatible ops).
+// CHECK-FLATTEN-LABEL: void @caller_streaming()
+//  CHECK-FLATTEN-NEXT: entry:
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_za
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_zt0
+
+// For always_inline: Clang blocks fn() (streaming caller, non-streaming callee).
+// CHECK-ALWAYS-INLINE-LABEL: void @caller_streaming()
+//  CHECK-ALWAYS-INLINE-NEXT: entry:
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_za
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_zt0
 
 FN_ATTR __arm_locally_streaming
 void caller_locally_streaming(void) {
@@ -83,11 +119,22 @@ void caller_locally_streaming(void) {
     STMT_ATTR fn_streaming_new_za();
     STMT_ATTR fn_streaming_new_zt0();
 }
-// CHECK-LABEL: void @caller_locally_streaming()
-//  CHECK-NEXT: entry:
-//  CHECK-NEXT:   call void @fn
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @fn_streaming_new_za
-//  CHECK-NEXT:   call void @fn_streaming_new_zt0
+// For flatten: Similar to caller_streaming - TTI allows all except new_za/new_zt0.
+// CHECK-FLATTEN-LABEL: void @caller_locally_streaming()
+//  CHECK-FLATTEN-NEXT: entry:
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_za
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_zt0
+
+// For always_inline: Clang blocks fn().
+// CHECK-ALWAYS-INLINE-LABEL: void @caller_locally_streaming()
+//  CHECK-ALWAYS-INLINE-NEXT: entry:
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_za
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_zt0
diff --git a/clang/test/CodeGen/flatten.c b/clang/test/CodeGen/flatten.c
index 4e762223de486..3d3b5928c366d 100644
--- a/clang/test/CodeGen/flatten.c
+++ b/clang/test/CodeGen/flatten.c
@@ -1,19 +1,8 @@
-// RUN: %clang_cc1 -triple=x86_64-linux-gnu %s -emit-llvm -o - | FileCheck %s
-
-void f(void) {}
-
-__attribute__((noinline)) void ni(void) {}
+// RUN: %clang_cc1 -triple=x86_64-linux-gnu -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s
 
+// CHECK: define{{.*}} void @g() [[FLATTEN_ATTR:#[0-9]+]]
 __attribute__((flatten))
-// CHECK: define{{.*}} void @g()
 void g(void) {
-  // CHECK-NOT: call {{.*}} @f
-  f();
-  // CHECK: call {{.*}} @ni
-  ni();
 }
 
-void h(void) {
-  // CHECK: call {{.*}} @f
-  f();
-}
+// CHECK: attributes [[FLATTEN_ATTR]] = {{{.*}}flatten{{.*}}}

@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2026

@llvm/pr-subscribers-clang

Author: Grigory Pastukhov (grigorypas)

Changes

Follow-up to #174899 which added the flatten function attribute to LLVM IR and implemented recursive inlining in the AlwaysInliner pass.

This patch updates Clang to emit the LLVM flatten attribute on functions with __attribute__((flatten)), instead of the previous approach of marking each call site with alwaysinline. This completes the transition to matching GCC's flatten semantics.

Changes:

  • Remove the callsite alwaysinline annotation logic from CGCall.cpp
  • Emit the flatten function attribute in CodeGenModule.cpp
  • Update clang/test/CodeGen/flatten.c to reflect the new IR output
  • Update clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c to reflect the new behavior
  • Add release notes documenting the behavior change

RFC: https://discourse.llvm.org/t/rfc-function-level-flatten-depth-attribute-for-depth-limited-inlining


Full diff: https://github.com/llvm/llvm-project/pull/188615.diff

5 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+10)
  • (modified) clang/lib/CodeGen/CGCall.cpp (-11)
  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+3)
  • (modified) clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c (+81-34)
  • (modified) clang/test/CodeGen/flatten.c (+3-14)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 0dbe667e4f07a..23d9fb246977a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -221,6 +221,16 @@ Attribute Changes in Clang
   foreign language personality with a given function. Note that this does not
   perform any ABI validation for the personality routine.
 
+- The ``__attribute__((flatten))`` attribute behavior has changed to match
+  GCC. Previously, Clang only inlined direct callees of the attributed
+  function. Now, all calls are inlined transitively, including calls
+  introduced by inlining. Calls that cannot be inlined are left as-is:
+  this includes callees marked ``noinline``, callees with incompatible ABI
+  attributes (e.g. SME), callees without a visible definition, and
+  recursive calls where a function already appears in the inlining chain.
+  Flatten also works across ThinLTO module boundaries when callee
+  definitions are available.
+
 - The :doc:`ThreadSafetyAnalysis` attributes ``guarded_by`` and
   ``pt_guarded_by`` now accept multiple capability arguments with refined
   access semantics: *writing* requires all listed capabilities to be held
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 4c0ea9ec3ea9c..5470f33d1ec7e 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5968,17 +5968,6 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   // Apply some call-site-specific attributes.
   // TODO: work this into building the attribute set.
 
-  // Apply always_inline to all calls within flatten functions.
-  // FIXME: should this really take priority over __try, below?
-  if (CurCodeDecl && CurCodeDecl->hasAttr<FlattenAttr>() &&
-      !InNoInlineAttributedStmt &&
-      !(TargetDecl && TargetDecl->hasAttr<NoInlineAttr>()) &&
-      !CGM.getTargetCodeGenInfo().wouldInliningViolateFunctionCallABI(
-          CallerDecl, CalleeDecl)) {
-    Attrs =
-        Attrs.addFnAttribute(getLLVMContext(), llvm::Attribute::AlwaysInline);
-  }
-
   // Disable inlining inside SEH __try blocks.
   if (isSEHTryScope()) {
     Attrs = Attrs.addFnAttribute(getLLVMContext(), llvm::Attribute::NoInline);
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 3fcd6f5f904db..090ff06e1c555 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2946,6 +2946,9 @@ void CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D,
   if (CodeGenOpts.DisableOutlining || D->hasAttr<NoOutlineAttr>())
     B.addAttribute(llvm::Attribute::NoOutline);
 
+  if (D->hasAttr<FlattenAttr>())
+    B.addAttribute(llvm::Attribute::Flatten);
+
   F->addFnAttrs(B);
 
   llvm::MaybeAlign ExplicitAlignment;
diff --git a/clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c b/clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c
index 2071e66e0d652..baa02926d9bf7 100644
--- a/clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c
+++ b/clang/test/CodeGen/AArch64/sme-inline-callees-streaming-attrs.c
@@ -1,5 +1,5 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme -target-feature +sme2 %s -DUSE_FLATTEN -o - | FileCheck %s
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme -target-feature +sme2 %s -DUSE_ALWAYS_INLINE_STMT -o - | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme -target-feature +sme2 %s -DUSE_FLATTEN -o - | FileCheck %s --check-prefix=CHECK-FLATTEN
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -target-feature +sme -target-feature +sme2 %s -DUSE_ALWAYS_INLINE_STMT -o - | FileCheck %s --check-prefix=CHECK-ALWAYS-INLINE
 
 // REQUIRES: aarch64-registered-target
 
@@ -31,14 +31,26 @@ void caller(void) {
     STMT_ATTR fn_streaming_new_za();
     STMT_ATTR fn_streaming_new_zt0();
 }
-// CHECK-LABEL: void @caller()
-//  CHECK-NEXT: entry:
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @fn_streaming
-//  CHECK-NEXT:   call void @fn_locally_streaming
-//  CHECK-NEXT:   call void @fn_streaming_new_za
-//  CHECK-NEXT:   call void @fn_streaming_new_zt0
+// For flatten: fn() and fn_streaming_compatible() are inlined, streaming functions
+// are blocked by TTI (non-streaming caller), new_za/new_zt0 are always blocked.
+// CHECK-FLATTEN-LABEL: void @caller()
+//  CHECK-FLATTEN-NEXT: entry:
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming
+//  CHECK-FLATTEN-NEXT:   call void @fn_locally_streaming
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_za
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_zt0
+
+// For always_inline: Clang's wouldInliningViolateFunctionCallABI controls.
+// CHECK-ALWAYS-INLINE-LABEL: void @caller()
+//  CHECK-ALWAYS-INLINE-NEXT: entry:
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_locally_streaming
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_za
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_zt0
 
 FN_ATTR void caller_streaming_compatible(void) __arm_streaming_compatible {
     STMT_ATTR fn();
@@ -48,14 +60,26 @@ FN_ATTR void caller_streaming_compatible(void) __arm_streaming_compatible {
     STMT_ATTR fn_streaming_new_za();
     STMT_ATTR fn_streaming_new_zt0();
 }
-// CHECK-LABEL: void @caller_streaming_compatible()
-//  CHECK-NEXT: entry:
-//  CHECK-NEXT:   call void @fn
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @fn_streaming
-//  CHECK-NEXT:   call void @fn_locally_streaming
-//  CHECK-NEXT:   call void @fn_streaming_new_za
-//  CHECK-NEXT:   call void @fn_streaming_new_zt0
+// For flatten: TTI allows inlining fn(), fn_streaming_compatible(), fn_streaming(),
+// fn_locally_streaming() because they don't have incompatible ops. Only new_za/new_zt0 blocked.
+// CHECK-FLATTEN-LABEL: void @caller_streaming_compatible()
+//  CHECK-FLATTEN-NEXT: entry:
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_za
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_zt0
+
+// For always_inline: Clang blocks fn() (streaming-compatible caller, non-streaming callee).
+// CHECK-ALWAYS-INLINE-LABEL: void @caller_streaming_compatible()
+//  CHECK-ALWAYS-INLINE-NEXT: entry:
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_locally_streaming
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_za
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_zt0
 
 FN_ATTR void caller_streaming(void) __arm_streaming {
     STMT_ATTR fn();
@@ -65,14 +89,26 @@ FN_ATTR void caller_streaming(void) __arm_streaming {
     STMT_ATTR fn_streaming_new_za();
     STMT_ATTR fn_streaming_new_zt0();
 }
-// CHECK-LABEL: void @caller_streaming()
-//  CHECK-NEXT: entry:
-//  CHECK-NEXT:   call void @fn
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @fn_streaming_new_za
-//  CHECK-NEXT:   call void @fn_streaming_new_zt0
+// For flatten: TTI allows all except new_za/new_zt0. fn() is inlined because
+// streaming caller can execute non-streaming callee's code (no incompatible ops).
+// CHECK-FLATTEN-LABEL: void @caller_streaming()
+//  CHECK-FLATTEN-NEXT: entry:
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_za
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_zt0
+
+// For always_inline: Clang blocks fn() (streaming caller, non-streaming callee).
+// CHECK-ALWAYS-INLINE-LABEL: void @caller_streaming()
+//  CHECK-ALWAYS-INLINE-NEXT: entry:
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_za
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_zt0
 
 FN_ATTR __arm_locally_streaming
 void caller_locally_streaming(void) {
@@ -83,11 +119,22 @@ void caller_locally_streaming(void) {
     STMT_ATTR fn_streaming_new_za();
     STMT_ATTR fn_streaming_new_zt0();
 }
-// CHECK-LABEL: void @caller_locally_streaming()
-//  CHECK-NEXT: entry:
-//  CHECK-NEXT:   call void @fn
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @was_inlined
-//  CHECK-NEXT:   call void @fn_streaming_new_za
-//  CHECK-NEXT:   call void @fn_streaming_new_zt0
+// For flatten: Similar to caller_streaming - TTI allows all except new_za/new_zt0.
+// CHECK-FLATTEN-LABEL: void @caller_locally_streaming()
+//  CHECK-FLATTEN-NEXT: entry:
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @was_inlined
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_za
+//  CHECK-FLATTEN-NEXT:   call void @fn_streaming_new_zt0
+
+// For always_inline: Clang blocks fn().
+// CHECK-ALWAYS-INLINE-LABEL: void @caller_locally_streaming()
+//  CHECK-ALWAYS-INLINE-NEXT: entry:
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @was_inlined
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_za
+//  CHECK-ALWAYS-INLINE-NEXT:   call void @fn_streaming_new_zt0
diff --git a/clang/test/CodeGen/flatten.c b/clang/test/CodeGen/flatten.c
index 4e762223de486..3d3b5928c366d 100644
--- a/clang/test/CodeGen/flatten.c
+++ b/clang/test/CodeGen/flatten.c
@@ -1,19 +1,8 @@
-// RUN: %clang_cc1 -triple=x86_64-linux-gnu %s -emit-llvm -o - | FileCheck %s
-
-void f(void) {}
-
-__attribute__((noinline)) void ni(void) {}
+// RUN: %clang_cc1 -triple=x86_64-linux-gnu -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s
 
+// CHECK: define{{.*}} void @g() [[FLATTEN_ATTR:#[0-9]+]]
 __attribute__((flatten))
-// CHECK: define{{.*}} void @g()
 void g(void) {
-  // CHECK-NOT: call {{.*}} @f
-  f();
-  // CHECK: call {{.*}} @ni
-  ni();
 }
 
-void h(void) {
-  // CHECK: call {{.*}} @f
-  f();
-}
+// CHECK: attributes [[FLATTEN_ATTR]] = {{{.*}}flatten{{.*}}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants