Skip to content

[Helion]: Develop new optimization to detect unnecessary boundaryCheck on load operations #5335

@etiotto

Description

@etiotto

We have implemented feature #5272 to collapse 3-dim loads on block ptrs into 2-dim loads when the tensor loaded has outermost dimension equal to 1. This feature compliments feature #5272. The goal here is to remove unnecessary boundaryCheck indexes on load operations. For example consider the load in the loop below:

         // offset_3: 0, 64, 128, ..., 960 , max(offset_3) = 960
        %acc_32:3 = scf.for %offset_3 = %9 to %10 step %11 iter_args(%m_i_36 = %m_i_20, %l_i_37 = %l_i_21, %acc_38 = %acc_22) -> (tensor<512xf32>, tensor<512xf32>, tensor<512x64xf32>)  : i32 {
          %indices_3 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32>
          %indices_3_39 = tt.splat %offset_3 : i32 -> tensor<64xi32>
          %indices_3_40 = arith.addi %indices_3_39, %indices_3 : tensor<64xi32>
          %mask_4 = arith.constant 1024 : i32
          %mask_4_41 = arith.constant dense<1024> : tensor<64xi32>
          %mask_4_42 = arith.cmpi slt, %indices_3_40, %mask_4_41 : tensor<64xi32>
          %k = arith.constant 512 : i64
          %k_43 = arith.constant 64 : i64
          %k_44 = arith.constant 1024 : i64
          %k_45 = arith.constant 65536 : i64
          %k_46 = arith.constant 1 : i64
          %k_47 = arith.constant 64 : i64
          %k_48 = arith.constant 0 : i32
          %k_49 = tt.make_tensor_ptr %k_view, [%k, %k_43, %k_44], [%k_45, %k_46, %k_47], [%offset_5, %k_48, %offset_3] {order = array<i32: 2, 0, 1>} : <tensor<1x64x64xf16>>
          %k_50 = tt.load %k_49 {boundaryCheck = array<i32: 1, 2>, padding = 1 : i32} : !tt.ptr<tensor<1x64x64xf16>>
     
   

The load (%k_50) has boundaryCheck on dim 2 is not necessary because:

  • %k_49 is never modified (the pointer is not advanced in the loop)
  • the range of the loop IV %offfset_3 is [0, 64, 128, ..., 960]
  • the boundary check expression max(%offset_3) + load_res.getType()[2] -1 < shape_of_k_49[2] is always true (960 + 64 - 1 < 1024)

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions