-
Notifications
You must be signed in to change notification settings - Fork 75
Closed
Feature
Copy link
Description
We have implemented feature #5272 to collapse 3-dim loads on block ptrs into 2-dim loads when the tensor loaded has outermost dimension equal to 1. This feature compliments feature #5272. The goal here is to remove unnecessary boundaryCheck indexes on load operations. For example consider the load in the loop below:
// offset_3: 0, 64, 128, ..., 960 , max(offset_3) = 960
%acc_32:3 = scf.for %offset_3 = %9 to %10 step %11 iter_args(%m_i_36 = %m_i_20, %l_i_37 = %l_i_21, %acc_38 = %acc_22) -> (tensor<512xf32>, tensor<512xf32>, tensor<512x64xf32>) : i32 {
%indices_3 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32>
%indices_3_39 = tt.splat %offset_3 : i32 -> tensor<64xi32>
%indices_3_40 = arith.addi %indices_3_39, %indices_3 : tensor<64xi32>
%mask_4 = arith.constant 1024 : i32
%mask_4_41 = arith.constant dense<1024> : tensor<64xi32>
%mask_4_42 = arith.cmpi slt, %indices_3_40, %mask_4_41 : tensor<64xi32>
%k = arith.constant 512 : i64
%k_43 = arith.constant 64 : i64
%k_44 = arith.constant 1024 : i64
%k_45 = arith.constant 65536 : i64
%k_46 = arith.constant 1 : i64
%k_47 = arith.constant 64 : i64
%k_48 = arith.constant 0 : i32
%k_49 = tt.make_tensor_ptr %k_view, [%k, %k_43, %k_44], [%k_45, %k_46, %k_47], [%offset_5, %k_48, %offset_3] {order = array<i32: 2, 0, 1>} : <tensor<1x64x64xf16>>
%k_50 = tt.load %k_49 {boundaryCheck = array<i32: 1, 2>, padding = 1 : i32} : !tt.ptr<tensor<1x64x64xf16>>
The load (%k_50) has boundaryCheck on dim 2 is not necessary because:
%k_49is never modified (the pointer is not advanced in the loop)- the range of the loop IV
%offfset_3is [0, 64, 128, ..., 960] - the boundary check expression
max(%offset_3) + load_res.getType()[2] -1 < shape_of_k_49[2]is always true (960 + 64 - 1 < 1024)