Skip to content

JIT Hardware Intrinsics low compiler throughput due to inefficient intrinsic identification algorithm during import phase #13617

@4creators

Description

@4creators

During the design phase of JIT Hardware Intrinsics compiler bits one inefficient, temporary algorithm was allowed to be used for searching for intrinsics function names/ids. The naive search algorithm currently used works in O(n) time for each intrinsic imported, while it is possible to do this search in O(1) with very low constant overhead. Usually, this problem is hitting particularly hard in functions that use multiple intrinsics instructions since the search time is equal to t = number_of_intrinsics_used * O(n). This is particularly exacerbated in applications that require fast startup time: cloud lambda functions, command-line tools i.e. PowerShell Core.

The function implementing an algorithm which slipped to the release is the following:

https://github.com/dotnet/coreclr/blob/6de88d4f5d291269f82e3dd1aa39cee026725dfe/src/jit/hwintrinsic.cpp#L186

and the algorithm used:

https://github.com/dotnet/coreclr/blob/6de88d4f5d291269f82e3dd1aa39cee026725dfe/src/jit/hwintrinsic.cpp#L210

Previously discussed solutions included using a hashtable or/and creating fast binary preselection due to the fixed nature of search terms.

@AndyAyersMS @CarolEidt @fiigii @tannergooding

category:implementation
theme:vector-codegen
skill-level:beginner
cost:small
impact:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIin-prThere is an active PR which will close this issue when it is mergedtenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions