Skip to content

Optimize pixel blending with integer arithmetics #1433

@antonfirsov

Description

@antonfirsov

Problem

The current float and Vector4 -based pixel blender API does not give us too much space for introducing the rasterization perf improvements needed for SixLabors/ImageSharp.Drawing#102. With our current bulk API, the maximum we can do is to process 2 pixels in one AVX batch, since we can fit only 8 float-s into one AVX register. This means that the expected speedup for blending is around or below 2x. With this we would keep lagging behind Skia and GDI significantly.

Idea

We should explore API-s and implementations working with UInt16-based fixed point arithmetics. This is technically very similar to approach taken by the libjpeg decoder SIMD pipelines which we eventually also want to adapt. In theory, UInt16-based bulk processing should reduce the time spent in pixel blenders by ~4x (or more) when AVX2 is present.

This will require API additions similar to the following:

public abstract class PixelBlender<TPixel> 
{
    public void Blend<TPixelSrc>(
            Configuration configuration,
            Span<TPixel> destination,
            ReadOnlySpan<TPixel> background,
            ReadOnlySpan<TPixelSrc> source,
            // 'amount' is scaled to 0-255. Could be byte, but with UInt16 we will avoid some unnecessary conversions
            ReadOnlySpan<UInt16> amount); 

    protected virtual BlendFunction(
            Configuration configuration,
            Span<Rgba32> destination,
            ReadOnlySpan<Rgba32> background,
            ReadOnlySpan<Rgba32> source,
            ReadOnlySpan<UInt16> amount);
}


public static class PorterDuffFunctions
{
    /*public*/ static Vector4 NormalSrcOver(Span<Rgba32> destination,
            ReadOnlySpan<Rgba32> background,
            ReadOnlySpan<Rgba32> source,
            ReadOnlySpan<UInt16> opacity);
}

Update:
In the first API variant there was a type ScaledUInt16Vector4, but after thinking it through, I realized it is unnecessary. We should work with Rgba32 to maximize perf.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions