Optimize pixel blending with integer arithmetics

### Problem
The current `float` and `Vector4` -based pixel blender API does not give us too much space for introducing the rasterization perf improvements needed for SixLabors/ImageSharp.Drawing#102. With our current bulk API, the maximum we can do is to process 2 pixels in one AVX batch, since we can fit only 8 `float`-s into one AVX register.  This means that the expected speedup for blending is around or below 2x. With this we would keep lagging behind Skia and GDI significantly.

### Idea
We should explore API-s and implementations working with `UInt16`-based fixed point arithmetics. This is technically very similar to approach taken by the libjpeg decoder SIMD pipelines which we eventually also want to adapt. In theory, `UInt16`-based bulk processing should reduce the time spent in pixel blenders by ~4x (or more) when AVX2 is present.

This will require API additions similar to the following:
```C#
public abstract class PixelBlender<TPixel> 
{
    public void Blend<TPixelSrc>(
            Configuration configuration,
            Span<TPixel> destination,
            ReadOnlySpan<TPixel> background,
            ReadOnlySpan<TPixelSrc> source,
            // 'amount' is scaled to 0-255. Could be byte, but with UInt16 we will avoid some unnecessary conversions
            ReadOnlySpan<UInt16> amount); 

    protected virtual BlendFunction(
            Configuration configuration,
            Span<Rgba32> destination,
            ReadOnlySpan<Rgba32> background,
            ReadOnlySpan<Rgba32> source,
            ReadOnlySpan<UInt16> amount);
}


public static class PorterDuffFunctions
{
    /*public*/ static Vector4 NormalSrcOver(Span<Rgba32> destination,
            ReadOnlySpan<Rgba32> background,
            ReadOnlySpan<Rgba32> source,
            ReadOnlySpan<UInt16> opacity);
}
```

**Update:** 
In the first API variant there was a type `ScaledUInt16Vector4`, but after thinking it through, I realized it is unnecessary. We should work with `Rgba32` to maximize perf.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize pixel blending with integer arithmetics #1433

Problem

Idea

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Optimize pixel blending with integer arithmetics #1433

Description

Problem

Idea

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions