Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@grant-d
Copy link

@grant-d grant-d commented Jan 26, 2019

ExtractBit, InsertBit, ClearBit

[Edited]
This is intended to be the last PR in this series (LZCNT, LOG2, TZCNT, ROTL, POPCNT)
Previously, all the above changes were in this PR; I decided to split them into separate PRs to make CR easier.
This PR is now repurposed to focus on the final set of changes.

There are several implementations of ExtractBit, InsertBit, Clearbit in the stack.

  • This PR consolidates them into a central class
  • The actual implementations are mostly unchanged, ie same standard idioms from original callsites
  • All BitOps methods pass units

The benefits here are:

  • Less proliferation of identical helpers
  • Some optimizations, API consistency
  • Existing code/callsites are more self-documenting
  • New code can use these new pseudo-intrinsics
  • Dependency/profile analysis

All known callers only use byte, int or uint arguments.
We may decide to add long, ulong for consistency.

The know callers are mostly downstream (also see review comments):

cc @tannergooding, @jkotas

@grant-d grant-d changed the title BitOps analysis NOMERGE (WIP) [NO MERGE] BitOps analysis (WIP) Jan 26, 2019
- Change return type to byte
@grant-d
Copy link
Author

grant-d commented Jan 28, 2019

OK @tannergooding per your request I have shaken the tree: removed all unused code, and only left the consolidated methods. We can perf-tune those later on.
Beneath each region I have left the commented-out original call sites for validation.

Let me know if that's suitable, I can make any changes needed.

@tannergooding
Copy link
Member

CC. @GrabYourPitchforks, who I believe was interested in some of this work.

@jkotas
Copy link
Member

jkotas commented Feb 14, 2019

This is adding a bunch of dead methods that are not used from anywhere else in CoreLib (different from previous PRs).

We should have the API review discussion first whether these methods are worth adding to public surface. The value & (1 << bit) != 0 pattern is pretty idiomatic - it is unclear to me whether these methods are useful.

@grant-d
Copy link
Author

grant-d commented Feb 14, 2019

it is unclear to me whether these methods are useful

Makes sense, we can wait for api review.
I am curious - does the JIT turn such idioms into native instructions, eg bts ?

@grant-d
Copy link
Author

grant-d commented Feb 14, 2019

@jkotas would you mind reviewing the single remaining change in this PR, that was a missed optimization from the previous PR?
[Edit] Change moved to separate PR (#22612)

@tannergooding
Copy link
Member

The value & (1 << bit) != 0 pattern is pretty idiomatic - it is unclear to me whether these methods are useful.

I think they are likely useful in the scenario where they can be backed by intrinsics. Whether or not BT, BTS, BTR, and BTC are useful is probably what should be determined.

@jkotas
Copy link
Member

jkotas commented Feb 14, 2019

does the JIT turn such idioms into native instructions, eg bts ?

I do not think it does today. https://github.com/dotnet/coreclr/issues/11471#issue-227453324 tracks it (look for * bt{s|r|c} formation).

@jkotas
Copy link
Member

jkotas commented Feb 14, 2019

BT, BTS, BTR, and BTC are useful is probably what should be determined.

If they are useful, they should be handled as peephole optimization. All code out there is using the idiomatic pattern today, and we should not be asking all of it to be rewritten to do something else to run well.

@grant-d
Copy link
Author

grant-d commented Feb 14, 2019

OK, closing this PR pending API review

@grant-d grant-d closed this Feb 14, 2019
@grant-d grant-d deleted the grant-d.bits branch February 14, 2019 23:40
@tannergooding
Copy link
Member

If they are useful, they should be handled as peephole optimization. All code out there is using the idiomatic pattern today, and we should not be asking all of it to be rewritten to do something else to run well.

I was suggesting that it might be beneficial to both recognize the code pattern and have a standard API for it.

Much like Rotate, it is typical to create a "helper" method for these kind of operations (we've used InsertBit, SetBit, ClearBit, RemoveBit, GetBit, GetFlag, TestBit, FlipBit, and WriteBit ourselves in various dotnet repos). The helper method makes it very clear the intended semantics, at a glance.

@grant-d
Copy link
Author

grant-d commented Feb 14, 2019

The helper method makes it very clear the intended semantics

I agree with that: the callsites I updated in the previous PRs are more self documenting, especially where operator-heavy code is present, like inline rotls.

beneficial to both recognize the code pattern and have a standard API for it
Agree. So we may or may not update the legacy callsites, but either way new code can use the intrinsic.

This also opens the door to analysis - we can then directly measure how many callers use (say) ClearBit both in sourcecode and at runtime (profiling) and thus make a decision to JIT optimize it or not.

@jkotas
Copy link
Member

jkotas commented Feb 15, 2019

Much like Rotate

The difference is that the idiomatic pattern for Rotate is not a simple code. The idiomatic pattern for bit test is compact code (ie the method won't save you much typing).

@tannergooding
Copy link
Member

ie the method won't save you much typing

Right, but the point of the method isn't to save on typing. It is to provide clarity in the code.
var result = SetBit(data, 31) is generally more clear than either var result = data | 0x80000000 or var result = data | (1 << 31). I would guess this is also why we have created helper methods for the cases we are using such a pattern internally.

@jkotas
Copy link
Member

jkotas commented Feb 15, 2019

It is matter of personal preference. I do not think I have ever created/used SetBit and friends methods myself. data | (1 << 31) worked just fine for me.

There are ~500 occurrences of 1 << in CoreLib and even more in CoreFX. I do not see much use of SetBit and friends methods. This suggests that the direct use of this pattern is orders of magnitudes more common than folks wrapping it in helper methods.

@mikedn
Copy link

mikedn commented Feb 15, 2019

The JIT generates BT for a while now. It could generate BTS & co. from code like x | (1 << y) but these instructions have problematic perf characteristics - they're useful on Intels and not so much on AMDs. And the reason why they're useful on Intels is not that these instructions have great performance but rather that variable shift count instructions have bad performance. Also, their memory operand forms are unusable, such that or [mem], reg/imm is preferable.

As such, it's probably more useful to make the JIT use SHLX/SHRX/SARX instead of the traditional shift instructions when available, these are efficient on both AMD and Intel and cover more scenarios.

BTS & co. might still be useful for its "test and set" capability as that saves an additional instruction. But that's more difficult to recognize in the JIT. Perhaps an intrinsic would be useful, though still not extraordinarily valuable.

@grant-d
Copy link
Author

grant-d commented Feb 15, 2019

Here's an example where I personally believe the utility methods make the code more self-documenting and simpler to digest.
The old code is commented out. At a glance I can tell what the new code is doing, vs the old code where I have to read more carefully:

In System.Diagnostics.Tracing.SessionMask

        public bool this[int perEventSourceSessionId]
        {
            get
            {
                return BitOps.ExtractBit(m_mask, perEventSourceSessionId);
                //return (m_mask & (1 << perEventSourceSessionId)) != 0;
            }
            set
            {
                if (value)
                    m_mask = BitOps.InsertBit(m_mask, perEventSourceSessionId);
                    //m_mask |= ((uint)1 << perEventSourceSessionId);
                else
                    m_mask = BitOps.ClearBit(m_mask, perEventSourceSessionId);
                    //m_mask &= ~((uint)1 << perEventSourceSessionId);

                // Or, even better, a one-liner
                BitOps.WriteBit(ref m_mask, perEventSourceSessionId, value);
            }
        }

In XmlQueryType.cs (similar fix, except using the ref based signatures)

            public bool this[int index1, int index2]
            {
                get
                {
                    Debug.Assert(index1 < _bits.Length && index2 < _bits.Length, "Index out of range.");
                    return (_bits[index1] & ((ulong)1 << index2)) != 0;
                }
                set
                {
                    Debug.Assert(index1 < _bits.Length && index2 < _bits.Length, "Index out of range.");
                    if (value == true)
                    {
                        _bits[index1] |= (ulong)1 << index2;
                    }
                    else
                    {
                        _bits[index1] &= ~((ulong)1 << index2);
                    }
                }
            }

@grant-d
Copy link
Author

grant-d commented Feb 15, 2019

[Edited]

I would guess this is also why we have created helper methods for the cases we are using such a pattern internally

It's hard to figure out a narrow search phrase for all these methods, but here's some I came across:

In System.Globalization.TextInfo

private static bool IsWordSeparator(UnicodeCategory category) 
{
    return (c_wordSeparatorMask & (1 << (int) category)) != 0;
}

In XmlSubtreeReader.cs

private bool InAttributeActiveState
{
    get
    {
#if DEBUG
        Debug.Assert(0 == (AttributeActiveStates & (1 << (int)State.Initial)));
        Debug.Assert(0 != (AttributeActiveStates & (1 << (int)State.Interactive)));
        // elided 9 more, which ~might be easier understood as:
        Debug.Assert(BitOps.ExtractBit(AttributeActiveStates, (int)State.Interactive));
#endif
        return 0 != (AttributeActiveStates & (1 << (int)_state));
    }
}

In XmlSerializationReader.cs

private static bool IsTextualNode(XmlNodeType nodeType)
{
    return 0 != (s_isTextualNodeBitmap & (1 << (int)nodeType));
}

In OptimizerPatterns.cs

        public bool MatchesPattern(OptimizerPatternName pattern)
        {
            Debug.Assert(Enum.IsDefined(typeof(OptimizerPatternName), pattern));
            return (_patterns & (1 << (int)pattern)) != 0;
        }

In XmlNavigatorFilter.cs

        public override bool IsFiltered(XPathNavigator navigator)
        {
            return ((1 << (int)navigator.NodeType) & _mask) == 0;
        }

In DefaultBinder.CanConvert.cs

        private static bool CanPrimitiveWiden(Type source, Type target)
        {
            Primitives widerCodes = s_primitiveConversions[(int)(Type.GetTypeCode(source))];
            Primitives targetCode = (Primitives)(1 << (int)(Type.GetTypeCode(target)));

            return 0 != (widerCodes & targetCode);
        }

In Utf8JsonWriter.cs
(I think there were several like this, I didn't think to capture them earlier

        private void SetFlagToAddListSeparatorBeforeNextItem()
        {
            // Surprising this is not manually inlined, but it supports the point about helpers
            _currentDepth |= 1 << 31;
        }

2 more in Roslyn
3 more in IOT
3 meandering ones in WPF
2 more in AD

@jkotas
Copy link
Member

jkotas commented Feb 15, 2019

Several more in Roslyn
And more in IOT
Some meandering ones in WPF
Some more in AD

What's interesting about these examples is that there are no two helpers that are the same. Everybody customizes them for their usage pattern. Here are the variations from these 4 examples.

uint InsertBit(int position, uint bits)
uint RemoveBit(int position, uint bits)
void ClearBit(ref byte data, int bitNumber) // throws when the bit number is invalid
void SetBit(ref byte data, int bitNumber) // throws when the bit number is invalid
bool GetBit(byte data, int bitNumber) // throws when the bit number is invalid
bool? GetFlag(int bitMask, int bitToCheck)
void SetFlag(ref int bitMask, int bitToSet, bool value) // Interlocked operation
void SetBit(ref int flags, int mask) // Interlocked operation
void ClearBit(ref int value, uint bitmask)
void SetBit(ref int value, uint bitmask)

@grant-d
Copy link
Author

grant-d commented Feb 15, 2019

[Edited - also edited previous message, added more inline examples]
Interesting, thanks for the analysis.
So we would not be able to replace all the callsites:

  • 7 inline examples above - we could do 4 of these
  • 2 more in Roslyn - we could do these (and assert from their callsites)
  • 3 more in IOT - per your analysis, probably can't do these
  • 3 meandering ones in WPF - cannot do
  • 2 more in AD - we could do these

However, it does support there being a use-case for helpers - devs don't always want to inline the algebra.
The onus would be to find common-enough signatures that fits most use-cases.

For reference, here's the original PR signatures:

// Overloads exist for byte|uint|int

public static bool ExtractBit(byte value, int bitOffset); // Used in SessionMask example above

public static bool WriteBit(ref byte value, int bitOffset, bool on); // Used in SessionMask
public static byte WriteBit(byte value, int bitOffset, bool on);

public static bool InsertBit(ref byte value, int bitOffset);
public static byte InsertBit(byte value, int bitOffset); // Used in SessionMask

// ClearBit, ComplementBit similar

@jkotas
Copy link
Member

jkotas commented Feb 15, 2019

there being a use-case for helpers - devs don't always want to inline the algebra.

When majority of cases do inline the algebra, is it worth having an API for small minority of cases in the framework?

It feels similar to the situation with "throw helpers". Majority of code throws exceptions directly. Some folks like to have helper methods for throwing exceptions, but there is a lot of variety, styles and trade-offs. We had API suggestions for introducing a helper methods for this in the framework (like https://github.com/dotnet/corefx/issues/17068). None of them made it because of the use case is not significant enough, and it is easy enough for folks who likes to have these to have their own local copy that they like.

@grant-d
Copy link
Author

grant-d commented Feb 15, 2019

Fair point, I agree with you on throw helpers.
If we focus on ExtractBit or InsertBit then the idioms are simple enough that I see your point, but for something like the below methods, I think they self-document easier than the math:

// Devs would likely need to unit this one to make sure it's right
public static uint ComplementBit(uint value, int bitOffset)
    =>  ~(~(1u << bitOffset) ^ value); 

// Easy once you know the idiom but at a glance it certainly doesn't say `BTR`
public static uint ClearBit(uint value, int bitOffset)
    => value & ~(1u << bitOffset);

// True BTR behavior, ie returns original value
public static bool ClearBit(ref uint value, int bitOffset)
{
    uint mask = 1u << bitOffset;
    bool btr = (value & mask) != 0;
    value &= ~mask;
    return btr;
}

@NickStrupat
Copy link

@grant-d Hi Grant,

I'm hunting for the latest version of your BitOps.cs with all the goodies in it. Could you be so kind as to point me to it?

@benaadams
Copy link
Member

benaadams commented Sep 11, 2019

@NickStrupat
Copy link

NickStrupat commented Sep 11, 2019

@benaadams ah yes, thanks! I saw that the other day, but I'm looking for the unapproved ones like Iff, Clear, Set, etc.

I'm working on a BitSpan which wraps Span<Byte> + starting bit offset and bit length.

@grant-d
Copy link
Author

grant-d commented Sep 11, 2019

This might also help:
https://github.com/k2workflow/Clay/blob/526271f8349be49ac0dbbd8be77750bbf950c4e2/src/SourceCode.Clay/Numerics/BitOps.cs
I can't vouch for correctness or perf on any of this, but it has (at least) a bunch of useful ideas.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants