This repository hosts the Berkeley SoftFloat port for float8 support for PULP.
This documentation needs to be expanded. Check README.html for the original SoftFloat documentation. There are only few additions to what is there.
For now, running make in the root directory is all you need to do to build SoftFloat on a Linux x86 system. No optimizations (x86-64, SSE2 etc.) have been ported yet.
Two defines have been created that impact how the float8 operations perform. Define one of the following:
-DSOFTFLOAT_FAST_DIV16TO8if your architecture supports reasonably fast uint16 by uint8 divisions.-DSOFTFLOAT_LUT_DIV8to pull float8 division results directly from a LUT. This is most likely the fastest option since there's only 16 entries.- None of the above to pull the reciprocal value of the second operand from a LUT which is then multiplied with the first one.
Additionally to all types and functions in vanilla SoftFloat, there are following additions:
- The type
float8_tfor float8 numbers. Its actual size in memory is 8 bits. - All functions in SoftFloat can also be performed on float8. Most functions have the prefix
f8_.
Check softfloat.h for a complete listing of available functions.
In order to use SoftFloat, include softfloat.h from the source/include directory in your source file. When linking, provide softfloat.a from the build/<target> directory (after having built it there once).
A C++ wrapper is provided in order to make use of operator overloading to swap out types without modifying functional source code. It provides:
- A templated
softfloatclass wrapping the C types. It can be constructed- from the SoftFloat C types,
- from C-native floating-point types (
float,double,long double), - from another
softfloatobject (by explicitly or implicitly casting), - through explicit casting from a castable integer type.
- Overloaded casts from the
softfloatclass to- C-native floating-point types (
float,double,long double), - SoftFloat C types. Thus, any C function in SoftFloat can be run on a
smallfloatobject as well.
- C-native floating-point types (
- The following types that are specializations of the
softfloatclass for convenience:float8float16float32float64extFloat80float128
- Overloaded functions for:
- Arithmetic operators (
+,-,*,/) - Relational operators (
==,!=,>,<,>=,<=) - Compound assignment operators (
+=,-=,*=,/=)
- Arithmetic operators (
In order to use SoftFloat in C++, include softfloat.hpp from the source/includedirectory in your source file. When linking, provide softfloat.a from the build/<target> directory (after having built it there once).
A specific type is provided to enable the analisys of custom floating-point types. Flexfloat values have this format: sign(1 bit)+exponent(E bits)+mantissa(M bits). E and M characterize the variable precision. This format is compliant with IEEE formats, i.e., the encoding of exponents includes a bias and the mantissa representation assumes an implicit 1 bit.
- C interface (flexfloat.h): "flexfloat_t" type and related functions
- C++ interface (flexfloat.hpp): "flexfloat" template class and its methods
- The float8 remainder function,
f8_rem, is currently not implemented and always returns positive zero. - The overloaded relational C++ operators
>and>=will returntrueonNaNinputs instead offalse.