in ,

oprecomp / FloatX, Hacker News

oprecomp / FloatX, Hacker News



NOTE: This project is under active development, and is not yet ready for use!

FloatX is a header-only C library which extends floating point types beyond the native single and double (and on some hardware half) precision types. It Provides template types which allow the user to select the number of bits used for the exponent and significand parts of the floating point number. The idea of ​​FloatX is based on the FlexFloat library, but, instead of implementing the functionality in C and providing C wrappers, FloatX is written completely in C , which makes it more natural to the end user. In addition, FloatX provides a superset of FlexFloat’s functionalities, and achieves higher performance.

This section lists the functionalities provided by FloatX. Functionalities that are also provided by FlexFloat have (flexfloat) appended to the description. In addition, functionalities that are planned, but are not yet implemented are also listed and have (TODO) appended.

header-only library, without a compiled component, and heavy inlineing, resulting in relatively high performance

  • floatxclass template , which allows emulation of non-native types with
  • exp_bitsexponent bits and
  • sig_bitssignificand bits using a natively supported
  • backend_float
  • type to perform arithmetic operations (flexfloat- provides a similar functionality in the C wrapper, but the memory consumption of the flexfloat C class is suboptimal. Additionally, the only supported backend native types are double and softfloat float 72)

  • floatxr
  • class template, which provides the same functionality asfloatx, but allows to change the precision of the type at runtime. This class is easier to experiment with, but is not as efficient asfloatxin both the performance, as well as the memory consumption. (It's performance and memory consumption can be compared to that of the types provided by flexfloat) (flexfloat- provides this in the C library, but not in the C wrapper)
  • conversions between builtin types and floatx(flexfloat- has a bug where NaN can be cast to Inf during conversion
  • assignments on(floatx) and
  • floatxrtypes (flexfloat)relational operations onfloatxand floatxrtypes (flexfloat- does not handle NaN properly)
    • relational operations between different types
    • arithmetic operations on
    • floatx

    and (floatxr) ************************ (types)flexfloat

  • arithmetic operations between different types with implicit type promotion
    • std :: ostream & operator(flexfloat
    • std :: istream & operator>>(std :: istream &, floatx [r])
    • conversion to
    • std: bitset(flexfloat - can only print a bitwise representation)
    • conversion to
    • std: string
    • )

    • optional operation counters (requires a compiled runtime library) (TODO)
    • automatic deduction of the smallest native type which can fit the requested number of exponent and significand bits (TODO)
      • optimized performance in case the requested type matches a natively supported type (flexfloat- only enabled for single, inconsistently with the rest of the librry - e.g. flexfloat's equivalent of half uses 4x more memory than the equivalent of float) (TODO)
      • compressed storage which allows to reduce memory footprint (TODO
      • rounding modes other than "round to zero" ( TODO
      • CUDA support (TODO - should already work, but not tested)

      ************************What FloatX is NOT)

      FloatX does not implement arbitrary floating point types. The only supported types are "subtypes" of those natively supported by the hardware. In case you need implementations of larger types, consider using the SoftFloat library. (With some effort, you should also be able to use SoftFloat types as backend for FloatX)

      FloatXemulatesthe types of custom precision, and, while trying to achieve as high performance as possible, it isnotcapable of magically delivering better performance than natively supported types. Thus, do not expectfloatx

      That being said, it is not likely that FloatX will be useful in production codes. On the other hand, it can be handy in research projects which aim to Study the effects of using different precisions.

      To use the library, just make sure thatfloatex.hpp () from the(src /folder) is in your include path.

      Alternatively, if you are using CMake, aCMakeLists.txtfile is provided. You can download the repository into your project and use the following code to depend on the floatx target:

    • ******************Building the examples / unit tests

      A standard CMake command line sequence should do:

  • To run all the tests:

    This will (hopefully) output a summary of the form:

    To run only one of the tests (and see more detail output):


    Brave Browser************************ Acknowledgment


    What do you think?

    Leave a Reply

    Your email address will not be published.

    GIPHY App Key not set. Please check settings

    HMRC catches only 13% of paying companies below minimum wage – The Guardian,

    HMRC catches only 13% of paying companies below minimum wage – The Guardian,

    Market Analysis Report (08 Jan 2020)