[Solved] a is a double, printf(“%d”, a); works differently in IA32 and IA32-64 [closed]

%d actually is used for printing int. Historically the d stood for “decimal”, to contrast with o for octal and x for hexadecimal. For printing double you should use %e, %f or %g. Using the wrong format specifier causes undefined behaviour which means anything may happen, including unexpected results. 5 solved a is a double, … Read more

[Solved] How do I truncate the significand of a floating point number to an arbitrary precision in Java? [duplicate]

Suppose x is the number you wish to reduce the precision of and bits is the number of significant bits you wish to retain. When bits is sufficiently large and the order of magnitude of x is sufficiently close to 0, then x * (1L << (bits – Math.getExponent(x))) will scale x so that the … Read more

[Solved] Problematic understanding of IEEE 754 [closed]

What is precision? It refers to how closely a binary floating point representation can represent a real value. Real values have infinite precision and infinite range. Digital values have finite range and precision. In practice a single-precision IEEE-754 can represent real values of a precision of 6 significant figures (decimal), while double-precision is good for … Read more

[Solved] Blatant floating point error in C++ program

80-bit long double (not sure about its size in MSVS) can store around 18 significant decimal digits without loss of precision. 1300010000000000000144.5700788999 has 32 significant decimal digits and cannot be stored exactly as long double. Read Number of Digits Required For Round-Trip Conversions for more details. 8 solved Blatant floating point error in C++ program