Real - Java floating point library for MIDP devices
Current version: 1.13
Copyright: 2003-2009 Roar Lauritzsen, roarl at users.sourceforge.net
Availability: The library is available under the GPL license, or for a
fee for
commercial use.
Develop: See http://sourceforge.net/projects/real-java
Feedback: Comments, reports of use, bug reports, bug fixes,
improvements, performance tests and comparisons to similar libraries
are all welcome.
Javadoc: Read the full documentation
here.
"Real.java" is a self-contained, single file Java floating
point emulation library for MIDP devices, such as a Java-enabled
cell-phone or PDA. The MIDP/CLDC 1.0 standard does not support the
basic floating point types "float" or "double", so when I set out to
program a scientific calculator for
my cell-phone, I had to reinvent floating point arithmetic from
scratch, using only integer basic types. I found it a very interesting
and educating challenge.
Download
For a pure java version, use the following link: http://real-java.sourceforge.net/Real.java.
This file is produced from the file http://real-java.sourceforge.net/Real.jpp
using the C preprocessor like this:
cpp -C -P -DDO_INLINE -DPACKAGE=ral -o Real.java Real.jpp
If you want more human-readable, but about 30% slower java code *), don't
define the DO_INLINE macro when preprocessing. The size of the
executable code won't change noticeably.
*) Most MIDP devices don't do just-in-time inlining, so this has to be
done by the use of preprocessor macros.
Features
- Precise: The internal mantissa is 63-bit, which
corresponds to approximately 19 decimal digits accuracy. All
operations are performed with maximum precision, basic operations
are all 63 bits precise and correctly rounded, more advanced
functions have error bounds of a few ULPs (Units in the Last Place),
except the erfc function which is a bit difficult to get accurate
beyond 44 bits.
- Wide range: The internal exponent is 31-bit, which allows
for numbers up to 4.197·10323228496
- Fast: Attention has been given especially to execution
speed, while at the same time keeping the code as simple as possible
to reduce footprint. All advanced operations are implemented using
established and often improved algorithms with rapid convergence.
Performance is nevertheless several orders of magnitude
slower than normal integer performance, but adequate for quite
complex calculator and spreadsheet functions on a typical MIDP
device.
- Small objects: One "Real" object uses 13 bytes of memory,
plus normal "Object" overhead. Full-precision Real's can be packed
in byte arrays with 12 bytes per number. Lower-precision
formats allow saving "double" representations in long's and
"float" representations in int's.
- Small footprint: Obfuscated, the library compresses to about
15KB in a jar file. One important fact: A good obfuscator removes all
unused methods and constants, which leaves you with just the amount
of code that you actually need.
- Clean: The library is contained in a single file and a
single class (plus one inner class).
- Easy-to-use: To encourage object reuse and minimize
garbage production, the interface mimics x86 assembly instructions
with destination and source operands, e.g. the assembly instruction
for adding b to a, "add a,b", becomes "a.add(b);". In Java, this
corresponds to writing expressions using only the operators "+=",
"-=", "*=", etc, in this case "a+=b;".
- No garbage: No temporary values are allocated from the heap
by any of the library functions, and in programs using the library
most garbage production can be avoided by reuse of objects.
- Abnormal numbers: Implements infinities and NaN following
the IEEE754 logic.
- Functions: Exhaustive set of mathematical functions,
e.g. sqrt, sin, acosh, atan2, pow, gamma and erfc.
- Time arithmetic: Conversion to/from DH.MS format using
Gregorian calendar accurate since 1582.
- Random generator: Advanced, arguably cryptographically
strong pseudo-random generator.
- String output: Comprehensive string output formatting
including SCI/FIX/ENG formats and left/right justification, with
configurable separators, precision and maximum width. HEX/OCT/BIN
output work seamlessly with other formatting options and produces
unambiguous twos-complement output of fractional numbers.
- Tested: Most features have been thoroughly used and tested
by a number of users in my Calc MIDP
application. Furthermore, all arithmetic functions have been
compared and calibrated against William Rossi's rossi.dfp.dfp at 40 decimal
digits accuracy.
- Fully documented: The complete javadoc covers all public fields and
methods, outfitted whenever possible with equivalent
double
code, calculation error bounds and execution
time relative to one addition.
Speed comparison
The speed comparisons were performed on a SonyEricsson T610. Other,
possibly contradictory results may be found using other hardware.
38 times faster than William Rossi's rossi.dfp.dfp (dfp accuracy set
to a comparable 20 decimal digits).
13 times faster than Nick Henson's henson.midp.Float
(additionally, Real is about twice as accurate in mul/div).
Runtime comparisons using
henson.midp.FloatTest on Sony Ericsson T610 (R3C) |
Library | sin,
ms 100 times | cos, ms 100
times | tan, ms 100
times | add, ms 10000
times | mul, ms 10000
times | div, ms 10000
times | sqrt, ms 1000
times | avg. relative runtime |
Real | 425 | 420 | 855 | 3110 | 2920 | 4820 | 1660 | « 1.0 » |
dfp | 23170 | 25220 | 50320 | 19460 | 26765 | 119280 | 83510 | 37.7 |
Float | 8780 | 5055 | 14515 | 11385 | 30440 | 42120 | 34595 | 13.3 |
Real 1.08 | 720 | 645 | 1400 | 3145 | 2940 | 4750 | 2735 | 1.4 |
Real 1.06 | 1475 | 1260 | 2915 | 3000 | 3475 | 24960 | 3300 | 2.7 |
Sample program
public class Sample
{
// Print out 2*PI in hexadecimal form
public static void main(String [] args)
{
Real a = new Real("0.5");
Real b = new Real();
a.gamma(); // We all know that gamma(0.5) == sqrt(pi)
a.sqr();
b.assign(2);
a.mul(b);
Real.NumberFormat format = new Real.NumberFormat();
format.base = 16;
System.out.println(a.toString(format));
}
}
Optimization tips
To maintain numbers in a normalized state and to preserve the
integrity of abnormal numbers, it is discouraged to modify the inner
representation of a Real directly. However, a number of tricks can be
performed with the functions that are already present.
For maximum performance, keep the following tips in mind while
programming:
- Use integers: Whenever integers are involved, do as much
calculation with the integers as possible (promote the integer to
long, if necessary), before introducing the integer in the
calculation. Additionally, using mul and div with integer arguments
is faster than doing "assign(int)" first and then mul/div.
- Use scalbn: Binary scale can be used instead of
multiplication and division when the factor is a power of
2. "scalbn" works with Reals as right or left-shift works with
integers, and is much faster than "mul" and "div".
- Use mul instead of div: It is better to pre-calculate a
reciprocal using "recip" outside a loop, and then use multiplication
instead of division inside the loop. A division is about as slow
as two multiplications.
- Use sqr: "sqr" is slightly faster than using "mul" to
multiply a number by itself.
- Use pre-calculated constants: The pre-calculated constants
are ready for use and may be more accurate than what you can
calculate yourself. See warning.
- Use functions with integer arguments: The most common
operators have been overloaded to take integer arguments. This can
reduce the number of temporary variables needed. For example,
instead of 'tmp.assign(20); a.add(tmp);', you could do 'a.add(20);'.
- Don't convert to String: Using Strings to store Reals is
inefficient. Converting the internal binary representation into a
decimal representation is time-consuming and may introduce
inaccuracies. It should only be used to present results to the human
eye. Once a string representation of a number is generated, it
should be cached for quick access, to avoid generating the string
over and over again.
- Don't convert from String: Assigning from Strings should
be avoided in time-critical code. Converting from a decimal string
representation to the internal binary representation is
time-consuming. It may be quicker to calculate the number from
pre-calculated constants and integers. For example, instead of
'a.assign("0.25");', you could do 'a.assign(Real.ONE); a.scalbn(-2);'.
- Avoid "new": Allocating temporary Real objects inside
loops should be avoided. For every "abandoned" object, a little
garbage is left in memory. When the garbage adds up, the garbage
collector is run to clean it up, and this causes the application to
stop for a short period of time.
Class interface summary
All functions are declared void unless the return type is specified
Public fields:
long mantissa
int exponent
byte sign
Constructors/assignment:
Real() <== 0
Real(Real) <== Real
Real(int) <== int
Real(long) <== long
Real(String) <== "-1.234e56"
Real(String, int base) <== "-1.234e56" / "/FFF3.2e-10"
Real(int s, int e, long m) <== (-1)**s * 2**(e-62) * m
Real(byte[] data, int offset) <== data[offset]..data[offset+11]
assign(Real)
assign(int)
assign(long)
assign(String)
assign(String, int base)
assign(int s, int e, long m)
assign(byte[] data, int offset)
assignFloatBits(int) <== IEEE754 32-bits float format
assignDoubleBits(long) <== IEEE754 64-bits double format
Output:
String toString() ==> "-1.234e56"
String toString(int base) ==> "-1.234e56" / "03.FEe56"
String toString(NumberFormat) ==> e.g. "-1'234'567,8900"
int toInteger() ==> int
long toLong() ==> long
void toBytes(byte[] data, int offset) ==> data[offset]..data[offset+11]
int toFloatBits() ==> IEEE754 32-bits float format
long toDoubleBits() ==> IEEE754 64-bits double format
(Error bounds are calculated using William Rossi's rossi.dfp.dfp at
40 decimal digits accuracy. Error bounds may increase when results
approach zero or infinity. ULP = Unit in the Last Place. Error
bound of ½ ULP means that the result is correctly rounded. Relative
execution time is the average from running on SE T610 (R3C), K700i,
and Nokia 6230i)
Approx error Execution time
"Explanation" bound (ULPs) (rel. to add)
Binary operators:
x.add(Real y) : x+=y ½ ««« 1.0 »»»
sub(Real) : x-=y ½ 2.0
mul(Real) : x*=y ½ 1.3
div(Real) : x/=y ½ 2.6
rdiv(Real) : x=y/x ½ 3.1
mod(Real) : x=x-y*floor(x/y) 0 27
divf(Real) : x=floor(x/y) 0 22
and(Real) : x&=y 0 1.5
or(Real) : x|=y 0 1.6
xor(Real) : x^=y 0 1.5
bic(Real) : x&=~y 0 1.5
swap(Real) : tmp=x, x=y, y=tmp 0 0.5
Functions:
x.abs() : x=|x| 0 0.2
neg() : x=-x 0 0.2
sqr() : x=x*x ½ 1.1
recip() : x=1/x ½ 2.3
sqrt() : (square root) 1 19
rsqrt() : x=1/sqrt(x) 1 21
cbrt() : (cube root) 2 32
exp() : x=e**x 1 31
exp2() : x=2**x 1 27
exp10() : x=10**x 1 31
ln() 2 51
log2() 1 51
log10() 2 53
sin() 1 28
cos() 1 37
tan() 2 70
asin() 3 68
acos() 2 67
atan() 2 37
sinh() 2 67
cosh() 2 66
tanh() 2 70
asinh() 2 77
acosh() 2 75
atanh() 2 57
fact() : x=x! 15 8-190
gamma() : x=(x-1)! 100+ 190
erfc() : (compl error func) 2**19 80-4900
inverfc() : (inverse erfc) 2**19 240-5100
toDHMS() : H -> yyyymmddHH.MMSS ? 19
fromDHMS() : yyyymmddHH.MMSS -> H ? 19
time() : x=HH.MMSS 0 8.9
date() : x=yyyymmdd00 0 30
random() : random number [0,1) - 81
Binary functions:
hypot(Real) : x=sqrt(x*x+y*y) 1 24
y.atan2(Real x) : y=atan(y/x) (-pi,pi] 2 48
pow(Real) : x**=y 2 110
pow(int) : x**=y ½ 84
nroot(Real) : x**=1/y 2 110
Integral values:
floor() 0 0.5
ceil() 0 1.8
round() 0 1.3
trunc() 0 1.2
frac() 0 1.2
Utility functions:
copysign(Real) : x=|x|*y/|y| 0 0.2
nextafter(Real) : x+=(y-x)*epsilon 0 0.8
scalbn(int) : x<<=y 0 0.3
normalize() : readjust mantissa ½ 0.7
lowPow10() 0 3.6
Make special values:
makeZero() 0 0.2
makeZero(int sign) 0 0.2
makeInfinity(int sign) 0 0.3
makeNan() 0 0.3
Comparisons:
boolean equalTo(Real) : x==y 1.0
boolean notEqualTo(Real) : x!=y 1.0
boolean lessThan(Real) : x<y 1.0
boolean lessEqual(Real) : x<=y 1.0
boolean greaterThan(Real) : x>y 1.0
boolean greaterEqual(Real) : x>=y 1.0
boolean absLessThan(Real) : |x|<|y| 0.5
Query state:
boolean isZero() 0.3
boolean isInfinity() 0.3
boolean isNan() 0.3
boolean isFinite() 0.3
boolean isFiniteNonZero() 0.3
boolean isNegative() 0.3
boolean isIntegral() 0.6
boolean isOdd() 0.6
Overloaded methods, integer arguments:
add(int) ½ 1.8
sub(int) ½ 2.4
mul(int) ½ 1.3
div(int) ½ 3.4
rdiv(int) ½ 3.9
boolean equalTo(int) 1.7
boolean notEqualTo(int) 1.7
boolean lessThan(int) 1.7
boolean lessEqual(int) 1.7
boolean greaterThan(int) 1.7
boolean greaterEqual(int) 1.7
Extended precision methods with 128-bit mantissa:
add128 2**-62 2.0
mul128 2**-60 3.1
recip128 2**-60 17
normalize128 2**-64 0.7
roundFrom128 ½ 1.0
Other methods:
assign(Real) 0 0.3
assign(int) 0 0.6
assign(long) 0 1.0
assign(String, 2) 0 54
assign(String, 8) 0 60
assign(String, 10) ½-1 100
assign(String, 16) 0 60
assign(int s, int e, long m) 0 0.3
assign(byte[] data, int offset) 0 1.2
assignFloatBits(int) 0 0.6
assignDoubleBits(long) 0 0.6
int toInteger() 0.6
long toLong() 0.5
String toString(2) 100
String toString(8) 110
String toString(10) 150
String toString(16) 120
void toBytes(byte[] data, int offset) 1.2
int toFloatBits() 0.7
long toDoubleBits() 0.7
Constants:
ZERO = 0
ONE = 1
TWO = 2
THREE = 3
FIVE = 5
TEN = 10
HUNDRED = 100
HALF = 1/2
THIRD = 1/3
TENTH = 1/10
PERCENT = 1/100
SQRT2 = sqrt(2)
SQRT1_2 = sqrt(1/2)
PI2 = pi*2
PI = pi
PI_2 = pi/2
PI_4 = pi/4
PI_8 = pi/8
E = e
LN2 = ln(2)
LN10 = ln(10)
LOG2E = log2(e) = 1/ln(2)
LOG10E = log10(e) = 1/ln(10)
MAX = max non-infinite positive number = 4.197e323228496
MIN = min non-zero positive number = 2.383e-323228497
NAN = not a number
INF = infinity
INF_N = -infinity
ZERO_N = -0
ONE_N = -1