Dear list,
I just spent several month to optimize some modules of my application. I would like to add some comments about this interesting experience. Thanks for any comment or corrections because I have always a lot of things to learn from you.
Before this experience, I really never understand nothing about this question of optimization because, when I test it, (speed 3) generally doesnt offer any difference
Now I realize that this feeling was completely wrong. The main module I optimized was a drawing module, so it uses many single-float geometric computation (I work on a 32 bits version of lw 5) and the result of optimization was amazing : with floats, its possible (in testing environment) to divide the time of computation by 50, or even more, and to reduce the consumption of memory to nothing. Practically, the drawing process of my application consumes now 40% of the time it consumed before (20% being the - irreducible - drawing itself), and less than 5% of the memory (before optimization, drawing the view completely can consume several MB). The gain is very noticeable, especially during scrolling, resizing, zooming and so on. But another observable result is that the codes are at least 2 times longer and more difficult to read. However, they are also safer and simpler to maintain because types are explicit.
During the process of optimization, my goal was to reduce the memory consumption of low-level functions used by this module to zero. Generally it was a good way to obtain a drastic reduction of the consumption of time. (float 0) is essential, but it must be combined with (safety 0) to obtain the « optimized environment ». If each variable going in an operator is typed, the compiler uses the typed version of this operator (you can see that when disassemble), the main gain of time is there. The main gain of space is to avoid boxing and unboxing numbers. So it is not a good idea to switch too often between optimized an non optimized environment. To store unboxed floats, the only solution is to use typed arrays (dont trust type declarations in classes or structures : they never store raw values). Passing arguments between functions (even local functions like flet) can become difficult because optimized floats are boxed at this point. The only solution is to pass typed arrays (even single index arrays, equivalent to a kind of « customized box ») between functions, and/or to use inlined functions (and/or macros). So the type of each float variable should be declared if you plane to use it for computation (even in loops). The type of typed arrays should be declared too. Numeric constants must be typed (I mean 1.0 not 1). Global variables must be typed with « the » (no other solution, even if « the » is really not the best way to declare types) - the declaration (defvar *foo* (the single-float 1.0)) is useless.
for instance :
(defvar *increment* 1.0)
NON OPTIMIZED
(defun test () (loop for f = 1.0 then (+ f *increment*) while (< f 20) sum f))
(time (test)) => 190.0, 0ms, 340 bytes. (for 1.000.000 operations the time is 2.8 sec)
OPTIMIZED
(defun test ()
(declare (optimize (speed 3) (safety 0) (float 0)))
(let ((fbox (load-time-value (make-array 1 :element-type 'single-float)))) ;could be a macro 'with-fbox'
(declare (type (simple-array single-float 1) fbox))
(loop for f of-type single-float = 1.0 then (+ f (the single-float *increment*)) while (< f 20.0)
sum f into sum of-type single-float
finally (progn (setf (aref fbox 0) sum)
(return fbox)))))
(time (test)) => #(190.0), 0ms, 0 bytes. (for 1.000.000 operation the time is 0.04 sec
70 times better)
Another source of acceleration (not so impressive than for float however) is with fixnums. The operators having always to check if a fixnum doesnt overflow the range of 29 bits (becoming a bignum), If you have fixnum computation and know that it wont overflow, the declaration of types combined with the use of (hcl:fixnum-safety 0) can add a significant acceleration.
Another source of optimization is the use of typed arrays. For instance, imagine two arrays of length 4, one with a default type T and another with a type single-float.
- the first needs 8 bytes for the array + 16 bytes for the 4 pointers to the boxed floats + 32 bits for the boxed floats themselves => 56 bytes
- the second needs 8 bytes for the array + 16 bytes for the 4 single-float => 24 bytes
Note that, if your read the typed array in a non optimized environment, it will consume 8 bytes each time when, boxed, it consumes nothing. But in optimized environment its something like the contrary
So the context, optimized or not, becomes very important to choose the type of storage.
There is another benefit of typed arrays but its hacking and certainly not portable : in optimized environment, you can read an array of single-float as a integer (the IEEE code), you can use an array of (unsigned-byte 32) as an array of (unsigned-byte 8) with 4 indexes instead of one, or even as a bit vector with 32 indexes instead of one... Simply pass the array to a function, and declare it as another (compatible) type. In the function, the new type will be used, when outside its the real type that is used.
Finally I am very happy with the result of this experience... but also troubled : After all this evolution, avoiding consing, using typed array, declaring the type of local variables, and so on
am I not in a kind of pseudo C environment ? except that the codes are less clear, the work at least two times longer, and the compiler a lot less helpful for the programmer that in, lets say, xcode 5 ? Naturally its for a very specific module
but this module needs to speak with my lisp model so it cant be implemented in pure C. So I believe that tools to build optimized and performant codes in lisp more simply could be very, very useful and a fantastic evolution for the language.
Best regards
Denis
------------------------------------------------
Denis Pousseur
70 rue de Wansijn
1180 Bruxelles
+ 32 2 219 31 09
http://www.denispousseur.com
------------------------------------------------
Hi,
Is there a precise references for what combinations of declares/the's/etc. actually achieve improvements in the LW compiler and what improvements they achieve? The reference manual page on declare mentions that declaring types "removes type checking", what effects do other declarations have?
Mark