\ runge4.seq Runge-Kutta ODE Solver for systems of ODEs \ \ Forth Scientific Library Algorithm #29 \ )runge_kutta4_init ( 'dsdt n -- ) \ Initialize to use function dsdt() for n equations, \ its stack diagram is: \ dsdt() ( 'u 'dudt -- ) ( F: t -- ) \ (the values in the array dudt get changed) \ \ runge_kutta4_integrate() ( 'u steps -- ) ( F: t dt -- t' ) \ Integrate the equations STEPS time steps of size DT, \ starting at time T and ending at time T'. U is the \ initial condition on input and the new state of the \ system on output. \ runge_kutta4_done ( -- ) \ Release previously allocated space. \ )rk4qc_init ( 'dsdt n 's -- ) ( F: maxstep eps -- ) \ Initialize to use function dsdt() for n equations. \ The initial function values are in s{. The output is also \ in s{. The result is computed with a 5th-order Runge-Kutta \ routine with adaptive step size control. The step size \ controller tries to keep the fractional error of any s{ \ component below eps. The maximum step size is limited to \ maxstep . \ \ rk4qc_done ( -- ) \ Release previously allocated space. \ \ rk4qc_step ( -- flag ) ( F: step t -- step' t' ) \ Do one Runge-Kutta step, using adaptive step size control. \ The flag is FALSE if the routine succeeds, else the step size \ has become too small. The current step size and time are one \ the fp stack and will be updated by the routine. \ This is an ANS Forth program requiring: \ 1. The Floating-Point word set \ 2. Uses words 'Private:', 'Public:' and 'Reset_Search_Order' \ to control visibility of internal code \ 3. The word 'v:' to define function vectors and the \ immediate 'defines' to set them. \ 4. The immediate words 'use(' and '&' to get function addresses \ 5. The words 'DARRAY' and '&!' to alias arrays. \ 6. Uses '}malloc' and '}free' to allocate and release memory \ for dynamic arrays ( 'DARRAY' ). \ 7. The compilation of the test code is controlled by the VALUE \ TEST-CODE? and the conditional compilation words in the \ Programming-Tools wordset. \ 8. To run the code the fp stack needs to be at least 5 deep. \ To run all examples you need 7 positions on the fp stack. \ \ (c) Copyright 1994 Everett F. Carter. Permission is granted \ by the author to use this software for any application provided \ this copyright notice is preserved. \ (The adaptive code was contributed by Marcel Hendrix) 0 [IF] Additional documentation (see Chapter 15. 'Integration of Ordinary Differential Equations', in 'Numerical Recipes in Pascal, The Art of Scientific Computing', by William H. Press, Brian P. Flannery, Saul A. Teukolsky and William T. Vetterling) Problems involving ordinary differential equations (ODEs) can always be reduced to the study of sets of first order differential equations. This is done by introducing new variables that are usually derivatives of each other and of the original variable. Example: 2 dy dy --- + q(x) -- = r(x) can be rewritten to 2 dx dx dy/dx = z(x); dz/dx = r(x) - q(x) z(x). The formula for traditional Euler integration is: y[n+1] = y[n] + h*f'(x[n], y[n]) This advances a solution from x[n] to x[n+1] = x[n] + h. It only uses information (the derivatives f') from the start of the interval. When a trial step to the midpoint of the interval is done we can do better: k1 = h*f'(x[n], y[n]) k2 = h*f'(x[n]+h/2, y[n]+k1/2) y[n+1] = y[n] + k2 + O(h^3) This is called the second-order Runge-Kutta or midpoint method (a method is called of order n when its error term is of order O(h^n+1)). By taking this idea a few steps further along we arrive at the fourth-order Runge-Kutta method: k1 = h*f'(x[n], y[n]) k2 = h*f'(x[n]+h/2, y[n]+k1/2) k3 = h*f'(x[n]+h/2, y[n]+k2/2) k4 = h*f'(x[n]+h/2, y[n]+k3) y[n+1] = y[n] + k1/6 + k2/3 + k3/3 + k4/6 + O(h^5) You'll find that the following Forth code is an exact implementation of these 5 formulas. When is Runge-Kutta(4) useful? Of course only when the increased number of function evaluations (f') is compensated for by the fact that larger steps can be taken than for Euler or Runge-Kutta(2) integration. Usually this is the case. In general, higher-order RK methods are not interesting because the number of evaluations increases faster than the gain produced by the larger stepsize they allow. Press et al highly recommend RK(4) in combination with an adaptive stepsize algorithm ("Runge-Kutta is for ploughing the fields"). For smooth functions they recommend the Bulirsch-Stoer method ("high-strung racehorse"). You should be aware of the fact that a RK(4) routine _without_ adaptive stepsize control can be a waste of computer time. This is how you'd use it: 1. integrate the ODEs between point x1 and x2 with stepsize h. 2. integrate the ODEs between point x1 and x2 with stepsize h/2. 3. compare the results. If they're "sufficiently" equal, you're done, else goto 2. This approach is useful when you just want a low resolution function sketch. (most people would even leave out steps 2 and 3). The fixed grid is exactly what is needed for plotting, an adaptive stepsize would require the use of an interpolation algorithm after RK(4) finishes! [THEN] CR .( RUNGE4.SEQ V1.2 15 November 1994 EFC ) Private: v: dsdt() \ pointer to user function t, u, dudt FLOAT DARRAY dum{ \ scratch space FLOAT DARRAY dut{ FLOAT DARRAY ut{ FLOAT DARRAY dudt{ FVARIABLE h FLOAT DARRAY u{ \ pointer to user array 0 VALUE dim : F2/ 0.5E0 F* ; : F2* 2.0E0 F* ; Public: : )runge_kutta4_init ( &dsdt n -- ) TO dim defines dsdt() & dum{ dim }malloc malloc-fail? ABORT" runge_init failure (1) " & dut{ dim }malloc malloc-fail? ABORT" runge_init failure (2) " & ut{ dim }malloc malloc-fail? ABORT" runge_init failure (3) " & dudt{ dim }malloc malloc-fail? ABORT" runge_init failure (4) " ; : runge_kutta4_done ( -- ) & dum{ }free & dut{ }free & ut{ }free & dudt{ }free ; Private: : runge4_step ( -- ) ( F: t -- t' ) FDUP u{ dudt{ dsdt() h F@ F2/ dim 0 DO dudt{ I } F@ FOVER F* u{ I } F@ F+ ut{ I } F! LOOP FOVER F+ ut{ dut{ dsdt() h F@ F2/ dim 0 DO dut{ I } F@ FOVER F* u{ I } F@ F+ ut{ I } F! LOOP FOVER F+ ut{ dum{ dsdt() h F@ dim 0 DO dum{ I } F@ FOVER F* u{ I } F@ F+ ut{ I } F! dum{ I } DUP F@ dut{ I } F@ F+ F! LOOP F+ \ tos is now t+dt FDUP ut{ dut{ dsdt() h F@ 6.0E0 F/ dim 0 DO dudt{ I } F@ dut{ I } F@ F+ dum{ I } F@ F2* F+ FOVER F* u{ I } DUP F@ F+ F! LOOP FDROP ; Public: : runge_kutta4_integrate() ( &u steps -- ) ( F: t dt -- t' ) SWAP & u{ &! h F! 0 DO runge4_step LOOP ; 0 [IF] Adaptive Step Size Control For Runge-Kutta The technique we'll be using is step doubling. We take each step twice, once as a full step, then, independently, as two half steps. This creates only about 38% of overhead because we'll get the accuracy of the half step. A nice trick described by Press et al is that the results of the 3 operations can be combined in a way that gives us a truncation error of order 6, not 5 as for the standard Runge-Kutta(4) (This is where the fcor factor is needed for). Given the results of Ringe-Kutta with the half and the full steps a new step size is chosen: h0 = h1 * |wanted_accuracy / delta1|^0.2 Here h stands for step size and delta1 is the maximum error of any component of the output vector y, for the present step. If the step failed we know how to choose the new step size h0, if the step was ok, we use h0 as the initial step size for the next integration phase. It is difficult to interpret "wanted_accuracy" correctly. When the dependent variables differ enormously in amplitude you'd want to use fractional errors. With oscillatory functions passing through zero it would be better to set the wanted_accuracy to some part of the amplitude of these functions. Press et al recommend an additional array uscal{ to do all this. One of the arguments of the routine will be the vector of dependent variables at the beginning of the proposed step. Call that u{ 1..n }. Let us require the user to specify for each step another, corresponding, vector argument uscal{ 1..n }, and also an overall tolerance level eps. Then the desired accuracy for the ith equation will be wanted_accuracy = eps * uscal{ i } When constant fractional errors are desired you should plug y into the uscal{ calling slot. When constant absolute errors relative to some maximum values are needed you set uscal{ i } equal to those values. A useful trick for getting constant fractional errors except very near zero crossings is to set uscal{ i } equal to |u{ i }| + |h * dsdt{ i }| . We'll use that in the code below. Further refinements of the step size controller result in exponents being 0.2 or 0.25 and the use of a safety factor S of 0.9: h0 = Sh1 * |wanted_accuracy / delta1|^0.2 wanted_accuracy >= delta1 = Sh1 * |wanted_accuracy / delta1|^0.25 wanted_accuracy < delta1 [THEN] Private: 1E-30 FCONSTANT tiny -0.20E0 FCONSTANT pgrow -0.25E0 FCONSTANT pshrink 1e 15E0 F/ FCONSTANT fcor 0.9E0 FCONSTANT safety 4E0 safety F/ 1E0 pgrow F/ F** FCONSTANT errcon FVARIABLE eps FVARIABLE step FVARIABLE tstart FVARIABLE maxstep FLOAT DARRAY uorig{ FLOAT DARRAY u1{ FLOAT DARRAY u2{ FLOAT DARRAY uscal{ \ Find reasonable scaling values to decide when to shrink step size. : scale'm ( -- ) tstart F@ uorig{ uscal{ dsdt() dim 0 DO uscal{ I } DUP F@ step F@ F* FABS uorig{ I } F@ FABS F+ tiny F+ F! LOOP ; \ With a trick the result of a step can be made accurate to 5th order. : 4th->5th ( -- ) dim 0 DO \ get 5th order truncation error uorig{ I } DUP F@ FDUP u1{ I } F@ F- fcor F* F+ F! LOOP ; \ Test if the step size needs shrinking : shrink? ( -- bool ) ( F: -- diff ) 0.0E0 ( errmax ) dim 0 DO uorig{ I } F@ u1{ I } F@ F- uscal{ I } F@ F/ FABS FMAX LOOP eps F@ F/ FDUP 1e F> ; Public: \ Initialize to use function dsdt() for n equations. The initial function \ values are in s{. The output is also in s{. The result is computed with a \ 5th-order Runge-Kutta routine with adaptive step size control. The step size \ controller tries to keep the fractional error of any s{ component below eps. \ The maximum step size is limited to maxstep . : )rk4qc_init ( 'dsdt n 'u -- ) ( F: maxstep eps -- ) eps F! maxstep F! & uorig{ &! )runge_kutta4_init & u1{ dim }malloc malloc-fail? ABORT" )rk4qc_init :: malloc (1)" & u2{ dim }malloc malloc-fail? ABORT" )rk4qc_init :: malloc (2)" & uscal{ dim }malloc malloc-fail? ABORT" )rk4qc_init :: malloc (3)" ; \ Release previously allocated space. : rk4qc_done ( -- ) runge_kutta4_done & u1{ }free & u2{ }free & uscal{ }free ; \ Do one Runge-Kutta step, using adaptive step size control. The flag is \ FALSE if the routine succeeds, else the step size has become too small. \ The current step size and time are one the fp stack and will be updated \ by the routine. : rk4qc_step ( -- flag ) ( F: step t -- step' t' ) tstart F! step F! scale'm uorig{ u1{ dim }fcopy \ we need a fresh start after a shrink uorig{ u2{ dim }fcopy BEGIN uorig{ 2 tstart F@ step F@ F2/ runge_kutta4_integrate() FDROP u1{ 1 tstart F@ step F@ runge_kutta4_integrate() ( F: -- t' ) FDUP tstart F@ 0.0E0 F~ IF 0.0E0 FSWAP FALSE EXIT THEN shrink? \ maximum difference between these two tries WHILE \ too large, shrink step size FLN pshrink F* FEXP step F@ F* safety F* step F! FDROP u2{ uorig{ dim }fcopy \ a fresh start after a shrink... u2{ u1{ dim }fcopy REPEAT \ ok, grow step size for next time FDUP errcon F< IF FDROP step F@ 4e F* ELSE FLN pgrow F* FEXP step F@ F* safety F* THEN maxstep F@ FMIN \ but don't grow excessively! FSWAP TRUE 4th->5th ; Reset_Search_Order TEST-CODE? [IF] \ test code ========================================== 3 FLOAT ARRAY x{ : do_output ( n 'x -- ) ( F: t -- ) CR F. }fprint ; FVARIABLE _dt \ With this time increment the dampled vibration test 1.0E-2 _dt F! \ will be in error in the 4th decimal place. Smaller \ values will give more accurate results. : dt _dt F@ ; : dt! _dt F! ; 1.0E0 FATAN 8.0E0 F* FCONSTANT PI*2 1.92E0 FCONSTANT cm 960.0E0 FCONSTANT km \ A damped vibration test case: \ u'' + cm u' + km u = 0 \ or, \ u' = v \ v' = - cm v - km u : dvderivs() ( 'u 'dudt -- ) ( F: t -- ) FDROP \ does not use t OVER 1 } F@ DUP 0 } F! OVER 1 } F@ cm F* SWAP 0 } F@ km F* F+ FNEGATE 1 } F! ; : actual ( -- ) ( F: t -- a ) \ just the U value not V for damped vib. cm FOVER F* F2/ FNEGATE FEXP FSWAP cm cm F* 4.0E0 F/ FNEGATE km F+ FSQRT F* FCOS F* PI*2 F/ ; 1 VALUE /steps \ Number of integration steps \ done for a single output value \ One would expect to get better results for high values of /steps, and \ that is indeed the case. : dvib_test_twice ( n -- ) \ n is the number of time steps to run FRAME| a | use( dvderivs() 2 )runge_kutta4_init 2 0 DO 1E PI*2 F/ x{ 0 } F! cm FNEGATE F2/ PI*2 F/ x{ 1 } F! CR 0.0E0 \ initial time 2 x{ FDUP do_output ." : " FDUP actual F. a 0 DO x{ /steps dt /steps S>F F/ runge_kutta4_integrate() 2 x{ FDUP do_output ." : " FDUP actual F. LOOP FDROP I 0= IF CR ." With twice as many steps now ..." THEN /steps 2* TO /steps LOOP runge_kutta4_done 1 TO /steps |FRAME ; : dvib_test ( n -- ) \ n is the number of time steps to run 1.0E PI*2 F/ x{ 0 } F! cm FNEGATE F2/ PI*2 F/ x{ 1 } F! use( dvderivs() 2 )runge_kutta4_init CR 0.0E0 \ initial time 2 x{ FDUP do_output FDUP actual F. 0 DO x{ 1 dt runge_kutta4_integrate() 2 x{ FDUP do_output FDUP actual F. LOOP FDROP CR runge_kutta4_done ; 16.0E0 FCONSTANT sig 45.92E0 FCONSTANT r 4.0E0 FCONSTANT bp \ the Lorenz equations for chaos: \ dx/dt = sig * (y - x) \ dy/dt = r * x - y - x * z \ dz/dt = -bp * z + x * y : derivs() ( 'u 'dudt -- ) ( F: t -- ) FDROP \ does not use t OVER DUP 1 } F@ 0 } F@ F- sig F* DUP 0 } F! OVER DUP DUP 2 } F@ FNEGATE r F+ 0 } F@ F* 1 } F@ F- DUP 1 } F! SWAP DUP DUP 0 } F@ 1 } F@ F* 2 } F@ bp F* F- 2 } F! ; : lorenz_test ( n -- ) \ n is the number of time steps to run 0.0E0 x{ 0 } F! 1.0E0 x{ 1 } F! 0.0E0 x{ 2 } F! use( derivs() 3 )runge_kutta4_init CR 0.0E0 \ initial time 3 x{ FDUP do_output 0 DO x{ 1 dt runge_kutta4_integrate() 3 x{ FDUP do_output LOOP FDROP CR runge_kutta4_done ; \ The RC discharge equation: dVc/dt = 1/RC (Vin-Vc) 100E-3 FCONSTANT tau ( R, C product is 100 ms) 10E FCONSTANT Vin ( charging voltage source is 10 Volts) : Cderiv() ( &u &dudt -- ) ( F: t -- ) FDROP Vin SWAP 0 } F@ F- tau F/ 0 } F! ; : Vc ( F: t -- v ) 1e FSWAP FNEGATE tau F/ FEXP F- Vin F* ; \ Example: 10 cap_test : cap_test ( n -- ) \ n is the number of time steps to run 0E x{ 0 } F! use( Cderiv() 1 )runge_kutta4_init CR 0.0E0 \ initial time 1 x{ FDUP do_output ." " FDUP Vc F. 0 DO x{ /steps dt /steps S>F F/ runge_kutta4_integrate() 1 x{ FDUP do_output ." " FDUP Vc F. LOOP FDROP CR runge_kutta4_done 1 TO /steps ; : .output1 ( F: time -- ) CR FDUP F. 1 x{ }fprint ." | " Vc F. ; \ The charging capacitor problem revisited. \ Example: 200e-3 50e-3 1e-3 cap_test2 : cap_test2 ( F: tend maxstep eps -- ) \ tend is the time to stop FROT FRAME| a | \ maxstep is time between outputs 0e x{ 0 } F! use( Cderiv() 1 x{ FOVER FSWAP )rk4qc_init 0e \ initial step and begin BEGIN FDUP .output1 rk4qc_step 0= FDUP a F> OR UNTIL .output1 FDROP rk4qc_done |FRAME ; : .output2 ( F: time -- ) CR FDUP F. 2 x{ }fprint ." | " actual F. ; \ The vibration problem again. \ Example: 800e-3 50e-3 1e-3 dvib_test2 : dvib_test2 ( F: tend maxstep eps -- ) \ n is the number of time steps to run FROT FRAME| a | \ maxstep is time between outputs 1E PI*2 F/ x{ 0 } F! cm FNEGATE F2/ PI*2 F/ x{ 1 } F! use( dvderivs() 2 x{ FOVER FSWAP )rk4qc_init 0e \ initial step and start time BEGIN FDUP .output2 rk4qc_step 0= FDUP a F> OR UNTIL .output2 FDROP rk4qc_done |FRAME ; [THEN]