How many ways can factorial be written on Scheme?

Evil languages claim that functional programming languages are "languages for writing factorials." Most often, this is what defines Haskell, but we will start with the functional language that strongly influenced Haskell, and a subset of the tools for functional programming in many other languages - the Scheme language. At least, map and for-each , filter and reduce , as well as apply and eval came to our favorite programming languages, if not from Scheme, then from there.

Consider some possible ways to write factorial calculations. At the same time you get a kind of ode to the Scheme programming language. I think this wonderful language deserves it.

I got 10 options for writing definitions of functions that can be reduced to 3 main methods of calculation: the traditional linear-recursive computational process, iteration, generation of a sequence of numbers, followed by convolution by multiplication. I propose to consider these options in more detail. Along the way, we will look at: tail recursion optimization, higher order functions and metaprogramming, deferred calculations, infinite lists, memoization, a way to create a static variable in Scheme, and hygienic macros.

For the experiments, the good old dialect of Scheme R5RS and the popular principle of the fine arts “minimum means - maximum impressions” were used.

All the examples in Scheme were prepared in the DrRacket 6.2 environment in the R5RS mode. Runtime measurements were performed in Guile 2.0 in the openSUSE Leap 15 OS environment.

For a start, you can take a recursive definition of factorial and simply rewrite the formula in Scheme:

 (define (factorial-classic n) (if (zero? n) 1 (* n (factorial-classic (- n 1)))))

It turned out the definition of a function (in terms of Scheme - a procedure, although in essence it is a function) for calculating factorial, which can be seen in countless programming guides, starting with the immortal book of H. Abelson and D. Sassman “Structure and interpretation of computer programs” .

You can read and understand this code like this: factorial $n$ there is $1$ if a $n = 0$ otherwise $n \ cdot (n-1)!$ . Thus, this code corresponds to the recursive definition of factorial adopted in mathematics. The only thing we do not check the identity $n$ nonnegative numbers.

Being recursive, the code above contains an obvious limitation on the value $n$ : recursive call data will accumulate on the stack while $n$ will not reach 0. This can cause stack overflow for large $n$ .

How can I remove this restriction? We need to optimize tail recursion: rewrite the code so that the recursive call becomes tail (that is, the last in the procedure). This will allow the Scheme interpreter to perform the optimization — replace the recursive calculation with an iterative one.

If you use the recommendations of the authors of the above-mentioned book, you can get the following:

 (define (factorial-classic-tco n) (define (iteration product counter) (if (> counter n) product (iteration (* product counter) (+ counter 1)))) (iteration 1 1))

This code is a common example, and, starting with the book "The Structure and Interpretation of Computer Programs," it is here that the optimization of tail recursion is usually explained.

It was a classic. But Scheme is flexibility itself, is it possible to write the same thing in a fundamentally different way? And, preferably, even shorter? For example, according to $n! = 1 \ cdot 2 \ cdot 3 \ cdot ~ \ cdots ~ \ cdot n$ form a sequence from $1$ before $n$ (or from $n$ before $1$ ) and then roll it up by multiplication? Fortunately, in Scheme, this is quite simple due to the built-in procedure apply , which applies a procedure with an arbitrary number of arguments to the list:

 (define (iota n) (define (iteration sequence i) (if (> in) sequence (iteration (cons i sequence) (+ i 1)))) (iteration '() 1)) (define (factorial-fold n) (apply * (iota n)))

Scheme is renowned for its convenience for symbolic computations due to the “unity of code and data” (as is sometimes said about languages of the Lisp family). We use this feature: we will form an expression to calculate the factorial of a number $n$ and then we calculate it:

 (define (factorial-eval n) (define expression `(* ,@(iota n))) (eval expression (interaction-environment)))

The symbol "reverse single quote" means quasiquotation. Without quasi-quoting, an expression for the subsequent calculation could be obtained using the code (cons '* (iota n)) . A single quote (quotation, quotation) means that * it is necessary to substitute the expression as a name (symbol), and not the corresponding value (here - the procedure). So, with $n = 3$ we get (* 3 2 1) . This list is an expression in the Scheme language. Its value can be performed in a suitable environment, best of all in the (interaction-environment) , which contains the built-in procedures and procedures defined by us in the program. Actually, this is what we are doing in the body factorial-eval .

Scheme supports deferred calculations. Haskell, who was greatly influenced by Scheme, uses a lazy computation model, i.e. does not calculate the value of the expression until the result of these calculations is claimed. This allows you to have in the program such original data structures as endless lists. If we take from them only the part necessary for further calculations, the program will not loop:

 ghci> take 4 [1 ..] [1,2,3,4]

The expression [1 ..] generates an infinite list of integers. The take 4 expression gets the 4 first elements from this list. Since the subsequent elements of the list remain unclaimed, they are not calculated.

On haskell getting $n!$ from the endless list you can write this:

 factorials :: [Integer] factorials = next 0 1 where next n fact = let n' = n + 1 in fact : next n' (fact * n') factorial :: Integer -> Integer factorial n = factorials !! fromIntegral n

 ghci> take 7 $ factorials [1,1,2,6,24,120,720] ghci> factorial 6 720

Using the Scheme delay / force form pair, let's try to do something similar. The delay keyword creates a promise to calculate the value of an expression. The force keyword instructs to perform these calculations, the resulting value is calculated and stored. When re-applying new calculations is not performed, and returns the value calculated previously.

In languages of the Lisp family, lists are built from pairs. In order to construct infinite lists, we introduce the type of “lazy pair” - a pair in which the first element is the calculated value, and the second is the promise to calculate the value. To do this, we need to supplement the “holy trinity” of Lisp family languages ( cons , car , cdr ) with their lazy versions:

 (define-syntax lazy-cons (syntax-rules () ((_ first second) (cons first (delay second))))) (define lazy-car car) (define (lazy-cdr lazy-pair) (force (cdr lazy-pair)))

The designer of a lazy pair of lazy-cons implemented as a macro. This is done in order to avoid calculating the value of the second element of the pair when it is created.

The idea is to create an endless list of values, and then take from it the one you need. To do this, we define a lazy version of the procedure for obtaining an element by index:

 (define (lazy-list-ref lazy-list index) (if (zero? index) (lazy-car lazy-list) (lazy-list-ref (lazy-cdr lazy-list) (- index 1)))) (define (generate-factorials) (define (next nn!) (define n+1 (+ n 1)) (lazy-cons n! (next n+1 (* n! n+1)))) (next 0 1))

Here n! and n+1 are the variable names. In Scheme, in comparison with other languages, there are very few characters that cannot be used in identifiers.

Notice that the generator of the endless list generate-factorials contains no exit from recursion. However, it will not get hung up, because when it is called, only the head of the list will be calculated, the tail will be represented by the promise to calculate the value.

Now you can define $n!$ like getting $n$ -th element of the list of factorials:

 (define lazy-factorials (generate-factorials)) (define (factorial-lazy n) (lazy-list-ref lazy-factorials n))

It works. At the same time, if in the same session of the interpreter, factorials of different numbers are calculated, then the calculations will be faster than in strict versions, because some of the values in the lazy list will already be calculated.

By the way, the Scheme code is very close to that of Haskell. So, the receipt operator !! roughly corresponds to the lazy-list-ref procedure lazy-list-ref constructor : matches lazy-cons . It corresponds approximately because Haskell, although it professes a lazy model of calculations, however, unlike Scheme’s delay / force , does not remember the calculated values.

By the way, to improve performance, you can apply the memorization of already calculated values - memoization. We will store the calculated values in an associative list, in which the keys are numbers, and the values are their factorials. When calling, look at the list for the already calculated values. If the value is in the list, we will return this saved value. If there is no value in the list, we will calculate it, put it in the list, and only then return it. To ensure that this list is always with the function being called, and not in the global environment, we place it in a static variable:

 (define factorial-memoized (let ((memo '())) (lambda (n) (let ((memoized (assq n memo))) (if memoized (cadr memoized) (if (zero? n) 1 (let ((computed (* n (factorial-memoized (- n 1))))) (set! memo (cons (list n computed) memo)) computed)))))))

Static Variables in Scheme

View Code

 (define proc (let ((static-var initial-value)) (lambda args ...)))

is a Scheme method of creating a procedure with a static variable. The principle of such an announcement is conveniently explained in a shorter example - a procedure that returns the number of its calls:

 (define count (let ((n 0)) (lambda () (set! n (+ n 1)) n)))

In one interpreter session, the first call (count) returns 1, the second 2, the third 3, and so on. How it works?

Without syntactic sugar, the definition of count looks like this:

 (define count ((lambda (n) (lambda () (set! n (+ n 1)) n)) 0))

Thus, the name count is associated with a procedure with no arguments (lambda () (set! n (+ n 1)) n) , into which n is freely included. It turns out that n defined in the external environment with respect to (lambda () (set! n (+ n 1)) n) , which means that the value of n will be stored between calls to count . The value n initialized to zero when the program is started, since (lambda (n) ...) is applied to the argument 0. Therefore, n absent in the global environment, but is always available from count .

This implementation also promises performance gains when repeatedly calculating the factorials of different numbers in a single interpreter session.

Of course, tail recursion optimization is also possible here:

 (define factorial-memoized-tco (let ((memo '())) (lambda (n) (define (iteration product counter) (cond ((> counter n) product) (else (set! memo (cons (list counter product) memo)) (iteration (* product counter) (+ counter 1))))) (iteration 1 1))))

“Why are these dances with a tambourine?”, The reader may say. In imperative programming languages, the same thing is written simply - through a cycle, it works quickly and without extra memory. In Scheme there is a subset for imperative programming, there is in it a tool for organizing cycles - a special form do . The procedure for calculating factorial, written with its help, may look like this:

 (define (factorial-do n) (define product 1) (do ((i 1 (+ i 1))) ((> in) product) (set! product (* product i))))

The do construct is fairly versatile, and that is why it is not very readable. Wouldn't it be better to organize your own cycle in an imperative style? This will help macros:

 (define-syntax for (syntax-rules () ((_ (variable init test step) . body) (let loop ((variable init)) (if test (begin (begin . body) (loop step)))))))

By optimizing tail recursion by the interpreter, we get a cycle, to which we are used to imperative programming languages. By optimizing tail recursion, the stack will not grow.

Defining factorial using for :

 (define (factorial-for n) (define product 1) (for (i 1 (<= in) (+ i 1)) (set! product (* product i))) product)

How it works

In this example, the expression (for (i 1 (<= in) (+ i 1)) (set! product (* product i))) will be matched with the pattern (_ (variable init test step) . body) syntax rule. The following substitutions will be made:

 for → _ i → variable 1 → init (<= in) → test (+ i 1) → step (set! product (* product i)) → body

From here, the following code will be generated by the syntax rule template:

 (define (factorial-for n) (define product 1) (let loop ((i 1)) ; этот фрагмент (if (<= in) ; кода (begin (begin (set! product (* product i))) ; сформирован (loop (+ i 1))))) ; макросом for product)

There is another option that looks like an imperative for loop - with the for-each procedure:

 (define (factorial-for-each n) (define product 1) (for-each (lambda (i) (set! product (* product i))) (iota n)) product)

Great and powerful language Scheme! And what about performance?

To measure performance, we’ll use the GNU Guile — in this environment, you can measure the time it takes to calculate an expression.

Guile works as follows: it compiles the source code of the program into bytecode, which is then executed by the virtual machine. This is only one of the implementations and one of several possible ways of executing a program on Scheme, there are others: Racket (uses JIT compilation), Chicken Scheme (uses “fair” interpretation or compilation to subset C), etc. Obviously, both the limitations and the performance of programs in these environments may differ slightly.

We will take measurements at a certain value. $n$ . What should it be $n$ ? The one with which $n$ will be able to "cope" the proposed options. With the Guile 2.0 default settings, on a PC with an Intel Core i5 and 4 GB of RAM, I got the following:

Procedure	Problem
`factorial-classic`	stack overflow on $n> 10 \, 000$
`factorial-classic-tco`	not ( $n = 100 \, 000$ )
`factorial-fold`	stack overflow on $n> 10 \, 000$
`factorial-eval`	stack overflow on $n> 8 \, 000$
`factorial-lazy`	at $n = 100 \, 000$ uses swap partition and freezes
`factorial-memoized`	stack overflow on $n> 10000$ only the first time
`factorial-memoized-tco`	at $n> 1 \, 000$ uses swap partition and freezes
`factorial-do`	not ( $n = 100 \, 000$ )
`factorial-for`	not ( $n = 100 \, 000$ )
`factorial-for-each`	not ( $n = 100 \, 000$ )

Hence, performance tests were performed at $n = 8 \, 000$ . The results are presented in the table below, where $t_ {run}$ - lead time, $t_ {GC}$ - the time of the garbage collector in seconds.
For all procedures, except for the lazy and memoizirovannyh, the smallest values of the execution time and the corresponding time of the garbage collector, obtained on the basis of three launches at $n = 8 \, 000$ .
For memoizovannyh and lazy procedures indicated the time of the first call, then - the smaller of the three calls.

Procedure	$t_ {run}$ , with	$t_ {GC}$ , with	Notes
`factorial-classic`	0.051	0.034
`factorial-classic-tco`	0.055	0.041
`factorial-fold`	0.065	0.059
`factorial-eval`	0.070	0.040
`factorial-lazy`	0.076	0.036	first call
`factorial-lazy`	0,009	-	subsequent calls
`factorial-memoized`	0.077	0.041	first call
`factorial-memoized`	0,002	-	subsequent calls
`factorial-memoized-tco`	0.077	0.041	first call
`factorial-memoized-tco`	0,002	-	subsequent calls
`factorial-do`	0.052	0.025
`factorial-for`	0.059	0.044
`factorial-for-each`	0.066	0.042

We have 4 options that can work with large $n$ . With $n = 100 \, 000$ they have the following times for computing and garbage collection:

Procedure	$t_ {run}$ , with	$t_ {GC}$ , with
`factorial-classic-tco`	8,468	6,628
`factorial-do`	8,470	6,632
`factorial-for`	8,440	6,601
`factorial-for-each`	9,998	7,985

You can see that when not too large $n$ the first is the fastest and, at the same time, the shortest. The same variant most closely matches the mathematical definition of factorial. The variant with tail recursion optimization is not much inferior to it in performance. Both of these options are idiomatic, recommended by the authors of the language. The conclusion is largely obvious: unless otherwise specified, an approach that is “typical” for the language is preferable, at least for the first implementation of an algorithm or method.

At the same time, the Scheme language allowed us to write many options for implementing factorial computing, using a very limited set of primitives (the very “minimum means - maximum impressions”). Therefore, despite the venerable age and not very widespread, this language can still be recommended for research programming: it seems that it can be used in anything and in any way (and in what way) method.

Source: https://habr.com/ru/post/436698/

How many ways can factorial be written on Scheme?

More articles: