HELP MEMORY Robert Duncan, Oct 1989 Memory organisation in PML. CONTENTS - (Use g to access required sections) -- Introduction -- The Heap Area -- The Callstack Area -- The Garbage Collector -- The Memory Module -- Summary of Memory Variables and Functions -- Introduction ------------------------------------------------------- The memory used by PML is divided into two distinct sections: a HEAP area and a CALLSTACK area. The heap is used for the allocation of all data structures created by ML programs and by the ML compiler; the callstack is used to maintain control information and local variables for active functions and exception handlers. The two memory areas are discussed in more detail below. It should be noted however, that memory organisation is ultimately determined by the underlying POPLOG system, and as such is somewhat more complex than as described here. In order to satisfy the requirements of other POPLOG languages, each of the two memory areas so far mentioned is further subdivided: the heap into a "true" heap area plus a USERSTACK area, and the callstack into a "true" callstack plus a special PROLOG area. These secondary divisions are not relevant to ML programmers, and are only mentioned to forestall possible confusion arising from descriptions of memory organisation given in other POPLOG documentation. For the full story, see REF * SYSTEM. -- The Heap Area ------------------------------------------------------ The heap stores all the data structures created by ML programs. Every data construction operation claims a certain amount of space from the heap. As a program runs, repeated data construction gradually uses up the available space; once it is completely exhausted a garbage collector is invoked automatically to sweep the heap, reclaiming space which is no longer required and compacting the remaining useful data. Data construction can then proceed again. A large program may call the garbage collector many times during its execution, but this is normally invisible to the user. It can happen, however, that the amount of unused space reclaimed by the garbage collector is insufficient to allow any further data construction, at which point the program will halt with an error. To demonstrate this, try marking and loading the following program which builds a unit list of unbounded length: let fun rom l = rom (()::l) in rom [] end; Running this should provoke the error: MISHAP - rom: MEMORY LIMIT (popmemlim) EXCEEDED (using heap space) DOING : conspair rom .... The MISHAP keyword indicates that this is an error generated by the POPLOG system rather than by PML: such errors cannot be trapped within ML. The message ``MEMORY LIMIT EXCEEDED'' describes the cause of the error: the program requires more memory than is available. The DOING line shows which functions were active at the time the error occurred: here -rom- is obviously active, as is the function -conspair- which is trying to claim space for the list construction operation "::". The size of the heap is not usually fixed: each time the garbage collector is called it may increase or decrease the size of the heap depending on how long the garbage collection took. The size is bounded however by the two variables -Memory.lolim- and -Memory.hilim-, measured in machine words. Both of these are user-assignable: increasing the high limit allows the heap to grow larger, thus increasing the time between garbage collections. Since the time taken by garbage collection is proportional to the amount of storage in use rather than the amount available (see below), this can decrease the overall garbage collection time. However, the larger the heap, the greater the use of operating system resources (in paging, for example) which can often offset any gains made. Changing the low limit places a lower bound on the size of the heap; in particular, setting the low limit equal to the high limit fixes the heap at a constant size. -- The Callstack Area ------------------------------------------------- The PML callstack area behaves much the same as the stack area of a conventional language: every function call or exception handler allocates space on the stack for control information and local variables (a ``stack frame''). This space remains in use for the duration of the call, but can be reclaimed automatically once the call is complete. In general then, the size of the used portion of the callstack area is proportional to the depth of recursion in a program. However, an important optimisation in the compiler allows the stack frame to be reclaimed immediately prior to the last call in the function body. This means that programs which are ``tail recursive'' can run with no overall growth of stack. The size of the callstack area is bounded above by the variable -Memory.stacklim-, measured in machine words. This places a limit on the depth of recursion allowed in a program. Any program which violates this limit provokes an error. The following program is an example: let fun rle l = () :: rle l in rle [] end; This is not tail recursive, because the last call in the body is to the construction operator "::", not to -rle- itself. Running the program produces an error something like: MISHAP - rle: RECURSION LIMIT (pop_callstack_lim) EXCEEDED DOING : rle(*27282) .... Again, this is an error which cannot be handled within ML. Note how the DOING line shows the number of calls to -rle- which occurred before the recursion limit was reached; the exact number will vary between systems. In practice, it's rarely necessary to change the callstack limit as stack overflow is most often caused by a programming error. There is also an external limit on the callstack size imposed by the operating system, and this cannot be exceeded irrespective of how high the -stacklim- variable has been set. -- The Garbage Collector ---------------------------------------------- As mentioned above, the garbage collector is invoked automatically whenever the heap area is exhausted and more space is required. In the normal case it uses a fast``stop-and-copy'' algorithm: a new area of memory of the required size is grabbed from the operating system and all accessible data is copied from the existing heap space into the new space. The new space then becomes the new heap. In the course of copying, the copied data is compacted at the lower end of the new heap space. The garbage collector needs only trace and copy data which is currently in use; thus the time taken is proportional to the amount of useful data and is independent of the amount of garbage. Occasionally, if the heap area is particularly large or if the host operating system is otherwise overloaded, the extra memory space required for the copying is unavailable. In this case the garbage collector will automatically switch to a slower non-copying algorithm. To see what the garbage collector is doing, you can assign -true- to the variable -Memory.gctrace-. After this, the garbage collector will print statistics each time it is called. The output is as follows: GC-(X) MEM: u + f + s = where is the reason for the garbage collection, typically "auto" X is the algorithm used, "C" for copying, "N" for non-copying is the time taken in seconds + is the amount of space in use (heap + userstack) is the amount free is the current heap size You can force a garbage collection at any time with the function -Memory.gc-. In this case, the word printed in the trace output is "user". The variable -Memory.usage- gives the number of words of heap in use after the last garbage collection. There is no way of finding out the amount of space currently in use. Two other functions are provided which can help minimise the amount of time taken by garbage collections. The function -Memory.lock- places a ``barrier'' at the current endpoint of the heap: the garbage collector assumes that anything below the barrier is valuable and must be kept. This means that it need not be copied, thus reducing garbage collection time. This function is typically used to lock in code and data which will be used throughout the run of a program. You should always call the garbage collector immediately before locking the heap to avoid having garbage locked in too. The function -Memory.unlock- removes any heap barrier, opening up the whole heap to the garbage collector. Note that only one barrier can be active at any time: a second call to -Memory.lock- without an intervening unlock will simply move the barrier further up the heap. There is no stack of barriers maintained. -- The Memory Module -------------------------------------------------- signature Memory structure Memory : Memory The memory control variables and functions discussed above are defined in the structure -Memory-, described by the following signature: signature Memory = sig (* The Heap Area *) val hilim : int ref val lolim : int ref (* The Callstack Area *) val stacklim : int ref (* The Garbage Collector *) val gc : unit -> unit val gctrace : bool ref val usage : int val lock : unit -> unit val unlock : unit -> unit end; (* signature Memory *) -- Summary of Memory Variables and Functions -------------------------- val hilim : int ref An upper limit (in machine words) on the size to which the PML heap area is allowed to expand. The actual size of the heap fluctuates according to the proportion of time spent garbage collecting, but will not rise above this value. val lolim : int ref A lower limit (in machine words) on the size to which the PML heap area is allowed to contract. The actual size of the heap fluctuates according to the proportion of time spent garbage collecting, but will not fall below this value. val stacklim : int ref An upper limit (in machine words) on the size of the PML callstack area. This limits the depth of recursion possible in an evaluation. val gc () : unit Forces a garbage collection. val gctrace : bool ref Flag controlling the display of garbage collection statistics. Default value: false. val usage : int The number of words of heap space in use after the last garbage collection. val lock () : unit Places a ``barrier'' at the current heap endpoint: all data below this point is assumed to be non-garbage in subsequent garbage collections. val unlock () : unit Removes the barrier established by a previous -lock- call, opening up the whole heap to the garbage collector. --- C.all/pml/help/memory --- Copyright University of Sussex 1991. All rights reserved. ----------