Andrew Tolmach - Summary of Research Program

My research is about modern, higher-order, securely-typed programming languages: their design, how they can be executed efficiently, how to build programming support tools for them, and how to use them to solve interesting problems easily. There are good reasons to believe that such languages, currently epitomized by Standard ML and Haskell, can make programming significantly easier, faster, less repetitive, and more accurate. Yet there has been little enthusiasm for higher-order languages outside of a small circle of academics. In addition to the usual problems of inertia, this has been due to difficulties (real and perceived) about time and space performance, difficulty in interoperating with other languages, and lack of programming support tools. Moreover, there have been few realistic attempts to apply functional languages to real industrial or commercial problems. My research has therefore focused on attacking these difficulties, with an emphasis on practical implementation issues. I conduct research both at Portland State University and at the Oregon Graduate Institute, where I am a Joint Assistant Professor and a member of the Pacific Software Research Center.

Efficient Implementation of Polymorphism

Building reusable software in a strongly-typed language requires some form of support for polymorphism, which appears in many languages under many names--as ``generics'' in Ada, ``templates'' in C++, and in ML functions and functors. To implement polymorphic functions, a compiler must cope with the fact that different data types are represented differently at machine level and handled differently by the runtime system. To solve this problem, most existing implementations of parametric polymorphism take one of two approaches: either data are stored in a uniform, tagged format that hides their differences in size and layout, or each source function is compiled into multiple versions, each specialized to operate on values of a particular type. The first approach degrades runtime speed and causes problems for integrating polymorphic code with code generated from other languages, and for providing effective debugging and tracing support. The second approach can greatly increase code size.

The aim of my research is this area is to develop a new approach to implementing polymorphism, in which types are explicitly passed as runtime arguments to polymorphic functions. Such implementations should help make polymorphic code more powerful, and address the problems with the conventional approaches. I used this approach to build the first ``tag-free'' garbage collector for ML (Tolmach, ``Tag-free Garbage Collection Using Explicit Type Parameters,'' 1994). I am currently improving the prototype and expanding it to support other uses of runtime type information, such as accurate printing of polymorphic variables.

This research is conducted at Portland State University under a three-year National Science Foundation grant, with the assistance of two MS students.

Interoperability and Simple Type-Based Compilation

However useful functional languages may be, many real-world applications need to take advantage of an existing base of ``legacy'' code written in older, imperative ``third-generation'' languages. Using functional languages in these applications requires a mechanism for integrating functional and imperative code cleanly and efficiently. In collaboration with Dino Oliva at the Oregon Graduate Institute, I have developed a new approach to interoperability that works by translating ML-like code into well-typed, portable, ``vanilla'' Ada83 or ANSI C code, and permits functional and imperative code to be integrated even within a single procedure. This work is reported in (Tolmach and Oliva, ``From ML to Ada: Strongly-typed Language Interopability via Source Translation,'' 1997)

Optimizing compilers for functional languages are complex and hard to validate, so techniques for making them simpler and more trustworthy are worth developing. Keeping programs fully typed at every stage of the compilation process provides an important tool for increasing confidence in compiler correctness, and can also be used to improve the quality of generated code, but most compilers that try to do this end up with quite complicated type systems. In addition to supporting interoperability, the compiler described above is fully type-preserving but also keeps types simple; it achieves this via a novel synthesis of techniques, including removal of polymorphism by code specialization, removal of higher-order functions using closure datatypes and interpretation, and aggressive optimization of the resulting first-order code. These aspects are also reported in (Tolmach and Oliva).

The functional language world has long been divided between partisans of ``eager'' and ``lazy'' evaluation semantics; the differences between them are not huge, but have limited the sharing of compilation technology. With colleagues and visitors at the Oregon Graduate Institute, I have begun work to establish a common, neutral compiler framework that supports both sorts of evaluation (Peyton Jones, Launchbury, Shields, and Tolmach, ``Bridging the Gulf: a Common Intermediate Language for ML and Haskell,'' 1998.)

These aspects of my research are conducted primarily at the Oregon Graduate Institute, under funding from the US Air Force, in collaboration with faculty, post-doc, student, and staff colleagues.

Support Tools and Infrastructure

Programming projects of significant size require support tools like browsers, profilers, and debuggers. Building these tools is not always ``merely engineering.'' For example, conventional, machine-based debuggers are difficult to integrate with heavily optimizing compilers, so I built a machine-independent debugger for Standard ML based on source-to-source transformation. The debugger also made novel use of first-class continuations to provide a replay or ``reverse execution'' facility for debugging (Tolmach and Appel, ``A Debugger for Standard ML,'' 1995). This work was conducted as part of my doctoral dissertation.

Higher-order languages are particularly useful for programming multi-threaded applications. In collaboration with Greg Morrisett I built a clean, portable platform for implementing concurrent applications on multi-processors (Morrisett and Tolmach, ``Procs and Locks: A Portable Multiprocessing Platform for Standard ML of New Jersey,'' 1993).

Application to Domain-specific Languages

Getting functional languages used in real-world projects is essential to justify, motivate, and guide continued research in language and compilation technology. One particularly promising area of application is domain-specific language development. With colleagues at the Oregon Graduate Institute, I work to apply functional language technology to the design and implementation of program analyzers and generators for domain-specific languages, and to move this technology into actual use in industrial settings. The domain-specific languages are specified, analyzed, and translated using functional languages, and in some cases the generators also produce functional-style intermediate code.

In particular, I have led an effort to integrate a generator for a domain-specific Message Translation Language into PRISM, a working Air Force command and control demonstration system. This effort is funded by the Air Force. I have also recently begun a joint project with Tektronix, Inc. to design and prototype a domain-specific language for describing resource scheduling problems in video networks. This project is partially funded by Tektronix.

Andrew P. Tolmach
Tue Sep 23 11:02:49 PDT 1997