![[Commission Home Page]](cpd.gif)
|
|
The IUCr-CPD Homepage is at http://www.iucr.org/iucr-top/comm/cpd/
Computer programming and mathematical modeling using evolutionary computationRoland OlssonØstfold College, Os Alle 11, N-1757 Halden, Norway. E-mail: Roland.Olsson@hiof.no WWW: http://www-ia.hiof.no/~rolando/ (mirrored at CCP14) The production of computer software and mathematical models is one of the cornerstones of modern society, both as an independent industry and as an integrated part of general industrial and research activity. Products as diverse as washing machines, mobile phones, spectrographs, combustion engines and web-browsers are controlled by software with rapidly increasing complexity, correctness problems and cost of development. The reason for this trend, which sometimes is referred to as the software crisis, is that programming is a highly creative activity that often is a form of art rather than science or well-planned engineering. The development of accurate mathematical models in the natural sciences usually builds on a larger scientific foundation than programming, but is still often an empirical science where intuition is more important than pure deduction. There are very many applications where adequate mathematical models are lacking even though substantial measurement data are available. Some of these applications, for example determining the three-dimensional structure of a protein from data obtained using X-ray diffraction, are both important and still not well developed. For instance, the three-dimensional structures of the estrogen receptors alpha and beta remained undetermined until last year even though these structures are essential for the effective design of drugs against common diseases such as osteoporosis and breast cancer. The field of evolutionary computation (EC) contains techniques both for automatic programming and automatic modeling where a computer independently produces programs or models using some form of simulated evolution. The most well known techniques in evolutionary computation are evolutionsstrategie (ES), evolutionary programming (EP), genetic algorithms (GA) and genetic programming (GP). The first three deal almost exclusively with mathematical modeling but the fourth has to a small extent also been applied to automatic programming even though the focus is on modeling also for GP. There are numerous other EC techniques, many of which should not be viewed as an offspring from one of the four above and also many hybrids between EC techniques and standard combinatorial optimization methods such as the simplex method, simulated annealing and tabu search. I have just returned from a stimulating week at the 1998 IEEE World Congress on Computational Intelligence, which included the International Conference on Evolutionary Computation, and found that the field is more diverse than ever and that navigation in this jungle of alternative methods is difficult even for someone who has been active in EC for more than a decade. It is important to have in mind that no matter what inductive mathematical modeling problem you are interested in , there are at least a half dozen rather different methods to choose from and that not one of them is likely to be the "best" method for your problem but only provide you with a starting point for your own method development. Let me now turn to automatic programming using simulated evolution which is more difficult than automatic modeling but also more general and potentially more rewarding. During the last 6 years, my main professional occupation has been to develop ADATE (Automatic Design of Algorithms through Evolution), that currently can produce general programs of moderate complexity from first principles which means that as much code as possible is produced through artificial evolution and that the system is fed a minimum of knowledge before the evolution begins. One can say that the system starts out as a "baby machine" or a "blank sheet" and typically does not even know elementary algorithms such as how to add natural numbers. However, it can learn for example addition, multiplication, sorting, string processing and other moderately complex algorithms in a few days of evolution simulation. The editor of this newsletter asked me to speculate a little about the future of the ADATE system and other similar techniques that are yet to be invented and so I certainly will. However, please keep in mind that my crystal ball may not be particularly reliable even though it is better polished and maintained than any other I have seen so far. The key issue in the success of the ADATE system is scalability which means that neither specification difficulty nor simulation time increase unreasonably much as the synthesized programs become more and more complex. The problem of writing specifications is similar to the problem of writing pedagogical textbooks for schools, be it a university or an elementary school, but a "textbook" for "individuals" produced by the ADATE system must start at a lower level than even textbooks for elementary school. It is relatively easy to write such ADATE "textbooks" for mathematical and logical problems, but more difficult when considering human skills that are more or less directly programmed in our genes, for example pattern recognition, motor control and language. Note that the problems of writing effective algorithms for the latter problems are largely unsolved using traditional methods and that EC may be the best way of creating such algorithms. There is often a fundamental difference between directly writing a program for a given class of problems and writing a textbook teaching how the problems may be solved. For example, we cannot write programs that effectively deal with the complexities and intricacies of human language but we can indeed develop ADATE textbooks for all the important facets of human language. Note that full mastery of such textbooks requires generalizing ability and levels of cognition that are so far unseen in any existing algorithms. Actually, there is no clear theoretical obstacle that would prevent the ADATE system from producing programs with such cognitive capabilities. The main obstacles are practical, in particular the time required to simulate the evolution of complex programs. In natural evolution, nature has employed extremely massive parallelism and a computational effort that we have just begun to comprehend. ADATE is governed by laws of evolution that are likely to be more optimized and well- designed than the seemingly primitive laws of Darwinian evolution. Therefore, an evolution simulation leading to human levels of cognition can conceivably be carried out using ADATE with a computational effort many orders of magnitude smaller than that of natural evolution. My research priority over the next years is to determine and enhance the scalability of the ADATE system, for example obtaining relationships between the complexity of synthesized programs and the corresponding simulation time. Hopefully, these relationships will be clear enough to allow extrapolation to orders of magnitude higher complexities, which may tell us how powerful computers we need to produce algorithms highly skilled in human language, logic, mathematics and general science. Maybe the world’s leading powder crystallographer in 30 years is an algorithm generated through simulated evolution on a parallel computer with ten million Merced-9 CPUs? The 1997 prototype for the ADATE system is available at http://www-ia.hiof.no/~rolando/adate_intro.html It is certainly not yet ready for powder crystallography or anything like it.
Please feel free to email any queries to:
r.j.cernik@dl.ac.uk
|