![[Commission Home Page]](cpd.gif)
|
|
The IUCr-CPD Homepage is at http://www.iucr.org/iucr-top/comm/cpd/
Evolving Techniques in Powder Structure Solution -- an Introduction to the Genetic AlgorithmKenneth D. M. Harris, Roy L. Johnston and Benson M. KariukiSchool of Chemistry, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom. E-mail: harrikdm@isdux1.bham.ac.uk; roy@tcindy4.bham.ac.uk; WWW: http://chemwww2.bham.ac.uk/research_labs/Research_Profiles/harris.html
1 Challenges in structure solution from powder diffraction dataThe techniques currently available for structure solution from powder diffraction data can be subdivided broadly into two categories -- ``traditional'' approaches and ``direct- space'' approaches. The traditional approach [1, 2, 3] for solving crystal structures directly (ab initio) from powder diffraction data involves extracting the intensities I(hkl) of individual reflections directly from the powder diffraction pattern, with the structure then solved using the types of structure solution calculation used for single crystal diffraction data (e.g. direct methods or Patterson methods). However, as there is usually extensive overlap of peaks in the powder diffraction pattern, it is often problematic to extract unambiguous values of the intensities I(hkl) of the individual diffraction maxima. To overcome this problem requires either improved techniques for extracting peak intensities (there have been several important recent developments in this area), or the use of new structure solution strategies (see below) that allow the powder diffraction profile to be used directly in its ``raw'' digitized form. In the direct-space strategy [2, 4, 5], trial crystal structures are generated in directspace, independently of the experimental powder diffraction data, with the suitability of each trial structure assessed by direct comparison between the powder diffraction pattern calculated for the trial structure and the experimental powder diffraction pattern. This comparison is quantified using an appropriate R-factor, with the majority of direct-space methods to date having used the weighted profile R-factor (Rwp ), as used in Rietveld refinement. We emphasize that Rwp considers the whole digitized intensity profile rather than the integrated intensities of individual diffraction maxima; it thus implicitly takes care of peak overlap and uses the digitized powder diffraction data ``as measured''. Alternatively, other definitions of R-factor based on extracted peak intensities may be used. Clearly, the aim of the direct-space strategy is to find the trial crystal structure corresponding to the lowest value of the R-factor, and the direct-space structure solution techniques are therefore equivalent to exploring a hypersurface R(X) to find the best structure solution (i.e. the structure corresponding to the global minimum in R-factor). Here fXg represents the set (string) of variables (parameters) that define the structure. In principle, any technique for global minimization may be used in this context and much success has been achieved in the use of Monte Carlo [4, 6, 7] and Simulated Annealing [5, 8, 9] search algorithms to explore the R(X) hypersurface. Recently, Genetic Algorithms have also been applied in this field. This paper focuses on fundamental aspects of applying a Genetic Algorithm to accomplish global minimization with respect to the R(X) hypersurface. The Genetic Algorithm (GA) is an optimization technique, based on the principles of evolution, and involves familiar evolutionary operations such as mating (crossover), mutation and natural selection [10]. Through natural selection, the fittest members of a population survive and procreate, passing their genetic information into subsequent generations. A crucial feature of the GA approach is that it operates essentially in a parallel manner, with many different regions of parameter space investigated simultaneously. Furthermore, information concerning different regions of parameter space is passed actively between the individual strings by the mating procedure, disseminating genetic information throughout the population. The possibility of using GA techniques in structure solution from powder diffraction data has been realized independently by two research groups. Our approach [11, 12, 13, 14, 15] described here and the approach of Shankland et al. [16, 17] differ in the definition and handling of the fitness function as well as other aspects concerning the way in which the Genetic Algorithm is implemented. Details may be found in the papers cited.
2 How does a Genetic Algorithm work?
Figure 1. Flow chart representing the evolution of the population from one generation (population Pj) to the next generation (population P j+1 ) in the program GAPSS. Our GA approach for structure solution from powder diffraction data has been implemented in the program GAPSS [18]. A schematic flow chart describing the operation of this program is shown in Figure 1. Our method and program have been discussed in detail elsewhere [11, 12, 13, 14, 15] (see in particular ref. [13]). In our GA, each member of the population is a trial crystal structure, defined by the position, orientation and internal geometry of a ``structural fragment'' (representing an appropriate set of atoms within the asymmetric unit). The ``fitness'' of each structure in the population depends on Rwp , and is quantified using the function:
where
and Rmin and Rmax are the lowest and highest values of Rwp in the current population. The fitness function defined above has been designed from our knowledge of the typical nature of Rwp hypersurfaces. The initial population comprises N P randomly generated structures, and the set X of variables that defines each structure can be regarded as its ``genetic code''. Subsequent generations of the population are produced through well- defined evolutionary procedures (Figure 1). Mating (``crossover'') involves selecting pairs of structures (with probability of selection proportional to fitness), and generating offspring by combining genetic information from the two parents. Any offspring that are identical to an exisiting structure in the population are deleted immediately, preventing premature convergence of the entire population towards a single structure. Diversity of the population is ensured by introducing a few mutant structures within each generation; these are generated by randomly selecting structures from the population and introducing random changes to parts of their genetic codes. Natural selection ensures that only the best structures survive and the overall fitness of the population improves from one generation to the next. 3 Case study: ortho-thymotic acid
Figure 2. (a) Molecular structure of ortho-thymotic acid. (b) Structural fragment used in the GA structure solution calculation for ortho-thymotic acid. Several structures of varying degrees of complexity have been solved using our GA method. As an illustrative example, we highlight here the structure solution of ortho- thymotic acid (Figure 2a), which represented the first application of our GA method to solve a previously unknown structure [11]. Measurement of the powder X-ray diffractogram for ortho-thymotic acid, unit cell determination (a = 11.08 Å, b = 8.15 Å, c = 11.78 Å, b = 100.2° ) and space group assignment (P21/n) have been discussed elsewhere [11]. There is one molecule in the asymmetric unit. In our GA structure solution calculation for ortho-thymotic acid, the structural fragment (Figure 2b) comprised all non- hydrogen atoms in the molecule. Standard geometries (bond lengths and bond angles) were used, with the lengths of the two C-O bonds in the carboxylic acid group taken to be equal. The structural fragment was flexible, with the intramolecular geometry defined by the two torsion angles t1 (describing rotation about the C-C bond between the carboxylic acid group and the benzene ring) and t2 (describing rotation about the C-C bond between the iso- propyl group and the benzene ring), as indicated in Figure 2b. The position of the structural fragment was defined by the {x, y, z} coordinates of the centre of mass of the molecule and its orientation by the angles {q; f, j}. Thus, each structure (i) was defined by eight parameters which were divided up into four groups:{xi , yi , zi | qi; fi, ji | t1i | t2i}. In mating, one offspring was generated by selecting two groups at random from one parent and combining them with the two complemantary groups taken from the other parent. The remaining groups were then combined to generate the second offspring. Mutations involved making a random change to two of the four groups. For the groups {xi , yi , zi} and {qi; fi, ji}, mutation involved changing only one of the three parameters.
Figure 3. Evolutionary Progress Plot showing the evolution of Rmin (darker circles) and R ave (lighter circles), as a function of generation number, in the GA structure solution calculation for ortho-thymotic acid.
Figure 4. Final refined crystal structure of ortho-thymotic acid (hydrogen atoms not shown) viewed along the b axis. The progress of the GA structure solution calculation can be monitored by constructing an Evolutionary Progress Plot (EPP), which shows the best (Rmin) and average (Rave) values of Rwp for the population as a function of the generation number. In the EPP for ortho-thymotic acid (Figure 3), Rmin and Rave both decrease rapidly in the early generations. R min flattens out after the 39th generation, with a further drop at the 80th generation. The quality of the structure solution may be judged by comparing the calculated and experimental powder diffraction profiles following full Rietveld refinement, as well as by assessing the chemical and structural plausibility of the final refined structure (Figure 4). Rietveld refinement from the best structure solution obtained in the GA calculation gave Rwp = 3.2 %. The final refined crystal structure shown in Figure 4 is completely reasonable on structural and chemical grounds [11]. For example, the structure is found to exhibit the familiar carboxylic acid dimer motif, without this (or any other type of) intermolecular contact being imposed during the GA structure solution calculation. The best structure solution in the plateau region (see Figure 3) extending from the 39th to the 79th generation (with Rmin » 19 %) also refines to the same structure, indicating that the correct structure solution has been found relatively early in the GA calculation. 4 Future prospectsThe successful application of Genetic Algorithms within the framework of the directspace approach for crystal structure solution from powder diffraction data has been clearly demonstrated by several examples, including the example highlighted above. In general, the correct structure solution is found after a relatively small number of generations in the GA calculation, suggesting that the GA approach is inherently an efficient means of searching the R(X) hypersurface. Although there has been a good deal of early success in the application of GAs in powder structure solution, there is nevertheless considerable scope for future development and optimization of the strategies for implementing GAs in this field. In this regard, we are currently pursuing the following directions of research [13]: (i) exploring fundamental aspects of the GA technique, leading towards new and optimized procedures for searching the R(X) hypersurface; and (ii) developing new ways of defining the hypersurface itself, so that global optimization may be achieved more efficiently.In summary, structure solution approaches employing Genetic Algorithms represent a valuable addition to the range of techniques (both of the traditional type and the direct-space type) that are currently available for structure solution from powder diffraction data. The future application of these techniques promises to reveal new and important insights into structural properties of solids across the full range of disciplines within solid state and materials sciences.
References
Please feel free to email any queries to:
r.j.cernik@dl.ac.uk
|