Taxon Sampling Rationale

General Sampling Issues

We selected species for the FGP by balancing the following major criteria:
a) phylogenetic position
b) diversity of floral-organ structure (but absence of highly specialized floral features)
c) direct relevance to crop or economic plants
d) diploid with a small genome size
e) availability of inbred lines, when possible
f) possession of other desirable properties, such as large numbers of flowers per plant, transformability, and having been the focus of prior flower developmental study
e) non-duplication of ongoing studies of the floral transcriptome—we will include in our analysis public data (but not duplication of effort) from ongoing studies of model plants (Arabidopsis, tomato, maize, rice, alfalfa, soybean, cotton, etc.)

Phylogenetic relationships -- We will sample critical “missing link” taxa that occupy pivotal phylogenetic positions, allowing us to link model plants and provide sampling of key basal lineages in the flowering plants, monocots, eudicots, and outgroups. Furthermore, it is important that the phylogenetic relationships of the exemplars is already well known, so that hypotheses of gene orthology and paralogy can be more easily determined. The recent elucidation of phylogenetic relationships among major clades of angiosperms (e.g., P. Soltis et al., 1999; D. Soltis et al., 2000; Savolainen et al., 2000a, b) ensures that this is the case for all of the taxa included.

Diversity of floral organ structure – The evolutionary gaps between gymnosperms, monocot, and eudicot model systems are enormous, particularly from the perspective of floral evolution. Although the eudicots represent about 75% of angiosperm species (Drinnan et al., 1994), most of the diversity in arrangement and number of floral parts actually occurs among basal angiosperm lineages, such as Nymphaeales (water lilies), Amborella (the sister to all other extant angiosperms either alone or with Nymphaeales, P. Soltis et al., 2000; Barkman et al., 2000), Magnoliaceae (magnolias and yellow poplar), Laurales (including avocado, sassafras, laurels, and others), Winteraceae, Piperaceae (black pepper family), and Chloranthaceae (with its reduced flowers that sometimes consist of a single reproductive structure). The perianth of basal angiosperms often lacks clear sepal-petal distinction, and “sepals” in these plants may simply result from tepal exposure in bud [M. Frohlich, unpub.]. Furthermore, floral organization and development are considered “open” and highly labile in basal angiosperms (e.g., Endress, 1987, 1994). In contrast, in most eudicots, numbers of floral parts are low (i.e., four or five) and fixed, and floral organs are distinct and arranged in whorls, suggesting that the basic floral Bauplan became canalized during the early diversification of the eudicots (e.g., Endress, 1987, 1994; Albert et al., 1998, Zanis et al., in press). Thus, critical components of the floral genetic program may have evolved among the most basal lineages of angiosperms. Our sampling is designed to include the diversity of fundamental floral forms, while avoiding species with highly specialized floral structures.

The genome sizes of a few of the species we have proposed to study (Amborella, Nuphar, Ribes) have not yet been determined. We plan to obtain these values prior to library construction. In the case of Nuphar and Ribes, numerous alternative species could be selected in case the genome of our target species is unduly large. Amborella is monotypic and not replaceable with any other species from its lineage. In this case, if the genome is unexpectedly large, it will not be possible to pick an alternative, but that knowledge will help direct our further experiments.

Inbred, homozygous lines – Whenever possible we will focus our sampling on inbred, homozygous lines, which are common among cultivated crop plants. However, inbred lines are not available for all of the study plants. For example, inbred lines of Liriodendron (yellow poplar, an important lumber tree) are not available due to the severe inbreeding depression that occurs when trees are selfed. However several clonal populations and large full-sib F1 progeny families are available for Liriodendron through our colleague S. Schlarbaum (Tennessee Valley Authority and U. of Tennessee tree breeding and seed orchard program; pers. comm.) which will permit us to use segregation analysis to determine which EST variants are allelic and which are from different loci among gene families. Some plants may be largely homozygous, e.g., Saruma, as it self-pollinates regularly, has small populations in the wild, and American nursery plants likely derive from the introduction at the U.S. National Arboretum. Other species will be heterozygous at many loci, but recent gene duplications not easily distinguished from distinct alleles at one locus, are NOT the focus of our work. We will focus on ancient gene families shared by multiple species. Hence, availability of inbred lines, although desirable, is not a crucial issue for every plant in this study.

Other properties -- We selected species with other desirable experimental properties. Plants for which material is readily available and that can be easily cultured in the greenhouse are desirable, as are those species that produce numerous flowers for prolonged periods. An ability to obtain numerous flowers at different stages of ontogeny is important for our gene expression studies. Flower size is also important. Plants with moderate-sized to large-sized flowers are preferred for floral dissection, facilitating the isolation of floral organs, and in situ hybridization studies. It is also useful if a plant has been the focus of prior floral developmental research. The more that is already reported, or known by an expert, the better.

Strategy for more distant relatives – The FGP has selected two gymnosperms (Zamia and Welwitschia) for sampling homologs of floral genes to test the principal hypotheses of the project, i.e., the “Mostly Male Theory” of M. Frohlich and a series of hypotheses about the origin of petals, sepals, and other floral structures in the angiosperms. We will link this knowledge with planned sampling of expressed gene sequences in the gymnosperm Ginkgo, and the more distantly related plants, a fern (Ceratopteris) and moss (Physcomitrella, Tortula). These will be sampled as part of the Integrated Research Challenge project recently proposed to NSF by Brent Mishler (with C. dePamphilis, H. Ma, P. Soltis, and D. Soltis, co-PIs).

Open to discussion – Although we believe that we have given careful consideration to taxon choice and sampling considerations, we recognize that the panel, or NSF, may suggest a different balance of effort in sampling economic plants vs. phylogenetically representative basal lineages. If prices for sequencing can be brought down, it may be possible to increase the depth of coverage for each of the crop species. Alternatively, we are open to suggestions from the panel and from NSF on choice of taxa.

Taxa Chosen for “Deep” EST Coverage

The basal angiosperms we have chosen are: Amborella trichopoda (Amborellaceae), Nuphar advena (waterlily, Nymphaeaceae), Liriodendron tulipifera (yellow or tulip poplar, Magnoliaceae), Persea americana (avocado, Lauraceae), Saruma henryi (Aristolochiaceae), and Acorus americanus (Acoraceae). We have also chosen the early-diverging eudicot, Eschscholzia californica (california poppy, Papaveraceae). We will also include two distantly related gymnosperms: Zamia fischeri (Cycadaceae) and Welwitschia mirabilis (Gnetales). Features common to both may represent the relatively unspecialized gymnosperm condition, possibly ancestral to that of the angiosperms.

Several alternative study taxa are also described below. These afford our research flexibility should we encounter unforeseen difficulties. For each species, we provide key features, such as crop status, ploidy, genome size, information regarding availability, phylogenetic placement, general floral characteristics, and additional footnotes to personal research experience (Table 1 in FGP grant proposal). The phylogenetic placement of these exemplars is illustrated on the Taxa page.

Taxa Chosen for “Shallow” EST Coverage

In addition to the taxa chosen for “deep” EST study, we have also selected additional species for “shallow” EST coverage. These taxa provide coverage particularly for additional lineages of core eudicots and hence will enhance genome linkages among angiosperms (Fig. 2, FGP grant proposal). Although no individual library will be sampled deeply, each taxon will contribute many sequences to gene family analyses, identification of genes of wide distribution, and a thorough representation of moderately and highly expressed genes.

Alternative Choices

Although sister to all other monocots and thus in an important phylogenetic position, Acorus may be difficult from other perspectives. The flowers are very small and flowering is sporadic. A member of the large Asparagales clade, which is sister to the large commelinoid clade that contains the grasses, may be a good alternative. Several major crops (Allium, Asparagus) would be strong possibilities for an alternative “deep” EST monocot.

Prunus persica (peach; Rosaceae) provides another economically important rosid species. Prunus is the second most economically important genus of Rosaceae following Malus (apples, which are polyploid) and includes almonds, cherries, nectarines, peaches, and plums. Prunus persica is diploid with 2n = 16 with a small genome size (4C = 1.87).

As a representative of the water lilies, Nuphar is slightly preferred over Nymphaea, because of the distinct sepals and petals of Nuphar. Furthermore, C-class genes have been obtained from Nuphar advena (M. Zanis et al., unpubl.). The study could be performed with Nymphaea, but may be somewhat less helpful for diagnosing the homology of important floral structures.

Lactuca sativa (lettuce; Asteraceae) could be used to represent the large euasterid II clade. It is diploid with 2n = 18 and a mean genome size of 4C = 9.0. Although Helianthus annuus (sunflower) is also an important crop, chromosome numbers for the genus are high (2n = 34) and these may be of ancient polyploid origin. However, the genome size for Helianthus is comparable to that of Lactuca (9.7). Another candidate of Asteraceae is Gerbera, which has been the target of recent investigations of floral developmental genetics (V. Albert research papers; see PCR in situ image elsewhere on this website).

Medicago trunculata (Fabaceae). Both the rosid and asterid clades are huge, together comprising over one-half of all angiosperm species; both consist of at least two major subclades. Arabidopsis is a member of the eurosid II clade (Fig. 2 in FGP grant proposal); we therefore could select Medicago to represent eurosid I. In addition, Fabaceae (legumes) are of enormous economic importance. Medicago truncatula is diploid (2n = 16), with a small genome (mean 4C = 4.9; lowest value = 1.9), is closely related to alfalfa, and can be transformed. Several very large and ongoing studies of expressed genes in Medicago and Glycine, and the possibility of a full-scale genome study of Medicago, should provide adequate information from legumes without inclusion of Medicago in this study.

Rejected Alternatives

A large number of potentially useful taxa were considered for inclusion in the FGP, but rejected for one reason or another. Below is a partial list of rejected taxa and the reasons for their rejection.