
Taxon Sampling Rationale
General Sampling Issues
We selected species for the FGP by balancing the following major criteria:
a) phylogenetic position
b) diversity of floral-organ structure (but absence of highly specialized floral features)
c) direct relevance to crop or economic plants
d) diploid with a small genome size
e) availability of inbred lines, when possible
f) possession of other desirable properties, such as large numbers of flowers per plant, transformability, and having
been the focus of prior flower developmental study
e) non-duplication of ongoing studies of the floral transcriptomewe will include in our analysis public data
(but not duplication of effort) from ongoing studies of model plants (Arabidopsis, tomato, maize, rice, alfalfa,
soybean, cotton, etc.)
Phylogenetic relationships -- We will sample critical missing link taxa that occupy pivotal phylogenetic positions, allowing us to link model plants and provide sampling of key basal lineages in the flowering plants, monocots, eudicots, and outgroups. Furthermore, it is important that the phylogenetic relationships of the exemplars is already well known, so that hypotheses of gene orthology and paralogy can be more easily determined. The recent elucidation of phylogenetic relationships among major clades of angiosperms (e.g., P. Soltis et al., 1999; D. Soltis et al., 2000; Savolainen et al., 2000a, b) ensures that this is the case for all of the taxa included.
Diversity of floral organ structure The evolutionary gaps between gymnosperms, monocot, and eudicot model systems are enormous, particularly from the perspective of floral evolution. Although the eudicots represent about 75% of angiosperm species (Drinnan et al., 1994), most of the diversity in arrangement and number of floral parts actually occurs among basal angiosperm lineages, such as Nymphaeales (water lilies), Amborella (the sister to all other extant angiosperms either alone or with Nymphaeales, P. Soltis et al., 2000; Barkman et al., 2000), Magnoliaceae (magnolias and yellow poplar), Laurales (including avocado, sassafras, laurels, and others), Winteraceae, Piperaceae (black pepper family), and Chloranthaceae (with its reduced flowers that sometimes consist of a single reproductive structure). The perianth of basal angiosperms often lacks clear sepal-petal distinction, and sepals in these plants may simply result from tepal exposure in bud [M. Frohlich, unpub.]. Furthermore, floral organization and development are considered open and highly labile in basal angiosperms (e.g., Endress, 1987, 1994). In contrast, in most eudicots, numbers of floral parts are low (i.e., four or five) and fixed, and floral organs are distinct and arranged in whorls, suggesting that the basic floral Bauplan became canalized during the early diversification of the eudicots (e.g., Endress, 1987, 1994; Albert et al., 1998, Zanis et al., in press). Thus, critical components of the floral genetic program may have evolved among the most basal lineages of angiosperms. Our sampling is designed to include the diversity of fundamental floral forms, while avoiding species with highly specialized floral structures.
- Crop plants --We selected crop plants whenever possible, but we stress that most basal angiosperms are not economically important. We have not included crop plants that are already the focus of intensive investigation of floral genetic architecture. These include maize and other grasses, Arabidopsis, Ranunculaceae, tomato (Solanum), and cotton (Gossypium). Our goal is to link these model plant systems with a more phylogenetically representative suite of taxa that exhibit diverse floral ground plans.
- Diploids and small genomes -- We focused on plants suggested to be diploid based on published chromosome counts (e.g., Federov, 1969) and plants known to have small genome sizes. Even in a small, diploid genome such as Arabidopsis, genome sequencing efforts have revealed numerous duplicated genes. Known polyploids with the added complexity of recent genome duplication are not optimal for our investigation but will be included where necessary to sample the floral diversity of interest (e.g., Liriodendron). However, our efforts will provide data and critical links relevant to the study of genome doubling (e.g., Osborn et al.Functional Genomics of Plant Polyploids; Biocomplexity Incubation grant to P. Soltis and D. Soltis). Small genomes are not crucial to our central EST-based research goals; however, the use of small diploid genomes will increase our ability to isolate and analyze genomic clones, and increase the likelihood that the plants we study will eventually be used for more detailed genome research.
The genome sizes of a few of the species we have proposed to study (Amborella, Nuphar, Ribes) have not yet been determined. We plan to obtain these values prior to library construction. In the case of Nuphar and Ribes, numerous alternative species could be selected in case the genome of our target species is unduly large. Amborella is monotypic and not replaceable with any other species from its lineage. In this case, if the genome is unexpectedly large, it will not be possible to pick an alternative, but that knowledge will help direct our further experiments.
Inbred, homozygous lines Whenever possible we will focus our sampling on inbred, homozygous lines, which are common among cultivated crop plants. However, inbred lines are not available for all of the study plants. For example, inbred lines of Liriodendron (yellow poplar, an important lumber tree) are not available due to the severe inbreeding depression that occurs when trees are selfed. However several clonal populations and large full-sib F1 progeny families are available for Liriodendron through our colleague S. Schlarbaum (Tennessee Valley Authority and U. of Tennessee tree breeding and seed orchard program; pers. comm.) which will permit us to use segregation analysis to determine which EST variants are allelic and which are from different loci among gene families. Some plants may be largely homozygous, e.g., Saruma, as it self-pollinates regularly, has small populations in the wild, and American nursery plants likely derive from the introduction at the U.S. National Arboretum. Other species will be heterozygous at many loci, but recent gene duplications not easily distinguished from distinct alleles at one locus, are NOT the focus of our work. We will focus on ancient gene families shared by multiple species. Hence, availability of inbred lines, although desirable, is not a crucial issue for every plant in this study.
Other properties -- We selected species with other desirable experimental properties. Plants for which material is readily available and that can be easily cultured in the greenhouse are desirable, as are those species that produce numerous flowers for prolonged periods. An ability to obtain numerous flowers at different stages of ontogeny is important for our gene expression studies. Flower size is also important. Plants with moderate-sized to large-sized flowers are preferred for floral dissection, facilitating the isolation of floral organs, and in situ hybridization studies. It is also useful if a plant has been the focus of prior floral developmental research. The more that is already reported, or known by an expert, the better.
Strategy for more distant relatives The FGP has selected two gymnosperms (Zamia and Welwitschia) for sampling homologs of floral genes to test the principal hypotheses of the project, i.e., the Mostly Male Theory of M. Frohlich and a series of hypotheses about the origin of petals, sepals, and other floral structures in the angiosperms. We will link this knowledge with planned sampling of expressed gene sequences in the gymnosperm Ginkgo, and the more distantly related plants, a fern (Ceratopteris) and moss (Physcomitrella, Tortula). These will be sampled as part of the Integrated Research Challenge project recently proposed to NSF by Brent Mishler (with C. dePamphilis, H. Ma, P. Soltis, and D. Soltis, co-PIs).
Open to discussion Although we believe that we have given careful consideration to taxon choice and sampling considerations, we recognize that the panel, or NSF, may suggest a different balance of effort in sampling economic plants vs. phylogenetically representative basal lineages. If prices for sequencing can be brought down, it may be possible to increase the depth of coverage for each of the crop species. Alternatively, we are open to suggestions from the panel and from NSF on choice of taxa.
Taxa Chosen for Deep EST Coverage
The basal angiosperms we have chosen are: Amborella trichopoda (Amborellaceae), Nuphar advena (waterlily, Nymphaeaceae), Liriodendron tulipifera (yellow or tulip poplar, Magnoliaceae), Persea americana (avocado, Lauraceae), Saruma henryi (Aristolochiaceae), and Acorus americanus (Acoraceae). We have also chosen the early-diverging eudicot, Eschscholzia californica (california poppy, Papaveraceae). We will also include two distantly related gymnosperms: Zamia fischeri (Cycadaceae) and Welwitschia mirabilis (Gnetales). Features common to both may represent the relatively unspecialized gymnosperm condition, possibly ancestral to that of the angiosperms.
Several alternative study taxa are also described below. These afford our research flexibility should we encounter unforeseen difficulties. For each species, we provide key features, such as crop status, ploidy, genome size, information regarding availability, phylogenetic placement, general floral characteristics, and additional footnotes to personal research experience (Table 1 in FGP grant proposal). The phylogenetic placement of these exemplars is illustrated on the Taxa page.
- Amborella - Amborella trichopoda is the only living member of Amborellaceae. It is a critical exemplar; recent studies have indicated that it is the sister to all other flowering plants (Qiu et al., 1999, 2000; P. Soltis et al., 1999, 2000; Parkinson et al., 1999; Mathews and Donoghue, 1999; Zanis et al., in press) (Fig. 2 in FGP grant proposal). Although restricted to New Caledonia and only recently brought into cultivation (once cultivated, it is easily maintained), we will be given plants by the University of CaliforniaSanta Cruz Arboretum and National Tropical Botanical Gardens (supporting letters on file). Amborella has unisexual flowers with an indeterminate number of spirally arranged parts; the perianth consists of tepals. The species is diploid with 2n = 26; genome size has not yet been estimated. The plants are dioecious, so two libraries will be sampled from male and female floral tissues, and EST sequencing will be performed separately on each component library.
- Nuphar - Some recent studies have suggested that Nymphaeaceae + Amborella are sister to all other flowering plants (Barkman et al., 2000). However, in most analyses (P. Soltis et al., 1999; D. Soltis et al., 2000; Savolainen et al., 2000; Qiu et al., 1999; Parkinson et al., 1999; Matthews and Donoghue, 1999), including the most recent and most comprehensive analyses of basal angiosperms to date (Zanis et al., submitted), Amborella continues to receive strong bootstrap support as the sister to all other extant angiosperms. However, we cannot consistently reject the alternative hypothesis of Nymphaeaceae + Amborella as sister to all other flowering plants. Hence, we have also included Nuphar (Nymphaeaceae). Whereas some water lilies have numerous, spirally arranged parts and undifferentiated perianths, reconstructions of floral evolution suggest that these characteristics are derived (Zanis et al., in press). Basal lineages (Cabomba, Brasenia, Nuphar, and Barclaya) are trimerous with differented sepals and petals. We have chosen Nuphar advena because of its phylogenetic position near the base of Nymphaeaceae (Les et al., 1999). We have preliminary data for Nuphar, including the isolation of C-class MADS box genes. Low chromosome numbers are 2n = 34 (Nuphar ) and 2n = 28 for Nymphaea. Genome size for Nymphaea is small ; no values have yet been reported for Nuphar.
- Persea, Liriodendron, and Saruma - Three genera were chosen to represent the eumagnoliid clade (Fig. 2 in FGP grant proposal) because it contains most of the species and much of the floral diversity of basal angiosperms: Persea, Liriodendron, and Saruma (Aristolochiaceae). The taxa chosen exhibit some of this enormous floral diversity. Persea americana (avocado) has parts in whorls, with two whorls of three tepals, stamens with well-differentiated anthers and filaments in three or four whorls, and a single carpel. Persea is of economic importance as a fruit crop, is diploid (2n = 24) with a small genome size (907 Mbp). Liriodendron also has tepals, but numerous spirally-arranged laminar (leaf-like) stamens and carpels. Liriodendron tulipifera (yellow or tulip poplar) is a valuable timber tree and is also transformable. It exhibits the lowest chromosome number in Magnoliaceae (2n = 38) and has a small genome size (784 Mbp). Saruma is unusual in having what appears to be well-differentiated sepals (3) and petals (3); all other Aristolochiaceae have one perianth whorl of three parts, which are considered to be sepals by convention (Cronquist, 1987). The petals of Saruma are hypothesized, based on studies of floral development, to be derived from stamens, providing a testable hypothesis for the origin of these petals. In contrast, the petals of many angiosperms are considered to be developmentally homologous with sepals. Thus, the inclusion of species with petals apparently homologous to either sepals or stamens allows testing of critical hypotheses on the origin of petals. Saruma henryi has a moderate-sized genome (3136 Mbp).
- Acorus - One basal monocot was chosen, Acorus americanus (Acoraceae). It is well supported as sister to all other monocots (Fig. 2 in FGP grant proposal). The genome size is small (392 Mbp). Flowers are small and bisexual, with 4-6 small perianth segments.
- Eschscholzia Inclusion of early-diverging eudicots is critical in that the basic floral Bauplan seems to remain open or flexible through these lineagesit is only in the core eudicots that extensive canalization is evident. Also, in some mapping investigations, one of the origins of well-differentiated sepals and petals occurs at the base of the eudicots. Eschscholzia (Papaveraceae), representing the early-diverging eudicots, has two sepals, two whorls of petals, whorls of numerous stamens, and several fused carpels. Our work on Papaveraceae will complement the ongoing work of Vivian Irish (Yale University), which focuses on PCR-based isolation of MADS-box genes from poppies and many other angiosperm species.
- Welwitschia - Welwitschia mirabilis represents the gymnosperm clade Gnetales. We rejected conifers for a major focus in this study because their female units (cone scales) are complex fusions of many parts; already funded genomic research with pine, a conifer, will provide comparative evidence for this group. In Welwitschia the parts are separate, with clearly defined morphology. Both male and female cones have numerous reproductive units, and show gradate development, with all stages available for ca. 6 weeks in the spring (M. Frohlich, pers. obs.). Both Huntington Gardens and California State UniversityFullerton agree to supply material. Both chromosome number (2n = 42) and genome size (12,740 Mbp) are fairly large. As in Amborella, the plants are dioecious, so separate libraries will be constructed from male and female cone tissues, and EST sequencing will be performed separately on each library.
- Zamia Zamia represents what may be the sister lineage to all other extant gymnosperms (Chaw et al., 1997, 2000; Bowe et al., 2000; P. Soltis et al., 1999). It is well removed from Gnetales and provides an important source of gymnosperm diversity. Zamia fischeri is easily obtained commercially, with both male and female plants readily available. The large genome size of most gymnosperms will not be any particular barrier to our proposed work on expressed gene sequences, but would slow the pace of future research based on genomic sequences. Smaller genome-sized alternatives are not available for several key gymnosperm groups. Again, because the plants are dioecious, male and female libraries will be constructed and analyzed.
Taxa Chosen for Shallow EST Coverage
In addition to the taxa chosen for deep EST study, we have also selected additional species for shallow EST coverage. These taxa provide coverage particularly for additional lineages of core eudicots and hence will enhance genome linkages among angiosperms (Fig. 2, FGP grant proposal). Although no individual library will be sampled deeply, each taxon will contribute many sequences to gene family analyses, identification of genes of wide distribution, and a thorough representation of moderately and highly expressed genes.
- Illicium (star anise; Illiciaceae) - Our current deep EST sampling covers most of the key basal angiosperms, but one omission involves the strongly supported clade of Illiciales, Austrobaileyaceae, and Trimeniaceae, a clade that follows Amborella and Nymphaeales as sister to all other angiosperms. Illicium parviflorum is a logical addition in that it is diploid (2n = 28), but with a moderate sized genome (2n = 13.4), easily cultivated in the greenhouse, with prolonged flowering, and moderate-sized flowers having numerous spirally arranged parts and an undifferentiated perianth.
- Asparagus (asparagus; Asparagaceae) Asparagus officinalis represents an economically important non-grain monocot that can link the most basal lineages (Acorus, Alismatales) with the highly derived floral structures found in the grasses and grass relatives. Asparagus produces many whorled, trimerous flowers in a large inforescence This species is diploid (2n = 20), transformable, and has a small genome size (1323 Mbp). The plants are dioecious, so two libraries will be sampled from male and female floral tissues, and EST sequencing will be performed separately on each library.
- Vaccinium (blueberries and cranberries: Ericaceae) - Vaccinium occupies a crucial phylogenetic position as a member of Ericales, an early-branching clade of the large asterid clade; several species are also important crops. Diploid species have 2n = 24 and a small genome size (V. pallidum, 4.4; V. boreale, 2.4; V. elliottii, ca. 1078 Mbp).
- Cucumis (cucumber; Cucurbitaceae) Cucumber (C. sativus) was selected from among many economic species in the large eurosid I clade. A member of the Cucurbitaceae (squash, pumpkin, cucumbers), it is an important New World fruit crop species. Cucumis is not presently the focus of intensive publicly funded genome research. This species is transformable, has a small genome (882 Mbp), and is diploid, with 2n = 14. Cucumis is dioecious, so two libraries will be sampled from male and female floral tissues, and EST sequencing will be performed separately on each library.
- Beta (beet; swiss chard; Chenopodiaceae) - Beta vulgaris was chosen to represent Caryophyllales; it is diploid with a small genome (2n = 18; 1225 Mbp), transformable, and readily accessible.
- Ribes (currants and gooseberries: Grossulariaceae) Ribes americanum is easily grown and flowers profusely. The genus is diploid with 2n = 12 with no known reported genome size values, but chromosomes are small and the genome size of its sister family, Saxifragacae, is small; Ribes is transformable. Ribes represents a clade, Saxifragales, that occupies a pivotal phylogenetic position as either the sister to the large rosid clade (Fig. 2 in FGP grant proposal) or one of the first branches of the core eudicots, following Gunnerales (Soltis et al., 2003).
Alternative Choices
Although sister to all other monocots and thus in an important phylogenetic position, Acorus may be difficult from other perspectives. The flowers are very small and flowering is sporadic. A member of the large Asparagales clade, which is sister to the large commelinoid clade that contains the grasses, may be a good alternative. Several major crops (Allium, Asparagus) would be strong possibilities for an alternative deep EST monocot.
Prunus persica (peach; Rosaceae) provides another economically important rosid species. Prunus is the second most economically important genus of Rosaceae following Malus (apples, which are polyploid) and includes almonds, cherries, nectarines, peaches, and plums. Prunus persica is diploid with 2n = 16 with a small genome size (4C = 1.87).
As a representative of the water lilies, Nuphar is slightly preferred over Nymphaea, because of the distinct sepals and petals of Nuphar. Furthermore, C-class genes have been obtained from Nuphar advena (M. Zanis et al., unpubl.). The study could be performed with Nymphaea, but may be somewhat less helpful for diagnosing the homology of important floral structures.
Lactuca sativa (lettuce; Asteraceae) could be used to represent the large euasterid II clade. It is diploid with 2n = 18 and a mean genome size of 4C = 9.0. Although Helianthus annuus (sunflower) is also an important crop, chromosome numbers for the genus are high (2n = 34) and these may be of ancient polyploid origin. However, the genome size for Helianthus is comparable to that of Lactuca (9.7). Another candidate of Asteraceae is Gerbera, which has been the target of recent investigations of floral developmental genetics (V. Albert research papers; see PCR in situ image elsewhere on this website).
Medicago trunculata (Fabaceae). Both the rosid and asterid clades are huge, together comprising over one-half of all angiosperm species; both consist of at least two major subclades. Arabidopsis is a member of the eurosid II clade (Fig. 2 in FGP grant proposal); we therefore could select Medicago to represent eurosid I. In addition, Fabaceae (legumes) are of enormous economic importance. Medicago truncatula is diploid (2n = 16), with a small genome (mean 4C = 4.9; lowest value = 1.9), is closely related to alfalfa, and can be transformed. Several very large and ongoing studies of expressed genes in Medicago and Glycine, and the possibility of a full-scale genome study of Medicago, should provide adequate information from legumes without inclusion of Medicago in this study.
Rejected Alternatives
A large number of potentially useful taxa were considered for inclusion in the FGP, but rejected for one reason or another. Below is a partial list of rejected taxa and the reasons for their rejection.
- Piper (black pepper, Piperaceae) Piper is an economically important basal angiosperm. However, the tiny and highly modified flowers of the Piperaceae were considered less likely to inform a general understanding of floral development than the less modifed and easily used flowers of Saruma (B3, above). Furthermore, because developmental data suggest that the petals of Saruma are derived from stamens, the inclusion of Saruma presents the opportunity to test hypotheses on the origins of petals.
- Lilium (lily, Liliaceae) Lily is an attractive plant for flower development research, and a representative mid-level monocot; however, the very large genome size (4C = 140) makes this species impractical for genetic and genomic research at this time.
- Pineapple, coconut palm, banana These economically important plants would all be important to include in a detailed study of expressed floral genes in a denser sampling of monocot lineages. We excluded them only because of budget constraints. Intense ongoing research with many different grasses will ensure that those data and those from the Floral Genome Project can be effectively linked.
There have been 14063 hits since 09/26/2003.

