Taxon Sampling Rationale
General Sampling Issues
We selected species for the FGP by balancing the following major criteria:
a) phylogenetic position
b) diversity of floral-organ structure (but absence of highly specialized floral features)
c) direct relevance to crop or economic plants
d) diploid with a small genome size
e) availability of inbred lines, when possible
f) possession of other desirable properties, such as large numbers of flowers per plant, transformability, and having
been the focus of prior flower developmental study
e) non-duplication of ongoing studies of the floral transcriptomewe will include in our analysis public data
(but not duplication of effort) from ongoing studies of model plants (Arabidopsis, tomato, maize, rice, alfalfa,
soybean, cotton, etc.)
Phylogenetic relationships -- We will sample critical missing link taxa that occupy pivotal phylogenetic
positions, allowing us to link model plants and provide sampling of key basal lineages in the flowering plants, monocots,
eudicots, and outgroups. Furthermore, it is important that the phylogenetic relationships of the exemplars is already well
known, so that hypotheses of gene orthology and paralogy can be more easily determined. The recent elucidation of phylogenetic
relationships among major clades of angiosperms (e.g., P. Soltis et al., 1999; D. Soltis et al., 2000; Savolainen et al., 2000a,
b) ensures that this is the case for all of the taxa included.
Diversity of floral organ structure The evolutionary gaps between gymnosperms, monocot, and eudicot model systems
are enormous, particularly from the perspective of floral evolution. Although the eudicots represent about 75% of angiosperm
species (Drinnan et al., 1994), most of the diversity in arrangement and number of floral parts actually occurs among basal
angiosperm lineages, such as Nymphaeales (water lilies), Amborella (the sister to all other extant angiosperms either
alone or with Nymphaeales, P. Soltis et al., 2000; Barkman et al., 2000), Magnoliaceae (magnolias and yellow poplar), Laurales
(including avocado, sassafras, laurels, and others), Winteraceae, Piperaceae (black pepper family), and Chloranthaceae (with
its reduced flowers that sometimes consist of a single reproductive structure). The perianth of basal angiosperms often lacks
clear sepal-petal distinction, and sepals in these plants may simply result from tepal exposure in bud [M. Frohlich,
unpub.]. Furthermore, floral organization and development are considered open and highly labile in basal angiosperms
(e.g., Endress, 1987, 1994). In contrast, in most eudicots, numbers of floral parts are low (i.e., four or five) and fixed, and
floral organs are distinct and arranged in whorls, suggesting that the basic floral Bauplan became canalized during the early
diversification of the eudicots (e.g., Endress, 1987, 1994; Albert et al., 1998, Zanis et al., in press). Thus, critical
components of the floral genetic program may have evolved among the most basal lineages of angiosperms. Our sampling is designed
to include the diversity of fundamental floral forms, while avoiding species with highly specialized floral structures.
- Crop plants --We selected crop plants whenever possible, but we stress that most basal angiosperms are not economically
important. We have not included crop plants that are already the focus of intensive investigation of floral genetic
architecture. These include maize and other grasses, Arabidopsis, Ranunculaceae, tomato (Solanum), and
cotton (Gossypium). Our goal is to link these model plant systems with a more phylogenetically representative suite
of taxa that exhibit diverse floral ground plans.
- Diploids and small genomes -- We focused on plants suggested to be diploid based on published chromosome counts (e.g.,
Federov, 1969) and plants known to have small genome sizes. Even in a small, diploid genome such as Arabidopsis,
genome sequencing efforts have revealed numerous duplicated genes. Known polyploids with the added complexity of recent
genome duplication are not optimal for our investigation but will be included where necessary to sample the floral diversity
of interest (e.g., Liriodendron). However, our efforts will provide data and critical links relevant to the study of
genome doubling (e.g., Osborn et al.Functional Genomics of Plant Polyploids; Biocomplexity Incubation grant to P. Soltis
and D. Soltis). Small genomes are not crucial to our central EST-based research goals; however, the use of small diploid
genomes will increase our ability to isolate and analyze genomic clones, and increase the likelihood that the plants we study
will eventually be used for more detailed genome research.
The genome sizes of a few of the species we have proposed to study (Amborella, Nuphar, Ribes) have not
yet been determined. We plan to obtain these values prior to library construction. In the case of Nuphar and Ribes,
numerous alternative species could be selected in case the genome of our target species is unduly large. Amborella is
monotypic and not replaceable with any other species from its lineage. In this case, if the genome is unexpectedly large, it will
not be possible to pick an alternative, but that knowledge will help direct our further experiments.
Inbred, homozygous lines Whenever possible we will focus our sampling on inbred, homozygous lines, which are common
among cultivated crop plants. However, inbred lines are not available for all of the study plants. For example, inbred lines of
Liriodendron (yellow poplar, an important lumber tree) are not available due to the severe inbreeding depression that
occurs when trees are selfed. However several clonal populations and large full-sib F1 progeny families are available for
Liriodendron through our colleague S. Schlarbaum (Tennessee Valley Authority and U. of Tennessee tree breeding and seed
orchard program; pers. comm.) which will permit us to use segregation analysis to determine which EST variants are allelic and
which are from different loci among gene families. Some plants may be largely homozygous, e.g., Saruma, as it
self-pollinates regularly, has small populations in the wild, and American nursery plants likely derive from the introduction
at the U.S. National Arboretum. Other species will be heterozygous at many loci, but recent gene duplications not easily
distinguished from distinct alleles at one locus, are NOT the focus of our work. We will focus on ancient gene families shared
by multiple species. Hence, availability of inbred lines, although desirable, is not a crucial issue for every plant in this
Other properties -- We selected species with other desirable experimental properties. Plants for which material is readily
available and that can be easily cultured in the greenhouse are desirable, as are those species that produce numerous flowers
for prolonged periods. An ability to obtain numerous flowers at different stages of ontogeny is important for our gene expression
studies. Flower size is also important. Plants with moderate-sized to large-sized flowers are preferred for floral dissection,
facilitating the isolation of floral organs, and in situ hybridization studies. It is also useful if a plant has been the focus
of prior floral developmental research. The more that is already reported, or known by an expert, the better.
Strategy for more distant relatives The FGP has selected two gymnosperms (Zamia and Welwitschia) for
sampling homologs of floral genes to test the principal hypotheses of the project, i.e., the Mostly Male Theory
of M. Frohlich and a series of hypotheses about the origin of petals, sepals, and other floral structures in the angiosperms.
We will link this knowledge with planned sampling of expressed gene sequences in the gymnosperm Ginkgo, and the more distantly
related plants, a fern (Ceratopteris) and moss (Physcomitrella, Tortula). These will be sampled as part of
the Integrated Research Challenge project recently proposed to NSF by Brent Mishler (with C. dePamphilis, H. Ma, P. Soltis, and
D. Soltis, co-PIs).
Open to discussion Although we believe that we have given careful consideration to taxon choice and sampling considerations, we recognize that the panel, or NSF, may suggest a different balance of effort in sampling economic plants vs. phylogenetically representative basal lineages. If prices for sequencing can be brought down, it may be possible to increase the depth of coverage for each of the crop species. Alternatively, we are open to suggestions from the panel and from NSF on choice of taxa.
Taxa Chosen for Deep EST Coverage
The basal angiosperms we have chosen are: Amborella trichopoda (Amborellaceae), Nuphar advena (waterlily,
Nymphaeaceae), Liriodendron tulipifera (yellow or tulip poplar, Magnoliaceae), Persea americana (avocado,
Lauraceae), Saruma henryi (Aristolochiaceae), and Acorus americanus (Acoraceae). We have also chosen the
early-diverging eudicot, Eschscholzia californica (california poppy, Papaveraceae). We will also include two distantly
related gymnosperms: Zamia fischeri (Cycadaceae) and Welwitschia mirabilis (Gnetales). Features common to both
may represent the relatively unspecialized gymnosperm condition, possibly ancestral to that of the angiosperms.
Several alternative study taxa are also described below. These afford our research flexibility should we encounter unforeseen
difficulties. For each species, we provide key features, such as crop status, ploidy, genome size, information regarding
availability, phylogenetic placement, general floral characteristics, and additional footnotes to personal research experience
(Table 1 in FGP grant proposal). The phylogenetic placement of these exemplars is illustrated on the Taxa page.
- Amborella - Amborella trichopoda is the only living member of Amborellaceae. It is a critical exemplar;
recent studies have indicated that it is the sister to all other flowering plants (Qiu et al., 1999, 2000; P. Soltis et
al., 1999, 2000; Parkinson et al., 1999; Mathews and Donoghue, 1999; Zanis et al., in press) (Fig. 2 in FGP grant proposal).
Although restricted to New Caledonia and only recently brought into cultivation (once cultivated, it is easily maintained),
we will be given plants by the University of CaliforniaSanta Cruz Arboretum and National Tropical Botanical Gardens
(supporting letters on file). Amborella has unisexual flowers with an indeterminate number of spirally arranged
parts; the perianth consists of tepals. The species is diploid with 2n = 26; genome size has not yet been estimated. The
plants are dioecious, so two libraries will be sampled from male and female floral tissues, and EST
sequencing will be performed separately on each component library.
- Nuphar - Some recent studies have suggested that Nymphaeaceae + Amborella are sister to all other flowering
plants (Barkman et al., 2000). However, in most analyses (P. Soltis et al., 1999; D. Soltis et al., 2000; Savolainen et al.,
2000; Qiu et al., 1999; Parkinson et al., 1999; Matthews and Donoghue, 1999), including the most recent and most comprehensive
analyses of basal angiosperms to date (Zanis et al., submitted), Amborella continues to receive strong bootstrap
support as the sister to all other extant angiosperms. However, we cannot consistently reject the alternative hypothesis
of Nymphaeaceae + Amborella as sister to all other flowering plants. Hence, we have also included Nuphar
(Nymphaeaceae). Whereas some water lilies have numerous, spirally arranged parts and undifferentiated perianths,
reconstructions of floral evolution suggest that these characteristics are derived (Zanis et al., in press). Basal lineages
(Cabomba, Brasenia, Nuphar, and Barclaya) are trimerous with differented sepals and petals. We
have chosen Nuphar advena because of its phylogenetic position near the base of Nymphaeaceae (Les et al., 1999). We
have preliminary data for Nuphar, including the isolation of C-class MADS box genes. Low chromosome numbers are
2n = 34 (Nuphar ) and 2n = 28 for Nymphaea. Genome size for Nymphaea is small ; no values have yet been
reported for Nuphar.
- Persea, Liriodendron, and Saruma - Three genera were chosen to represent the eumagnoliid clade
(Fig. 2 in FGP grant proposal) because it contains most of the species and much of the floral diversity of basal
angiosperms: Persea, Liriodendron, and Saruma (Aristolochiaceae). The taxa chosen exhibit some of this
enormous floral diversity. Persea americana (avocado) has parts in whorls, with two whorls of three tepals, stamens
with well-differentiated anthers and filaments in three or four whorls, and a single carpel. Persea is of economic
importance as a fruit crop, is diploid (2n = 24) with a small genome size (907 Mbp). Liriodendron also has tepals,
but numerous spirally-arranged laminar (leaf-like) stamens and carpels. Liriodendron tulipifera (yellow or tulip
poplar) is a valuable timber tree and is also transformable. It exhibits the lowest chromosome number in Magnoliaceae
(2n = 38) and has a small genome size (784 Mbp). Saruma is unusual in having what appears to be well-differentiated
sepals (3) and petals (3); all other Aristolochiaceae have one perianth whorl of three parts, which are considered to be
sepals by convention (Cronquist, 1987). The petals of Saruma are hypothesized, based on studies of floral development,
to be derived from stamens, providing a testable hypothesis for the origin of these petals. In contrast, the petals of many
angiosperms are considered to be developmentally homologous with sepals. Thus, the inclusion of species with petals
apparently homologous to either sepals or stamens allows testing of critical hypotheses on the origin of petals. Saruma
henryi has a moderate-sized genome (3136 Mbp).
- Acorus - One basal monocot was chosen, Acorus americanus (Acoraceae). It is well supported as sister to all
other monocots (Fig. 2 in FGP grant proposal). The genome size is small (392 Mbp). Flowers are small and bisexual, with 4-6
small perianth segments.
- Eschscholzia Inclusion of early-diverging eudicots is critical in that the basic floral Bauplan seems to
remain open or flexible through these lineagesit is only in the core eudicots that extensive canalization is evident.
Also, in some mapping investigations, one of the origins of well-differentiated sepals and petals occurs at the base of the
eudicots. Eschscholzia (Papaveraceae), representing the early-diverging eudicots, has two sepals, two whorls of
petals, whorls of numerous stamens, and several fused carpels. Our work on Papaveraceae will complement the ongoing
work of Vivian Irish (Yale University), which focuses on PCR-based isolation of MADS-box genes from poppies and many other
- Welwitschia - Welwitschia mirabilis represents the gymnosperm clade Gnetales. We rejected conifers for a
major focus in this study because their female units (cone scales) are complex fusions of many parts; already funded genomic
research with pine, a conifer, will provide comparative evidence for this group. In Welwitschia the parts are
separate, with clearly defined morphology. Both male and female cones have numerous reproductive units, and show gradate
development, with all stages available for ca. 6 weeks in the spring (M. Frohlich, pers. obs.). Both Huntington Gardens and
California State UniversityFullerton agree to supply material. Both chromosome number (2n = 42) and genome size (12,740
Mbp) are fairly large. As in Amborella, the plants are dioecious, so separate libraries will be constructed from male
and female cone tissues, and EST sequencing will be performed separately on each library.
- Zamia Zamia represents what may be the sister lineage to all other extant gymnosperms (Chaw et al.,
1997, 2000; Bowe et al., 2000; P. Soltis et al., 1999). It is well removed from Gnetales and provides an important source of
gymnosperm diversity. Zamia fischeri is easily obtained commercially, with both male and female plants readily
available. The large genome size of most gymnosperms will not be any particular barrier to our proposed work on expressed
gene sequences, but would slow the pace of future research based on genomic sequences. Smaller genome-sized alternatives are
not available for several key gymnosperm groups. Again, because the plants are dioecious, male and female libraries will be
constructed and analyzed.
Taxa Chosen for Shallow EST Coverage
In addition to the taxa chosen for deep EST study, we have also selected additional species for shallow
EST coverage. These taxa provide coverage particularly for additional lineages of core eudicots and hence will enhance genome
linkages among angiosperms (Fig. 2, FGP grant proposal). Although no individual library will be sampled deeply, each taxon will
contribute many sequences to gene family analyses, identification of genes of wide distribution, and a thorough representation of
moderately and highly expressed genes.
- Illicium (star anise; Illiciaceae) - Our current deep EST sampling covers most of the key basal angiosperms, but
one omission involves the strongly supported clade of Illiciales, Austrobaileyaceae, and Trimeniaceae, a clade that
follows Amborella and Nymphaeales as sister to all other angiosperms. Illicium parviflorum is a
logical addition in that it is diploid (2n = 28), but with a moderate sized genome (2n = 13.4), easily cultivated in the
greenhouse, with prolonged flowering, and moderate-sized flowers having numerous spirally arranged parts and an
- Asparagus (asparagus; Asparagaceae) Asparagus officinalis represents an economically important
non-grain monocot that can link the most basal lineages (Acorus, Alismatales) with the highly derived floral
structures found in the grasses and grass relatives. Asparagus produces many whorled, trimerous flowers in a large
inforescence This species is diploid (2n = 20), transformable, and has a small genome size (1323 Mbp). The plants are
dioecious, so two libraries will be sampled from male and female floral tissues, and EST sequencing will be performed
separately on each library.
- Vaccinium (blueberries and cranberries: Ericaceae) - Vaccinium occupies a crucial phylogenetic position as
a member of Ericales, an early-branching clade of the large asterid clade; several species are also important crops. Diploid
species have 2n = 24 and a small genome size (V. pallidum, 4.4; V. boreale, 2.4; V. elliottii, ca. 1078
- Cucumis (cucumber; Cucurbitaceae) Cucumber (C. sativus) was selected from among many economic
species in the large eurosid I clade. A member of the Cucurbitaceae (squash, pumpkin, cucumbers), it is an important New World
fruit crop species. Cucumis is not presently the focus of intensive publicly funded genome research. This species is
transformable, has a small genome (882 Mbp), and is diploid, with 2n = 14. Cucumis is dioecious, so two libraries will
be sampled from male and female floral tissues, and EST sequencing will be performed separately on each library.
- Beta (beet; swiss chard; Chenopodiaceae) - Beta vulgaris was chosen to represent Caryophyllales; it is
diploid with a small genome (2n = 18; 1225 Mbp), transformable, and readily accessible.
- Ribes (currants and gooseberries: Grossulariaceae) Ribes americanum is easily grown and flowers
profusely. The genus is diploid with 2n = 12 with no known reported genome size values, but chromosomes are small and the
genome size of its sister family, Saxifragacae, is small; Ribes is transformable. Ribes represents a clade,
Saxifragales, that occupies a pivotal phylogenetic position as either the sister to the large rosid clade (Fig. 2 in FGP
grant proposal) or one of the first branches of the core eudicots, following Gunnerales (Soltis et al., 2003).
Although sister to all other monocots and thus in an important phylogenetic position, Acorus may be difficult from
other perspectives. The flowers are very small and flowering is sporadic. A member of the large Asparagales clade, which is sister
to the large commelinoid clade that contains the grasses, may be a good alternative. Several major crops (Allium,
Asparagus) would be strong possibilities for an alternative deep EST monocot.
Prunus persica (peach; Rosaceae) provides another economically important rosid species. Prunus is the second
most economically important genus of Rosaceae following Malus (apples, which are polyploid) and includes almonds,
cherries, nectarines, peaches, and plums. Prunus persica is diploid with 2n = 16 with a small genome size (4C = 1.87).
As a representative of the water lilies, Nuphar is slightly preferred over Nymphaea, because of the distinct sepals
and petals of Nuphar. Furthermore, C-class genes have been obtained from Nuphar advena (M. Zanis et al., unpubl.).
The study could be performed with Nymphaea, but may be somewhat less helpful for diagnosing the homology of important floral
Lactuca sativa (lettuce; Asteraceae) could be used to represent the large euasterid II clade. It is diploid with
2n = 18 and a mean genome size of 4C = 9.0. Although Helianthus annuus (sunflower) is also an important crop, chromosome
numbers for the genus are high (2n = 34) and these may be of ancient polyploid origin. However, the genome size for
Helianthus is comparable to that of Lactuca (9.7). Another candidate of Asteraceae is Gerbera, which has been the
target of recent investigations of floral developmental genetics (V. Albert research papers; see PCR in situ image elsewhere on
Medicago trunculata (Fabaceae). Both the rosid and asterid clades are huge, together comprising over one-half of all
angiosperm species; both consist of at least two major subclades. Arabidopsis is a member of the eurosid II clade (Fig. 2 in FGP
grant proposal); we therefore could select Medicago to represent eurosid I. In addition, Fabaceae (legumes) are of
enormous economic importance. Medicago truncatula is diploid (2n = 16), with a small genome (mean 4C = 4.9; lowest
value = 1.9), is closely related to alfalfa, and can be transformed. Several very large and ongoing studies of expressed genes
in Medicago and Glycine, and the possibility of a full-scale genome study of Medicago, should provide
adequate information from legumes without inclusion of Medicago in this study.
A large number of potentially useful taxa were considered for inclusion in the FGP, but rejected for one reason or another.
Below is a partial list of rejected taxa and the reasons for their rejection.
- Piper (black pepper, Piperaceae) Piper is an economically important basal angiosperm. However,
the tiny and highly modified flowers of the Piperaceae were considered less likely to inform a general understanding of floral
development than the less modifed and easily used flowers of Saruma (B3, above). Furthermore, because developmental
data suggest that the petals of Saruma are derived from stamens, the inclusion of Saruma presents the
opportunity to test hypotheses on the origins of petals.
- Lilium (lily, Liliaceae) Lily is an attractive plant for flower development research, and a
representative mid-level monocot; however, the very large genome size (4C = 140) makes this species impractical for genetic
and genomic research at this time.
- Pineapple, coconut palm, banana These economically important plants would all be important to include in a detailed
study of expressed floral genes in a denser sampling of monocot lineages. We excluded them only because of budget constraints.
Intense ongoing research with many different grasses will ensure that those data and those from the Floral Genome Project can
be effectively linked.