With rapidly growing numbers of whole genome and expressed sequence tag (EST) sequences in our public databases, sequence-based protein classification systems are providing foundations for gene annotation, functional genomics, and comparative investigations of gene and genome evolution.

PlantTribes is a classification system based on cluster analyses of the inferred Arabidopsis thaliana v Columbia and Oryza sativa v. japonica (Rice) proteomes. We use the similarity-based clustering procedure TribeMCL (Enright et al, 2002,2003) to classify protein-coding genes into putative gene families. Classifications have been constructed using three clustering stringencies with most tribes being stable across a wide range of clustering stringencies.

Phylogenetic analyses of exemplar gene families show a strong, but not perfect correspondence between tribe membership and cladistic relationships. The results of these analyses provide insights into the Arabidopsis and Rice genomes, gene family evolution, and the evolutionary dynamics of functional domains among gene families. In addition, the resulting classification schemes provide scaffolds for sorting protein sequences from other plant species.

We hope the PlantTribes database is a useful tool for sorting genes into objectively defined clusters of Arabidopsis and Rice genes that can be aligned and analyzed in formal phylogenetic analyses.

