18.12 Fungal genomes and their comparison
Saccharomyces cerevisiae is the best-studied fungus (and has a genome comprised of about 12 Mb, with 5,885 genes that code for proteins), and the fission yeast Schizosaccharomyces pombe is also an important model organism for which a complete genome is available (of 13.8 Mb with 4,824 protein-coding genes). However, neither of these yeasts is an adequate model for filamentous fungi, which have more genes (approximately 8,400) and bigger genomes (30 to 40 Mb); both features are presumably related to the wider morphogenetic, metabolic, and ecological capabilities of filamentous fungi.
Certainly, it is already clear that several genes present in filamentous fungi are not present in yeasts, so comparative genomics is a growing business. There are several genome projects underway and planned; they include many filamentous fungi. The principal one being the Earth BioGenome Project (EBP), which has been described as ‘a moonshot for biology’. This proposed 10-year project aims to sequence, catalogue, and characterise the genomes of all of Earth’s eukaryotic biodiversity (Lewin et al., 2018). Comparative genomics is a science in its own right (Gibson & Muse, 2009) and we can do no more than simply introduce it to you here. We have already used several examples from the wealth of data available to illustrate a range of topics in this textbook and these examples, together with what we will show you in this Section of the range of detailed information that is online, may well inspire you to see how studies of this sort could contribute to your own interests.
Interestingly, with more than 2,000 already fully sequenced or in progress, the range of fungal genomes available is the widest sampling of genomes from any eukaryotic kingdom. This is true now and was even true when we wrote the first edition of this textbook (Galagan et al., 2005), and as you can see on the websites detailed below, many of the fungal genomes fall into groups of related species that are ideal for comparative studies (see, for example, Jones, 2007).
Nevertheless, the fungi chosen for sequencing initially were mostly pathogens or model organisms and dealing with this bias was one aim of the 1,000 Fungal Genomes Project [http://1000.fungalgenomes.org/home/] the motto for which is ‘Sequencing unsampled fungal diversity’. Another approach is to design genome sequencing programmes with some specific objective in mind such as development of alternative bioenergy sources, bioremediation, and fungus-environment interactions (Baker et al., 2008).
The first global initiative to sequence and annotate fungal genomes was managed and co-ordinated by the Broad Institute of MIT and Harvard under what was called the Fungal Genome Initiative (FGI), which is still described at this URL: https://www.broadinstitute.org/fungal-genome-initiative. The FGI was supported by the National Human Genome Research Institute, the National Science Foundation, the National Institute of Allergy and Infectious Disease and the U.S. Department of Agriculture. FGI prioritises sequence data from fungi that are important to medicine, agriculture and industry and established a sequence database for that purpose. Over 100 fungi have been sequenced in this programme, including human and plant pathogens as well as fungi that serve as basic models for molecular and cellular biology.
Fungal genome websites at the Broad Institute have been changed as the sequencing projects have been completed. Formerly interactive websites have been replaced with static pages providing information on fungal projects, along with links to sites where datasets can still be downloaded, and the primary repositories for all fungal genomic data now are MycoCosm, FungiDB and Ensembl Fungi.
- MycoCosm [https://genome.jgi.doe.gov/programs/fungi/index.jsf] is hosted by the Joint Genome Institute (JGI), a Department of Energy Office of Science User Facility managed by Lawrence Berkeley National Laboratory at the University of California. Mycocosm is:‘…JGI’s web-based fungal genomics resource, which integrates fungal genomics data and analytical tools for fungal biologists. It provides navigation through sequenced genomes, genome analysis in context of comparative genomics and genome-centric view…’ and offers the largest available collection of fungal genomes, for comparative genomics across phylo- and eco-groups, along with interactive web-based tools for genome downloading, searching and browsing, and a form for nominating new species for sequencing to fill gaps in the Fungal Tree of Life. This portal also hosts the 1,000 Fungal Genomes Project, an international collaboration set up to sequence 1,000 fungal genomes (though this number has now been greatly exceeded) [https://jgi.doe.gov/our-science/science-programs/fungal-genomics/1000-fungal-genomes/]; and the Genomic Encyclopedia of Fungi, which focuses on genomes of fungi that contribute to plant health (including symbiosis, pathogenicity and biocontrol), biorefinery mechanisms (conversion of biopolymers to sugars for fuel production), and fungal diversity [https://jgi.doe.gov/our-science/science-programs/fungal-genomics/genomic-encyclopedia-of-fungi/#feedstock] (Grigoriev et al., 2014).
- FungiDB (http://fungidb.org/fungidb/; Stajich et al., 2012) is now one of the EuPathDB family of databases (this being the eukaryotic pathogen genomics database resource) that supports a wide range of microbial eukaryotes; FungiDB (Aurrecoechea et al., 2017) includes many fungal (and oomycete) species, including non-pathogens. This resource provides automated analysis of multiple genomes, curated information, with comments and supporting evidence from the user community. In addition, FungiDB offers sophisticated tools for integrating and mining diverse Omics datasets that fungal biologists will find useful. The FungiDB web site also gives access to a YouTube tutorials channel, web tutorials (videos and PDF-downloads), and teaching exercises.
- Ensembl Fungi is a browser for fungal genomes (http://fungi.ensembl.org/index.html). The genomes are taken from the databases of the International Nucleotide Sequence Database Collaboration (the European Nucleotide Archive at the European Bioinformatics Institute [https://www.ebi.ac.uk/], GenBank at the US National Center for Biotechnology Information [https://www.ncbi.nlm.nih.gov/], and the DNA Data Base of Japan [https://www.ddbj.nig.ac.jp/index-e.html]). The portal offers an extensive range of tools, downloads and documentation.
As of 2019, well over 1,000 fungal genomes have already been sequenced and annotated or are in the process of being sequenced and annotated (and that total does not include the 1,011 Saccharomyces cerevisiae genomes published by Peter et al., 2018). We strongly recommend that you visit the websites listed above because the genomic data are updated regularly as improvements and amendments are made to the sequences; but also because the index pages provide hyperlinks that allow you to access, and even download the genome sequences and information about many aspects that we cannot deal with here, including: basic statistics about genome size, gene density, etc., search facilities allowing you to find similarities to other sequences, feature searches to explore and view annotated features on the sequence, gene indexes to find specific genes by a variety of methods, ability to browse the DNA sequence, find clones, and graphically view sequence regions, opportunity to download sequence, genes, markers, and other genome data.
You could start at these addresses:
- List of fungal genomes in the Mycocosm system at: https://genome.jgi.doe.gov/fungi/fungi.info.html
- Progress at the 1,000 Fungal Genomes Project is regularly reported at: http://1000.fungalgenomes.org/home/
- List of all fungi on the website of Ensembl Fungi at: http://fungi.ensembl.org/species.html.
Updated August, 2019