18.8 Fungal genomes and their comparison

Saccharomyces cerevisiae is the best-studied fungus, and the fission yeast Schizosaccharomyces pombe is also an important model organism for which a complete genome is available. However, neither of these yeasts is an adequate model for filamentous fungi, which have more genes (approximately 8400) and bigger genomes (30 to 40 Mb); both features are presumably related to the wider morphogenetic, metabolic, and ecological capabilities of filamentous fungi. Certainly, it is already clear that several genes present in A. nidulans are not present in S. cerevisiae, so comparative genomics is a growing business. There are quite a number of genome projects underway and planned; including many filamentous fungi.

A global initiative to sequence and annotate fungal genomes is managed and co-ordinated by the Broad Institute of MIT and Harvard under what is called the Fungal Genome Initiative (FGI), and you should visit this URL: http://www.broad.mit.edu/node/304. The FGI is supported by the National Human Genome Research Institute, the National Science Foundation, the National Institute of Allergy and Infectious Disease and the U.S. Department of Agriculture. FGI prioritises sequence data from fungi that are important to medicine, agriculture and industry and maintains a sequence database that is increasingly important as a tool for comparative studies in fungal diversity and evolution.

At the time of writing, 64 fungal genomes have already been sequenced and annotated or are in the process of being sequenced and annotated (Table 9), and another 85 are on the agenda for analysis in the near future (Table 10). You can find the most up to date information at this URL: http://www.broad.mit.edu/annotation/fungi/fgi/index.html. We strongly recommend that you visit this website mainly because the genomic data is updated regularly as improvements and amendments are made to the sequences; but also because the index page provides hyperlinks that allow you to access, and even download the genome sequences and information about many aspects that we cannot deal with here, including:

  • basic statistics about genome size, gene density, etc.,
  • search facilities allowing you to find similarities to other sequences,
  • a feature search to explore and view annotated features on the sequence,
  • a gene index to find genes by a variety of methods,
  • ability to browse the DNA sequence, find clones, and graphically view sequence regions,
  • opportunity to download sequence, genes, markers, and other genome data.
Table 9. Status of Fungal Genome Initiative (FGI) sequencing projects as they stood in May 2009
Organism name and strain designation
Assembly Size (Mb)
Status in May 2009

Allomyces macrogynus ATCC 38327

 

Sequencing in progress

Aspergillus nidulans FGSC A4

30

Assembly and annotation released

Aspergillus terreus NIH2624

29

Assembly and annotation released

Batrachochytrium dendrobatidis JEL423

24

Assembly and annotation released

Blastomyces dermatitis strain SLH#14081

 

In progress

Blastomyces dermatitis strain ER-3

 

In progress

Botrytis cinerea B05.10

43

Assembly and annotation released

Candida guilliermondii ATCC 6260

11

Assembly and annotation released

Candida tropicalis MYA-3404

15

Assembly and annotation released

Candida albicans WO-1

14

Assembly and annotation released

Candida lusitaniae ATCC 42720

12

Assembly and annotation released

Chaetomium globosum CBS 148.51

35

Assembly and annotation released

Coccidioides immitis RS

29

Assembly and annotation released

Coccidioides immitis H538.4

28

Assembly and annotation released

Coccidioides immitis RMSCC2394

29

Assembly and annotation released

Coccidioides immitis RMSCC3703

28

Assembly and annotation released

Coccidioides posadasii Silveira

27

Assembly and annotation released

Coccidioides posadasii RMSCC 3488

28

Assembly and annotation released

Coccidioides posadasii RMSCC 2133

28

Assembly released; annotation in progress

Coccidioides posadasii RMSCC 3700

25

Assembly released; annotation in progress

Coccidioides posadasii CPA 0001

29

Assembly released; annotation in progress

Coccidioides posadasii RMSCC 1037

27

Assembly released; annotation in progress

Coccidioides posadasii RMSCC 1038

26

Assembly released; annotation in progress

Coccidioides posadasii RMSCC 1040

26

Assembly released; annotation in progress

Coccidioides posadasii CPA 0020

27

Assembly released; annotation in progress

Coccidioides posadasii CPA 0066

28

Assembly released; annotation in progress

Colletotrichum graminicola M1.001

 

Assembly in progress

Coprinopsis cinerea (as Coprinus cinereus) okayama7#130

36

Assembly and annotation released

Cryptococcus neoformans H99

19

Assembly and annotation released

Cryptococcus neoformans R265

18

Assembly and annotation released

Fusarium graminearum PH-1 (NRRL 31084)

36

Assembly and annotation released

Fusarium verticillioides 7600

42

Assembly and annotation released

Fusarium oxysporum f. sp. lycopersici 4286

61

Assembly and annotation released

Gaeumannomyces graminis var. tritici R3-111a-1

40

In progress

Histoplasma capsulatum Nam1

33

Assembly and annotation released

Histoplasma capsulatum G186AR

30

Assembly released; annotation in progress

Histoplasma capsulatum G143

 

Sequencing in progress

Lacazia loboi EDM7

 

Some information released; on hold

Lodderomyces elongisporus NRRL YB-4239

16

Assembly and annotation released

Magnaporthe grisea 70-15

42

Assembly and annotation released

Magnaporthe poae ATCC 64411

 

In progress

Microsporum canis CBS113480   Some information released; assembly in progress

Microsporum gypseum CBS118893

 

Assembly released; annotation in progress

Mortierella verticulata NRRL 6337

 

Sequencing in progress

Neurospora crassa OR74A

39

Assembly and annotation released

Paracoccidioides brasiliensis Pb03

29

Assembly and annotation released

Paracoccidioides brasiliensis Pb01

33

Assembly released; annotation in progress

Paracoccidioides brasiliensis Pb18

30

Assembly released; annotation in progress

Puccinia graminis f. sp. tritici CRL 75-36-700-3

89

Assembly and annotation released

Pyrenophora tritici-repentis Pt-1C-BFP

38

Assembly and annotation released

Rhizopus oryzae RA 99-880

46

Assembly and annotation released

Saccharomyces cerevisiae RM11-1a

12

Assembly and annotation released

Schizosaccharomyces japonicus yFS275

11

Assembly and annotation released

Schizosaccharomyces octosporus yFS286

10

Assembly released; annotation in progress

Sclerotinia sclerotiorum 1980

38

Assembly and annotation released

Spizellomyces punctatus BR117

 

In progress

Stagonospora nodorum SN15

37

Assembly and annotation released

Trichophyton equinum CBS127.97

 

Some information  released; assembly in progress

Trichophyton rubrum CBS118892

 

Sequencing in progress

Trichophyton tonsurans CBS112818

 

Sequencing in progress

Uncinocarpus reesii UAMH 1704

22

Assembly and annotation released

Ustilago maydis 521

20

Assembly and annotation released

Verticillium dahliae Vdls.17

34

Assembly released; annotation in progress

Verticillium albo-atrum VaMs.102

30

In progress

Data from the ‘Status’ section of the FGI website (downloaded May 2009) at the URL: http://www.broad.mit.edu/annotation/fungi/fgi/status.html.

 

Table 10. FGI Nominated Candidates; fungi on the planning list for genome sequencing in the near future
ASCOMYCOTA

Mycosphaerella graminicola
Cenococcum geophilum
Aspergillus niger
Aspergillus versicolor
Aspergillus flavus
Aspergillus terreus
Neosartorya fischeri
Aspergillus clavatus
Penicillium chrysogenum
Penicillium roquefortii
Penicillium marneffei
Penicillium mineoleutium
Xanthoria parietina
Ramalina menziesii
Fusarium oxysporum
Fusarium verticillioides
Fusarium solani
Fusarium proliferatum
Stachybotrys chartarum
Sporothrix schenckii
Ophiostoma ulmi
Cryphonectria parasitica
Blastomyces dermatitidis
Paracoccidioides brasiliensis
Uncinocarpus reesei
Trichophyton rubrum
Blumeria graminis
Sclerotinia sclerotiorum
Neurospora discreta

Neurospora tetrasperma
Podospora anserina
Septoria lycopersici
Tuber borchii
Tuber melanosporum
Paxillus involutus
Exophiala (Wangiella) dermatitidis
Candida albicans (WO-1)
Lodderomyces elongisporus
Candida krusei
Holleya sinecauda
Eremothecium gossypii
Schizosaccharomyces japonicus
Schizosaccharomyces octosporus
Schizosaccharomyces kambucha
Pneumocystis carinii (human and mouse isolates)
Tolypocladium inflatum
Cordyceps militaris
Epichloe typhina
Ceratocystis fimbriata
Colletotrichum
Corollospora maritima
Xylaria hypoxylon
Leotia lubrica
Botrytis cinerea
Pyrenophora trici-repentis
Morchella esculenta
Taphrina deformans

BASIDIOMYCOTA

Schizophyllum commune
Agaricus bisporus
Microbotryum violaceum
Trichodoma
Tremella fuciformis
Tremella mesenterica
Cryptococcus neoformans (Serotype C)
Tsuchiyaea wingfieldii
Filobasidiella depauperata
Filobasidiella flava
Filobasidiella xianghuijun

Puccinia triticina
Puccinia striiformis
Amanita phalloides
Flammulina velutipes
Armillaria
Cantharellus cibarius
Phallus impudicus
Phellinus pinii
Leptosphaeria maculans
Stagonospora nodorum

ZYGOMYCOTA

Phycomyces blakesleeanus

Mucor racemosus

CHYTRIDIOMYCOTA

Coelomomyces stegomyiae
Coelomomyces utahensis

Allomyces macrogynus
Blastocladiella emersonii

Comparative genomics is a science in its own right (Gibson & Muse, 2009) and we can do no more than simply introduce it to you here. We will select a few examples from the wealth of data already available and show you something of the range of detailed information that is online and the range of studies to which they can contribute. Interestingly, with approaching 200 already fully sequenced or in progress, the range of fungal genomes available is the widest sampling of genomes from any eukaryotic kingdom (Galagan et al., 2005), and as we will see below, many of the fungal genomes fall into groups of related species that are ideal for comparative studies (see, for example, Jones, 2007). Nevertheless, the fungi chosen for sequencing so far are mostly pathogens or model organisms, and it has been argued that a more rational approach would be to design genome sequencing programmes with some specific objective in mind such as development of alternative bioenergy sources, bioremediation, and fungus-environment interactions (Baker et al., 2008).

To support systematic comparative analyses of fungal genomes the e-Fungi database is being developed (Hedeler et al., 2007) as a joint research project between the School of Computer Science and Faculty of Life Sciences at The University of Manchester, and the Departments of Computer Science and Biological Sciences at the University of Exeter. The e-Fungi project has developed a data warehouse that integrates data from multiple fungal genomes (more than 30 fungal genomes already) in a way that aids their systematic comparative study. The e-Fungi database also provides mycologists with a library of techniques for comparative study of genome data and is accessible at this URL: http://www.e-fungi.org.uk.

So let us dip into the FGI database for a few examples. The fully annotated sequences (Table 9) include the following.

The Neurospora crassa (Ascomycota) genome, which features over 2000 different genes that are expressed at different vegetative or sexual stages, or during different intervals of the circadian cycle. More than half of these have no known homologues in the yeast genome or elsewhere. The haploid genome of N. crassa contains 39.23 Mb of chromosomal DNA; individual chromosomes range from 4 to 10.3 Mb. The seven linkage groups have been identified cytologically with individual chromosomes. There is little repetitive DNA in the N. crassa genome, mainly the genes specifying ribosomal RNA. N. crassa telomeres have a DNA sequence identical to that of humans.

The 30.07 Mb genome of A. nidulans (Ascomycota) has been completed, and, as in N. crassa, preliminary data indicate that about half of the genes discovered are ‘new’ in the sense that their sequences have not been encountered previously nor associated with a function in any other organism. Several species of Aspergillus have been sequenced (Table 11) as part of a species comparison project (see http://www.broad.mit.edu/annotation/genome/aspergillus_group/MultiHome.html). It is particularly important that the 29.38 Mb genome of Aspergillus fumigatus is included in this comparative study because A. fumigatus is an extremely important human pathogen, causing allergic diseases in asthmatic and cystic fibrosis patients and invasive aspergillosis in immunocompromised patients and those suffering from tuberculosis or other cystic lung diseases. The Aspergillus website (http://www.aspergillus.org.uk/index.html) provides information on pathogenic Aspergillus species for clinicians and scientists (DNA sequence data, bibliography, laboratory protocols, and treatment information), and also has a separate ‘Aspergillus for patients’ website (http://www.aspergillus.org.uk/newpatients/).

Table 11. Comparison of the basic genome statistics of Aspergillus species

Aspergillus species

Size of complete genome sequence (Mb)

number of chromosomes

%GC content

Number of predicted protein-coding genes

A. fumigatus

29.38

8

48.82

9,887

A. flavus

36.79

8

48.26

12,604

A. nidulans

30.07

8

50.32

10,701

A. niger

37.2

8

47.06

11,200

A. terreus

29.33

8

52.90

10,406

A. oryzae

37.12

8

48.24

12,336

Neosartorya fischeri*

32.55

8

49.42

10,406

A. clavatus

27.86

8

49.21

9,121

* Neosartorya fischeri is the teleomorph of A. fischerianus, a very close relative to A. fumigatus.
Data from: http://www.broad.mit.edu/annotation/genome/aspergillus_group/GenomeStats.html

Several other species comparison projects of Ascomycota are underway, including the filamentous Fusarium (Table 12) and the yeast Candida (Table 13). Some species of both of these genera are important pathogens of man (see the section entitled Clinical groupings for human fungal infections in Chapter 16; CLICK HERE to view now).

Table 12. Comparison of the basic genome statistics of Fusarium species

Fusarium species

Size of complete genome sequence (Mb)

%GC content

Number of predicted protein-coding genes

F. verticillioides

41.78

48.70

14,179

F. graminearum

36.45

48.33

13,332

F. oxysporum

61.36

48.40

17,735

 

Table 13. Comparison of the basic genome statistics of Candida species

Candida species

Size of complete genome sequence (Mb)

%GC content

Number of predicted protein-coding genes

C. albicans WO1

14.42

33.47

6,160

C. albicans sc5314 v21

14.32

33.46

6,094

C. guilliermondii

10.61

43.76

5,920

C. lusitaniae

12.11

44.50

5,941

Debaryomyces hansenii*

12.22

36.28

6,312

C. parapsilosis

13.09

38.69

5,733

Lodderomyces elongisporus**

15.51

36.96

5,802

C. tropicalis

14.58

33.14

6,258

*phylogenetically related to C. lusitaniae and C. guilliermondiae
**phylogenetically related to C. parapsilosis
Data from: http://www.broad.mit.edu/annotation/genome/candida_group/GenomeStats.html

Some of the Fusarium species listed in Table 12 are important pathogens of plants but Magnaporthe grisea has become the main ‘model organism’ for studying the molecular aspects of fungal plant disease. M. grisea causes the rice blast disease. Crop losses have been magnified in recent times as rice production has intensified; enough rice is lost each year to this disease alone to feed 60 million people. Strains of the fungus also attack other cereals, including wheat and barley, and it is a serious disease of turf grasses. Note that Magnaporthe grisea is a complex species that attacks several hosts and some people assign species status to the strains of the fungus on a host-specificity basis; so, the name Magnaporthe oryzae may be used for the rice blast disease fungus. However, M. grisea is the name used for the rice blast fungus in the genome database so we will stick to that. Genetic resistance in the host plant has been and continues to be the major means of disease control for blast, but M. grisea is able to evolve rapidly and overcome major gene resistance. The goal of genome analysis of the fungus is to understand fungus-host interactions well enough to develop durable, and environmentally sound strategies to manage rice blast disease.

The draft sequence of the M. grisea genome was published in 2005 (Dean et al., 2005)(summary statistics in Table 14) and subsequently a functional genomics study of pathogenicity revealed many new gene functions required for rice blast disease (Jeon et al., 2007; Talbot, 2007), improving understanding of the adaptations required by a fungus to cause disease (Lorenz, 2002).

Table 14. Summary of the basic genome statistics of Magnaporthe grisea strain 70-15

Size of complete genome sequence (Mb)

%GC content

Number of predicted protein-coding genes

Number of predicted tRNA genes

Number of predicted rRNA genes

41.7

51.57

11,074

341

46

Data from: http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/GenomeStats.html

Pneumocystis carinii is an important lung pathogen, which causes pneumonia in immunocompromised patients and is consequently a major infection risk to patients with HIV infections and individuals undergoing organ transplantation, chemotherapy or those with congenital deficiencies (see the section entitled Clinical groupings for human fungal infections in Chapter 16; CLICK HERE to view now). P. carinii is not sensitive to the usual antifungal agents. It is hoped that genome analysis will lead to identification of more effective targets for new treatments. The Pneumocystis Genome Project is underway (see: http://pgp.cchmc.org) and some comparisons have been made with the genomes of other microbes (Cushion, 2004), but an annotated full sequence has not yet been released.

Cryptococcus neoformans causes cryptococcosis, one of the most serious fungal diseases of humans, particularly in immunocompromised patients. As HIV infection has increased, so there has been a corresponding increase in cases of cryptococcosis in all areas of the world (see the section entitled Clinical groupings for human fungal infections in Chapter 16; CLICK HERE to view now).  Cryptococcus is a basidiomycetous yeast and C. neoformans was the first basidiomycete genome to be fully sequenced (Table 15).

Apart from an increasing number of Cryptococcus genomes (representative of different clinical isolates), the only other basidiomycete genome sequences released at the time of writing are those of the filamentous saprotrophic mushroom Coprinopsis cinerea, the rust fungus Puccinia graminis, and the biotrophic plant pathogen Ustilago maydis (Table 16), which make up a fairly heterogeneous group of organisms. Several other basidiomycete genome projects are described as ‘in progress’, including Agaricus bisporus, Phanerochaete chrysosporium (described as ‘complete with gene predictions and other features’), Schizophyllum commune (said to be ‘complete, annotation under way, not yet released’), Lentinula edodes, Pleurotus ostreatus ('genome sequencing in progress, not released'), and Armillaria species. With regard to these projects we suggest you refer to the World Wide Website at http://www.basidiomycetes.org, which is intended to provide a central location for organising links to research and public data on Basidiomycota.

Table 15. Summary of the basic genome statistics of other members of the Basidiomycota available in May 2009

Species and strain designation

Size of complete genome sequence (Mb)

number of chromosomes

%GC content

Number of predicted protein-coding genes

Number of predicted tRNA genes

Number of predicted rRNA genes

Cryptococcus neoformans serotype A isolate variety grubii H99

18.87

14

48.23

6,967

148

6

Coprinopsis cinerea okayama7#130

36.29

13

51.67

13,392

267

3

Puccinia graminis var. tritici

88.64

N/A

43.35

20,567

N/A

N/A

Ustilago maydis

19.68

N/A

54.03

6,522

N/A

N/A

N/A = not available

Data from:

Consideration of the many uses of a genome sequence started by focussing on the human genome (Sharman, 2001) and came up with these activities:

  • studying the proteins and RNA of the proteome and transcriptome (and perhaps deciding how to change them to serve our own purposes);
  • establishing the genetic basis of interactions between organisms, especially pathogenesis and the mechanisms of disease, but including more benign relationships such as mutualisms and mycorrhizas;
  • comparing genome sequences from related organisms to examine genome evolution and relationships between organisms at the genomic level (for example: how/if genes are conserved in different species; how relationships between genomes compare with conventional taxonomic classifications; studying mechanisms of speciation).

Some comparative genomics can be done even with the small amount of data we have included above. You can make the clear generalisation by comparing Tables 11 to 15 that fungi that exist naturally as yeasts have much smaller genomes than filamentous fungi; so perhaps the yeast life style is a highly reduced adaptive specialisation. You could collect more data from the FGI website to test this suggestion. You might even use some of the genome analysis tools on that website to attempt to identify sequences that exist in filamentous forms but not in yeast forms, but in this discussion we want now to proceed from analysing genomes to manipulating them.

Updated December 17, 2016