18.5 Understanding fungal genetic structure

We have a long heritage of using fungi and fungal products. Some of our current fungal biotechnology, such as baking, brewing and the numerous fermented food products described in Chapter 17, originated hundreds or even thousands of years ago and largely by the chance association between natural fungi and one or more of the constituents of the food material. Although the original discovery of penicillin was also a matter of chance (see the section Antibiotics and other pharmaceuticals in Chapter 17 CLICK HERE to view now) its industrial production in the middle of the 20th century was a much more directed process, as was the development of other products such as citric acid (see the section Citric acid biotechnology in Chapter 17 CLICK HERE to view now). Yet all improvements were at the organismal level. Techniques were found that enabled cultivation of particular organisms, and strains were selected that had advantageous biological characteristics.

In the second half of the 20th century the rapidly accumulating knowledge of fungal genetics was brought to bear, and from what we have discussed so far it must be clear that a thorough understanding of the molecular genetics of fungi is essential to future exploitation. The basic genetic architecture of fungi is typical of eukaryotes in general (see the section entitled Nuclear genetics in Chapter 5 CLICK HERE to view now). All the major principles of eukaryote genetics apply in fungi; gene structure and organisation, Mendelian segregations, recombination, and the rest (Moore & Novak Frazer, 2002).

Chromosome maps constructed solely from recombination frequencies have a limited resolution, although in microorganisms large numbers of progeny can be scored, so reducing this problem to some extent. When the Saccharomyces cerevisiae genome sequencing project began in 1989, the conventional genetic map consisted of more than 1 400 markers, an average of one every 3.3 kb, and this was detailed enough for the sequencing programme without the need for much more physical mapping. However, S. cerevisiae was one of the two really intensively mapped eukaryotes at the time (the fruit fly, Drosophila, being the other), so physical mapping is necessary to improve the marker density in other fungi as they are included in genome sequencing programmes.

Physical mapping procedures include restriction mapping, which establishes the positions of restriction endonuclease recognition sites in a DNA molecule, locating markers on chromosomes by hybridising marker probes to intact chromosomes, and mapping known sequences in genome fragments using PCR and hybridisation. The ideal is to establish the locations of unique sequences, which are not duplicated at any other site, as markers spaced about 100 kb apart (that’s just less than 1% recombination in traditional genetic crosses) throughout the genome. The collection of such markers is known as a mapping panel.

Use of mapping panels is essentially a management technique for sharing the effort between participating laboratories, something which has been common for many years in several physical sciences, like particle physics and astronomy, though genome sequencing represented the entry of biology into the 'big-science' league. The first example of really big fungal science was the programme to sequence the yeast genome which was initiated in 1989 by the European Commission. The project involved 35 European laboratories at the outset, and the first sequence of a complete chromosome was published in 1992. Eventually, over 600 scientists were involved, at locations in Europe, North America and Japan, and progress involved distribution of DNA fragments to the contributing laboratories by the DNA coordinator. The complete sequence of the yeast genome was published in 1997.

The genome is made up of the entire DNA content of a cell. Eukaryotes and prokaryotes have quite different types of genome but it is generally assumed that something like the prokaryotic grade of organisation is the primitive form from which the eukaryote organisation evolved. Modern prokaryotes and eukaryotes have a great deal in common (see Chapter 5); the DNA of a gene is transcribed into RNA, which is called a messenger RNA (mRNA) if it is a transcript of a protein-coding gene, and the mRNA is translated into protein by the ribosomes and other translation machinery. The part of a protein-coding gene sequence that is translated into protein is called the open reading frame, usually abbreviated ORF.

As a genome sequence is assembled the functional genes in the sequence are recognised as open reading frames (ORFs); the process is called genome annotation and is discussed in more detail below. Not all of the ORFs that are identified can be associated with a gene of identified function; an ORF specifying a product that does not resemble a known protein is called an unidentified reading frame, or URF. But comparative genomics does more than identify the genes. It can show the evolutionary relationships between different organisms, and aids understanding of how the genotype relates to life-style and environment.

Characteristically, the ORF is read in the 5' to 3' direction along the mRNA, and it starts with an initiation codon and ends with a termination codon (Fig. 14). Nucleotide sequences that occur in the mRNA before the ORF make up the leader sequence, and sequences following the ORF make up the trailer segment. Many eukaryotic genes are split into exons (meaningful segments) and introns (sequence segments that do not contribute to the protein-coding sequence). The introns are removed from the primary RNA transcript by the splicing machinery to form the functional mRNA (Fig. 14; and see the section entitled The nucleus in Chapter 5 CLICK HERE to view now).

However, the smallest eukaryotic genomes (like yeast) are in the region of 10 Mbp, and the largest are over 100 000 Mbp (in vertebrates and plants), so we can observe even more surprising structural differences when we compare other eukaryotes (see Table 2 in Chapter 5 CLICK HERE to view now).

Generally speaking, it appears that space is saved in the genomes of less complex organisms by having the genes more closely packed together and by having much less repetition (Fig. 14). The genome of Saccharomyces cerevisiae (see Table 9 below) contains more genes per unit length of DNA than occur in human or maize DNA.

Basic structure of a typical eukaryotic gene
Fig. 14. Top: the basic structure of a typical eukaryotic gene. The schematic diagram indicates the structure of a type II gene; that is, a protein-encoding gene transcribed by polymerase II. The diagram is not drawn to scale and the relative sizes of the different sections differ between genes and between the eukaryotic Kingdoms. Bottom: comparison of 50 kbp segments of the genomes of the prokaryote Escherichia coli and three eukaryotes to show how the ‘density’ of genetic information varies. In each case the grey boxes correspond to gene sequences, and the white boxes correspond to stretches of repeated sequences. Adapted from Moore & Novak Frazer, 2002.

This small genome is one reason why yeast geneticists and molecular biologists pioneered eukaryote genome analysis. Although some of the more unusual aspects of genome structure observed in higher animals and plants might not be represented in fungi, the genomes of yeast and other fungi remain good models of eukaryotic genetic architecture and their smaller size means that the information they contain is technically more accessible. In terms of genetic information content the organisation of the fungal genome is much more economical than that of higher eukaryotes. Genes are more compact with fewer introns, spaces between genes are short and much less of the DNA is devoted to repetitive noncoding sequences.

Nevertheless, fungi are typical eukaryotes, featuring all the basic cell biology expected of this grade of organisation. Even though the yeast genome is only in the same size range as some of the more advanced prokaryotes, the genetic structure and functioning of fungal genes is representative of all eukaryotes and we can use their sequences to learn about genomics (Moore & Novak Frazer, 2002).

Analysis of the genetic sequences that make up the genome of an organism, and comparisons of the genomes of different organisms (exercises that have come to be known as the science of genomics) only became possible from the mid-1990s. Establishing the exact DNA sequence of a genome is a major undertaking, but is only the prelude to intensive analysis. The priority of genomics is to establish the number and function of genes in an organism.

The first step is probably to look for potential start and stop codons. This identifies a collection of potential open reading frames (ORFs) contained in the overall sequence, which are the potential genes. The sequences of these potential genes are then usually compared with known sequences in databases, because a strong match to a gene that is known in another species is the clearest way of establishing that a gene exists as a protein-coding entity. Comparison of DNA and protein sequences has been made possible only by the development of computer programs that enable the sequence data to be stored and analysed effectively; a new branch of biology called bioinformatics, which assembles, documents, maintains, and analyses very large data sets.

Updated December 17, 2016