Systems Approach to Metabolism - CHEMICAL BIOLOGY

CHEMICAL BIOLOGY

Systems Approach to Metabolism

Kiyoko F. Aoki-Kinoshita, Department of Bioinformatics, Faculty of Engineering, Soka University, Hachioji, Tokyo, Japan

Minoru Kanehisa, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan

doi: 10.1002/9780470048672.wecb589

The network structure of pathways can be studied from two complementary viewpoints; as networks of enzymes or as networks of chemical compounds. This structure enables more in-depth analysis into metabolic pathways. From these networks, new features regarding pathways on both the local and the global levels can be detected. On the one hand, global features such as the scale-free property of pathways have attracted much attention from the bioinformatics community. On the other hand, local features of networks such as pathway modules can retrieve and characterize subnetworks of related genes that are potentially involved in a particular function of the metabolic pathway. Systems analysis of metabolic pathways must focus not only on existing pathways, but also on reconstructing pathways for new genomes or filling in information regarding missing enzymes. By using the vast amounts of genomic data available, it is possible to reconstruct the metabolic maps of new genomes. Such genomic information has proved useful to refine prediction methods, and they can be complemented with chemical-based information that is inherent in the same network. Overall, a systems approach to metabolism covers the realm of both the genomic and the chemical worlds in an integrated manner. We will show that the concepts of local network features in terms of both these worlds produce modules that can be integrated such that the global view of metabolism can be grasped. The current findings will be described systematically while also involving manual curation such that biologically accurate systems can be produced for analysis.

In bioinformatics, the term “systems approach” is often contrasted to the reductionist approach, in which a large system is broken down into its parts and the parts are studied individually. That is, based on systems theory, a network is studied from the perspective of the organization (relationship) of its parts, from which patterns may emerge. Therefore, we look at the metabolic network in an integrated manner covering the realm of both the genomic and the chemical worlds to identify features that emerge from the network. We will show that the concepts of local network features in terms of both these worlds produce modules that can be integrated such that the global view of metabolism can be grasped. The current findings will be described systematically while also involving manual curation such that biologically accurate systems can be produced for analysis. The recent advancements in systems analysis of metabolic pathways will be introduced.

Systems Analysis of Metabolic Pathways

Metabolic pathways have been illustrated using simple diagrams since before the human genome project and related bioinformatics projects had begun. With the involvement of computer science techniques, however, systematic approaches to modeling metabolic pathways have progressed quickly, with various aims that range from metabolite analyses to pathway prediction and reconstruction (1-3). Systems analysis has come to incorporate graph-theoretic techniques on the one hand, and physics on the other hand, in the attempt to elucidate the complex functioning of the cellular system. In terms of graph theory, in particular, the network structure of pathways has been studied with complementing views by considering them as networks of enzymes or as networks of chemical compounds to capture more and important information. From these networks, new features regarding pathways on both the local and the global levels have been detected. On the one hand, global features such as the scale-free property of pathways have attracted much attention from the bioinformatics community. These properties have shown that metabolic networks are not so different from other well-known networks such as social networks and the Internet. They also helped to characterize networks in a systematic manner such that particular enzymes that are either undefined (missing) or have important roles in the network could be identified and studied in more depth. On the other hand, local features of networks such as pathway modules can retrieve and characterize subnetworks of related genes that are involved potentially in a particular function of the metabolic pathway. This latter approach of characterizing modules has been supplemented with gene expression information and analyses of chemical reaction patterns not only to infer the function of the genes involved in the particular module, but also to infer the evolution of pathways.

Systems analysis of metabolic pathways needs to focus on existing pathways; it can also be used to reconstruct pathways for new genomes or to fill in information regarding missing enzymes. By using the vast amounts of genomic data available, it is possible to reconstruct the metabolic maps of new genomes. Information of orthologous groups of genes combined with pathway data enables such predictions. Furthermore, the integration of data from a variety of resources such as microarray expression data and localization data can be incorporated in new advanced models to predict and to fill in the gaps in pathways for missing enzymes (3-5). Such genomic information is useful to refine prediction methods, and they can be complemented with chemical-based information that is inherent in the same network (6). Methods for pathway prediction can use a systematic approach to classify chemical reactions based on the specific structures of the chemical compounds involved. Because computer science techniques from graph theory can and have been applied directly for these analyses, some methods will be described later.

Network Structure

The analysis of network structure from the viewpoint of computer theory requires an introduction to some background information, which will be provided here. We will introduce the data involved for modeling metabolic pathways and the KEGG pathway database in particular. Furthermore, a basic introduction to graphs as used in computer science will be provided.

Background: data models

Several databases for metabolic pathways are available currently from the Internet, and some major representatives are listed in Table 1. KEGG (Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan), BRENDA (Institute of Biochemistry, University of Cologne, Germany), and Bio- Cyc (SRI International, Menlo Park, CA) may be considered the most well known for systems analysis of pathways. In this manuscript, we will refer to the data from KEGG (Kyoto Encyclopedia of Genes and Genomes) (7) at http://www.genome.jp/ and make note that the analyses presented may be applied to other databases as well.

Table 1. Some representative metabolic pathway databases

Name

Provider

Description

Biocatalysis/Biodegradation Database

Biochemical Pathways

BioCyc Knowledge Library

Biomolecular Interaction Network (BIND)

BRENDA

Cell Signaling Networks Database

Enzymology Database

Kyoto Encyclopedia of Genes and Genomes (KEGG)

University of Minnesota

ExPASy

SRI International

Institute of Biochemistry, University of Cologne, Germany

National Institute of Health Sciences, Japan

Argonne National Laboratories

GenomeNet

Microbial biocatalytic reactions and biodegradation pathways for xenobiotic and chemical compounds

Biochemical pathways

Consists of EcoCyc and MetaCyc; collection of metabolic pathways for individual organisms and a reference source on metabolic pathways from many organisms, respectively

Interaction, molecular complex, and pathway records

Collection of enzyme functional data classified according to the Enzyme Commission (EC) list of enzymes

Signaling pathways of human cells, compiling information on biologic molecules, sequences, structures, functions, and biologic reactions which transmit cellular signals

Detailed information on a large number of enzymes from the literature

Computerize knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes, providing links from the gene catalogs produced by genome sequencing projects

KEGG provides a view of a global “reference map” of pathways, which are categorized into various groups as listed in Table 2. The same metabolic pathway is distinguished between different organisms by coloring the appropriate genes in the reference map. This reference map can display all possible genes and networks from all organisms in a single drawing to provide a “bird’s eye view” of the metabolic network. These maps also contain the chemical compounds that are catalyzed by the respective enzymes, which provide another source of information to be integrated into metabolic systems analysis.

Table 2. Categories of KEGG pathway maps

1.

Metabolism

a. Carbohydrate

b. Energy

c. Lipid

d. Nucleotide

e. Amino acid

f. Other amino acid

g. Glycan

h. PK/NRP

i. Cofactor/vitamin

j. Secondary metabolite

k. Xenobiotic

2.

Genetic Information Processing

3.

Environmental Information Processing

4.

Cellular Processes

5.

Human Diseases

6.

Drug Development

All genes in all organisms with completely sequenced genomes are cataloged in the KEGG GENES database. Furthermore, KEGG provides a categorization of biologic data with its BRITE resource. One major component of BRITE is the KEGG Orthology (KO) database, which contains ortholo- gous groups of genes based on pathway information. That is, those enzymes that appear in the same location in the same map can be compared across genomes because of the manner in which the KEGG pathways are organized. These genes are compared based on sequence similarity and bi-directional best hit information in pairwise genome comparisons. This results in orthologous groups of genes that are based not only on sequence information, but also on pathway information that has been manually curated from the literature. Such biologic information incorporated into the data ensures that the resulting catalog of gene groups is truly meaningful.

Another resource that is useful in metabolic pathway analysis is the KEGG COMPOUND database of chemical compounds. This database is supplemented by the database of reaction information, which consists of REACTION, RPAIR, and ENZYME. The reaction formulas (chemical equations) in the ENZYME nomenclature as well as those taken from the KEGG pathways are stored in the REACTION database, which contains, among others, the stoichiometry of substrates and products in an enzymatic reaction. To trace the atomic changes of substrates and products, the RPAIR database is constructed by decomposing each chemical equation into a set of substrate-product pairs.

The RPAIR database contains chemical structure alignments of substrate-product pairs (reactant pairs) and chemical structure transformation patterns, which were generated computationally and curated manually from all known enzyme-catalyzed reactions. These patterns are called RDM patterns, which describe biochemical structure transformations and represent KEGG atom type changes in a reaction. KEGG atom type changes are defined at the reaction center atom (R atom), its neighboring atoms in the different (mismatched) region (D atom), and the matched region (M atom), based on a graph-based alignment of the compounds involved in the reaction. (The definition of a graph-based alignment between two compounds is described in the next section.) Figure 1 illustrates these RDM atoms. Because these transformation patterns generalize complex enzymatic reactions, given a new set of chemical compound structures, the reactions that could possibly take place between them can be predicted.

Figure 1. RDM atoms in a chemical reaction.

Background: algorithms

The bioinformatics field has enabled the use of algorithmic techniques from computer science to analyze vast amounts of data efficiently and accurately. For the study of networks, graph models are most appropriate, and numerous algorithms exist for studying graph objects. A graph is defined as a set of nodes connected by edges, in which a node represents a specific object such as a particular chemical compound or a particular enzymatic protein, and an edge represents the relationship between two different nodes, such as the catalysis of one compound into another or a protein-protein interaction. Thus, a graph can be defined as a set of nodes V = {v0, v1, ... ,vn} and a set of edges E = {e0,e1, ... ,em}, in which any edge in E connects exactly two nodes in V and no two edges share the same pair of nodes. A directed graph is a graph whose edges define a source and a target; the direction of the graph is defined, such as the direction of an irreversible reaction from substrate to product. In contrast, an undirected graph does not define any direction on the edges. The degree of a node is defined as the number of nodes with which it shares an edge. A subgraph of a graph is a graph that contains a subset V' of the nodes in V and all those edges in E that connect those nodes in V'. Consequently, a subnetwork is a subgraph of a network modeled as a graph. We will also define here NP-completeness. A problem is NP-complete if a solution to the problem can be verified quickly, but a solution itself is difficult to find efficiently. For example, the Hamiltonian path problem is a well-known NP-complete problem. Given an undirected graph, the problem is to find a path in the graph that passes through all nodes exactly once. This path is most difficult to find, but given a path, it is easy to verify whether the given path in the graph solves this problem.

Efficient methods exist to test whether two graphs are similar (or isomorphic). However, the problem of deciding whether a subgraph of one graph is isomorphic to another is known to be an NP-complete problem (8). Nevertheless, many heuristic algorithms to find as accurate and efficient a solution and algorithms that can efficiently find a solution in a more restricted search space have and continue to be developed (9). In particular, the search for frequent patterns or motifs in graphs is a popular problem for which these heuristics can be applied (10).

Note that chemical compounds themselves can be modeled as graphs, with atoms being represented by nodes and bonds represented by edges. Thus, chemical compound similarity can be measured using algorithms for graph comparison. These similarity scores can be obtained based on the alignment of two compounds and the degree of agreement in the alignment. This fundamental concept is used to determine RDM patterns for chemical reaction classification, as described previously.

The metabolic network is a dual network, which may be viewed as a graph that consists of enzymes as nodes and their connections in the pathway as edges (which we define as the metabolic enzyme network), or as a graph that consists of chemical compounds (substrates and products) as nodes and reactions (catalyzed by enzymes) as edges (which we define as the metabolic compound network). Here, one network can be obtained from the other by performing a line graph transformation (11) on the nodes and edges. This transformation is performed by reversing the nodes and the edges. Formally stated, given an undirected graph G , its set of nodes is defined as V(G) and its set of edges is defined as E(G). Another graph called the line graph of G, represented as L(G), can be associated with G by setting V(L(G)) = E(G), in which two vertices are adjacent if and only if they have a common endpoint in G. That is, E(L(G)) = {{(u,v),(v,w)}|(u,v) ∈ E(G),(v,w) ∈ E(G)}.

Global network features

One of the earliest features that characterizes metabolic compound networks is the scale-free property, which was derived from the finding that the probability that a node can interact with k other nodes, which is the degree distribution P(k) of a metabolic compound network, decays as a power law P(k) ~ k — γ with γ ≈ 2.2 in all organisms (12-14). This scale-free property ultimately illustrated that biologic networks were not as different from other nonbiologic networks as thought previously, and that metabolic compound networks of almost all organisms thus exhibited robust and error-tolerant properties as a result. Moreover, an analysis of the scale-free properties of the line graphs of metabolic compound networks (that is, the properties of the metabolic enzyme networks) was performed (15). The network properties of the metabolic enzyme networks are not exactly one-to-one to the metabolic compound networks because several reactions may have common products, which reduce effectively the number of edges in the transformed network. Nevertheless, it was found that the scale-free power-law distribution was still preserved in the metabolic enzyme network, with only a small (less than one) difference between the exponents.

However, it was also found that “hubs” of highly connected nodes, such as pyruvate and coenzyme-A, also existed in metabolic compound networks, in which these nodes were highly connected and interacted with many other nodes. It was proposed that these metabolic networks were actually arranged in a hierarchical manner (16), where highly connected modules would be connected to one another in a scale-free manner. These modules would in turn form clusters that would then be connected to other clusters at a higher level, and so on. Such properties can be taken advantage of to infer the function of the genes involved in each corresponding module at various levels of the hierarchy. This method would actually correlate well with the fact that networks of genes are not necessarily working alone, but function in concert with other proteins and complexes at higher levels. This finding in fact correlates surprisingly well with results published recently based on graph-theoretical analysis of gene-regulatory networks in Bacillus Subtilis (2). That is, because only a subset of genes is actually active at any one time, the dynamic topology of gene regulatory networks was taken into consideration in this work, as opposed to the full static network. As a result, a hierarchical scale-free network emerged.

Local network features

Commonly occurring patterns in metabolic networks, or network motifs, which can be found using heuristics for finding frequent subgraphs, have shown promise of functional inference (17). Recently, however, critiques have been raised saying that such functional inferences must also take into consideration evolution (18). As such, work on extracting phylogenetic modules from metabolic enzyme networks demonstrated that such functional units are indeed conserved across evolution (19). In this work, phylogenetic profiles were constructed for all the enzymes in the metabolic reference map of KEGG. Using the Jaccard coefficient as a similarity measure, all enzymes were clustered hierarchically based on phylogenetic profiles. Then, edges between the enzymes were added based on the edges in the metabolic network. Finally, clusters were created within each cluster based on these new edges between enzymes. These small clusters were thus defined as phylogenetic network modules, in which enzymes that have similar phylogenetic profiles are close to one another in the metabolic network. In preliminary studies, the enzyme clusters were constructed using only the similarity between phylogenetic profiles differed from those that resulted from the final network modules that metabolic network connectivity, which indicates that phylogeny should indeed be incorporated in metabolic module analysis. These phylogenetic modules also demonstrated that this final network possessed hierarchical network features, such that hubs of important genes exist, but that these hubs are connected by more sparsely linked genes that work as linkers between these hubs to connect the entire network as a whole.

The concept that modules comprise the traditional pathways is gaining more focus as basic functional building blocks (20). Gene expression patterns in pathways and their formation of modules has been an intense topic of study (21, 22). These pathways combined with flux balance analysis have also provided interesting results about the metabolic pathway of yeast (23) and Escherichia coli (24). The latter involves steady-state analysis using reaction stoichiometry information, such as those stored in the KEGG REACTION database, and it is gaining renewed interest for systematic analysis of metabolic networks (6).

Functional Network Inference

In addition to the topological features of networks, other sources of information can and should be incorporated to take a step further into inferring function from the hierarchically organized modules of metabolic networks.

Metabolic reconstruction: genome to pathway mapping

The term metabolic reconstruction refers to the process of linking the genomic repertoire of enzyme genes to the chemical repertoire of metabolic pathways. That is, a metabolic enzyme pathway can be inferred given a set of enzymes (25). This task can be done by first referring to the existing pathway maps in which the involved genes are known. By using the genomic information of multiple (related) species and comparing them against these pathways, ortholog groups involved at specific nodes in the pathways can be identified. This method is the basis of the KO system. Correspondingly, the entire metabolic pathway of an organism can be inferred given its genome. That is, the KO system can be used to reconstruct a metabolic enzyme network by first referring to the genes known to be in a particular organism. Once the KO groups in which these genes are involved are identified, the nodes in the metabolic pathway in which these genes participate can be reconstructed. Thus, new sets of genes can then be compared against the KO groups to reconstruct the metabolic pathways in which the input genes may be involved.

Integration of heterogeneous datasets

Because the metabolic pathway is in fact a complex process of various degrees of interactions between biomolecules, the integration of the main components and their fundamental interactions are important for the extraction of the functional modules and the identification of their roles in the network. For example, information on cellular components and their interactions can be incorporated to reconstruct metabolic networks more accurately compared with genome annotation and/or sequence information alone (4). This involves the incorporation of data from multiple data sources, such as KEGG (for pathway and genomic data) and PSORTdb (for subcellular localization data). In addition, work has been done to integrate stoichiometric and bibliomic data for reconstructing the human metabolic network (3).

To incorporate an even wider variety of biologic data for predicting missing enzymes in metabolic enzyme networks, kernel methods are used. A kernel is a mathematic function that can take as input a variety of data for a specific set of entities and transform it such that the input entities can be classified as distinctly as possible. This method consists of two steps: a training phase and a test phase. The training phase consists of using data for which the properties are known in advance. Then, the test phase can be used to assess the applicability of the properties to new input data sets.

In terms of a metabolic network inference that uses multiple sources of data, as an example, for a given set of genes, expression, genomic context, chemical, and phylogenetic information can be used to train a kernel function to infer a metabolic network. This task is done by developing the kernel function such that a score is obtained for every pair of genes. If this score exceeds a particular threshold, then the corresponding genes are considered to be related, and an edge can be drawn between them to form the inferred network. The incorporation of chemical information in this work was attempted in two ways: preintegration and postintegration, to enforce chemical restraints in an indirect and direct manner, respectively. In the indirect manner, all input sources are compared and contrasted with the chemical restraints, whereas in the direct approach, the chemical restraints are applied after an initial network is obtained. This latter approach ensures that chemical compatibility is maintained. As a result, several enzymes were identified to fill in the missing nodes of the metabolic enzyme network for yeast (26).

Compound scope

Metabolic pathways may also be analyzed from a chemical standpoint, and it has been surmised that the array of concentrations of relatively simple chemicals in pathways may provide and transfer information for biologic processing (27, 28). Thus, the study of the metabolic reactions that take place in the metabolic compound network comes naturally.

The idea of the “scope” of a chemical compound was defined recently to characterize metabolic compound networks systematically. This idea developed from the fact that the occurrence of a metabolic reaction generally requires the existence of other reactions that provide its substrates, which generates a series of metabolic reactions. In each step of the corresponding expansion process, those reactions whose substrates are made available by previous generations are incorporated (29). Thus, starting with one or more seed compounds, an expansion can result in a final network whose compounds define the scope of the seed. Using all the metabolic reactions in the reference pathways of the KEGG PATHWAY database, the scopes of all metabolic compounds were calculated, and it was found that large parts of cellular metabolism could be considered as the combined scope of simple building blocks. Analyses of various expansion processes revealed that the incorporation of key metabolites such as adenosine tri-phosphate and coenzyme A would increase the network complexity. It was also shown that the outcome of network expansion is in general very robust against the elimination of a single or few reactions, although the elimination of a key reaction would result in a dramatic reduction of scope sizes. As a result, it was hypothesized that the expansion process displays characteristics of the evolution of metabolism, in that the emergence of metabolic pathways over time could be estimated from the systematic analysis of metabolic compound networks (30).

From this work on compound scope, an interesting analysis of the effect of oxygen on metabolic networks and the evolution of life was made possible (31). Recent evidence suggested that the increasing importance of molecular oxygen to metabolic pathways eventually replaced the enzymatic reactions central to anoxic metabolism in aerobic organisms (32). Thus, by comparing metabolic compound networks under oxic and anoxic conditions, the effect of the presence or the absence of oxygen on the complexity of specific seed compounds could be determined. Based on the reference pathways for metabolism in KEGG, O2 was found to be among the most used compounds, superseding even adenosine tri-phosphate. Their analyses revealed four subnetworks of increasing complexity, which form a hierarchy such that certain reactions allow transitions between the subnetworks at different levels. Among these four subnetworks, molecular oxygen was required for transition into the largest network. Furthermore, in another analysis of the enzyme distribution across different organisms, it was found that the distributions of enzymes that catalyze oxic networks were not necessarily consistent with the tree of life, which indicates that the adaptation to O2 had occurred throughout the tree of life. These results were supported by data from geologic and molecular evolutionary analyses indicating that all three domains of life had appeared by the time oxygen became widely available (33).

Pathway prediction using RDM patterns

The study of the chemical reactions involved in metabolic compound networks and their scopes can help to predict new pathways. In this case, the RDM patterns defined in KEGG can be used. In fact, an analysis of the RDM patterns in KEGG in the context of their frequency of appearance in the KEGG PATHWAY categories was performed. In particular, the more than 2000 RDM patterns that appear in the metabolic pathways of KEGG were analyzed. Because RDM patterns themselves do not indicate the direction of the reaction, when a reaction in the pathway was defined as reversible, two reactions were generated for the corresponding RDM pattern. The number of unique patterns was counted for each pathway category, and it was found that the reactions in the xenobiotics biodegradation pathways in particular were most distinct compared with the other categories of pathways. In fact, roughly 80% of the RDM patterns were unique to this category. Thus, an attempt was made to use RDM patterns to predict a biodegradation pathway of a new xenobiotic compound.

This task was done by comparing the new compound first against the KEGG COMPOUND database to retrieve a list of candidate compounds that are most similar to the query. The matched compounds are then queried against the RDM pattern library to retrieve a list of putative RDM patterns. In the third step, the query compound is transformed into new possible compounds based on the retrieved transformation patterns. These newly generated compounds are then used iteratively as a new query to repeat the prediction cycle until no new transformations can be found. This approach retrieved successfully the degradation pathway for 1,2,3,4-tetrachlorobenzene (34).

Similar research has attempted to gain insight into protein function prediction based on information hidden in the molecular structure of metabolites (35). Such work may eventually identify the relationship between metabolite structure and protein function, thus possibly improving techniques in the prediction of enzyme function and novel metabolic pathways (36).

Discussion

It may now be generally believed that both metabolic networks can be characterized as hierarchically organized networks of modules that have scale-free properties. Several methods for the analysis of metabolic pathways are actively being developed to understand these functional modules found among them. The results, of course depend greatly on the data being used to find them. It has been shown that the incorporation of genomic and phylogenetic information aids the identification of functionally important modules. In turn, these data can aid phylogenetic analysis and functional annotation of the biologic entities involved.

Work in chemical reaction characterization and analysis enables the prediction of missing enzymes and pathways. The concept of modules of compounds, or compound scopes, defines the extent to which a particular chemical compound plays a role in the metabolic compound network. This research aids in evolutionary analysis of modules as the importance of specific compounds can be directly analyzed based on its scope and the effect it has on the scope sizes of other compounds in the network.

We have illustrated that metabolic enzyme networks and metabolic compound networks are in fact two sides of the same coin. The global analysis of metabolic networks using the line graph transformation illustrated this point nicely. Thus, it is natural to pursue the relationships between those modules found in the original metabolic enzyme network and in the line-graph-transformed metabolic compound network to ascertain their functions. Such integration of knowledge from various aspects is crucial to gain a true understanding of the biologic processes of life.

We note that metabolic systems are studied often in systems biology using dynamics analysis such as flux balance analysis and differential equations. However, discussion regarding systems dynamics is beyond the scope of this current manuscript, and we refer the interested reader to the relevant literature (37, 38). This limitation, however, does not preclude these analyses from the standpoint of integrated systems analysis for understanding the metabolic pathway.

Systems analysis approaches such as those presented in this manuscript have illustrated the importance of systematic and integrated methods of analyzing metabolism. It can be expected that multidimensional data will continue to play an important role in such approaches. Because the consistency of such data will determine the accuracy of the predictions, a balance between the speed of computational techniques and the accuracy of manual curation must be maintained. As long as an over-dependence does not exist on either approach, systems approaches for the study of metabolism should prove to be fruitful.

References

1. Hatzimanikatis V, Li C, Ionita JA, Henry CS, Jankowski MD, Broadbelt LJ. Exploring the diversity of complex metabolic networks. Bioinformatics 2005; 21:1603-1609.

2. Christensen C, Gupta A, Maranas CD, Albert R. Large-scale inference and graph-theoretical analysis of gene-regulatory networks in B. subtilis. Physica A. 2007; 373:796-810.

3. Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BO. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc. Natl. Acad. Sci. U.S.A. 2007; 104:1777-1782.

4. Reed JL, Famili I, Thiele I, Palsson BO. Towards multidimensional genome annotation. Nature Rev. Genet. 2006; 7:130-141.

5. Reed JL, Patel TR, Chen KH, Joyce AR, Applebee MK, Herring CD, Bui OT, Knight EM, Fong SS, Palsson BO. Systems approach to refining genome annotation. Proc. Natl. Acad. Sci. U.S.A. 2006; 103:17480-17484.

6. Nikolaev EV, Burgard AP, Maranas CD. Elucidation and structural analysis of conserved pools for genome-scale metabolic reconstructions, Biophys. J. 2005; 88:37-49.

7. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006; 34:D354-357.

8. Ullman JD. An algorithm for subgraph isomorphism. J. ACM 1976; 23:31-42.

9. Cortadella J, Valiente G. A relational view of subgraph isomorphism. In Proc. Fifth Int. Seminar on Relational Methods in Computer Science, Quebec, Canada, 2000. pp. 45-54.

10. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon, Network motifs: simple building blocks of complex networks, Science 2002; 298:824-827.

11. Hemminger RL, Beineke LW. Selected Topics in Graph Theory, Volume I. 1978. Academic Press, London.

12. Barabasi A-L, Albert R. Emergence of scaling in random networks. Science 1999; 286:509-512.

13. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi A-L. The large- scale organization of metabolic networks. Nature 2000; 407:651-654.

14. Wagner A, Fell D. The small world inside large metabolic networks. Proc. Biol. Sci. 2001; 268:1803-1810.

15. Nacher JC, Ueda N, Kanehisa M, Akutsu T; Flexible construction of hierarchical scale-free networks with general exponent. Phys. Rev. 2005; E 71, 036132

16. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi A-L. Hierarchical organization of modularity in metabolic networks. Science 2002; 297:1551-1555.

17. Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U. Superfamilies of evolved and designed networks. Science 2004; 303:1538-1542.

18. Snel B, Huynen MA. Quantifying modularity in the evolution of biomolecular systems. Genome Res. 2004; 14:391-397.

19. Yamada T, Kanehisa M, Goto S. Extraction of phylogenetic network modules from the metabolic network. BMC Bioinformat. 2006; 7:130.

20. Segre D. The regulatory software of cellular metabolism. TRENDS Biotechnol. 2004; 22:261-265.

21. Ihmels J, Levy R, Barkai N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nature Biotechnol. 2003; 22:86-92.

22. Segal E, Shapira M, Regev A, Pe’er D, Botstein D, Koller D, Friedman N. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 2003; 34:166-176.

23. Bilu Y, Shlomi T, Barkai N, Ruppin E. Conservation of expression and sequence of metabolic genes is reflected by activity across metabolic states. PLoS Computat. Biol. 2006; 2:e106.

24. Segre D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. U.S.A. 2002; 99:15112-15117.

25. Kanehisa M. A database for post-genome analysis. Trends Genet. 1997; 13:375-376.

26. Yamanishi Y, Vert JP, Kanehisa M. Supervised enzyme network inference from the integration of genomic data and chemical information. Bioinformatics 2005; 21:i468-i477.

27. Dyson F. Origins of Life. 1985. Cambridge University Press, Cambridge, U.K.

28. Morowitz HJ. A theory of biochemical organization, metabolic pathways and evolution. 1996. Santa Fe Institute, 96-04-014.

29. Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. Organizing and computing metabolic pathway data in terms of binary relations. Pac. Symp. Biocomput. 1997; 175-186.

30. Handorf T, Ebenhoh O, Heinrich R. Expanding metabolic networks: scopes of compounds, robustness, and evolution. J. Mol. Evol. 2005; 61:498-512.

31. Raymond J, Segre D. The effect of oxygen on biochemical networks and the evolution of complex life. Science 2006; 311:1764-1767.

32. Raymond J, Blankenship RE. Biosynthetic pathways, gene replacement and the antiquity of life. Geobiology 2004; 2:199-203.

33. Xiong J, Fischer WM, Inoue K, Nakahara M, Bauer CE. Molecular evidence for the early evolution of photosynthesis. Science 2000; 289:1724-1730.

34. Oh M, Yamada T, Hattori M, Goto S, Kanehisa M. Systematic analysis of enzyme-catalyzed reaction patterns and prediction of microbial biodegradation pathways. J. Chem. Inf. Model 2007; 47:1702-1712.

35. Nobeli I, Ponstingl H, Krissinel EB, Thornton JM. A structure- based anatomy of the E. coli metabolome. J. Mol. Biol. 2003; 334:697-719.

36. Hatzimanikatis V, Li C, Ionita JA, Broadbelt LJ. Metabolic networks: enzyme function and metabolite structure, Curr. Opin. Struct. Biol. 2004; 14:300-306.

37. Segre D, DeLuna A, Church GM, Kishony R. Modular epistasis in yeast metabolism. Nature Genet. 2005; 37:77-83.

38. Schwartz J-M, Gaugain C, Nacher JC, de Daruvar A, Kanehisa M. Observing metabolic functions at the genome scale. Genome Biol. 2007; 8:R123.

Further Reading

Caspi R, Foerster H, Fulcher CA, Hopkinson R, Ingraham J, Kaipa P, Krummenacker M, Paley S, Pick J, Rhee SY, Tissier C, Zhang P, Karp PD. MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res. 2006; 34:D511-D516.

Kotera M, Okuno Y, Hattori M, Goto S, Kanehisa M. Computational Assignment of the EC numbers for genomic-scale analysis of enzymatic reactions. J. Am. Chem. Soc. 2004; 126:16487-16498.