"Classification of Biological Networks Via Walks and Words"
Over the last half-decade, researchers in the network community have proposed a diversity of models aiming to describe naturally occurring networks based on fundamentally different mechanisms (e.g., preferential attachment, fitness-selection strategies, duplication with mutation). Generically, these models are validated by measuring a few basic graph properties (e.g., clustering coefficient, mean geodesic length or degree distribution). Since different models can often be tuned to yield exactly the same values for such properties, these features are unfortunately non-discriminative. To obviate this ambiguity, we develop a high-dimensional feature space for graphs, and exploit classification techniques standard in the machine learning community. We then classify biological data sets such as protein-protein interaction networks, neuronal networks, and genetic regulatory networks. The resulting classifications deepen our understanding of the design and evolution of the networks studied. Our conclusions have also guided the development of new network models. We are thus able for the first time to quantify objectively the success of a model in describing a given network without prior choice of graph properties.