"ANALYSIS OF COMPLEX NETWORKS USING LIMITED INFORMATION"
Complex networks have attracted considerable attention in the scientific community, and in popular culture, in recent years. This interest is motivated in part because complex networks provide a natural framework for studying a wide range of systems in nature and society, and in part because the behavior of these networks is often quite surprising. For instance, it is by now well-known that many complex networks exhibit a “robust, yet fragile” character, so that small perturbations to otherwise reliable systems can lead to catastrophic failure; this phenomenon manifests itself as diversely as cascading failures in electric power grids and other infrastructure networks, crashes in financial markets, mass extinctions, and fads and social movements. Understanding this property of complex networks is clearly of great scientific and practical interest, and various analysis frameworks have been proposed to explain it. Complex networks have an additional property, much less studied but extremely important: despite their complexity, it is often possible to extract deep, quantitative information about them using only limited observations of their behavior. As an example of the basic idea, consider the problem of obtaining an understanding of the behavior of a large organization through the study of communication patterns of its members. Remarkably, studying only (the time series of) who contacts whom, without regard to the content of the communications, can provide deep insight into the objectives, strategies, activities, and efficacy of the organization, its subgroups, and even individual agents. While this example is only a sample of the broad class of analysis problems of interest to us, it possesses the main characteristics of these problems: the network is large, dynamic, and complex, the information sought is semantically rich and quantitative, it is desired to extract this information efficiently and accurately, and the available data provide only a limited view of the system. It is, of course, fair to ask whether it is really possible to learn much about the behavior of complex networks from only limited observations, i.e., whether the claimed “deep information from limited data” property is indeed possessed by a broad and important class of (complex) networks. The main contributions of this paper are to rigorously establish the existence of this property, to show that it is, in fact, the dual of the robust, yet fragile property, to develop a systematic methodology for exploiting it, and to demonstrate the power of the proposed information extraction process through implementation with “real world” problems. A key step in the development is the introduction of a new class of systems, termed complex additive systems (CAS), which result from incremental evolution of system configuration driven by (myopic) response to failures and adoption of innovations. It is shown that the CAS framework provides a natural environment within which to model a wide range of networks in nature and society. Moreover, we establish that CAS evolution leads to networks with considerable structure and important properties, including the robust, yet fragile and deep information from limited data properties, and propose a systematic, efficient process for exploiting the properties to extract information from CAS. An interesting aspect of the development is the demonstration that broad classes of CAS admit simple vertex dynamics models, which enable rigorous, yet tractable network analysis. We illustrate the power of this approach by applying it to several “real world” networks in a series of case studies. We begin by showing that it is possible to accurately and robustly identify important agents and collaborating agents in social networks by studying only communication “metadata” (e.g., who sends whom email); the particular application of these results to terrorist and criminal networks is also briefly explored. Next we analyze biological networks, and present algorithms which use only network topology data and basic evolutionary principles to accurately identify “lethal” genes in the gene regulatory network of the yeast S. cerevisiae and functional modules in the metabolic network of the bacterium E. coli. Finally, we consider the problem of understanding the susceptibility of complex networks to cascading failures and predicting their occurrence, and develop a systematic approach to conducting such analysis; the feasibility of the proposed methodology is indicated through a study of electric power grid failures and financial market crashes.