One of the hardest things to explain is why complex systems are actually different from simple systems. The problem is rooted in a set of ideas that work together and reinforce each other so that they appear seamless: Given a set of properties that a system has, we can study those properties with experiments and model what those properties do over time. Everything that is needed should be found in the data and the model we write down. The flaw in this seemingly obvious statement is that what is missing is realizing that one may be starting from the wrong properties. One might have missed one of the key properties that we need to include, or the set of properties that one has to describe might change over time. Then why donâ€™t we add more properties until we include enough? The problem is that we will be overwhelemed by too many of them, the process never ends. The key, it turns out, is figuring out how to identify which properties are important, which itself is a dynamic property of the system.

To explain this idea we can start from a review of the way this problem came up in physics and how it was solved for that case. The ideas are rooted in an approximation called "separation of scales."

Consider a block sliding down an inclined plane. In a traditional approach, micro and macro scales are treated separately. To address dynamics at the micro scale—the molecules—we average over them and, using thermodynamics, describe their temperature and pressure. To address dynamics at the macro scale—the motion of the block on the inclined plane—we use Newtonian physics to talk about their large scale motion (see Fig. 1). In this case, the pieces can be considered to be acting either independently, like the random relative motion on the micro scale, or coherently, like the average motion on the macro scale. Since the scales are sufficiently distinct, separated by orders of magnitude, we do not encounter a problem in describing them separately. Finally, often unstated, the structures of the block and the plane are considered fixed.

Thus, traditionally, there were three aspects of a system: fine scale, dynamic, and fixed. A glass of water on a table with an ice cube in it might be treated by considering the movement and melting of the ice cube, the average over molecular vibrations, and the fixed structure of the glass. At longer time scales, the water will evaporate, the glass will flow, the table may rot, but this is not important at a particular scale (or a range of scales) of observation.

Consider the earth viewed from space. The earth is highly complex. Still, we can describe it as a planet orbiting the sun in a predictable fashion. Most of the details of what happens on Earth play no role at the scale of its orbit. For the earth, at the orbital scale, all the internal structure can be averaged to a point. The bodies of the solar system are assumed unchanging and the material of each of them is separated from other solar objects. The dynamic behavior can then be modeled and predicted.

When separation of scales works, we can describe not only the system as it exists in isolation, but also how it responds to external forces. Forces that act on the earth at the scale of orbital motion couple to the dynamic behavior that occurs at that scale. Thus if we were to consider a new celestial body entering the solar system, unless it disrupted the structure of the system (i.e. by shattering a planet) and as long as we continue to be interested in the scale of orbital motion, we can describe the behavior of the system using these same degrees of freedom.

For complex systems, it is still true that the questions we most want to answer have to do with the larger scale information. Significantly, the scale of description and scale of interactions are similar. When we have a description of the larger scale behavior we are also considering the larger scale impacts of the environment on the system and reciprocally.

But many systems, especially those we are interested in understanding and influencing, are not well described by separate micro and macro scales. Consider a flock of birds. If all of the birds flew independently in different directions, we would need to describe each one separately. If they instead all went in the same direction, we could simply describe their average motion. However, if we are interested in their movement as a flock, describing each bird's motion would be too much information and describing the average would be too little information. Similarly, for traffic jams, market behavior and weather, the average behavior is not enough and all the details are too much to be useful. Understanding complex behavior that is neither independent nor coherent behavior is best described across scales. This requires knowing which information can be observed at a scale of interest.

The importance of multiscale ideas is apparent in an approach developed in statistical physics beginning in the 1970s anchored in the method of renormalization group. Modeling in this framework allows distinguishing what can be observed at the largest scale. To explain the concepts of this formalism, we describe its development in the study of materials.

Central to the study of matter is that movements of individual atoms are not visible to us. Instead, we use pressure, temperature and volume to describe both what we see and how we can manipulate matter using forces. For example, a piston compressing a gas reduces the volume and increases the pressure, and heat transfer to a material causes its temperature to rise. The key concept underlying our ability to make such descriptions is scale: The fine scale (microscopic) behaviors of atoms are not important to an observer or to their manipulation of a system; and the large scale (macroscopic) properties we observe and manipulate reflect average or aggregate properties of atomic motion.

This approach was formalized in the 1800s through statistical physics. It appeared to solve the problem of determining properties of a material in equilibrium by minimizing the free energy relative to the macroscopic variables. This almost always works. However, in the study of phase transitions, e.g. between water and steam or between ferromagnet and paramagnet, properties were found not to be correctly given by this method for special conditions called second order phase transition points. This phenomenon proves to be a relatively simple illustration of a complex system, where the elements act neither fully independently nor fully coherently, and the separation of scales breaks down.

Consider the transition between water and steam. At a particular pressure we can cause a transition between water and steam by raising the temperature. At the transition temperature the density changes abruptly---discontinuously. As we raise the pressure, we compress the steam and the change in density at the transition temperature decreases (see Fig. 2). There is a point where the transition stops, and there is no longer a distinction between water and vapor. This end point is called a second order transition point, at the end of the first order transition line. Near this point, the discontinuity of the density between liquid and gas phases becomes zero (hence the term second order transition). The way it does so has the form of a power law *ρ ∝ x ^{β}*, where

This surprising discrepancy between observations and theory compelled a dramatic change in our understanding. Our usual methods of calculus and statistics fail at this point because their assumptions no longer hold true. Calculus assumes that matter is smooth and statistics assumes that averages over large numbers of objects are well defined. Away from the critical point these assumptions are justified, since the microscopic behavior of atoms is well separated from the macroscopic behavior of the material as a whole. Different parts of the material appear essentially the same, making it smooth, and any (local) average over atomic properties has a single well defined number. However, at the critical point, the density fluctuates—between water-like and vapor-like conditions—so that the material is not smooth and the average taken of the material as a whole is not representative of the density at any particular location or time. Near the critical point, the matter is composed of patches of lower and higher density, and this patchiness occurs on all scales, even at the macroscopic scale.

In order to mathematically solve this problem, the renormalization group was developed. In the renormalization group method, we consider the system at multiple scales (levels of resolution). The spatially varying macroscopic density or magnetization at one level of resolution is related to that at a larger scale by performing local averages rather than a global average. This averaging relates the free energy at one scale of observation to the free energy at a larger scale. The properties of the system can be found from how the behavior varies with scale. The mathematics is not easy, but it yields exponents that agree with the phenomenology. Since its development, renormalization methods have enabled many advances in addressing questions about the structure and dynamics of materials.

The reason that different results were obtained is that the free energy in this case is not just a function of the average density. Still, it is not necessary to consider interactions among individual atoms. For a liquid undergoing transition to a vapor, the free energy depends on the spatial variation of the density, i.e. how the local densities at different locations interact with each other. There are many possible interactions between local densities that could contribute to the free energy. However, only some of them are important. The renormalization group is a method for determining which parameters describing the interaction are important and which are not. "Relevant" parameters are those parameters of the free energy that increase with scale; "irrelevant" parameters are those that decrease with scale. Because there are so many atoms in matter, the irrelevant parameters cannot affect our observation. We can consider only the relevant parameters. We might measure irrelevant parameters microscopically, but they won't affect macroscopic changes in the material or our interactions with it near the critical point.

A representation is a map of a system onto mathematical variables. More correctly, a representation should be understood as a map of the set of possible states of a system onto the possible states of mathematical variables. A faithful representation must have the same number of states as the system it is representing. This enables the states of the representation to be mapped one to one to the states of the system. If a model has fewer states than the system, then it can't represent everything that is happening in the system. If a model has more states, then it is representing things that can't happen in the system. Conventional models often do not take this into account and this results in a mismatch of the system and the model; they are unfaithful representations and do not properly identify the behavior of the system, and thus ultimately its response to environmental forces or interventions we might consider. Because we are interested in influencing the system, we only want to know the distinctions that matter. We have to focus attention on those states that are *distinguishable* at a particular scale of observation.

To formalize these ideas for complex systems, it is useful to understand information as related to scale. We define the complexity profile as the amount of information necessary to represent a system as a function of scale. Information theory defines the amount of information in a message as the logarithm (base 2) of the number of possibilities of the message—the number of bits needed to represent the set of possible messages. Thus, the complexity profile is given by the number of possible states of the system at a particular scale. Typically, the finer the scale of inquiry about a system, the more information is needed to describe it (Fig. 3). The complexity at the finest scales is finite because of quantum uncertainty and is equal to a universal constant, 1/*k _{B}* ln(2), times the entropy for a system in equilibrium, where

A single real number has infinite possibilities in all of the infinite digits of its representation. Therefore it has the ability to represent an infinite amount of information. This would seem to indicate that we could use a single real number to represent a system. For example, a number that represents the density of a liquid has infinite information, but we know from studies of phase transitions that this single number isn't enough. Why doesn't this work? The problem is that the way the information is organized in scale in the real number does not correspond to the way it does in the system. A real number can represent the position of a point along one dimension. Let's say we start by knowing where the object is to a resolution of 1 unit of length. Increasing the resolution by a factor of two means we can distinguish which of the two possible segments that are 1/2 units of length it is in. Communicating this information requires a binary variable. For each 2 fold increase in resolution we have 2 additional possibilities to specify. The number of bits is the logarithm (base 2) of the scale of resolution. However, for a liquid at its critical point the number of bits increases differently with increasing resolution. As resolution increases we have to describe the fluctuations of density. The growth in the number of bits is more rapid than one bit per factor of two in resolution (see Fig. 4).

A sufficient representation, therefore, is one that has a set of possible states corresponding to the set of distinguishable states of the system at each level of resolution, down to the level we need to describe the properties we care about—the relevant parameters—and no further. When considering interventions that affect the large scale properties of the system, rather than accumulating details about the system, we should start with the largest scale pattern of behavior and add additional information only as needed. According to the complexity profile, each piece of information about a system has a size—the largest scale at which we can begin to detect that piece of information.

When we observe the largest scale behaviors of a system, we simplify the mathematical description of the system because there are fewer distinguishable states, and only a limited set of possible behaviors. This also means that systems that look different on a microscopic scale may not look different at the macroscopic scale, and their mathematical descriptions become the same.

An important example of this arose in the study of phase transitions using the new mathematics of renormalization group. The transition when boiling a liquid to a gas has the same properties as the one that occurs when a heating a magnet up to the point where it becomes non-magnetic (ferromagnet to paramagnetic transition). Magnets have local magnetizations that fluctuate and interact at a critical point just like local changes of density at the water to vapor critical point. The result is that these two seemingly different types of systems map mathematically onto each other.

As renormalization group was more widely applied, other instances were found of systems that have the same behavior even though they differ in detail, a concept that became referred to as *universality*. Still, while many systems have the same behavior, there are multiple distinct behaviors. Together this means that systems fall into classes of behaviors, leading to the term '*universality class*.' Since renormalization group focuses on how behaviors transform across scales leading to power laws, the value of the power law exponent became used as a signature of the universality class.

In a sense, the idea that many systems can be described by the same large scale behavior is used in traditional theory. Scientists use the normal distribution for many different biological and social systems. Any system having sufficiently independent components, satisfies the axioms of the central limit theorem, and therefore can be described by the normal distribution. When there are dependencies, the normal distribution no longer applies, but there are other behaviors that are characteristic of other kinds of dependencies. To study those behaviors, we have to determine the way different kinds of dependencies give rise to kinds of large scale behavior.

There are even more basic ways a common mathematical description of systems is used, e.g., point particle motion describes the motion of many distinct objects, and wave equations describe everything from music strings to water waves to light. Even though the specific systems are very different, the dependencies that give rise to their behaviors, and the behaviors themselves, are related mathematically.

How does universality work for complex systems? Unlike traditional renormalization group, we do not consider the limit of infinite size and power law exponents. Instead, the states of our representation must correspond to the states of the system at the scale of observation. Moreover, instead of describing the equilibrium energy, we describe dynamics and system response. The mathematical representation of one system at a particular scale may correspond to the behavior of other systems despite different underlying components. This is a general concept of universality (Fig. 5).

What are the cases where the thermodynamic limit does not serve to expose universality? An important example is pattern formation that results in spots and stripes, like those on predator and prey animals. This type of pattern formation was described by Alan Turing and are called Turing patterns (Fig. 5). They arise in many ways, for example from the reaction of diffusing chemical species. If we think about what happens with a very large pattern we see that at large enough scales, these patterns look only gray. Still, we can map these descriptions from system to system. The patterns represent universal classes of behavior. Microscopic changes only change the pattern to the extent that they change the relevant parameters of those patterns.

The adoption of Turing's ideas in biology for patterns on animal skins has been controversial precisely because the pattern dynamics does not capture microscopic mechanisms. This controversy misses the key point about universality. Universality should be intuitive as we don't need to describe the molecular processes to characterize the variation between patterns on species, or individual members of a species, or the dynamics of a pattern as it forms, and do not affect roles of these patterns in social and ecological interactions. This is similar to the ability to describe planetary motion without describing details of individual planet structure.

The study of universality enables us to identify classes of systems whose behaviors can be described the same way by a common mathematical model. This is the principle of universality that is formalized by renormalization group and generalized by multiscale information theory to the scientific study of complex systems.

**Figure 1:** Schematic diagram of a block (with a velocity at a particular moment, *v*) sliding down an inclined plane. The macroscopic motion subject to gravity and friction may be treated using Newton's laws of motion, while the microscopic behavior of the atoms may be treated using thermodynamics by considering the local oscillations of groups of atoms as random and independent (the probability that one group is in a particular state is independent of the state of another group); the statistical treatment of that movement leads to the determination of pressure and temperature of the block and the inclined plane.

**Figure 2:** The phase diagram of water. The line of transitions between liquid water and water vapor stops at the critical point (red dot). At that point the fluctuations between liquid-like and vapor-like densities extend across the system so that the system is not smooth (violating the assumptions of calculus) and averages are not well behaved (violating the assumptions of statistics). A new method that considers behaviors across scales, renormalization group, was developed to address this and similar questions.

**Figure 3:** The complexity profile is the amount of information that is required to describe a system as a function of the scale of description. Typically, larger scales require fewer details and therefore smaller amounts of information. The most important information about a system for informing action on that system is the behavior at the largest scale.

**Figure 4:** A single real number (x, top) has infinitely many digits, which increase the amount of information available at a rate that is two possibilities for every change of scale by a factor of two. Real numbers are not good representations of systems for which the amount of information grows differently with scale (y, bottom). The number of digits as a function of scale is characterized by the complexity profile Fig. 1.

**Figure 5:** When we focus on the largest scale, system behaviors map onto simplified models, each of which applies to a large set of possible systems with widely different microscopic details. Examples shown in this figure: the Gaussian distribution, wave motion, order to disorder transitions, Turing patterns, fluid flow described by Navier-Stokes equations, attractor dynamics. That only a few models capture the behavior of a wide range of systems underlies the idea of universality—systems are members of universality classes of behavior.