Los Alamos Science

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][graphic][merged small][merged small][subsumed][merged small][subsumed][subsumed][subsumed][subsumed][subsumed][subsumed][merged small]

mong a variety of fundamental themes running through Stan Ulam's mathematical research,

one that particularly intrigued him was that of similarity. He was constantly fascinated by the problem of quantifying exactly how alike (or different) two mathematical objects or structures were, and during his career he discovered many ingenious ways of doing so. A good example is the well-known Ulam distance between finite sequences, which has recently been applied so effectively in analysis of DNA sequences and recognition of speech (Sankoff and Kruskal 1983). (Also see "Sequence Analysis: Contributions by Ulam to Molecular Biology" in this issue.)

Here I will describe another measure of similarity suggested by Stan, one applicable to a wide assortment of combina

A Similarity Measure for Graphs

reflections on a theme
of Ulam

by Ronald L. Graham

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small]

torial structures. Like many seeds planted by his fertile imagination, this similarity measure has taken root and flowered in the modern mathematical jungle.

The story begins one morning in late July of 1977, during one of my aperiodic visits to Stan and Françoise's marvelous house on the outskirts of Santa Fe. Stan and I had just finished playing tennis, which not only generated a plentiful supply of perspiration (and consequent thirst) but also inevitably led to a lively discussion of the differences in the game at an altitude of over 7000 feet, where the balls are effectively more highly pressurized, the air resistance is diminished, less oxygen is available for demanding lungs, and so on.

Perhaps stimulated by trying to get a better grasp on understanding just how various aspects of the game (such as the

[ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][ocr errors][ocr errors][merged small][merged small][merged small][ocr errors][merged small][ocr errors][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small]

serve, the stroke, and the strategy) might change under varying conditions, Stan suddenly suggested, "Why not measure the difference between objects by trying to break them up into as few as possible pairwise equal pieces?" At first I didn't see quite what Stan was driving at (which happened fairly often), but after we talked it over, it became clear that here was an entirely new way of defining a measure of similarity between two (or more) combinatorial structures. In fact, it is very much akin to comparing two complex molecules by breaking them up into a number of pairwise identical fragmentsthe smaller the number of pieces needed,

the more similar are the molecules.

Our first application of the approach was to a class of mathematical objects known as graphs. Simply speaking, a graph G consists of a set V of elements called the vertices of G and a set E of certain pairs of elements of V called the edges of G. Graphs are often pictured by representing the vertices in V as points and the edges as lines between the pairs of points in E (Fig. 1).

Before proceeding to the main topic of the article, we need two more basic definitions-those for isomorphism of graphs and for a partition of the edge set of a graph.

Two graphs G1, and G2 are said to be isomorphic (G1 G2) if, as shown in Fig. 2, a one-to-one transformation of V1 onto V2 effects a one-to-one transformation of E1 onto E2.

By a partition of the edge set E of a graph G is meant a set of pairwise disjoint subsets E of E such that U, E; = E (Fig. 3). (The number of ways to partition an edge set depends, in a complicated way, on the number of edges of the graph, e(G).)

We now come to the key definitions. Let G and G' be two graphs having the same number of edges. An Ulam decomposition of G and G' is a pair of par

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][ocr errors][merged small][merged small][ocr errors][merged small][ocr errors][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

titions {E,...,E,} and {E,..., E'} of the respective edge sets of G and G' such that, as graphs, E; E for 1 ≤ i ≤r. Such a decomposition always exists since we can always choose each E; (and E) to be a single edge (and by hypothesis e(G) = e(G')). Further, define U (G, G') to be the least value of r for which an Ulam decomposition of G and G' into r parts exists. (Figure 4 shows such minimum Ulam decompositions for two pairs of graphs.) Thus U (G, G') is a similarity measure for pairs of graphs: the smaller U (G, G') is, the more alike G and G' are. In particular, U (G, G') = 1 exactly when G and G' are isomorphic (that is, when they differ only in the way their vertices are labeled).

We now extend our view from a single pair of graphs with the same number of edges to sets of graphs Пne with e edges and at most n vertices (Fig. 5) and define the function U (n):

So in U(n) we have a measure of the maximum dissimilarity among all pairs of graphs in In,e

A fundamental question about U(n) that occurs at the outset is this: How large can U (n) ever be? To whet your appetite for the answer to this question, let us determine U (G, G') for two examples. (U (n) is of course equal to or greater than U(G, G').) In the first example G is a k-rayed "star" (that is, it consists of one vertex joined to each of k others), and G' consists of k disjoint edges (Fig. 6a). Here e = k and n = 2k. For such a pair of graphs, the only Ulam decomposition comes from taking each E; and E to be a single edge. Therefore

and Yao 1979).

Theorem 1. For a suitable fixed constant c, U(n) n+c for all n.

Our proof of Theorem 1 uses several ideas that are now standard items in the toolbox of every combinatorialist. One is the idea of a greedy algorithm. seems only natural to try to remove the largest subgraph common to each of the two graphs for which one is seeking a minimum Ulam decomposition (although in many situations that myopic approach is far from optimal). Indeed, such a technique is quite effective for the problem at hand. However, it leads to the next question: Just how large can we expect (or guarantee) such a common subgraph to be? Here the second technique we want to mention comes in, namely, the so-called probabilistic method, which was pioneered so effectively by Paul Erdös. Suppose G and G' each have n vertices and e edges. What we will show is that they must share a common subgraph H having at least 2e2/n(n − 1) edges. However, we won't be able to specify what H is or how to get it—just that it exists! How do we do this? Every mathematical paper should have at least one proof, so here comes ours.

Label the vertices of G and G' by, say, V = {x1,...,x} and V' = {x},...,x;}. Let A denote the set of one-to-one mappings of V onto V'. Thus, A has n! elements. If y = {x,,x;} and y' = {x',x}} are given elements in V and V', respectively, there are exactly 2(n −2)! elements AEA that map y onto y'. (The factor of 2 counts the two possibilities \(x) = x', λ(x;) = x; and X(x;) = x{, X(x;) = xk.) Define the indicator function ix(y,y'):

ix(x,y) = {o otherwise. 1 if maps y onto y'; Now sum ixy, y') over all A € A and all y ЄE, y' Є E':

In the first step we have interchanged the order of summation, and in the second we have used the previously noted fact about the number of A E A that map any given y EE onto any given y' E'.

Now we note that since S is a sum of n! terms of the form yixy, y') (one for each Є A), at least one of those terms must equal or exceed their average, which of course is just 2e2(n-2)!/n!, or 2e2/n(n − 1). In other words, for some AE A, say λo, we have

2e2 n(n − 1)

Having proved that the two graphs G and G' have a common subgraph H with at least 2e2/n(n − 1) edges, suppose now that we remove H from G and G', producing the graphs G1 and G, which have at most e1 = e− 2e2/n(n − 1) edges. Our theorem says that those graphs also have in common a subgraph H1 with at least 2e2/n(n 1) edges. Remove H1 to produce G2 and G2, and so on. It is not hard to show that after repeating the process k times, we have graphs G and G with at most n(n - 1)/2k edges. That in turn can be used to show that U (n) < √2n. To squeeze the last bit of juice out of the argument and show that U (n) ≤ } n + c requires more complicated considerations that we will not go into here.

Here, however, something completely unexpected happened. We had been guessing what the coefficient of n was going to be in the bound for U4(n) (why not ?) and more generally for Uk (n) (could it be?). We were quite unprepared for the following result, which was finally proved with the full arsenal of techniques we were rapidly accumulating (Chung, Erdös, and Graham 1981):

Theorem 3. For each k > 3, there is a fixed constant ck such that Uk(n) ≤ n+ck for all n.

In other words, the constant factor of

that appears in the bound for Uз(n) does not increase for values of k greater than 3. It is as though the space of nvertex graphs is in some sense "threedimensional," and once you have three graphs that are maximally separated, then adding further graphs can cause no real additional trouble. In fact, the most striking result we were finally able to establish dealt with trying to decompose all graphs

« Previous Continue »

Books