mong a variety of fundamental themes running through Stan Ulam's mathematical research, A one that particularly intrigued him was that of similarity. He was constantly fascinated by the problem of quantifying exactly how alike (or different) two mathematical objects or structures were, and during his career he discovered many ingenious ways of doing so. A good example is the well-known Ulam distance between finite sequences, which has recently been applied so effectively in analysis of DNA sequences and recognition of speech (Sankoff and Kruskal 1983). (Also see "Sequence Analysis: Contributions by Ulam to Molecular Biology" in this issue.) Here I will describe another measure of similarity suggested by Stan, one applicable to a wide assortment of combina A Similarity Measure for Graphs reflections on a theme of Ulam by Ronald L. Graham 12 8 10 2 5 8 6 12 8 12 torial structures. Like many seeds planted by his fertile imagination, this similarity measure has taken root and flowered in the modern mathematical jungle. The story begins one morning in late July of 1977, during one of my aperiodic visits to Stan and Françoise's marvelous house on the outskirts of Santa Fe. Stan and I had just finished playing tennis, which not only generated a plentiful supply of perspiration (and consequent thirst) but also inevitably led to a lively discussion of the differences in the game at an altitude of over 7000 feet, where the balls are effectively more highly pressurized, the air resistance is diminished, less oxygen is available for demanding lungs, and so on. Perhaps stimulated by trying to get a better grasp on understanding just how various aspects of the game (such as the 11 1 1 EXAMPLES OF GRAPHS Fig. 1. Shown here are the pictorial and mathematical representations of two simple graphs. G1 has the maximum number of edges for a given number n of vertices (namely, (2), or n(n − 1)). G2 is an example of a graph consisting of disjoint subgraphs. (For simplicity in this and all other figures, the edges of graphs are depicted as straight lines.) ISOMORPHISM OF GRAPHS Fig. 2. Let T be the following one-to-one trans d2, C1 12. formation of V1 onto V2: a1 → az, b1 G2, then G2 G and G, serve, the stroke, and the strategy) might change under varying conditions, Stan suddenly suggested, "Why not measure the difference between objects by trying to break them up into as few as possible pairwise equal pieces?" At first I didn't see quite what Stan was driving at (which happened fairly often), but after we talked it over, it became clear that here was an entirely new way of defining a measure of similarity between two (or more) combinatorial structures. In fact, it is very much akin to comparing two complex molecules by breaking them up into a number of pairwise identical fragmentsthe smaller the number of pieces needed, the more similar are the molecules. Our first application of the approach was to a class of mathematical objects known as graphs. Simply speaking, a graph G consists of a set V of elements called the vertices of G and a set E of certain pairs of elements of V called the edges of G. Graphs are often pictured by representing the vertices in V as points and the edges as lines between the pairs of points in E (Fig. 1). Before proceeding to the main topic of the article, we need two more basic definitions-those for isomorphism of graphs and for a partition of the edge set of a graph. titions {E,...,E,} and {E,..., E'} of the respective edge sets of G and G' such that, as graphs, E; E for 1 ≤ i ≤r. Such a decomposition always exists since we can always choose each E; (and E) to be a single edge (and by hypothesis e(G) = e(G')). Further, define U (G, G') to be the least value of r for which an Ulam decomposition of G and G' into r parts exists. (Figure 4 shows such minimum Ulam decompositions for two pairs of graphs.) Thus U (G, G') is a similarity measure for pairs of graphs: the smaller U (G, G') is, the more alike G and G' are. In particular, U (G, G') = 1 exactly when G and G' are isomorphic (that is, when they differ only in the way their vertices are labeled). So in U(n) we have a measure of the maximum dissimilarity among all pairs of graphs in In,e. A fundamental question about U(n) that occurs at the outset is this: How large can U (n) ever be? To whet your appetite for the answer to this question, let us determine U (G, G') for two examples. (U (n) is of course equal to or greater than U(G, G').) In the first example G is a k-rayed "star" (that is, it consists of one vertex joined to each of k others), and G' consists of k disjoint edges (Fig. 6a). Here e = k and n = 2k. For such a pair of graphs, the only Ulam decomposition comes from taking each E, and E¦ to be a single edge. Therefore U (n) > U (G, G') = k = = n. In the second, slightly more sophisticated example G consists of a 3k-rayed star, and G' consists of k disjoint trian 3k + 1 2 e=k, n = 2k 2 1 Minimum Ulam Decomposition 1 3 2k 1 1 2k - 1 H H 2 4 2k 2 4 2k 3k - 2 G' e= 3k, n = 3k+1 U (n) > U(G, G') = 2k = 2 3k Fig. 6. By considering pairs of graphs of the general form shown in (a), U(n) is found to be equal to or greater than n, whereas by considering pairs of graphs of the general form shown in (b), U(n) is found to be greater than (n - 1), which is greater than 1⁄2n (for n > 4). gles (Fig. 6b). Here e = 3k and n = 3k + 1. What is U (G, G') for such pairs of graphs? It is not difficult to see that the best we can do is to decompose each graph into k disjoint 2-rayed stars and k disjoint edges. Thus This situation might lead one to conclude that U(n) has no upper bound other than (2), the greatest possible number of edges possessed by any graph with n vertices. However, as demonstrated in the text, U(n) has an upper bound that is linear in n. At this point one may well wonder whether further search will produce even more complicated examples from which even larger lower bounds on U (n) can be deduced. That this cannot happen is the content of Theorem 1, which was the main result in our first paper on the (n − 1) ≈ 1) ≈n. subject (Chung, Erdös, Graham, Ulam, 1235 (115 and Yao 1979). Theorem 1. For a suitable fixed constant c, U(n) ≤ n +c for all n. Our proof of Theorem 1 uses several ideas that are now standard items in the toolbox of every combinatorialist. One is the idea of a greedy algorithm. It seems only natural to try to remove the largest subgraph common to each of the two graphs for which one is seeking a minimum Ulam decomposition (although in many situations that myopic approach is far from optimal). Indeed, such a technique is quite effective for the problem at hand. However, it leads to the next question: Just how large can we expect (or guarantee) such a common subgraph to be? Here the second technique we want to mention comes in, namely, the so-called probabilistic method, which was pioneered so effectively by Paul Erdös. Suppose G and G' each have n vertices and e edges. What we will show is that they must share a common subgraph H having at least 2e2/n(n − 1) edges. However, we won't be able to specify what H is or how to get it-just that it exists! How do we do this? Every mathematical paper should have at least one proof, so here comes ours. = Label the vertices of G and G' by, say, V = {x1,...,xn} and V' = {x,...,x}. Let A denote the set of one-to-one mappings of V onto V'. Thus, A has n! elements. If y {x¡,x;} and y' = {x},x}} are given elements in V and V', respectively, there are exactly 2(n-2)! elements AEA that map y onto y'. (The factor of 2 counts the two possibilities λ(x) = x', X(x;) = x; and X(x;) = x{, X(x;) = x;.) Define the indicator function ix(y, y'): ix(x,y) = {o otherwise. if maps y onto y'; Now sum ix(y, y') over all A E A and all ye E, y' E': s = ΣΣix(y,y') XEA y,y' y,y' 2e2(n − 2)!. In the first step we have interchanged the order of summation, and in the second we have used the previously noted fact about the number of λ € ▲ that map any given y EE onto any given y' € E'. Now we note that since S is a sum of n! terms of the form yixy, y') (one for each Є A), at least one of those terms must equal or exceed their average, which of course is just 2e2(n-2)!/n!, or 2e2/n(n − 1). In other words, for some AE A, say λo, we have Having proved that the two graphs G and G' have a common subgraph H with at least 2e2/n(n − 1) edges, suppose now that we remove H from G and G', producing the graphs G1 and G, which have at most e1 = e− 2e2/n(n − 1) edges. Our theorem says that those graphs also have in common a subgraph H1 with at least 2e2/n(n 1) edges. Remove H1 to produce G2 and G2, and so on. It is not hard to show that after repeating the process k times, we have graphs G and G with at most n(n 1)/2k edges. That in turn can be used to show that U (n) < √2n. To squeeze the last bit of juice out of the argument and show that U (n) ≤ n + c requires more complicated considerations that we will not go into here. Iwond was only natural that we began to wonder next about what happens if instead of starting with two graphs, we start with three (or more). Indeed, defining U(G1, G2, G3) as the minimum value. of r for which an Ulam decomposition of G1, G2, and G3 exists and U3(n) as maxG,G.GET, U (G1, G2, G3), we soon |