Last fall I read “A diagrammatic view of differential equations in physics” by Evan Patterson, Andrew Baas, Timothy Hosgood and James Fairbanks. They show that you can use diagrams to write down the sorts of equationson manifolds that physicists care about. In this post I’ll try to convey the main ideas of the paper to you so that: (1) I can get you excited about it and (2) I can refer back to these notes in the future.
So how does one think of an equation as a diagram?
The idea is pretty neat. First you’ve got to ask yourself: “what’s an equation?” Since an actually good answer to this questions can get rather philosophical and since I want to get to the point fast, I’ll wave my hands a bit: an equation is a way of relating “quantities” that are defined on some “space” of interest.
Then you’ve got to ask yourself: “what kind of mathematical structure can I use to impose relations on some quantities?”
If you’re thinking of graphs – the relational structure par excellence – then you’re not far off. They don’t really do the trick, but you’ll see in a second that diagrams (a.k.a. semantically-rich graphs) are exactly what we need.
(By the way, as an Italian who grew up in a culture where hand gestures can be very precise, I never really understood why “hand-waving” is considered another way of saying “sweeping details under the rug”.)
So having established the intutition behind where the diagrams come from, all that’s left is to figure out what we’ll mean by “space” and “physical quantities” defined on that space. This is summarized below.
“Space”
A manifold \(M\)
Physical quantities defined in “space”
Sheaves on \(M\)
Equations
Diagrams relating physical quantities on the space (i.e. diagrams into the category of sheaves on \(M\))
I like to get the big picture first, but that’s just my general preference, so don’t worry if you’re lost a this point: I’ll unpack things below. Let’s start with an example (it’s the same example that’s given in the Patterson, Baas Hosgood & Fairbanks paper).
Example: the Discrete Heat Equation as a Diagrammatic Equation
I’ll be talking about the heat equation as an example, so, just as a reminder, let’s recall it: we say that a function \(u \colon (T \subseteq \mathbb{R}, U \subseteq \mathbb{R}^n) \to \mathbb{R}\) is a solution to the heat equation (where \(T\) is the time domain and \(U\) is the spacial domain) if \[\frac{\partial u}{\partial t} = \Delta u\] where \(\Delta u\) denotes the Laplacian \[\Delta u = \sum_{1}^n \frac{\partial^2 u}{\partial x_i^2}\] in terms of the Cartesian coordinates \(x_1, \dots, x_n\).
Now, since the goal is to explain how to think of equations as diagrams and since I’m trying to follow the Diagramamtic Equations paper closely here (so that you can seemelessly transition into reading it, if you’re interested), I’ll focus on the discrete heat equation.
In the discrete case, rather than thinking of space as \(\mathbb{R}^n\), we’ll think of it as a graph \(G\) with vertices \(V\) (for instance we could take \(G = \mathbb{Z}^n\)). Just as in the case above, we’ll want a solution (a “quantity”) to the heat equation on the spacial domain \(G\) to be a function of the form \(u \colon \mathbb{N} \times V \to \mathbb{R}\) (note that we’re thinking of everything, including time as being discrete).
The discrete derivative with respect to time of such a function \(u\) is then given as the operator \(\partial \colon \mathbb{R}^{\mathbb{N} \times V} \to \mathbb{R}^{\mathbb{N} \times V}\) which takes any function \(u\) to the function \[\partial \: u \colon (n, x) \mapsto u(n+1, x) – u(n,x).\] One can similarly define the discrete Laplacian as an operator \(\Delta \colon \mathbb{R}^{\mathbb{N} \times V} \to \mathbb{R}^{\mathbb{N} \times V} \). I won’t give the whole definition here (you can just check out page 6 of the paper) since the point I’m trying to get accross is that the heat equation can be written as the following diagram in \(\mathsf{Vect}_{\mathbb{R}}\) (the category of \(\mathbb{R}\)eal vector spaces and linear maps)
To undertand why this is an equation, notice that you can pick out a single vector \(u \in \mathbb{R}^{\mathbb{N} \times V}\) as a generalized element i.e. a linear map \(\mathbb{R} \to \mathbb{R}^{\mathbb{N} \times V}\) which takes the basis vector of \(\mathbb{R}\) to \(u\) (the vector we were trying to pick out). Then the commutativity of the following diagram is precisely the statement of the heat equation!
Summarising what we’ve learnt so far, we have the following table.
Traditional view
Diagrammatic View
An equation \(e\)
A diagram \(d_e\) whose morphisms are the operators of \(e\).
A solution \(u\) to \(e\)
A cone over the diagram \(d_e\).
Diagrammatic Equations
If you get the general idea by now (please ask questions in the comments, if not!) then it’s time to do two things: (1) think of the continuous case and (2) figure out what kinds of properties we want the “operators” to have (and what category we ought to live in). I’ll give a very concise summary below of how to think of all of this. For that, though, you’ll need to know about sheaves.
Sheaves. I already explained sheaves in some previous posts (see §1, §2 and §3 if you want to learn a bunch about them) but, for what I’ll be speaking about in this post, we can stick with the more down-to-Earth notion of a sheaf on a topological space.
Df. Let \(X\) be a topological space and \(\mathsf{S}\) be a category. An \(S\)-sheaf on \(X\) is a contravariant functor \[\mathcal{F} \colon (\mathcal{O}X)^{op} \to \mathsf{S} \] from the poset of open sets of \(X\) to \(\mathsf{S}\) satisfying the sheaf condition; i.e. the requirement that, if you give me a collection \((U_i)_{i \in I}\) of opens in \(X\) whose union is an open \(U\), then \(\mathcal{F}U\) is the limit of the \(FU_i\) over the diagram induced by \(I\) (in other words \(\mathcal{F}\) takes colimits to limits).
I’m telling you about sheaves becasue we’ll use them to represent the physical quantities involved in our diagrammatic equations: these quantities will be described as sheaves on a fixed space-time domain represented by a manifold \(M\) with corners.
For this post, I’ll only care about \(\mathsf{Vect}_{\mathbb{R}}\)-valued sheaves such as the sheaf \(\mathfrak{X}_M\) of smooth vector fields on a manifold \(M\). Such sheaves (as do all sheaves for that matter) assemble into a category \(\mathsf{Sh}(M, \mathsf{Vect}_{\mathbb{R}})\) of \(\mathsf{Vect}_{\mathbb{R}}\)-valued sheaves on \(M\) (the morphisms are natural transformations). Following the notation in the Diagrammatic Equations paper, I’ll abbrevite \(\mathsf{Sh}(M, \mathsf{Vect}_{\mathbb{R}})\) as \(\mathsf{Sh}_\mathbb{R} M\).
Recall that in our earlier Heat Equation example the solution to the heat equation on \(\mathbb{R}^n\) was a function \[u \colon \mathbb{R} \times \mathbb{R}^n \to \mathbb{R}\] assigning quantities (heat, for instance) to space-time. The idea in Patterson, Baas Hosgood & Fairbanks’ paper is to replace \(\mathbb{R}^{T \times \mathbb{R}^n}\) by a sheaf on a fixed manifold \(M\) representing space-time. Indeed here’s how they put it:
“[p]hysical quantities are taken to be sections of sheaves on the spatial domain M or the space-time domain \(M \times [0, \infty)\). Loosely speaking, a sheaf on a topological space is a coherent family of quantities defined on open subsets that can be glued together from smaller subsets whenever they agree on the overlaps.“
We’ve already seen such a sheaf \(\mathfrak{X}_M\) (first on the list below) and the following screenshot has a few more physically-relevant examples they give in the paper.
Finally, with this view in mind, we can speak of a diagram \[e \colon J \to \mathsf{Sh}_{\mathbb{R}} M \] as a diagrammatic equation \(e\) on a manifold \(M\) whose solutions (as in the example above) are cones over \(e\).
Even though I’m only thinking of \(\mathsf{Vect}_{\mathbb{R}}\)-valued sheaves (note that things are presented in greater generality in the original paper), I still find this “definition” of a diagramamtic equation remarkably satisfying.
In a previous post, we discussed sieves, the sieve-y definition of Grothendieck topologies and a few exaples thereof. Today we’ll return to sheaves, but this time we do so armed with a better understanding of sites (the “places” in which to define sheaves).
As you might recall, the frist definition of a sheaf which we saw was that of a sheaf on the subgraph topology; namely a functor $$\mathcal{F}: \mathbf{Sub}G \to \mathbf{Set}$$ satisfying the condition that, whenever we are given an \(I\)-indexed mathching family of sections \((s_i \in \mathcal{F}G_i)_{i \in I}\) (i.e. satisfying the requirement that \(s_i|_{G_i \cap G_j} = s_j|_{G_i \cap G_j}\) for all \(i\) and \(j\)), then there is a unique section \(s \in \mathcal{F}\bigl(\bigcup_{i \in I }G_i\bigr)\) which restrict to each section in the family (i.e. such that \(s|_{G_i} = s_i\)).
Notice that, when one says that a functor is a sheaf, one must also specify with respect to which topology it is a sheaf. For instance, in the definition above, we have that \(\mathcal{F}\) is a sheaf with respect to the subobject topology on \(\mathbf{Sub}G\) (but it might fail to be a sheaf with respect to some other topology on \(\mathbf{Sub}G\). The key point is that, if we change the topology, then we are also changing what it means when we state that a family of sections is a matching family. So the first step forwards is to define the matching condition more generally.
Matching families
Recal that a sieve \(S\) on an object \(c \in C\) can be defined as a subfunctor of the representable $$y_c \colon C \to \mathbf{Set}^{C^{op}}$$ $$y_c \colon b \mapsto C(b, c).$$ You should think of \(S\) as the functor taking each \(b \in C\) to the subset set of all morphisms in \(C(b,c)\) which are in the sieve \(S\).
Def : Let \((C, K)\) be a site, \(S\) be a \(K\)-cover of an object \(c \in C\) and \(P: C^{op} \to \mathbf{Set}\) be a presheaf. Then a matching family of sections of \(P\) with respect to the cover \(S\) is a morphism of presheaves \(\chi \colon S \Rightarrow P\).
Let’s unpack this. Let \(a\) be an object of \(C\). The component of \(\chi\) at \(a\) is a function \(\chi_a \colon Sa \to Pa\) which “picks-out” a section in \(Pa\) for each element of \(Sa\) (think an “open” in \(S\)). The naturality of \(\chi\) ensures that every morphism \(f \colon a \to b\) gives rise to a commutative square which intuitively states, for any “open” \(\beta \in Sb\) in \(b\), that “if you restrict \(\beta\) to \(a\) (via \(Sf\)) and then consider its associated section \(\chi_a Sf(\beta)\), you’ll get the same answer as if you had first grabbed ahold of the section \(\chi_b(\beta)\) associated to \(\beta\) and then restricted that section to \(a\) (via \(Pf\))”. If you compare this to the definition of a matching family on the subgraph topology I gave earlier, you’ll see that the intuition carries over.
Now we’re ready to give a proper definition of a sheaf on a site.
Def: Let \(P \colon C^{op} \to \mathbf{Set}\) be a presheaf on a site \((C, K)\). Then we call \(P\) a sheaf with respect to \(K\) (or a \(K\)-sheaf) if $$\forall c \in C \text{ and } \\ \forall \text{ covering sieves } \bigl(\iota \colon S \Rightarrow y_c\bigr) \in Kc \text{ of } c \text{ and } \\ \forall \text{ matching families } \chi: S \Rightarrow P$$ there is a unique extension $$\mathcal{E} \colon y_c \Rightarrow P$$ making the following triangle commute.
Let’s process that definition one step at a time, starting with the quantifiers;. We’re just saying the usual thing: “\(P\) is a sheaf if there is a unique extension for every matching family \(\chi\) w.r.t. any cover \(S\) of any object \(c\)”. That’s the trivial part.
The harder part to unpack is understanding how the morphism of presheaves \(\mathcal{E}_\chi\) encodes the amalgamation property of sections of sheaves (it’s the usual big equalizer diagram.. see the first post in the series if you don’t remember what I’m talking about). This is not apparent “on the nose”. But it becomes clear once you invoke the Yoneda Lemma: since the natural transformations \(\mathsf{Nat}( y_c , P) \) from \(y_c\) to \(P\) are in one-to-one correspondence with the elements of \(Pc\), a morphism of presheaves \(\mathcal{E}: y_c \Rightarrow P\) is just a way of picking an element of \(Pc\) — i.e. a global section on \(c\). The fact that the triangle commutes is simply saying that the global section \(\mathcal{E}\) which we picked must restrict to the local sections picked by the matching family \(\chi\).
Anyway, today is a short post because I’m taking my Christmas spirit elsewhere. Ciao!
This post is in two parts. The first part consists of notes based on reading Daniel Rosiak’s book Sheaf Theory Through Examples; while the second part of this post (titled “Decompositions as topologies”) consists of new work straight from my notebook. As always the lines are blurred between learning something that is new to me and discovering something that is new to me and as well as others.
Notes on Sieves & Grothendieck Topologies
Last time I started explaining (\(\mathbf{Set}\)-valued) sheaves, coverages, sites and Grothendieck (pre-) topologies. Today I’ll talk about sieves and I’ll go deeper on sheaves, presheaves and Grothendieck topologies.
So here’s a reminder of what happened in the previous post in this series. Even though topology is all about open sets and continuity, Grothendieck realised that, from the point of view of defining sheaves, the only thing that is important is the notion of a covering. This is a notion that has nothing to do with the idea of an “open set” and is hence much easier to generalize.
The very rough idea is the following. You start with a category \(C\). Thinking of a cover of any object \(c \in C\) as a set of morphisms with codomain \(c\), you associate to each such object \(c \in C\) a set \(K(c)\) of “permissible coverings” of \(c\). Then you call \((C, K)\) a site(a “place in which you can define sheaves”) if \(K\) satisfies certain “niceness” conditions.
One concern I was having last time while I was reading Daniel’s book was that the assignment \(K\) of permissible converings was defined as a function rather than as a functor. Thankfully nature is healing: today \(K\) will indeed be a functor. But for this we need our notion of a cover to be a little nicer. We need sieves.
Def: A sieveon an object \(c \in C\) is a family \(S_c\) of morphisms with codomain \(c\) which is closed under pre-composition.
Think: its a sieve because, after picking “entry-points into \(c\)” of the form \(f: d \to c\), anything that can enter \(d\) (i.e. any morphism \(g: e \to d\)) can also enter \(c\) (i.e. as the composite \(fg: e \to d \to c\)).
The idea now is to work only with “nice” covers (i.e. sieves) and define a “permissible covers” functor \(K\) which assigns sets \(K(c)\) of sieves to any object \(c\). Before doing this, though, let’s give two other equivalent definition of a sieve.
Def [sieve – Yoneda definition]: A sieve \(S\) on an object \(c \in C\) is a subfunctor of the Yoneda embedding at \(c\) (i.e. the functor \(y_c = C(-, c)\)).
Here’s how you see that this agrees with the defintion above. Given a family of morphisms \(\mathcal{S} := \{f_i: c_i \to c \mid i \in I\}\), we can define a functor \(S: C^{op} \to \mathbf{Set}\) taking each object \(x\) to the set \(\{f:x \to c \mid f \in \mathcal{S}\}\) and each morphism \(g: x \to y\) to the functor \(S_g: S(y) \to S(x)\) given by \(S_g: (f \in S(y)) \mapsto (fg \in S(x))\). Conversely, given any subfunctor \(S\) of \(y_c\), we get a sieve on \(c\) (in the set theoretic sense of the first definition above) as the set \(\bigcup_{x \in c} S(x)\) (by construction this will be closed under precomposition, as desired).
Def [sieve – Slice definition]: A sieve \(S\) on an object \(c \in C\) is a subcategory of the slice \(C/c\).
Given these definitions, we can redefine a Grothendieck topology as a category \(C\) equipped with an assignment \(K\) of “permissible” coverings to each object satisfying three axioms (which I’ll define in a second): (1) maximality, (2) stability and (3) transitivity.
The second axiom – i.e. stability – is just a funny way of saying that \(K\) is itself in fact a presheaf by postcomposition $$K: C^{op} \to \mathbf{Set}$$ which takes each \(c \in C\) to a set of sieves of \(c\) and which contravariantly maps each morphism \(f : x \to y\) in \(C\) to the function $$K_f : (S \in Ky) \mapsto (\{g: x’ \to x \mid (fg: x’ \to x \to y) \in S\} \in Kx).$$ We can rewrite this \(K\) as a subfunctor of the functor $$\mathbf{Sieve} : C^{op} \to \mathbf{Set}$$ which takes each object \(c \in C\) to the collectionof all sieves on \(c\) (and which acts on arrows as we defined above).
I find this functorial perspective much easier to remember that the statement of the stability axiom, so I’ll adopt it when formulating the definition of a Grothendieck topology below.
Def: Let \(C\) be a category. We call a subfunctor \(K\) of \(\mathbf{Sieve} : C^{op} \to \mathbf{Set}\) a Grothendieck topology on \(C\) if it satifies the following two conditions:
[Identity/maximality] the maximal sieve is always permissible; i.e. \(C/c \in K(c)\) for all \(c \in C\)
[Transitivity] for any sieve \(R\) on some object \(c \in C\), if there is a permissible sieve \(S \in Kc\) on \(c\) such that for all \((\sigma: b \to c) \in S\) the sieve $$\sigma^*(R) := \{g: a \to b \mid (\sigma g: a \to b \to c) \in R\}$$ is permissible on \(b\) (i.e. \(\sigma^*(R) \in Kb\)), then \(R\) is itself permissible on \(c\) (i.e. \(R \in Kc\)).
Note: there is an equivalent “arrow” version of this definition which I won’t tell you about here (you can find in Daniel’s book or in Mac Lane & Moerdijk).
Thinking of a cover as a means of obtaining information about an object, the first axiom makes sense: perfect information should always be allowed. The second axiom has to do with the idea that “a cover of a cover better be itself a cover”. To see this, notice that it says that any sieve \(R\) on some object \(c\) must be a “permissible” (i.e. a “cover” for \(c\) – i.e. \(R \in Kc\)) if it “refines” some permissible cover \(S \in Kc\). Anyway, now we can define a site in the usual way.
Def: a site is pair \((C, K)\) consisting of a category \(C\) and a Grothendieck topology \(K\) on \(C\).
Examples
For my own sake, I’ll review four Grothendieck topologies which are covered in Daniel’s book.
The minimal topology on a category \(C\) is the the Grothendieck topology \(K\) which assigns to each \(c \in C\) precisely one covering sieve; namely the maximal sieve \(\{f \in \mathbf{Mor}(C) \mid \mathsf{codom}(f) = c\}\).
If \(C\) is a poset, the minimal topology simply amounts to assigning to each \(c \in C\) its downset \(\downarrow c\).
The maximal topology on a category \(C\) is the functor \(\mathbf{Sieve} : C^{op} \to \mathbf{Set}\) which we defined earlier.
The maximal topology looks wild to me. Taking \(C\) to be the category of graphs, the maximal topology would allow, for instance, the singleton set consisting of the uniqe morphism \(! : K_0 \to G\) to be a permissible cover of any graph \(G\).
The dense topology on a category \(C\) is the functor \(\mathsf{dense}: C^{op} \to \mathbf{Set}\) which takes each \(c \in C\) to the set of all dense sieves on \(c\) where a sieve \(S\) is dense if it satifies the following requirement: $$\forall f: b \to c \; \exists g: a \to b \text{ s.t. the composite } fg: a \to b \to c \in S.$$
For a finite poset \(P\) the notion of a dense set on some \(p \in P\) is easy to understand: it is any subset of \(\downarrow p\) which is a superset of the set of all those minimal elements of \(P\) which are in \(\downarrow p\). To see this, take some dense set \(S\) and notice that, if \(m\) is a minimal element which is at most \(p\), then there exists a \(q\) in \(S\) which is at most \(m\); but then \(q = m\) by the minimiality of \(m\). Conversely, writing \(M_p\) to denote the set of minimal elements which are at most \(p\), consider any set \(S\) with \(M_p \subseteq S \subseteq \downarrow p\). This set is dense since, for all \(p’ \leq p\) at least one of the elements of \(M_p\) will be at most \(p’\) and in \(S\). When a dense set is also a sieve, then it can be specified by any choice of elements in \(\downarrow p\) whose union contains all of \(M_p\). (Clearly the notion of a dense set is more interesting in infinite posets.)
For fun, let’s instantiate this example on the subobject poset \(\mathbf{Sub}G\) of some finite graph \(G\). Then, by the finiteness of \(G\) and the fact that \(\mathbf{Sub}G\) has a least element (namely the empty graph) we find that, for (G’ \subseteq G), any downset (or unions thereof) in \(\mathbf{Sub}G’\) is a permissible cover in \(\mathsf{dense}(G’)\).
The atomic topology on a category \(C\) specifies that a sieve on any \(c \in C\) is permissible whenever it is non-empty. As Daniel points out in his book, the maximality and transitivity axioms are trivially met; however, it is the functoriality of the Grothendieck topology that might fail: if \(C\) fails to have pullbacks, then, given a morphism \(b \to c\) it might not be obvious how to pull sieves on \(c\) back to sieves on \(b\). This can be resolved by requiring every cospan in \(C\) to have at least one cone (in particular posets which are not down-semi-lattices do not satisfy this property).
Some examples on graphs
An interesting example is to consiter the free category \(FG\) on a directed graph \(D\). What’s a sieve on \(FD\)? Well, pick a vertex \(x\) in \(D\) and recall that morphisms in \(FD\) are directed walks in \(D\). If we have a sieve \(S\) on \(x\), this corresponds to a union of maximal directed walks towards \(x\).
If \(D\) is strongly connected, then every sieve will necessarily be all of \(D\). To see this proceed by induction: it’s obvious on the 1-vertex graph. Now pick a vertex \(x\) in \(D\) and let \(S\) be a sieve on \(x\). By the inductive hypothesis we have that \(S\) contains all of \(D – y\) for some vertex \(y\) distinct from \(x\). Now consider any morphism \(f: y \to w\) for some \(w\) in \(D\). By the inductive hypothesis and since \(D\) is strongly connected, we have that there is a morphism from \(g: w \to x\) which is in the sieve and hence \(gf: y \to w \to x\) is in the sieve as well. Similarly, take any morphism \(f’: w’ \to y\) and notice that, since \(gf: y \to x\) (defined as above) is in the sieve, then \(f’gf\) is also in the sieve, as desired.
What’s nice about this is that it suggests that sieves on \(FG\) are related to the condensation of directed graphs. However, since the relationship doesn’t seem as clean as I’d like, I won’t sketch this further.
Finally let’s have a look at the subobject poset \(\mathbf{Sub}(G)\) of some fixed graph \(G\). We can assign a Grothendieck topology \(K\) to the category \(\mathbf{Sub}(G)\) as follows. For every subgraph \(G’\) of \(G\), define \(K(G’)\) to be the set of all sieves on \(G’\) in \(\mathbf{Sub}(G)\) (i.e. downward-closed collections of subgraphs of \(G’\)) whose colimit is \(G’\). This forms a Grothendieck topology because the colimit of the maximal sieve \(\downarrow G’\) is \(G’\) for each subgraph \(G’\) of \(G\) and because, if \(R\) is a downards-closed collection of subobjects of \(G’\) such that there is a cover \(S\) for \(G’\) satisfying the requirement that, for any morphism \(G” \subseteq G’\) in \(S\), the set \(R \cap \downarrow G”\) yields \(G”\) as a colimit, then clearly \(R\) yields \(G’\) as a colimit as well.
Decompositions as Topologies
Note: this section includes new work and departs from the note-taking theme of the rest of this post.
The topology on \(\mathbf{Sub}G\) I mentioned above, together with a similar topology on the poset \(\mathcal{O}X\) of opens of a topological space \(X\), are – from what I gather (and please correct me if I’m wrong) – the prototypical examples of Grothendieck topologies. But I find these constructions somewhat in conflict with my perspective on what the point of category theory is: I don’t want to define a topology for each graph, instead I’d like to equip the category of all graphs with a topology which I can then induce on a graph itself. I’m sure there are many ways of doing this, but I’ll focus on using structured decompositions (I have alterior motives, you see).
Throughout the rest of this section, fix some small category \(C\) which is adhesive(we need both assumptions). On any such category I’ll define a functor $$\mathsf{decomp}: C^{op} \to \mathbf{Set}$$ which takes each object \(c \in C\) to the set of all \(C\)-valued structured decompositions yielding \(c\); spelling this out we have $$\mathsf{decomp}: (c \in C) \mapsto \{d \in \mathcal{D}C \mid \mathrm{colim }\:d = c\}$$ where \(\mathcal{D}C\) denotes the category of \(C\)-valued structured decompoisitions. The arrow-component of \(\mathsf{decomp}\) is defined by “pointwise pullback”. Although this construction is a direct consequence of Lemma 5.10 and Theorem 5.8 in my paper with Jade Master and Zoltan Kocsis, I’ll still spend some time giving you the gist of it here. For each morphism \(f: b \to c\) in \(C\) we define a map $$\mathsf{decomp}\:f : \mathsf{decomp}\:c \to \mathsf{decomp}\:b$$ which takes each decomposition \(d_c\) which yields \(c\) to a decomposition \(d_b\) which yields \(b\). The way you go about producing such a \(d_b\) from \(d_c\) is by pointwise pullback with \(f: b \to c\); rather than explain this further I’ll refer you to the following screenshot of my paper (which you should check out if you want futher details).
To make the variables in the figure make sense with what I said above, replace \(\delta\) by \(f: b \to c\) and \(D\) by \(d\).
The point of all this is that we’ve defined a functor $$\mathsf{decomp}: C^{op} \to \mathbf{Set}$$ which we’ll now show is a Grothendieck topology on \(C\).
First of all notice that each structured decomposition gives us a sieve by just taking the colimit arrows and then completing these downwards (by precomposition). In turn, any sieve \(S\), gives rise to structured decompositions by sticking spans betwen the bases of the resulting multicospan defined by \(S\); the choice of spans can be done by taking products in the slice \(C / c\) (i.e. pullbacks in \(C\) … tangentially Matteo cappucci just tweeted about this yesterday, so say “ciao Matteo” and check that out). With this in mind, let’s check the two remianing axioms in turn.
Maximality. The maximal sieve \(M\) on any object \(c \in C\) clearly defines a decomposition \(m: F_{dag}K_{|C|} \to \mathbf{Sp}C\) which yields \(c\) as a colimit (note that we need \(C\) to be small to be able to define the – possibly infinite – compelte graph \(K_{|C|}\) as the shape of the decomposition).
(Parenthetical remark: to be able to apply this whole construction to sets or graphs, say, we need to work with their skeletons since otherwise \(\mathbf{FinSet}\) and \(\mathbf{FinGr}\) are too big… you might even say that they are large.)
Transitivity. Take any sieve \(R\) on some object \(c \in C\), and consider the situation where there is a structured decomposition \(d_c \in \mathsf{decomp}\:c\) yielding \(c\) whose sieve \(S\) defined by the colmit cocone arrows from \(d_c\) into \(c\) satisfies hte following requirement:
\(\mathbf{(\star)}\) for all \((\sigma: b \to c) \in S\) the sieve $$\sigma^*(R) := \{g: a \to b \mid (\sigma g: a \to b \to c) \in R\}$$ is permissible on \(b\) (i.e. there is a structured decomposition \(d_\sigma \in \mathsf{decomp}\:b\) whose colimit arrows form a basis for \(\sigma^*(R)\)).
Now our task is to show that \(R\) is itself permissible on \(c\); in other words we need to show that there is a structured decomposition \(\rho\) whose colimit is \(c\) and whose colimit arrows define \(R\). This, however, follows since diagrams of diagrams are themselves diagrams. To see this note that for each \((\sigma: b \to c) \in S\) we get a structured decompositon \(d_\sigma\) for \(b\) via \(\sigma^*(R)\) (as we noted above). This corresponds to replacing each bag \(dx\) of the structured decompostion \(d\) corresponding to the sieve \(S\) by a structured decomposition \(d_{\sigma_x}\) defined as I mentioned earlier from some \((\sigma_x : dx \to c) \in S\). But since
\(\mathsf{colim}\:d_{\sigma_x} = dx\) and
\(\mathsf{colim}\:d = c\) and
\(\mathbf{id}_c\) is terminal in \(C /c\),
we have that \(\mathsf{colim}\:\rho = c\), as desired. Thus we have shown the following theorem.
Theorem: For any small, adhesive category \(C\), the functor \(\mathsf{decomp}: C^{op} \to \mathbf{Set}\) is a Grothendieck topology on \(C\).
In a previous post I mentioned a very simple algorithm for solving decision problems encoded as sheaves on inputs that display some recursive structure. In this post I’ll talk about what goes wrong with that approach when we’re trying to compute on inputs that are presented in a recursive, tree-like way and I’ll also propose a route for circumnavigation. But first let’s review what I mean by “recursive structure”.
In this blog I’ve spoke a lot about structured decompositions. These are ways of building objects (or morphisms) in a category out of smaller and simpler parts. They can be thought of as generalized graphs: a structured decomposition consists of a collection of bags (the simple building blocks) and a collection of adhesions (spans with bags for feet) which act as generalized relations between the bags. Formally, given a graph \(G\) and a category \(C\) with pullbacks, a \(C\)-valued structured decomposition of shape \(G\) is a diagram \(d\) of the following form $$d: F_{dag}G \to \mathbf{Sp}C$$ where \(F_{dag}G\) is the free dagger category on \(G\), \(d\) is a dagger functor and \(\mathbf{Sp}C\) is the span category on \(C\).
Reïterating this for clarity, note that for any vertex \(v\) of \(G\), the image \(dv\) is called a bag while, for any edge \(e=xy\) in \(G\), the image \(de\) (which is a span with feet \(dx\) and \(dy\)) is called an adhesion.
There is a lot to be said about \(C\)-valued structured decompositions; for example they form a category – denoted \(\mathcal{D}C\) – and they can be used to recover many useful combinatorial invariants including tree-width, graph decomposition width, layered tree-width, co-tree-width and a refinement of \(\mathcal{H}\)-tree-width. However, for this post all you need to know is how they work for graphs.
A tame graph-valued structured decomposition (which, to avoid being verbose, I’ll simply refer to as a “structured decomposition” in the rest of this post) is a structured decomposition having graphs for bags and monic spans of graphs for adhesions. Here’s a picture of a tree-shaped structured decomposition \(d\) (right) of a graph \(G\) (left).
The graph \(G\) on the left is obtained as a colimit of the strucrtured decomposition on the right. Note that some spans are colored red and some are colored blue simply to make the spans easier to see. Each bag and adhesion set above is representing the induced subgraph on those vertices while each leg of the span is the obvious inclusion of graphs.
Sheaves on tree-shaped structured decompositions
Tree-shaped structured decompositions are among the simplest to compute upon. The tree-shape clearly displays the recursive structure of any graph that is obtained as a colimit of such a decomposition and this structure can be exploited algorithmically. Let’s see how to compute sheaves on such decompositions. (By the way, if you’re new to sheaves, I’ve collected some notes on sheaves in a previous post; you can find it here.)
Suppose we have a sheaf $$ \mathcal{F}: (\mathbf{Sub}G)^{op} \to \mathbf{FinSet} $$ which for concreteness you could think of as the \(n\)-coloring sheaf which takes each subgraph \(G’\) of \(G\) to the set \(\mathbf{Gr}(G’, K_n)\) of all homomorphisms from \(G’\) to \(K_n\) (note that I’m working with irreflexive graphs here).
We can decide whether this sheaf admits a global section (which in the coloring example above translates to deciding whether \(G\) is \(n\)-colorable) by using any tree-shaped structured decomposition $$d: F_{dag}T \to \mathbf{Sp}\mathbf{Gr}$$ as follows.
Algorithm 1. Fix any ordering \(e_1, \dots, e_m\) of the edges of \(T\). For each edge \(e=xy\) proceed as follows:
consider the associated adhesion
under the sheaf \(\mathcal{F)\) this gives us a cospan
take the pullback of this cospan and then replace \(\mathcal{F}x\) and \(\mathcal{F}y\) by the images of the pullback under the projections \(\rho_1\) and \(\rho_2\).
To see why this works for coloring, let’s first unpack what this algorithm does. It first computes the sets of all possible \(n\)-colorings of each bag, then, for each edge \(e=xy\) of the tree \(T\) it filters out of the bags at the feet of the adhesion corresponding to \(e\) in such a way that only the “good pairs” of colorings of \(dx\) and \(dy\) are kept (namely the pairs of colorings which agree on the vertices of \(de\); i.e. those that \(dx\) shares with \(dy\)).
This same arguement works for any sheaf \(\mathcal{F}\) on a tree-shaped decomposition. Let’s prove it.
Proposition: Let \(d: F_{dag}T \to \mathbf{SpGr}\) be a tree-shaped decomposition of width \(k = \max_{t \in t}|dt|\) of a graph \(X\) and let \(\mathcal{F}: (\mathbf{Sub}X)^{op} \to \mathbf{FinSet}\) be a sheaf. Then the algorithm above decides whether \(X\) has a global section (i.e. whether \(\mathcal{F}X\) is non-empty) in time $$\sum_{t \in T}|\mathcal{F}(dt)| \leq |T|\max_{t \in T}|\mathcal{F}(dt)| \leq |X|\max_{t \in T}|\mathcal{F}(dt)|.$$
Proof: the running time bound of the algorithm above is immediate while corectness can be easily proved by induction on \(|VT|\) as follows. If \(T\) has at most two nodes, then the algorithm is clearly correct since it ammounts to checking whether at most two sections of the sheaf agree. Otherwise consider some edge \(e = t_1t_2\) of \(T\) and let \(T_1\) (resp. \(T_2)\) be the subtree of the forest \(T – e\) which contains \(t_1\) (resp \(t_2\)). Now, for \(i \in {1,2}\), take any section \(\sigma_i\) on the subgraph of \(X\) which is the colimit of the decomposition $$ d_i: F_{dag}T_i \to F_{dag}T \to \mathbf{SpGr} $$ (where \( F_{dag}T_i \to F_{dag}T\) is the functor given by the obvious inclusion \(T_i \to T\)). By definition \(X\) has a global section if and only if there are two such sections \(\sigma_1\) and \(\sigma_2\) such that \(\sigma_1|_{de} = \sigma_2|_{de}\ ). But then this completes the proof by noticing that $$\sigma_i|{de} = \sigma_i|{dt_i}|_{de}.$$
Connectivity is playing a crucial role: the algorithm is correct because any edge of a tree \(T\) is a cut in \(T\). Indeed don’t be fooled into thinking that the proposition above can be naïvely extended to decompositions that are not tree-shaped.
So how does this go wrong for decompositions that aren’t tree-shaped?
The sheaf condition is one that needs to be checked in parallel. It says that, given a family of sections \((s_i \in \mathcal{F}G_i)_{i \in I}\) (think: “a collection of \(n\)-colorings, one for each subgraph \(G_i\) indexed by some family \(I\)), if they form a matching family (i.e. they agree on intersections), then there is a unique section \(s \in \bigcup_{i \in I}G_i\) which restricts to \(s_i\) on \(G_i\) for all \(i \in I\).
On a structured decomposition \(d: F_{dag}H \to \mathbf{Sp}\mathbf{Gr}\) (not necesarily tree-shaped) this means that, if you give me a family of sections \((s_h)_{h \in VH}\) – one for each bag of \(d\) – then I can determine if there is a global section on \(H\) by simply checking whether these sections all pairwise agree on each adhesion of \(d\). Notice that, if we know nothing of \(H\) a priori, this requires us to check these pairwise intersections simultaneously. This is different from the algorithm I wrote above: rather than checking if each section \(s_h\) can be extended locally with respect to at-least one section in each bag indexed by some node adjacent to \(h\), it says that we have to check whether each tuple \((s_{h_1}, \dots, s_{h_{|VH|}} )\) agrees, simlutaneously on intersections of bags.
To demonstrate this, consider the following (very simple) cycle-shaped structured decomposition of a 5-cycle.
Again pairs of edges which make an adhesion span are colored with the same color to make things easier to see.
Algorithm 1 (which I mentioned above) would yield the wrong answer because no edge of the decomposition shape (which, confusingly is also a cycle in this example) is a separator. It’s easy to see why: suppose you’re interested in 2-coloring, then you enumerate all 2-colorings of each bag (of which there are two for each bag), then you check for each span of the decomposition whether this coloring can be extended to neighboring bags and the algoithms answers “yes” if and only if there is no empty bag after this filtering phase. However this will incorrectly conclude that the 5-cycle is 2 colorable.
An algorithm for cyclic decompositions
One way of fixing the problem above is by wrapping our previous approach in the following routine:
Algorithm 2. Given a structured decomposition \(d: F_{dag}H \to \mathbf{SpGr}\) whose colimit is a graph \(X\) and given a sheaf \(\mathcal{F}: (\mathbf{Sub}X)^{op} \to \mathbf{FinSet}\), proceed as follows.
let \(S\) be a feedback-vertex-set for \(H\) (i.e. a set of vertices whose removal renders \(H\) a tree)
for each combination of sections \((\sigma_1, \dots, \sigma_{|S|}) \in \prod_{s \in S}\mathcal{F}ds\) run Algorithm 1 described above, but with \(\mathcal{F}ds_i\) replaced by \(\{\sigma_i\}\) for each \(s_i \in S\). If Algorithm 1 returns “yes”, then likewise return “yes”.
Return “no”.
Proposition: Algorithm 2 correctly decides whether \(\mathcal{F}\) has a global section on \(X\) in time \(\omega^{\mathrm{FVS}(H)}|X|\) where \(\omega = \max_{h \in H}|\mathcal{F}(dh)|\) and \(\mathrm{FVS}(H)\) denotes the feedback vertex number of \(H\).
The proof is by induction on the feedback vertex number of the decomposition shape \(H\). If \(\mathrm{FVS}(H) = 0\), then \(H\) is a tree so our earlier proposition establishes the base case. So assume the claim holds for shapes with feedback-vertex-number at most \(k-1\) and suppose \(h_1, \dots, h_k\) is feedback vertex set for \(H\). Now notice that Algorithm 2 amounts to: (1) fixing a section \(\sigma_k\) in \(\mathcal{F}dh_k\), (2) removing from \(\mathcal{F}dh\) any section which does not agree with \( \sigma_k \) for all neighbors \(h\) of \( h_{k} \) in \(H\) and (3) running Algorithm 1 on the resulting ammended set-valued structured decomposition. Given this recursive presentation, correctness is easily seen to follow by induction.
Notice that the algorithm above is stating that deciding a property expressed as a finite sheaf (i.e. one into \(\mathbf{FinSet}\) ) is in FPT under the joint parameterization of \(\mathcal{H}\)-width and the feedback vertex number of \(\mathcal{H}\) (where \(\mathcal{H}\) is a class of graphs).
Sheaves came up in my last post, so , since I’m quite new to sheaves, I figured it would be a good idea to learn some more about them. Naturally, I decided to share my notes with you.
The books I’m using to learn about sheaves are: (1) Sheaf Theory though Examples [Rosiak] and (2) Sheaves in Geometry and Logic [MacLane, Moerdijk]. I’m really enjoying both, so I recommend you give them a read, if you’re interested (maybe start with the first one).
Sheaves, I say, waving my hands profusely, have to do with passing from local information to global information. Daniel Rosiak has a bunch of great examples (which is unsurprising, since the word “examples” is even in the in the title of his book). The example I’ll use here is one we’ve already seen in my previous post: graph homomorphisms. Let’s unpack that.
Suppose \(f: G \to H\) is a graph homomorphism where \(G\) and \(H\) are finite simple graphs (“to keep things simple“). Now suppose we have a covering \((G_i)_{i \in I}\) of \(G\) (i.e. some \(I\)-indexed family of subgraphs of \(G\) whose union is \(G\)). Clearly we can restrict \(f\) to each subgraph \(G_i\) in the family; this yields a family of graph homorphisms \( (f_i := f|_{G_i})_{i \in I} \) which match-up on boundaries: $$\forall i,j \in I, \quad f_i|_{G_i \cap G_j} = f_j|_{G_i \cap G_j}.$$
Yeah, so what?Well, what if, conversely, I were to give you such a matching family of functions \( \mathcal{M} := (f_i: G_i \to H)_{i \in I} \) (i.e. one where \(f_i|_{G_i \cap G_j} = f_j|_{G_i \cap G_j}\)); what would you be able to deduce? You’d be able to deduce that this family \(\mathcal{M}\) uniquely determines a graph homomorphism $$f_{\mathcal{M}}: G \to H \text{ where } f_{\mathcal{M}}: (x \in G_i) \mapsto f_i(x).$$
This is saying that graph homomorphisms are global constructions which are uniquely reconstructable from matching families of their sections (i.e. restrictions).
In a sense, lovingly mocking topolgy, this is the primordial example of a sheaf. This post is my attempt to organize sheaves in my mind so that I have a clear picture of how: (1) one generalizes the notion of a sheaf, (2) what are the knobs we can tune in the definition and (3) how one generates examples — at least heruistically — of sheaves.
From my ever-less naïve perspective (thanks go to Daniel Rosiak for helping me clear up some confusion) , the birds-eyed, handwavy view is the following. In the general case, to define a sheaf one needs four kinds of data:
a presheaf \(\mathcal{F}: C^{op} \to D\) (i.e. a functor whose attitude seems to display a particular proclivity towards sheaf-y-ness)
a notion of covering,
what it means for “stuff in \(\mathcal{F}\)” (sections) to match-up given this notion of covering and
what the sheaf condition looks like given a matching family.
For the graph homomorphism example, the presheaf in question is that of continuous functions (a classic): $$\mathcal{F} : (\mathbf{Sub}G)^{op} \to \mathbf{Set} \text{ where } \mathcal{F} : (G’ {\hookrightarrow} G) \mapsto \{h \colon G’ \to H \mid h \text{ graph homo.}\}.$$
The interesting parts are points 2,3 and 4; namely: “coverings” and “matching families” and the “sheaf condition”.
Let’s start by abstracting the sheaf condition.
To that end let’s think of set-valued presheaves \(\mathcal{F}: (\mathcal{O}X)^{op} \to \mathbf{Set} \) on the poset of opens in a topological space \(X\) (notice that the graph example above is a special case). Let’s also take this as an opportunity to formally state the terminology:
we call \( \mathcal{F}U \) the set of sections of \(\mathcal{F}\),
naturally we call any \(s \in \mathcal{F}U\) a section of \(\mathcal{F}U\),
we say that a family \((s_i \in \mathcal{F}U_i)_{i \in I}\) is a matching family if it consists of pairwise matching/agreeing sections; i.e. such that \(s_i|_{U_i \cap U_j} = s_j|_{U_i \cap U_j}\).
If you’ve read my previous post, you’ll have noticed that one can indeed define presheaves or sheaves that are not \(\mathbf{Set}\)-valued. To try to keep things simpler, I’ll stick with \(mathbf{Set}\)-valued sheaves in this post.
By the way, John Baez gave me some sound typograhical advice this week: one should use bold for definitions and italic for emphasis. I’m going to try to shift to this convention from now on, but, since old habits die hard, please help me out by pointing out typographical inconsistnecies, if you find any. 🙂
Alright, so hopefully you agree that the definition of a sheaf that I gave before reeks with bad vibes. So what’s “off” in that definition.? Well, for starters we’re taking intersections of opens \(U_i \cap U_j\), yuck! The polite thing to do is to talk about spans like: \(U_i \leftarrow U_i \cap U_j \rightarrow U_j\).
Def: Fix a topological space \(X\) and and open \(U\) in \(X\). Consider any (I)-indexed family \(\mathcal{U} := (U_i)_{i \in I}\) of opens in \(X\) (for any such \(I\)) together with the inclusions \(U_i \hookleftarrow U_i \cap U_j \hookrightarrow U_j\). We say that \(\mathcal{U}\) is a covering of \(U\) if \(U\) is the colimit of \(\mathcal{U}\).
Def: a presheaf \(\mathcal{F}: (\mathcal{O}X)^{op} \to \mathbf{Set}\) is a sheaf of Sets if, for all coverings \(\mathcal{U}\) of \(X\), the following is an equalizer diagram.
The fact that \(e\) is an equalizer amounts to saying that you can reconstruct sections on \(U\) from sections on each one of the opens \(U_i\) in the covering \(\mathcal{U}\) of \(U\).
This is a general definition of the “collation” property which should work in any category. Unfortunetely we’re not done, since the notion of a covering doesn’t generalize immeditely (what should the analogue of \(\mathcal{O}X\) be?).
The following are some notes on Chapter 10 of Daniel’s book; if it’s your first time being “sheafy” and you’re confused, don’t be discouraged everything is explained very well in the book: I’m deliberately trying to condense these ideas all in one place for my own benefit.
So what makes a covering a covering? Well, you somehow recovered a bunch of data of some topological space \(X\) by pasting together opens of X. The opens of \(X\) are the elements of the poset \(\mathcal{O}X\) (these are subobjects of \(X\)). This is similar to the grah case where we let the opens of a graph \(G\) be objects of \(\mathbf{Sub}G\). In this setting, a covering of a graph \(G\) is just a diagram in the subcategory \(\mathbf{dom}: \mathbf{Sub}G \to \mathbf{Gr} \) whose colimit is \(G\) (… equivalently you could just take the diagram to be in \(\mathbf{Sub}G\)).
Observation: structured decompositions of a topological space and coverings of that space are the same thing. This makes me guess that the way to generalize the notion of a topology is to take structured decompositions is to first define an “opens” functor \(\mathbb{O}: C \to \mathbf{Cat}\) which is left adjoint and then define a covering of any \(c \in C\) to be \(\mathbb{O}(c)\)-valued structured decompositions which behave “nicely” when passed through the adjunction. Anyway, I’m writing this post as I read, so, brimming with eager anticipation, let’s find out together if this is how the story goes..
Gorthendieck’s idea was to let an “open” of an object \(c \in C\) to be any old mapping \(f: c’ \to c\). Then these would be collected into sets of the form $$S := \{f_i: c_i \to c \mid i \in I\}$$ called covering families and then these would in turn be collected into familes \(K\) of “permissible coverings” (I’m making this name up, I think); these are functions of the form $$K: (c \in C) \mapsto \{S, S’, S”, \dots\}.$$
Obviously, if you want this to be a useful notion, you can’t just call any such function \(K\) a covering; you want coverings to be nicely behaved. one way to make this precise is through the idea ofbasis of a Grothendieck topology (also known as a pretopology). Let’s define it.
Def: in a category \(C\) with pullbacks, we call such a “permissible coverings” function \(K\) a basis of Grothendieck topology if the following three conditions hold for all \(c \in C\):
for every isomorphism on \(c\) is a singleton covering; i.e. if \(f : c’ \to c\) is an iso, then \(\{f\} \in K(c)\)
if \(S \in K(c)\) is a covering family for \(c\) and \(g: b \to c\) a morphism in \(C\), then the pullback of \(S\) and \(g\) is a covering family for \(b\); i.e. $$\{g \times_{c} f \mid f \in S\} \in K(b);$$
if \(\{f_i : c_i \to c \mid i \in I\} \in K(c)\), then, whenever we are given \(\{g_{i,j}: b_{i,j} \to c_i | j \in J_i\} \in K(c_i)\) for all \(i \in I\), the composite covering family $$\{f_ig_{i,j}: b_{i,j} \to c_i \to c \mid i \in I, j \in J_i\} \in K(c).$$
The first axiom is very natural, it says, up to iso, that everything ought to cover itself. The third axiom is also very natural; it feels very monadic since it says that: “a cover of a cover is again a cover“. The second axiom is more interesting. I’m reading it as a generalization of the fact that \(\mathbf{Sub}(-): C \to \mathbf{Pos}\) is “functorial by pullback”.
By the way, this second condition looks like the definition of a pullback-absorbing functor which Zoltan, Jade and I give in our paper on structured decompositions.. this is not important, but I guess that it gives me a moment to remark on the discomfort I feel about the fact that \(K\) is a mere function instead of being a functor. If there are any sheaf-theorists out there who can tell me why it’s a bad idea to promote \(K\) to a functor, then leave a comment, I’d be very happy to hear about it!
Anyway, according to Grothendieck’s taste, requiring \(C\) to have pullbacks (needed for condition 2 above) was apparently still too restrictive. Since pullbacks only appear in the second condition and since the second condition is the odd one among the three, it makes sense to single it out. This yields the notion of a coverage (following Johnstone’s notation).
Def: a coverage on a category \(C\) is a function \(T\) taking objects of \(c\) to covering families (collections of morphisms with codomain \(c\)) such that, for every \(S \in T(c)\) and every arrow \(g: b \to c\) there is a covering family \(S’ \in T(b)\) such that \(g \circ S’\) factors through \(S\); spelling this out, we want the following to hold.
Apparently, you can throw the first and third conditions of a Grothendieck pretopology out of the window and stick with coverages to define site(i.e. a “place where you can define sheaves”) as a category equipped with a coverage .
Def: a site is a pair \((C, T)\) where \(C\) is a category and \(T\) a coverage.
Again I am confused: why not just define \(T\) to be a subfunctor of the Yoneda embedding \(y: (c \in C) \mapsto (C(-, c) : C^{op} \to \mathbf{Set})\)?
Anyway, sidestepping my unease around \(T\) being a function, the point of all this is that we can now define \(T\)-sheaves on a site \((C, T)\).
Def: let \(\mathcal{F}: C^{op} \to \mathbf{Set}\) be presheaf on a site \((C, T)\). We say that \(\mathcal{F}\) is a \(T\)-sheaf if the following holds for all objects \(c \in C\):
for all covers \(S := \{f_i: c_i \to c \mid i \in I\} \in T(c) \) in the coverage \(T\) and
for any family of sections \(s_i \in \mathcal{F}c_i )_{i \in I} \) (one for each element of the given cover) which is compatible w.r.t. \(S\) — i.e. such that given any commutative square we have that \(\mathcal{F}_g s_i = \mathcal{F}_h s_j\) (recall that \(\mathcal{F}\) is contravariant)
there is a unique section \(\sigma \in \mathcal{F}\) which restricts to each given section; i.e. \(\mathcal{F}_{f_i}\sigma = s_i\) for all \(i \in I\).
There is obviously more to be said here, but I’ll stop because I’m tired. Ciao!
P.S. looking ahead in Daniel’s book I think my issues with \(K\) being a function rather than a functor are fixed by sieves… more on that next time..
Last time we spoke a bit about dynamic programming; specifically about solving Weighted Independent Set on trees. You can read about it here. Today things will get a bit more serious. I’ll start by briefly explaining how the algorithm we sketched for trees extends to graphs of bounded tree-width and then I’ll speak about the thing that makes dynamic programming fast: namely the quotinenting of solution spaces. This last part is work in progress; feel free to share any feedback in the comments below.
Weighted independent set: from trees to graphs of bounded tree-width.
We already saw how to solve this problem on trees in the previous post in this series. The notation I chose for that post involved structured decompositions which, in case you need a reminder, look like this (see my paper with Zoltan Kocis and Jade Master for the real definition).
I will continue to handwave the definiton of structured decompositions for now since we’ll only use them to talk about tree decompositions. Don’t worry, I’ll try to make things as general as possible in the next post.
The algorithm I sketched last time for the Maximum Weight Independent set generalized from structured decompositions of trees to structured decompositions of arbitrary graphs. Crucially, the efficiency of the algorithm carries over too: the running time of the algorithm scales exponentially in the size of the largest bag of the decomposition, but it remains linear in the overall input size.
Notation: The maximum bag-size of a strucrtured decomposition of graphs is called the width of the decomposition. In graph theory tree-shaped decompositions are called tree decompositions and the smallest width attainable for constructing a given graph \( G\) with decomposition which is tree-shaped is called its tree-width. It is denoted \(\mathbf{tw}(G)\).
Our algorithm for Weighted Independent Set follows the dynamic programming leitmotif:
solve your problem on the bags (usually by brute force)
working bottom-up from the leaves of the tree-shaped decomposition, combine local solutions to get ever-more-global solutions until we find the overall global solution.
For such algorithms to be fast (polynomial time) on some class, we need two things to be true:
the class has to have bounded width
we need to quotient the combined solution spaces as we go.
To really understand this last point, let’s look at an example. As before suppose we’re solving Weighted Independent Set. And, for simplicity, suppose we’re solving it on an input grah that admits a path-shaped decomposition \( (P, d)\) of width \( k \) (for some \( k \in \mathbb{N}\)) as in the following example.
Estimating things very roughly, let’s say that any n-vertex graph has at most \( 2^n \) different independent sets. Then, if we were to naïvely solve the independent set problem using the decomposition above, how long would this take?
Well the last step of the compositional algorithm would take time $$2^{|d x_1 +_{d e_{1,2}} d x_2 +_{d e_{2,3}} d x_3 )|} \times 2^{|d x_4|} $$ (I’m dropping parentheses to make things readable… and because I prefer it this way anyway). This is obviously very bad since it’s pretty much exponenetial in the size of the entire input graph.
The key insight we noticed last time is that we don’t need to carry around the entire set \( I_{1,2,3} \) of all independent sets of $$ d(x_1) +_{d(e_{1,2})} d(x_2) +_{d(e_{2,3})} d(x_3). $$ We only need to remember what \( I_{1,2,3} \) looks like locally within \( d(x_3) \). This insight allows us to speed up the algorithm into one running in time \( \sum_{ 1 \leq i \leq 4} f(|d(x_i)|) \) for some exponential function \( f \) which I’m not bothering to estimate right now (from experience, I’d guess it’s probably something like \( f: |d(x_i)| \mapsto 2^{2||d(x_i) |} \), but I haven’t checked this..).
Notice that, if each bag has size bouded above by some constant \( k\), then the runing time of the “quotiented’ version is linear in the input’s overall size!
Another example — this time “by picture”
It’s good to have at least another example on hand. It helps build intutition. So let’s now consider the Longest Path problem.
Longest Path
Input: a graph G and an integer k.
Question: does G contain a simple path of length at least k?
How does one solve this problem compositionally?
Well, it’s a standard exercise in paramterized complexity textbooks; indeed it’s so common that it features as one of the first dynamic programming exercises in Flum and Grohe’s textbook.
I’ll sketch the intuitive idea pictorially. Consider the following diagram where we have a path-shaped structured decomposition consisting of three bags \( d(x_1)\), \( d(x_2)\) and \(d(x_3)\) (I’m leaving out the spans so things are easier to see).
Now how can we use the decomposition to solve the Longest Path problem? As John Baez puts it, there is a general trick for gaining intuition:
“as always in mathematics, if you’re trying to figure out how to get something, it’s always good to pretend you already have it and then see what it’s like [so that] you can figure out how to get it.”
Let’s do exactly that. Suppose we are given an optimum solution (a longest path) as drawn in green below.
What does this path look like locally? Well, it’s a disjoint collection of paths (as I’ve highlighted in the second bag of the decomposition below).
This suggets an algorithm:
enumerate all collections of disjoint paths in each bag (let’s call these local solutions);
recursively try to join local solutions into larger and larger solutions.
Unruprisingly this algorithm follows the leitmotif we already mentioned.
Now, if you’re getting the hang of how these algorithms go, then your first question should be: “how do we quotient the solution space here?” Well, consider the following diagram.
Figure 1.
We have three partial solutions: one in \( d(x_2)\) (drawn in black) and two in \( d(x_1)\) (drawn in green and violet respectively). Here’s the crucial observation: despite the fact that green and violet solutions are different, they link-up to the black solution in the same way. This is important because it means that any extension of the black+green solution (by means of some solution local to \( d(x_3) \) ) is also an extension to the black+violet solution! In other words, from the perspective of answering the Longest path problem the two solutions are almost the same: we can forget the shorter path of the pair (black+green, black+violet).
Try to keep this example in mind. It will be useful for the next section.
Quotienting solution spaces in general
Things will get slightly category-theoretic now, so apologies if you don’t yet speak category-theory (I suggest Emily Riehl‘s book if you want to learn).
Let’s model algorithmic problems \(\mathcal{F}\) on an input graph \( X\) as a poset-valued presehaves; i.e. as functors of the form \( \mathcal{F}: (\mathbf{Sub} X)^{op} \to \mathbf{Pos}\).
For example we could model Independent Set on the input \( X\) as the functor \( \mathcal{I}: (\mathbf{Sub} X) ^{op}\to \mathbf{Pos}\) taking each open (i.e. a subgraph of \( X\)) to the poset (ordered by inclusion) of all independent sets in \( X’\). On arrows, \( \mathcal{I}\) takes \( X’ \subseteq X”\) to the restriction map \( \mathcal{I}_{X’ \subseteq X”} : \mathcal{I} X” \to \mathcal{I} X’\) taking each independet set \( \iota \in \mathcal{I}X”\) to its restriction \( \iota|_{X’} \in \mathcal{I}X’\) to \( X’\) (this works because the property of being an independent set is closed under taking subgraphs).
It will be convenient to call an independent set \( \iota’\) an extension of \( \iota\) if \( \iota \subseteq \iota’\) is a morphism of a solution space poset \( \mathcal{I}(X’)\) which contains both of these independent sets (where \( X’\) is some subgraph of \( X\)).
We’ll need some notation. Let \( Y \subseteq Y’\) be any two subgraphs of \( X\); we wish to define a functor \( \mathrm{Ext}_Y^{Y’}\) which we’ll refer to as the \( (Y, Y’)\)-extension functor. This will be a contravariant functor \( \mathrm{Ext}_Y^{Y’}: \mathcal{I}Y^{op} \to \mathbf{Sub}\mathcal{I}Y’\) taking
each object \( \iota \in \mathcal{I}Y\) to the poset (ordered by inclusion) of all those \( \iota’\) in \( \mathcal{I}Y’\) which restrict to \( \iota\); i.e. \( \mathrm{Ext}_Y^{Y’}: \iota \mapsto (\{\iota’ \in Y’ | \mathcal{I}_{X’ \subseteq X”}(\iota’) = \iota\}, \subseteq)\)
and each morphism \( \iota \subseteq \iota’\) in \( \mathcal{I}Y\) to the restriction $$ \mathrm{Ext}_Y^{Y’}: (\iota \subseteq \iota’) \mapsto \bigl( \mathrm{Ext}_Y^{Y’}( \iota’ ) \to \mathrm{Ext}_Y^{Y’}(\iota) \bigr).$$ To see that this is well-defined, notice that, since \( \iota \subseteq \iota’\), then for all \( \iota”\), if \( \iota” \supseteq \iota’\), then \( \iota” \supseteq \iota\). (In particular this implies that \( \mathrm{Ext}_Y^{Y’}(\iota’)\) is a subposet of \( \mathrm{Ext}_Y^{Y’}(\iota)\).
With this notation, we’re ready to write down how I think “solution space quotienting” should go.
Solution space quotienting
The first two steps are obvious: we compute the solution spaces \( \mathcal{I}(dx)\), \( \mathcal{I}(dy)\), \( \mathcal{I}(de)\) and we use these to compute the joint solution space \( \mathcal{I}(dx +_{de} dy)\) as in the following diagram. (I’m glossing over how this computation is done, but, if you’re interested, then you can check out Section 6 of our paper where we show how to do this step for a fairly large class of problems.)
Now the quotienting starts. So first of all we compute the product
It’s the poset whose objects are pairs \( (\iota_1, \iota_2)\) of elements of the solution space \( \mathcal{I}(d_x)\) and the projections do the obvious thing; namely \( \pi_i: (\iota_1, \iota_2) \mapsto \iota_i\).
Then we take the equalizer of \( \mathrm{Ext}_{dx}^{dx+_{d_e}dy} \circ \pi_1\) and \( \mathrm{Ext}_{dx}^{dx+_{d_e}dy} \circ \pi_2\) as follows.
The poset \( E\) has underlying set $$ \{(\iota_1, \iota_2) \in \mathcal{I}(d_x) \times \mathcal{I}(d_x) | (\mathrm{Ext}_{dx}^{dx+_{d_e}dy} \circ \pi_1)(\iota_1) = (\mathrm{Ext}_{dx}^{dx+_{d_e}dy} \circ \pi_2)(\iota_2)\}$$ consisting of all those local solutions \( \iota_1 \text{ and } \iota_2 \) in $ \mathcal{I}(dx) $ which can be extended in exactly the same way in \( \mathcal{I}(d_y)\).
Compare this to Figure 1 (the one with the green, violet and black paths).
Now, to actually do the quotienting, we compute the coequalizer of \( \pi_1 \varepsilon\) and \( \pi_2 \varepsilon\) as follows.
What does \( P_{x,y}\) look like? It’s what you get when you quotient \( \mathcal{I}(dx)\) by the equivalence relation defined by \( E\); namely the equivalence relation that deems two elements \( \iota_1\) and \( \iota_2\) equivalent if they can be extended by exactly the same solutions in \( \mathcal{I}(dy)\) (i.e. if \( (\iota_1, \iota_2) \in E\)).
The final step is to use this quotient \( P_{xy}\) of \( \mathcal{I}(dx)\) to remove the “spurious” information from \( \mathcal{I}(dx +_{de} dy)\): namely we compute a subposet \( \mathcal{I}_{quot}(dx +_{de} dx)\) of \( \mathcal{I}(dx +_{de} dy)\) consisting of all those \( \iota \in \mathcal{I}(dx +_{de} dy)\) which yield elements in \( P_{xy}\) when restricted to \( dx\).
The idea is that we would then proceed with our computations (as in the naïve algorithm) but using \( \mathcal{I}_{quot}(dx +_{de} dx)\) instead of \( \mathcal{I}(dx +_{de} dy)\) in all future iteration of the algorithms.
So does it work?
Well, I haven’t stated the computational task yet, so the question isn’t a fair one.
Let’s try to see if we get the right kind of objects for a couple of problems.
Sheaves
Recall that I wrote earlier that we could think of our computational problems on graphs as presheaves $$ \mathcal{I}: (\mathbf{Sub}G)^{op} \to \mathbf{Pos}$$. A natural question to ask is whether the quotienting procedure I described above does the right thing when \( \mathcal{I}\) is a sheaf.
Recall that a presheaf \( \mathcal{I}: (\mathbf{Sub}G)^{op} \to \mathbf{Pos}\) is a sheaf if it satisfies the following sheaf condition.
Sheaf condition: suppose we are given a collection of subgraphs \( (G_j)_{j \in J}\) and a collection of sections (i.e. solutions) \( \bigl (s_j \in \mathcal{I}(G_j) \bigr )_{j \in J}\). If they pairwise agree (i.e. \( s_j|_{G_j \cap G_k} = s_k|_{G_j \cap G_k}\) for all \( j,k\) with \( j \neq k\)), then there is a unique section \( s \in \mathcal{I}( \mathrm{colim}_{j \in J}(G_j) )\) restricting to each \( s_j\) (i.e. \( s|_{G_j} = s_j\) for all \( j \in J\)).
So what happens if we apply our construction to \( \mathcal{I}\) when \( \mathcal{I}\) is a sheaf? Well we need to figure out what the equalizer looks like:
As before, it’s underlying set consists of all pairs \( (\iota_1, \iota_2) \in \mathcal{I}(d_x) \times \mathcal{I}(d_x)\) such that $$ (\mathrm{Ext}_{dx}^{dx+_{d_e}dy} \circ \pi_1)(\iota_1) = (\mathrm{Ext}_{dx}^{dx+_{d_e}dy} \circ \pi_2)(\iota_2).$$ Now, since \( \mathcal{I}\) is a sheaf, we immediately have that \( (\iota_1, \iota_2) \in E\) if and only if \( \iota_1|_{de} = \iota_2|_{de}\) (i.e. they agree on the boundary and hence are extendable by the same set of solutions in \( dy\)).
This means that the coequalizer \( (P_{xy}, \rho)\)
will quotient \( \mathcal{I}(dx)\) under the equivalence relation which stipulates that two elements \( \iota_1, \iota_2 \in \mathcal{I}(dx)\) are equivalent whenever they look the same in \( dx \cap dy = de\). In particular this implies that \( \mathcal{I}_{quot}(dx +_{de} dy)\) will be isomorphic to \( \mathcal{I}(dy)\). Thus we’ve just found that the quotinenting was able to “figure out” that solving a “sheaf problem” amounts to filtering partial solutions!
Note that, by what we’ve just observed, we have obtained linear FPT-time algorithms for any sheaf-problem parameterized by the maximum size of a bag in a tree-shaped structured decomposition on graphs.
In of itself, since we stated it only for graphs, the result above is nothing new: after all it is already known that such problems are in FPT parameterized by tree-width.
However, it is easy to see that our result should generalize to any category and sheaf-problem (provided appropriate obvious computability conditions are satisfied by the category and sheaf in question).
So what can we encode as sheaves? Well, for starters, we can encode our favourite problem so far, the Max Independent Set problem! Furthermore, we can also encode the H-coloring problem which is defined as follows.
H-coloring
Input: we’re given graphs G and H
Question: is there a homomorphism from G to H?
When your presheaf ain’t a sheaf
We’ve seen that sheaves are particularly nice problems. However, it seems unlikely to me (despite being a “sheaf-novice”) that all computational problems should yield a sheaf. Examples of presheaves which fail to yield sheaves under the subobject topology include the Max Planar Subgraph and Longest Path presheaves.
Since we already mentioned Longest Path in this post, let’s stick with it and forget about Max Planar Subgraph for now.
A natural question is to ask: does the quotienting idea I presented above yield something useful for the Longest Path presheaf?
For starters, let’s define this presheaf; I’ll call it $ \mathcal{L}: (\mathbf{Sub} G)^{op} \to \mathbf{Pos}.$$ It takes each subgraph of G to the set of all collections of disjoint paths in that subgraph (see the diagrams earlier inthis post if you want something pictorial to stare at). On arrows \( \mathcal{L}\) takes subgraphs \( G” \subseteq G’ \subseteq G\) of \( G\)to the obvious restriction of path-forests in \( G”\) to path-forests in \( G’\).
Now, if you work-out what the quotienting does, you’ll find that \( \mathcal{L}_{quot}(d{x_1} +_{de_{12}} d{x_2})\) sees the black+green and black+violet solutions below as equivalent.
Figure 1.
The only issue is that the chosen representative for the equivalence class containing black+green and black+violet need not be the longest one.
This seems like a relatively easy issue to fix, so I’ll leave it as future work for myself. What is more pressing is determining whether the resulting quotiented solution space is small enough for fast dynamic programming.
To that end, I’ll cheat a little and use combinatorics. How big can \( \mathcal{L}_{quot}(dx_1 +_{1,2} dx_2 +_{2,3} \dots +_{k-1,k} dx_k)\) be? Well, let’s ask how many non-isomorphic solutions there can be in the coequalizer \( P_{1,2,\dots,k}\). After staring at it a bit, you’ll find that \( |P_{1,2,\dots,k-1}| \leq 2^{|d_{k-1,k}| \choose 2} \) (where \( d_{k-1,k}\) is the base of the span from the (k-1)-th bag to the k-th bag). To see this notice that two solutions of \( \mathcal{L}(dx_1 +_{1,2} dx_2 +_{2,3} \dots +_{k-2,k-1} dx_{k-1})\) are identified in \( P_{1,2,\dots,k-1}\) when they contract to the same thing in \( d_{k-1,k}\). The number of these contractions is at most \( 2^{|d_{k-1,k}| \choose 2}\) (think of it as choosing to add an edge between a pair of vertices in \( d_{k-1,k}\) if contracting some path to the left — i.e. a path in \( P_{1,2,\dots,k-1}\) — identifies these vertices). This is great news since it means that our quotienting once again yields an FPT-time algorithm 🙂
Until Next time
A few notes on future work (ther are undoubtedly lots of directions I’ll forget to mention).
In the next post I’ll try to generalize the quotienting approach I’ve sketched here and lift it from graphs to other categories.
There is also some work to be done on how to ensure that we choose a “best” representative of the quotiented solution space.
The efficiency of the algorithm for sheaves was easy to show; however, for the Longest Path example, I resorted to some simple combinatorial estimation. Although it does the jobin this instance, this appraoch does not generalize, so more thought is needed on what properties can be imposed on the presheaves that define our problems that allow us to deduce upper-bound estimates (in terms of monomorphisms of solution spaces) on the size of the quotinented solution space.
There are a lot of \((-)^{op}\)-s floating around. This is entirely because \( \mathrm{Ext}\) is contravariant. It would be nice to avoid this somehow..
If you are perplexed or have any ideas/thoughts/reflections, please let me know in the comments (or otherwise).
Many real-world problems can be formulated and modeled in the language of graph theory. However, real-world networks are often not static. They change over time. These structures are called dynamic or temporal graphs. Here’s an example of temporal star with four leaves; each one of its edges is active twice except for which is active three times.
Temporal graphs have recently become a very active area of research. However, despite having generated many interesting graph- and complexity-theoretic results, nobody seems to have contemplated the question of what a morphism of temporal graphs should be! Let’s think about it together.
Getting to know the characters
I encountered temporal graphs for the first time when Kitty Meeks (she was my PhD supervisor back in Glasgow) showed me an open problem in a paper by Akrida, Mertzios and Spirakis.
The problem had to do with the following algorithmic task called StarExp(k).
We’re given a star graph with leaves and an integer . The edges of the star appear and dissappear over time and each edge is labeled with a list of length at most indicating all the times at which it is active. The algorithmic task is to determine whether there is a temporal walk which starts and ends at the center of the star and which visits every leaf.
It sounds easy, right? After all, in the static case it’s a non-question: the answer is alwasy yes since stars are connected.
So how hard is it?
Akrida, Mertzios and Spirakis showed that you can solve it in polynomial time for at most three. They also showed that the problem is NP-hard for at least six. The cases were left open, but Kitty and I filled-in the gap and showed that they too are NP-hard (by the way, you can read about how we obtained this complexity dichitomy here).
When Kitty first showed me this paper , I was surprised by just how much harder problems get when passing from the static setting to the temporal one. I didn’t realize this at the time, but this is a well known phenomenon.
The challenge
Coming from parameterized complexity theory, the first question I ask when confronted with an NP-hard problem is: “are there any classes of inputs for which the problem becomes tractable? And if so, what do they look like structurally?”
Since the input graphs to the StarExp(k) problem are stars, restricting the graph-theoretic structure of the problem isn’t going to help; stars are pretty much as simple as it gets. So the hardness is coming entirely from the scheduling problem encoded by the temporal data.
In general this seems to be part of a theme for temporal graphs: restricting only the temporal structure or only the graph-theoretic structure is often not enough to achieve tractability.
This observation got me interested in ways of measuring how “structurally complex” a temporal graph is. The problem is that, to be able to speak about structure in some context, you need to know what isomorphism and homomorphism mean in that context; unfortunately this seems to be a question that people haven’t studied yet… so let’s do it!
Morphisms of temporal graphs
Let’s stick to a simple model and assume that our temporal graphs are temporally discrete in the sense that vertices and edges may appear or disappear only in discrete time-steps.
The naïve model:
A first thought would be to model a temporal graph as a functor from the poset of natural numbers to the category of graphs and their homomorphisms. Let’s call this the naïve model of temporal graphs.
In the naïve model, the natural notion of morphism is a natural transformation as in the drawing below.
Notice that vertices and edges can’t really dissappear in this model. Since any temporal graph is a functor with the poset of naturals as its domain, the n-th snapshot of of must have a homomorphism to all of the graphs in the sequence that follow it for all n. This means that, although vertices might merge (because of some non-injective homomorphism at some point in the sequence), you can’t really model a “flashing” vertex which, for example, appears at every even time and disappears at every odd time.
These observations mean that the objects of the naïve model don’t match-up with the temporal graphs I’m trying to study, so I’m ready to move on. However, just to hammer home what’s weird about this model, let’s ask: “how would I write down a temporal path in the naïve model?”
Well, we’d like a temporal path to be a temporal graph whose underlying static graph is a path. In the naïve setting this could be modeled as follows.
But then above fails to be isomorphic to the following temporal graph which should also be a perfectly good candidate as a temporal path (albeit one where nothing changes between times 1 and 2).
Modifications of the naïve model
The sequence model
A trivial model which I’m mentioning only for completeness is the sequence model in which views temporal graphs as discrete diagrams i.e. simply sequences of graphs. To me this feels more related to graph limits than temporal graphs. Quindi non soffermiamoci.
The growth and decay models
Two other obvious modifications to the the naïve model are the subcategories of whose objects consit of functors whose morphism maps are either only injective (growth) or surjective (decay). They’re interesting, so I thought they deserved a mention, but they’re not what we’re after, so let’s move on.
The filtered model
What if, rather than defining temporal graphs as functors , we defined them to be functors of the form for some subcategory of ?
This perspective would allow vertices and edges to actually appear and dissappear. But the filtered model is too broad since this definition allows to be any countable poset.
The persistence model
Another possible perspective on temporal graphs – one that Zoltan Kocsis and I chatted about a long while ago — is to see them as -shaped diagams in the category having graphs as objects and (isomorphism classes of) spans as morphisms (which are composed by pullback).
Here’s what a temporal graph would look like in this perspective.
If we’re reading a span as telling us what happens to the parts of each snapshot which persist from one snapshot to the next, then what does composition in encode?
Well suppose we are told that a vertex appears both at some time and also at some other time . This means that there is some span which witnesses this fact. But then, since compositions is done by pullback and is thin, we find that the vertex must be contained in all snapshots with .
This observation means that this model – which I’ll call the persistence model – suffers from the same “no flashing” issue as the naïve model: births and deaths of vertices or edges can each occur at most once.
However, before we move on further, let’s consider a few notions of morphism.
Morphisms of the persistence model
Viewing as a category (rather than a double category), then it’s “natural” think of morphisms (no pun intended) of temporal graphs as natural transformations. But, if we did this, we’d encounter the same problem as before (in the naïve model) of two temporal paths not being isomorphic even when we’d like them to be.
One idea to overcome this would be to view morphism of temporal graphs as a pair consisting of a functor and a natural transformation as in the following diagram.
This is slightly better; for example, the following diagram would be a valid morphism.
Above we chose:
to to be the mapping which takes to if and otherwise and
to have all identity components.
Unfortunately, even though we would like the morphism of temporal graphs that I drew above to be an isomorphism, it is not (since isn’t).
I hear the distant calls of double categories… but that will have to wait until next time.
I’m fascinated by compositionality, the study of how the whole can be determined by its parts.
Category theorists will tell you that this topic sits entirely within their domain… and they’re not wrong, but fundamentally I think that most of mathematics has to do with some sort of compositionality.
One particular flavor of compositionality that interests me is that found in discrete mathematics; particualrly in graph theory and algorithms. This post will be the first in a series on compositional structure in graph theory and complexity theory.
Algorithms on recursive structure
Compositionality in complexity theory usually consists of exploiting some compositional structure of the input data. Trees are the prototypical example of recursive structure and, unsurprisingly, algorithms that compute on trees are usually compositional.
The rest of this post will use the pretext of solving the Weighted Independet Set problem on trees to set the scene for what is to come in the future.
The task is to find a vertex-subset of some input vertex-weighted tree which has maximum weight such that that no two vertices of are adjacent in .
The compositional structure of trees comes in handy: trees can be built from smaller trees by glueing along shared copies of (the 1-vertex complete graph; I’ll abbreviate it simply as ““). In category theory this is called a pushout; I’ll adopt this notation here: pick two vertices of two trees and — by choosing homomorphisms — then the pushout of this diagram is the tree obtained by “gluing” the image of in to its image in .
Anyway, back to the weighted independet set. How does the compositional structure of trees help us solve our problem?
Well, suppose the input is given as the gluing along some “root’ vertex and suppose further that we already computed all of the independent sets in and — let’s call these and –, then we observe that we can compute all of the independent sets in by matching-up the independent sets of with those of . Explicitly, given any two independent sets and in and , they can be combined into an independent set for iff is an independent set in the closed neighborhood of (the vertex we glued the two trees along).
Now, even if you perhaps find this neat, you’re probably asking yourself two questions: (1) “why are we enumerating all independent sets? doesn’t this make the algorithm very slow?” and (2) “why can’t we just keep track of the biggest independent set on each side?” .. well, I’m glad you asked!
The second question has an easy answer which you can figure out for yourself by staring at this picture:
If we only kept track of the best solution on the RHS and LHS, then we might never find the global optimum (1002 in the example above). This also provides a partial answer to the first question: we need to keep track of all solutions in the LHS and RHS simply because we prefer correct algorithms over incorrect ones.
Now the question about speed is a good one. The algorithm above is very slow. Let’s improve it!
Notice that, when we composed the partial solutions and , we only needed to check that their union gave us an independent set locally (i.e. on the closed neighborhood of ) in order to deduce that we could compose them. This suggests that we’re carying around way too much information: we don’t need to compute all of and ; we simply need to know what the members of and look like locally (i.e. in ).
We could go ahead and turn this into a fast algorithm now, but let’s wait and instead take some time to set-up some better notation.
Interlude on Structured decompositions
The tool we’ll need is called a structured decomposition, a general way of decomposing stuff into smaller constituent parts that Zoltan A. Kocsis, Jade Edenstar Master and I came up with. Roughly you should think of them as generalized graphs: rather than consisting of a set of vertices and an edge relation on this set, a structured decomposition is a collection of bagsand a generalized relation on this collection. Here I’ll keep it simple, so I’ll avoid the general setting and I’ll only give the definition for graphs.
Fix the category of graphs and graph homomorphisms. A structured decomposition of graphs (which I’ll simply refer to as a “decomposition” in this post) is any diagram in that looks like this:
The idea is that you have some shape (a graph; in the example above it’s the tree with vertices w,x,y,z) to which you associate objects of the category ; i.e. graphs. Each vertex of the shape is labeled with a graph; for example think of the graph as the label of the vertex . Each edge of the shape is labeled by a span (a generalized relation) of graphs: for example the edge is labeled by the span consisting of two graph homomorphisms and .
So where are we heading with these decompositions?
The point of the very informal definition I just gave, which was achieved by sweeping lots of details under the rug, is that you should think of a structured decomposition as a “recipe” for building graphs. If you come along and give me a structured decomposition of shape , then I can build a graph out of this recipie: for each edge of , I glue the bag to the bag along the image of . In other words I take a bunch of pushouts. Here’s an example of a graph (left hand side) built as a gluing (colimit) of a tree-shaped structured decomposition (right hand side):
A faster algorithm for Weighted Independent Set
Let’s use the syntax of structured decompositions to write down a faster algorithm for Weighted Independent Set.
First of all note that any tree can be obtained as a gluing of structured decomposition whose bags are two-vertex complete graphs; here’s an example.
So here’s the algorithm (sketch).
Suppose we are given a tree-shaped decompostion of the input tree . Pick any edge . This edge partitions into two trees and (containing and respectively) and in so doing it also partitions into two trees: and .
Let be the set of all pairs where is an independent set in and is the maximum weight of an independent set in whose intersection with is . Let be defined similarly, but for and .
Now we can compose and similarly to how we did before:
for each and , conclude that there is an independent set of weight in whenever is an independent set in the graph .
It’s not too hard to prove correctness, so I won’t do it here, but if you get stuck, have a look at any parameterized complexity book (for example Downey and Fellows Page 191).
The cool part is that this algorithm is much faster than the one I mentioned before. To see this, note that, since each bag of the decompostion always has at most two elements, the sets and have at most two elements (since there are at most two distinct independent sets on a connected graph with at most two vertices). Thus the entire algorithm requires time at most .
That’s cool, but where is it heading?
It might seem a bit baroque of me to whip-out structured decompositions to solve such a this simple algorithmic problem on trees. But the point is that the algorithm above generalizes to other graph classes as well; in particular it generalizes to any graph class which has bounded tree-width. I’d love to tell you more about this, but it’ll have to wait for another post.
Until next time, a puzzle..
Earlier I noted that all trees can be built out of tree-shaped structured decompositions whose bags are either or . So here’s a quiz:
what graphs can be built out of tree-shaped structured decompositions whose bags are complete graphs?