Graphical Models continued

In my previous post about graphical models, the basic idea behind graphical models were discussed. The question now,  how does probability theory and graph theory interact in graphical models?

In Graph 1, above,  we have a very simple graphical model.  X1, X2 and X3 are random variables, representing events. Each node in the graph will be associated with such a random variable, and nodes are connected with a directed or undirected edge. The edges between the nodes indicate that there is a connection between the events. From the definition of a random variable, each Xi therefore has a number of realisations (observations of a random variable), each with a different probability. The joint distribution of the 3 variables is given, in this example, by

The probability distribution of X1 may be given by

Using this probability distribution, we find that

Note that there are some properties and rules of probability that all probability distributions have to obey by, and thus also the functions, which may otherwise be arbitrarily chosen. In statistics there are two types of random variables: discrete (set of realisations is countable) and continuous (set of realisations is uncountable), but the theory that will be covered will remain true for both types, and thus we will only deal with the discrete case. 

For graphical models to be useful, there must be some independencies between the variables. For instance, imagine that X1 is my grandfather’s genes, X2 my father’s genes and X3 my genes. Considering these, we know that my genes will depend on my father’s genes, which in turn is dependent on my grandfather’s genes. If we have no certainty as to my father’s genes, then to have any indication of my genes, we’ll need to know what my grandfather’s genes are. However, if we know my father’s genes, knowing my grandfather’s genes will be of no value, simplistically. So, my genes are conditionally independent of my grandfather’s genes, with that condition being that we know my father’s genes. When looking at the graphical model for this scenario, it is as in Graph 1. The edges between the nodes indicate the dependencies, and the arrow-heads indicate the direction of the dependence. This type of graphical model is known as a Bayesian Network (BN), since the edges are directed. BNs must be acyclic and are also known as directed acyclic graphs. There is another type of graphical model known as Markov Random Fields (MRF), in which the edges are not directed [the arrowheads fall away], and although the dependence is still indicated, there is not as much information as in a BN. However, MRFs have their own advantages, and there are probability distributions that can only be represented by MRFs and some that can only be represented by BN.  The two types can be converted to each other, with BN to MRF being more popular because it is easier.  The moral graph of a BN is the MRF that it will convert to, where after the probability distributions still need to be allocated.

The detail surrounding the interaction between probability theory and graph theory  are  lengthy discussions, if all factors are taken into consideration.  Thus, a quick summary is given.

For BN, the factor

will result in a graph having the connection Thus, in BN, the heads of the arrows point to the variable that is conditioned on (the variable who’s probability we want to find, given the other variables).

For MRFs, the factor will result in the resultant graph having the connection

Thus, in MRF, edges are drawn between variables that are found together in a factor.

There are three basic canonical graphs which graphical models are built up of. The associated joint probability distributions are

These distributions illustrate the three scenarios of conditional independencies – the one thing that is necessary in graphical models, seeing as if there are no independencies / conditional independencies, then using graphical models has no advantage over any other modelling technique. To see exactly why the variables are conditionally independent, one would have to go into detail discussing d-separation for BNs and graph separation for MRFs. The definitions of these techniques are:

d-separation:
If A, B and C represent groups  of nodes, then the nodes in group  A and group B are said to be conditionally independent of each other if, for every path between A and B,

  • there is a node in C that is on the path between A and B where the arrows meet either head-to-tail or tail-to-tail, or
  • there is a node that is not in C and none of this node’s descendants are in C that is on the path between A and B where the arrows meet head-to-head.

Note that the definitions of the different connections will be given in the next section.

graph separation:

If a group of nodes C separates the group of nodes in A and group in B, then the nodes in A and B are conditionally independent, given C.

The graph for the BN associated with

is

with X2 having a head-to-tail connection, and for the MRF

Thus, from the definitions, X1 and X3 are conditionally independent, given X2. This can be seen from the joint distribution as well.

The graph for the BN associated with

is

with X2 having a tail-to-tail connection, and for the MRF

Therefore, X1 and X3 are conditionally independent, given X2. Again, this can be seen from the associated joint distribution.

The last case to consider is when X2 has a head-to-head connection, in a BN.
The graph of the BN associated with

is

and for the MRF is


From these graphs, it is clear that in the BN, X1 and X3 are conditionally independent, given X2. However, this is not the case in the MRF, seeing as all the nodes are connected. It is this difference that causes information to be lost when converting from a BN to a MRF.

Although the probability theory was not covered in depth, a feel of how the probability theory and graph theory works together in a graphical model was introduced.

Related Posts

  1. An Introduction to Graphical Models

Categories: Development

blog comments powered by Disqus