Create a Module Network: Identify Regulatory Networks from Expression Data
A module network is a probabilistic model, based on probabilistic graphical models and Bayesian networks, for identifying regulatory modules from gene expression data. The procedure identifies modules of co-regulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form 'regulator X regulates module Y under conditions W'. We applied this method to construct a regulatory network underlying the response of yeast to stress.
Step 1: Load expression data
The first step is loading the expression data for which you want to construct a module network. Details on how to load expression and of Genomica file formats for expression data are given here. In this tutorial we assume that you load the module network sample expression data. Other expression data are available here.
Step 2: Create the module network
You are now ready to create the module network. Choose Algorithms → Learn module network. The dialog box should look similar to the following:
Move to the Regulation panel, where you can control various parameters of the learned modules' regulation programs. The default parameters are suitable for many applications. For now, however, since the sample file we use is small, we will use greater lookahead and allow smaller experiment partitions. Change Lookahead depth to 1 and Min experiments per context to 3. The dialog box should look similar to the following:
To create the module network, press the Run button. The module network is now ready.
Step 3: Displaying the module network results
Having learned a module network, there are several ways to view the results. First, you can obtain a global 'birds eye' view of the results by selecting the birds eye view from the Tab panels. After enlarging the image using the 'Pixel Size' controls in the left control panel, the birdseye view should look similar to:
This birdseye view shows the modules as horizontal strips (in this sample case, there are two such strips and thus two modules), and for each module, its arrays are shown sorted by the regulation program with each split in the tree shown by separate blocks separated by yellow lines (the color and thickness of these block boundary lines can be controlled in by the 'Border' properties in the left control panel). You can also view the entire structure of the module network tree that was learned by selectinig the 'Tree' tab from the main Tab panel. After expanding all levels of the tree, it should look similar to:
Each node in this tree represents a split. By selecting a node in the tree, you can view a particular module. However, for large files with many modules, this tree may become quite large and difficult to navigate. Thus, the preferred mode for examining a specific module along with its regulation program, is to go back to the birdseye view and select a module that seems interesting, by clicking on its expression profile. For example, going back to the birdseye view above, you can select the leftmost yellow bordered block, and then go to the Cluster view. The resulting cluster view should look similar to:
Testing Module Network Robustness
After learning a module network, it is natural to wonder whether any other regulators might have fit the data with similar likelihood. To test the robustness of the regulators of any modules in a learned network, choose Algorithms → Test regulation program robustness. The dialog box should look similar to the following:
To run the test, press the Run button. The result is a grid like this:
In this example, the results verify that Genes 1 and 2 are significant regulators for the module represented by Cluster 5. For each module, the robustness test randomly resamples K arrays from the original K arrays. (There are nine arrays, or experiments, in this example.) The resampling is done with replacement, so the resampled data set will duplicate the data for some arrays. Genomica learns a regulation program for this data set and then repeats the sampling and learning. (There are 100 trials in this example.) Tallies are kept of how often each regulator appears in a learned regulation program. The more often a regulator is used, the more essential the regulator can be considered.