Signal flow analysis¶

This brief tutorial will guide you to start utilizing SFA. We wrote this tutorial assuming users already have the overall knowledge about the original journal paper.

Creating algorithm object¶

sfa.AlgorithmSet deals with creating and managing the algorithm objects in SFA. Thus, we first need to create an sfa.AlgorithmSet object.

>>> import sfa
>>> algs = sfa.AlgorithmSet()

Now, we can create algorithm objects with sfa.AlgorithmSet. Create the Signal Propagation (SP) algorithm, which is designated by its abbreviation, SP.

>>> alg = algs.create('SP')
>>> alg.abbr
'SP'
>>> alg
SignalPropagation object

As sfa.AlgorithmSet has the functionality of a dictionary, we can also access the created algorithms using the abbreviations as keys.

>>> algs['SP']
SignalPropagation object

Setting hyperparameter values¶

Algorithms in SFA have hyperparameters that adjust and constrain the behavior of the algorithms. ParameterSet, a nested object defined in sfa.Algorithm, has member variables that contain the information about the various hyperparameters. The below shows examples of the parameters.

>>> alg.params
<sfa.algorithms.sp.SignalPropagation.ParameterSet at 0x25b5a7e5550>
>>> alg.params.alpha
0.5

We can see the default value of alpha is 0.5. alpha is a hyperparameter that controls the proportion of signal flow in determining the next system state, \(x(t+1)\), in the following formula.

\[ x(t+1) = \alpha W x(t) + (1-\alpha) b \]

Thus, 0.5 means the algorithm reflects the effects of signal flow on half of the estimation of \(x(t+1)\).

We can easily change the value of alpha by assigning a real value between 0 and 1.

>>> alg.params.alpha = 0.5
>>> alg.params.alpha = 0.9
>>> alg.params.alpha
0.9

Another hyperparameter is apply_weight_norm, which designates whether to use link weight normalization. The default value is False, but it is recommended to set it to True.

>>> alg.params.apply_weight_norm
False
>>> alg.params.apply_weight_norm = True

Refer to the documentation for more details about the other hyperparameters.

Creating data object¶

Creating and handling data objects in SFA are similar to those of algorithms. A data object is also designated by its abbreviation, as in the algorithm. For example, the datasets for Borisov et al. can be created using BORISOV_2009 as follows.

>>> ds = sfa.DataSet()
>>> mdata = ds.create('BORISOV_2009')
>>> mdata  # Multiple datasets.
{'120m_AUC_EGF=0.001+I=0.1': BorisovData object,
 '120m_AUC_EGF=0.001+I=1': BorisovData object,
 '120m_AUC_EGF=0.001+I=10': BorisovData object,
...

The above mdata or ds['BORISOV_2009'] is a dict that contains multiple dataset objects with different conditions. For example, 120m_AUC_EGF=0.001+I=0.1 denotes the dataset was created by performing a simulation under the stimulation of 0.001 M EGF and 0.1 M insulin using the original ODE model, where the activity of a biomolecule was calculated by estimating the area under the curve (AUC) of the time profile.

We can select a dataset object by using the abbreviation.

>>> data = mdata['120m_AUC_EGF=0.001+I=0.1']
>>> data.abbr
'120m_AUC_EGF=0.001+I=0.1'

We can also consider a utility function in SFA, sfa.get_avalue, which arbitrarily selects a dataset object from the dictionary.

>>> data = sfa.get_avalue(mdata)
>>> data.abbr
'120m_AUC_EGF=0.001+I=0.1'

Actually, sfa.get_avalue returns the first item by applying the next() built-in function to a given dict object.

Accessing the members of a data object¶

The data object (instantiated with a subclass of sfa.Data) has various data structures that are required for using sfa.Algorithm. For example, an sfa.Data object has the information about network topology in A (adjacency matrix in NumPy's ndarray), dg (NetworkX's DiGraph), and n2i (dict for mapping names to the indices of A).

>>> data.n2i  # Name to index mapper.
{'AKT': 0,
 'EGF': 1,
 'EGFR': 2,
 'ERK': 3,
 'GAB1': 4,
 'GAB1_SHP2': 5,
 'GAB1_pSHP2': 6,
 'GS': 7,
 'I': 8,
 'IR': 9,
 'IRS': 10,
 'IRS_SHP2': 11,
 'MEK': 12,
 'PDK1': 13,
 'PI3K': 14,
 'PIP3': 15,
 'RAF': 16,
 'RAS': 17,
 'RasGAP': 18,
 'SFK': 19,
 'SHC': 20,
 'mTOR': 21}
>>> data.A[data.n2i['ERK'], data.n2i['MEK']]  # MEK -> ERK
1
>>> data.A[data.n2i['GAB1'], data.n2i['ERK']]  # ERK -| GAB1
-1
>>> data.A[data.n2i['ERK'], data.n2i['EGFR']]  # No link between EGFR and ERK.
0
>>> for src, trg, attr in data.dg.edges(data=True):
...     if attr['SIGN'] > 0:
...         print('%s -> %s' % (src, trg))
...     elif attr['SIGN'] < 0:
...         print('%s -| %s' % (src, trg))
...
AKT -> mTOR
AKT -| RAF
EGF -> EGFR
...

Analyzing data with an algorithm¶

To make sfa.Algorithm work with sfa.Data, we should first assign the data object to the algorithm object.

>>> alg.params.alpha = 0.5
>>> alg.params.apply_weight_norm = True
>>> alg.data = data  # Assign the data object to the algorithm.
>>> alg.initialize()  # Initialize the algorithm object.

In the initialization of the algorithm (calling sfa.Algorithm.initialize), the algorithm prepares estimating signal flow by performing some necessary tasks such as link weight normalization.

>>> data.A[data.n2i['GAB1'], data.n2i['EGFR']]
1
>>> alg.W[data.n2i['GAB1'], data.n2i['EGFR']]
0.1889822365046136

Note that the element of the weight matrix is different from that of the adjacency matrix.

One of the important tasks is to determine the values of the basal activity before analyzing signal flow. The effects of input stimulation or perturbation are basically reflected on the basal activity vector, \(b\). For example, EGF stimulation can be reflected on \(b\) as follows.

>>> import numpy as np
>>> N = data.dg.number_of_nodes()  # The number of nodes; data.A.shape[0]
>>> b = np.zeros((N,), dtype=np.float64)
>>> b[data.n2i['EGF']] = 1

Now, we can perform the estimation of signal flow, and examine how the two outputs, ERK and AKT, have changed.

>>> xs1 = alg.compute(b)  # xs: x at steady-state
>>> xs1
array([0.00155625, 0.5       , 0.25      , 0.00165546, 0.02951243,
       0.00659918, 0.03226491, 0.0367612 , 0.        , 0.        ,
       0.00608503, 0.0017566 , 0.00331091, 0.00401268, 0.02780067,
       0.01390033, 0.00662182, 0.00733528, 0.01601391, 0.03340766,
       0.04724556, 0.00055022])
>>> xs1[data.n2i['ERK']]
0.0016554557287082902
>>> xs1[data.n2i['AKT']]
0.0015562514037656679

We can see the signs of the two outputs are positive, which means ERK and AKT are upregulated by EGF stimulation.

Next, let's apply an inhibitory perturbation to the network. For example, we can perturb MEK by setting its basal activity as follows.

>>> b[data.n2i['MEK']] = -1
>>> b[data.n2i['EGF']], b[data.n2i['MEK']]
(1.0, -1.0)
>>> b
array([ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

>>> xs2 = alg.compute(b)
>>> xs2[data.n2i['MEK']]
-0.4947084519007513
>>> xs2[data.n2i['ERK']]
-0.24735422595037565
>>> xs2[data.n2i['AKT']]
0.001836795161913794

At this time, the sign of ERK is negative, which means it is downregulated by MEK inhibition. On the other hand, AKT is not downregulated by the inhibition under EGF stimulation.

If we want to examine how the inhibition of MEK affects each node, we take the difference between the vectors of the two results.

>>> dxs = xs2 - xs1  # Difference between the two results.
>>> ind_up = np.where(dxs > 0)[0]  # Indices of upregulated nodes
>>> ind_dn = np.where(dxs < 0)[0]  # Indices of downregulated nodes
>>> for idx in ind_up:
...     print(data.i2n[idx])  # data.i2n: Index to name mapper.
AKT
GAB1
GAB1_SHP2
GAB1_pSHP2
GS
IRS
IRS_SHP2
PDK1
PI3K
PIP3
RAF
RAS
RasGAP
mTOR
>>> for idx in ind_dn:
...     print(data.i2n[idx])
ERK
MEK

This result shows that only MEK and ERK are downregulated by the inhibition of MEK under EGF stimulation.

Applying perturbation to a link¶

In some cases, a perturbation should be reflected on a link weight, not the basal activity. For example, we may want to examine what happens when PI3K cannot send signal to its downstreams (i.e., the out-links of PI3K are removed).

>>> b = np.zeros((N,), dtype=np.float64)
>>> b[data.n2i['EGF']] = 1
>>> alg.W[:, data.n2i['PI3K']] *= 0  # Remove all the out-link weights.
>>> xs3 = alg.compute(b)
>>> xs3[data.n2i['ERK']]
0.00172210494367554
>>> xs3[data.n2i['AKT']]
0.0
>>> dxs = xs3 - xs1  # xs1 is the same as the previously computed one.
>>> dxs[data.n2i['ERK']]
6.664921496724974e-05
>>> dxs[data.n2i['AKT']]
-0.0015562514037656679

We can see that AKT is downregulated if all out-links of PI3K are lost.

Estimating signal flows¶

The estimation of signal flow is defined as the multiplication of link weight and the activity of the source node. The activity is usually the steady-state activity.

\[ F(t)_{ij} = W_{ij} \cdot x(t)_{j} \]

Following the definition, we can compute the signal flow as follows.

>>> alg.initialize()  # Obtain the intact weight matrix.
>>> W1 = alg.W.copy()  # Get a copy of the weight matrix.
>>> F1 = W1 * xs1  # Element-wise multiplication of each row and xs1

Note that the above code snippet is not matrix-vector multiplication; it is element-wise multiplication of vectors (ndarray in NumPy). The following shows some of the estimated signal flows.

>>> F1[data.n2i['PIP3'], data.n2i['PI3K']]
0.02780066830505488
>>> F1[data.n2i['ERK'], data.n2i['MEK']]
0.0033109114574165805
>>> F1[data.n2i['GAB1'], data.n2i['ERK']]
-0.0005852919858618747

If we want to compare the two conditions, we can compute the net signal flow as follows.

\[ F_{net} = F_{c2} - F_{c1} \]

Let's use the PI3K example of Applying perturbation to a link again.

>>> W3 = alg.W.copy()  # alg.W is the intact one.
>>> W3[:, data.n2i['PI3K']] *= 0  # Apply the PI3K perturbation.
>>> F3 = W3 * xs3
>>> Fnet = F3 - F1  # Net signal flow.
>>> ir, ic = data.A.nonzero()
>>> for i in range(ir.size):
...     idx_trg, idx_src = ir[i], ic[i]
...     src = data.i2n[idx_src]
...     trg = data.i2n[idx_trg]
...     sf = Fnet[idx_trg, idx_src]  # Signal flow
...     print("Net signal flow from %s to %s: %f" % (src, trg, sf))
Net signal flow from PDK1 to AKT: -0.002837
Net signal flow from mTOR to AKT: -0.000275
Net signal flow from EGF to EGFR: 0.000000
Net signal flow from MEK to ERK: 0.000133
...
Net signal flow from PI3K to PIP3: -0.027801
...

We can see that some links have no change in their signal flows between the two conditions. Obviously, the signal flow from PI3K to PIP3 has decreased due to the perturbation. However, the depletion of all out-links of PI3K has upregulated the signal flow from MEK to ERK (i.e., positive value).

Creating a dataset with network structure¶

Describe how to define your own datasets only with network topology.
Explanation for the members of the Data class.