fMRI data: blood-oxygen-level dependent (BOLD) signals from each
cube/voxel (~millimeters),
$10^5$ ~ $10^6$ voxels in total.
Multilevel fMRI Studies
Sub 1, Sess 1
Time 1
2
…
~200
⋮
Sub i, Sess j
…
⋮
Sub ~100, Sess ~4
…
Large, multilevel (subject, session, voxel) data
e.g. $1000 \times 4 \times 300 \times 10^6 \approx 1 $ trillion data points
Raw Data: Motor Region
Time (seconds)
Black: fMRI BOLD activity
Blue: stop onset times;
Red: go onset times
$Z_t$: Stimulus onsets convoluted with Canonical HRF
$M_t$, $R_t$: fMRI time series from two brain regions
Review: Granger Causality/VAR
Given two (or more) time series $x_t$ and $y_t$
$$\begin{align*} x_t &= \sum_{j=1}^p \psi_{1j} x_{t-j} + \sum_{j=1}^p \phi_{1j} y_{t-j} + \epsilon_{1t} \\
y_t &= \sum_{j=1}^p \psi_{2j} y_{t-j} + \sum_{j=1}^p \phi_{2j} x_{t-j} + \epsilon_{2t} \end{align*}$$
Also called vector autoregressive models
$y$ Granger causes $x$ if $\phi_{1j} \ne 0$ Granger, 69
Models pair-wise connections not pathways
Granger Causality/VAR
Granger Causality (VAR) popular for fMRI
Over 10,000 google scholar results on "granger causality neuroimaging", as of May 29, 2019
Models multiple stationary time series
AR($p$) (small $p$) fits fMRI well Lingdquist, 08
Not for non-stationary/task fMRI
Cannot model stimulus effects
Conceptual Brain Model with Stimulus
Goal: quantify effects
stimuli →
preSMA → PMC regions Duann, Ide, Luo, Li (2009). J of Neurosci
Model: Mediation Analysis and SEM
$$\begin{align*}M &= Z a + \overbrace{U + \epsilon_1}^{E_1}\\ R &= Z c + M b + \underbrace{U g + \epsilon_2}_{E_2}, \quad \epsilon_1 \bot \epsilon_2\end{align*}$$
We established causality using potential outcomes.
Ours also model time series and multilevel dependence
Causal Conditions
The treatment randomization regime is the same across time and participants.
Models are correctly specified, and no treatment-mediator interaction.
At each time point $t$, the observed outcome is one realization of the potential outcome with observed treatment assignment $\mathbf{\bar S}_{t}$, where $\mathbf{ \bar S}_{t}=( \mathbf{S}_{1},\dots,\mathbf{S}_{t})$.
The treatment assignment is random across time.
The causal effects are time-invariant.
The time-invariant covariance matrix is not affected by the treatment assignments.
Estimation: Conditional Likelihood
The full likelihood for our model is too complex
Given the initial $p$ time points, the conditional likelihood is
$$ \begin{align*} & \tiny \ell\left(\boldsymbol{\Theta},\delta~|~\mathbf{Z},\mathcal{I}_{p}\right) = \sum_{t=p+1}^{T}\log f\left((M_{t},R_{t})~|~\mathbf{X}_{t}\right) \\
& \tiny = -\frac{T-p}{2}\log\sigma_{1}^{2}\sigma_{2}^{2}(1-\delta^{2})-\frac{1}{2\sigma_{1}^{2}}\|\mathbf{M}-\mathbf{X}\boldsymbol{\theta}_{1}\|_{2}^{2} \\
& \tiny -\frac{1}{2\sigma_{2}^{2}(1-\delta^{2})}\|(\mathbf{R}-\mathbf{M}B-\mathbf{X}\boldsymbol{\theta}_{2})-\kappa(\mathbf{M}-\mathbf{X}\boldsymbol{\theta}_{1})\|_{2}^{2} \end{align*} $$
Full Model for Multilevel Data
Multilevel Data: Two-level Likelihood
Second level model, for each subject $i$
$$(A_i,B_i,C_i) = (A,B,C) + (\eta^A_i, \eta^B_i, \eta^C_i)$$
where errors $\eta$ are normally distributed
The two level likelihood is conditional convex
Two-stage fitting: plug-in estimates from the first level
Block coordinate fitting: jointly optimize first level likelihood + second level likelihood
Theorem: Assume assumptions (A1)-(A6) are satisfied.
Assume $\mathbb{E}(Z_{i_{t}}^{2})=q\lt \infty$, for $i=1,\dots,N$. Let $T=\min_{i}T_{i}$. 1. If $\boldsymbol{\Lambda}$ is known, then the two-stage estimator $\hat{\delta}$ maximizes the profile likelihood of model asymptotically, and $\hat{\delta}$ is $\sqrt{NT}$-consistent. 2. If $\boldsymbol{\Lambda}$ is unknown, then the profile likelihood of model has a unique maximizer $\hat{\delta}$ asymptotically, and $\hat{\delta}$ is $\sqrt{NT}$-consistent, provided that $1/\varpi=\bar{\kappa}^{2}/\varrho^{2}=\mathcal{O}_{p}(1/\sqrt{NT})$, $\kappa_{i}=\sigma_{i_{2}}/\sigma_{i_{1}}$, $\bar{\kappa}=(1/N)\sum\kappa_{i}$, and $\varrho^{2}=(1/N)\sum(\kappa_{i}-\bar{\kappa})^{2}$.
Using the two-stage estimator $\hat{\delta}$, the CMLE of our model is consistent, as well as the estimator for $\mathbf{b}=(A,B,C)$.
Theory: Summary
Under regularity conditions, $N$ subs, $T$ time points
Our $\hat \delta$ is $\sqrt{NT}$-consistent
This relaxes the unmeasured confounding assumption in mediation analysis
Our $(\hat{A},\hat{B}, \hat{C})$ is also consistent
Simulations & Real Data
Comparison
Our methods: GMA-h and GMA-ts
Previous methods: BK Baron & Kenny, MACC Zhao and Luo, KKB Kenny et al
Other methods do not model the temporal correlations or time series like ours
Simulations
Low bias for $AB$
Low bias for temporal cor
Gray dash lines are the truth
GMA performs the best, and recovers the temporal correlations
Real Data Experiment
Public data: OpenFMRI ds30
Stop-go experiment: withhold (STOP) from pressing buttons
Expect "STOP" stimuli to deactivate brain region M1
Goal: quantify the role of region preSMA
Result
Result
STOP deactivates M1 directly ($C$) and indirectly ($AB$)
preSMA mediates a good portion of the total effect
Help resolve the debates among neuroscientists
Other methods under-estiamte the effects
Novel feedback findings: M1 → preSMA after lag 1 and 2 (not shown)