Question: can "big and complex" fMRI data be helpful?
Brain Networks
Functional/Effective Connectivity
nodes/connections removed to enhance visualization
Network Model with Stimulus
Goal: quantify effects
stimuli →
preSMA → PMC regions Duann, Ide, Luo, Li (2009). J of Neurosci
Model
Mediation Analysis and SEM
$$\begin{align*}M &= Z a + \overbrace{U + \epsilon_1}^{E_1}\\ R &= Z c + M b + \underbrace{U g + \epsilon_2}_{E_2}, \quad \epsilon_1 \bot \epsilon_2\end{align*}$$
Estimate $(a,b,c,\boldsymbol{\Sigma})$ via ML (a lot of handwaving)
We solve an optimization with constraints
Different than running two regressions
Causal Interpretation
Prove causal using potential outcomes Neyman, 23; Rubin, 74
Causal inference intuition $$Z \rightarrow \begin{pmatrix} M \\ R \end{pmatrix}$$
Other approaches assume $$r_{i}\left(z_{i}^{\prime},m_{i}\right)\bot m_{i}\left(z_{i}\right)|Z_{i}=z_{i}$$
We do not need this assumption
Theory
Theorem:
Given $\delta$, unique maximizer of likelihood, expressed in closed form
Theorem:
Given $\delta$, our estimator is root-n consistent and efficient
Bias (and variance) depends on $\delta$
Maximum Likelihood: “Tragedy may lurk around a corner”
[Stigler 2007]
"Tragedy" of ML
Theorem: The maximum likelihood value is the same for every $\delta \in (-1,+1)$.
Likelihood provides
zero info about $\delta$
Cannot simply use prior on $\delta$
Two different models generate same single-trial BOLD activations if only observing $Z$, $M$, and $R$
without measuring $U$
Our Approach: Step 2
Cannot identify $\delta$ from single sub and single sess (see our theorem)
Intuition: leverage complex data structure to infer $\delta$
Step 2: Some Details
Step 1 model for each sub and each sess
$$\begin{pmatrix}{M}_{ik} & {R}_{ik}\end{pmatrix}=\begin{pmatrix}{Z}_{ik} & {M}_{ik}\end{pmatrix}\begin{pmatrix}{a}_{ik} & {b}_{ik}\\ 0 & {c}_{ik} \end{pmatrix}+\begin{pmatrix}{E}_{1_{ik}} & {E}_{2_{ik}}\end{pmatrix}$$
Limited variability in $\delta$ across sub/sess
Random effect model cf AFNI, FSL, SPM, and etc $$\begin{pmatrix}{A}_{ik}\\ {B}_{ik}\\ {C}_{ik} \end{pmatrix}=\begin{pmatrix}{A}\\ {B}\\ {C} \end{pmatrix}+\begin{pmatrix}\alpha_{i}\\ \beta_{i}\\ \gamma_{i} \end{pmatrix}+\begin{pmatrix}\epsilon_{ik}^{{A}}\\ \epsilon_{ik}^{{B}}\\ \epsilon_{ik}^{{C}} \end{pmatrix}=b+u_{i}+\eta_{ik}$$
Option 1: Two-stage Fitting
Stage 1: fit $(\hat{A}_{ik}(\delta), \hat{B}_{ik}(\delta), \hat{C}_{ik}(\delta))$ for each $i$ and $k$ for varying $\delta$ using our step 1 single-level model
Stage 2: Find $\hat{\delta}$ that $(\hat{A}_{ik}(\hat{\delta}), \hat{B}_{ik}(\hat{\delta}), \hat{C}_{ik}(\hat{\delta}))$ yields maximum likelihood for random effects model
Small-scale computing
However, estimation error in stage 1 not accounted
Complicates our goal on determining the variance
Option 2: Integrated Modeling
Optimize all parameters in joint likelihood $$\begin{align*} &\sum_{i=1}^{N}\sum_{k=1}^{K}\log\Pr\left(R_{ik},M_{ik}|Z_{ik},\delta,b_{ik},\sigma_{1_{ik}},\sigma_{2_{ik}}\right)\quad \mbox{Data}\\ & + \sum_{i=1}^{N}\sum_{k=1}^{K}\log\Pr\left(b_{ik}|u_{i},b,\boldsymbol{\Lambda}\right)\quad \mbox{Subject variation}\\ & +\sum_{i=1}^{N}\log\Pr\left(u_{i}|\boldsymbol{{\Psi}}\right) \quad \mbox{Prior}\end{align*}$$
Large computation: $5NK + 3N + 11 > 2000$ paras
Algorithm
Theorem: The joint likelihood is conditional convex for groups of parameters.
Algorithm: block coordinate descent with projections.
Leverage conditional convexity to reduce computation