Estimating Brain Pathways Using Large-scale Multilevel Models
Xi (Rossi) LUO
Brown University Department of Biostatistics
Center for Statistical Sciences
Computation in Brain and Mind
Brown Institute for Brain Science
Brown Data Science Initiative
ABCD Research Group
Question: can "
big and
complex" fMRI reveal
causality?
Brain Networks
Functional/Effective Connectivity
nodes/connections removed to enhance visualization
Network Model with Stimulus
Goal: quantify effects
stimuli →
preSMA → PMC regions Duann, Ide, Luo, Li (2009). J of Neurosci
Model
Mediation Analysis and SEM
$$\begin{align*}M &= Z a + \overbrace{U + \epsilon_1}^{E_1}\\ R &= Z c + M b + \underbrace{U g + \epsilon_2}_{E_2}, \quad \epsilon_1 \bot \epsilon_2\end{align*}$$
This talk: causal estimation under $U\ne 0$ (its effect size $\delta \ne 0$) when modeling two brain regions
Pathway=Activation+Connectivity
Activation: stimulus $\rightarrow$ brain region activity
Connectivity: one brain region $\rightarrow$ another region
Whether not two or more brain regions "correlate"
Pathway: stimulus $\rightarrow$ brain region A $\rightarrow$ region B
Strong path: strong activation
and strong conn
Zero path: zero activation
or zero conn, including
Zero activation + strong conn = zero
Strong activation + zero conn = zero
Existing Approaches for $\delta \ne 0$
Assuming $\delta=0$
Assumption "too strong" for most cases Imai et al, 10
Sensitivity plot: "guessing" $\delta$
Simplify models: e.g. $c=0$ via instrumental variable
Adjust (e.g. motion) if possible Sobel, Lindquist, 14
Use Bayesian prior or regularization
Positive or negative effects depending on subjective choice of $\delta$
Method
Recall 3-level data: subjects, sessions, trials
Correlated mediation model for trials, mixed model for mediation effects among higher levels
Special case: 2-level data, mixed becomes anova
We will optimize the multilevel likelihood $$\underbrace{\sum_{\mbox{Sub }i} \sum_{\mbox{Sess }k}\ell(\mbox{brain activities in trials} | A_{ik}, B_{ik}, C_{ik}, \Theta_{ik})}_{\mbox{first level likelihood}} \\ + \underbrace{\sum_i \ell(A_{i1},\dotsc,A_{iK}, B_{i1},\dotsc | A, B, C, \Theta)}_{\mbox{second and third level likelihood}}$$
Challenges
Unmeasured confounding and causal inference
Usually impossible in many other statistical models
Prove our model is identifiable or our multilevel likelihood has a unique maximum
Usually the likelihood is mutlimodal in many other cases
Prove our MLE is unbiased and consistent under minimal assumptions
We do
not need this assumption when we can estimate $\delta$
Theory
Theorem:
Given $\delta$, unique maximizer of likelihood, expressed in closed form
Theorem:
Given $\delta$, our estimator is root-n consistent and efficient
Bias (and variance) depends on $\delta$
"Tragedy" of ML
Theorem: The maximum profile likelihood value is the same for every $\delta \in (-1,+1)$.
Likelihood provides
zero info about $\delta$
Cannot simply apply priors on $\delta$
Two different models generate the same single-trial BOLD activations if only observing $Z$, $M$, and $R$
without measuring $U$
Our Higher Level Models
Cannot identify $\delta$ from single sub and single sess (see our theorem)
Intuition: leverage complex data structure to infer $\delta$
Some Details
Level-1 model for each sub and each sess
$$\begin{pmatrix}{M}_{ik} & {R}_{ik}\end{pmatrix}=\begin{pmatrix}{Z}_{ik} & {M}_{ik}\end{pmatrix}\begin{pmatrix}{a}_{ik} & {b}_{ik}\\ 0 & {c}_{ik} \end{pmatrix}+\begin{pmatrix}{E}_{1_{ik}} & {E}_{2_{ik}}\end{pmatrix}$$
Limited variability in $\delta$ across sub/sess
Random effect model cf AFNI, FSL, SPM, and etc $$\begin{pmatrix}{A}_{ik}\\ {B}_{ik}\\ {C}_{ik} \end{pmatrix}=\begin{pmatrix}{A}\\ {B}\\ {C} \end{pmatrix}+\begin{pmatrix}\alpha_{i}\\ \beta_{i}\\ \gamma_{i} \end{pmatrix}+\begin{pmatrix}\epsilon_{ik}^{{A}}\\ \epsilon_{ik}^{{B}}\\ \epsilon_{ik}^{{C}} \end{pmatrix}=b+u_{i}+\eta_{ik}$$
Algorithm 1: Two-stage Fitting
Stage 1: fit $(\hat{A}_{ik}(\delta), \hat{B}_{ik}(\delta), \hat{C}_{ik}(\delta))$ for each $i$ and $k$ for varying $\delta$ using our step 1 single-level model
Stage 2: Find $\hat{\delta}$ that $(\hat{A}_{ik}(\hat{\delta}), \hat{B}_{ik}(\hat{\delta}), \hat{C}_{ik}(\hat{\delta}))$ yields maximum likelihood for random effects model
Small-scale computing
Theory for SEM and Confounding
Theorem: Under certain regularity conditions, asymptotically, the joint
multilevel likelihood has a unique maximum and the maximizer is consistent for $\delta$.
Contributions: data-driven estimation of confounding and consistency proof for SEM
Alternative Likelihood under Our Framework
Optimize all parameters in joint likelihood $$\begin{align*} &\sum_{i=1}^{N}\sum_{k=1}^{K}\log\Pr\left(R_{ik},M_{ik}|Z_{ik},\delta,b_{ik},\sigma_{1_{ik}},\sigma_{2_{ik}}\right)\quad \mbox{Data}\\ & + \sum_{i=1}^{N}\sum_{k=1}^{K}\log\Pr\left(b_{ik}|u_{i},b,\boldsymbol{\Lambda}\right)\quad \mbox{Subject variation}\\ & +\sum_{i=1}^{N}\log\Pr\left(u_{i}|\boldsymbol{{\Psi}}\right) \quad \mbox{Prior}\end{align*}$$
Large computation: $5NK + 3N + 11 > 2000$ paras
Algorithm 2
Theorem: The joint likelihood is conditional convex for groups of parameters.
Algorithm: block coordinate descent with projections.
Leverage conditional convexity to reduce computation