Across-Site Predictability of Metabolite Profiles

Data

load_packages_names.R
Raw metabolite data
Covariates data: S-I_A covariates, S-I_B covariates, S-II covariates, S-III covariates.

Codes

Data preprocessing:
- Goal: Imputation and normalization to prepare data for statistical analysis.
- meta_preprocess.R
- covar_preprocess.R

R2 calculation:
- Goal: Calculate marginal R2 and conditional R2 for three sets of variables--demographics (D), metabolites (M), cohorts (C).
- Results contains: Total R2, R2(D), R2(M), R2(C), R2(D|M,C), R2(M|D,C), R2(C|D,M) for all metabolites
- Codes: rsq_subset.R

Two-cohort study
- Goal: Construct predictive model using training set (from S-I_B and S-II) and test the model using test set from from S-I_B and S-II plus all samples from S-I_A and S-III.
- Step 1. Obtain training and test set from S-I_B and S-II: 2cohort_split_train_test.R
- Step 2. Training model using training set: 2cohort_train_model.R
- Step 3. Test model in Step 2 using test set:
  - Test using test set from S-I_B and S-II: 2cohort_test_testset.R
  - Test using samples in S-III: 2cohort_test_S-III.R
  - Test using samples in S-I_A: 2cohort_test_S-IA.R
- Step 4. Measure variation using Averaged Squared Error (ASD)
  - Calculate deviation distance: 2cohort_calc_D.R
  - Calculate ASD: 2cohort_calc_ASD.R

Three-cohort study
- Goal: Construct predictive model using training set (from S-I_B, S-II and S-III) and test the model using test set from from S-I_B, S-II and S-III plus all samples from S-I_A.
- Step 1. Obtain training and test set from S-I_B, S-II and S-III: 3cohort_split_train_test.R

(Note that we used the same set of samples from training set of two-cohort study plus the new samples from S-III as the training set for three-cohort study.):

- Step 2. Training model using training set: 3cohort_train_model.R
- Step 3. Test model in Step 2 using test set:
  - Test using test set from S-I_B, S-II and S-III: 3cohort_test_testset.R
  - Test using samples in S-I_A: 3cohort_test_S-IA.R
- Step 4. Measure variation using Averaged Squared Error (ASD)
  - Calculate deviation distance: 3cohort_calc_D.R
  - Calculate ASD: 3cohort_calc_ASD.R

One-cohort study
- Goal: Construct predictive model using training set (from S-I_B) and test the model using test set from from S-I_B plus all samples from S-I_A, S-II and S-III.

- Step 1. Obtain training and test set from S-I_B: 1cohort_split_train_test.R

(Note that we used the same set of samples from S-I_B in the training set of two-cohort study as the training set of one-cohort study.):

- Step 2. Training model using training set: 1cohort_train_model.R
- Step 3. Test model in Step 2 using test set:
  - Test using test set from S-I_B: 1cohort_test_testset.R
  - Test using samples in S-I_A, S-II and S-III: 1cohort_test_S-IA.R, 1cohort_test_S-II.R, 1cohort_test_S-III.R
- Step 4. Measure variation using Averaged Squared Error (ASD)
  - Calculate deviation distance: 1cohort_calc_D.R
  - Calculate ASD: 1cohort_calc_ASD.R

Testing with an additional cohort S-IV ( newly collected 253 samples)
- Data: raw metabolite data meta_raw_bioivt_penn.txt
- Codes:
  - Preprocess data from S-IV: preprocess_meta_S4.R
  - Test using 2-cohort model: 2cohort_test_S-IV.R
  - Test using 3-cohort model: 3cohort_test_S-IV.R

Figures
- Histograms of observed, predicted and residuals from two-cohort model: Download HERE.

Across-Site Predictability of Metabolite Profiles

Data

Codes

Publications