Across-Site Predictability of Metabolite Profiles
Data
- load_packages_names.R
- Raw metabolite data
- Covariates data: S-I_A covariates, S-I_B covariates, S-II covariates, S-III covariates.
Codes
- Data preprocessing:
- Goal: Imputation and normalization to prepare data for statistical analysis.
- meta_preprocess.R
- covar_preprocess.R
- R2 calculation:
- Goal: Calculate marginal R2 and conditional R2 for three sets of variables--demographics (D), metabolites (M), cohorts (C).
- Results contains: Total R2, R2(D), R2(M), R2(C), R2(D|M,C), R2(M|D,C), R2(C|D,M) for all metabolites
- Codes: rsq_subset.R
- Two-cohort study
- Goal: Construct predictive model using training set (from S-I_B and S-II) and test the model using test set from from S-I_B and S-II plus all samples from S-I_A and S-III.
- Step 1. Obtain training and test set from S-I_B and S-II: 2cohort_split_train_test.R
- Step 2. Training model using training set: 2cohort_train_model.R
- Step 3. Test model in Step 2 using test set:
- Test using test set from S-I_B and S-II: 2cohort_test_testset.R
- Test using samples in S-III: 2cohort_test_S-III.R
- Test using samples in S-I_A: 2cohort_test_S-IA.R
- Step 4. Measure variation using Averaged Squared Error (ASD)
- Calculate deviation distance: 2cohort_calc_D.R
- Calculate ASD: 2cohort_calc_ASD.R
- Three-cohort study
- Goal: Construct predictive model using training set (from S-I_B, S-II and S-III) and test the model using test set from from S-I_B, S-II and S-III plus all samples from S-I_A.
- Step 1. Obtain training and test set from S-I_B, S-II and S-III: 3cohort_split_train_test.R
(Note that we used the same set of samples from training set of two-cohort study plus the new samples from S-III as the training set for three-cohort study.):
-
- Step 2. Training model using training set: 3cohort_train_model.R
- Step 3. Test model in Step 2 using test set:
- Test using test set from S-I_B, S-II and S-III: 3cohort_test_testset.R
- Test using samples in S-I_A: 3cohort_test_S-IA.R
- Step 4. Measure variation using Averaged Squared Error (ASD)
- Calculate deviation distance: 3cohort_calc_D.R
- Calculate ASD: 3cohort_calc_ASD.R
- One-cohort study
- Goal: Construct predictive model using training set (from S-I_B) and test the model using test set from from S-I_B plus all samples from S-I_A, S-II and S-III.
-
- Step 1. Obtain training and test set from S-I_B: 1cohort_split_train_test.R
(Note that we used the same set of samples from S-I_B in the training set of two-cohort study as the training set of one-cohort study.):
-
- Step 2. Training model using training set: 1cohort_train_model.R
- Step 3. Test model in Step 2 using test set:
- Test using test set from S-I_B: 1cohort_test_testset.R
- Test using samples in S-I_A, S-II and S-III: 1cohort_test_S-IA.R, 1cohort_test_S-II.R, 1cohort_test_S-III.R
- Step 4. Measure variation using Averaged Squared Error (ASD)
- Calculate deviation distance: 1cohort_calc_D.R
- Calculate ASD: 1cohort_calc_ASD.R
- Testing with an additional cohort S-IV ( newly collected 253 samples)
- Data: raw metabolite data meta_raw_bioivt_penn.txt
- Codes:
- Preprocess data from S-IV: preprocess_meta_S4.R
- Test using 2-cohort model: 2cohort_test_S-IV.R
- Test using 3-cohort model: 3cohort_test_S-IV.R
- Figures
- Histograms of observed, predicted and residuals from two-cohort model: Download HERE.
Publications