library(sesame)
library(CytoMethIC)
library(dplyr)
model_home = "https://github.com/zhou-lab/CytoMethIC_models/raw/main/models"
cmi_checkVersion()
## CytoMethIC requires matched versions of R, sesame, sesameData and ExperimentHub.
## Here is the current versions installed:
## R: 4.4.0
## Bioconductor: 3.19
## CytoMethIC: 0.99.21
## sesame: 1.21.14
## sesameData: 1.21.10
## ExperimentHub: 2.11.1

The mature CytoMethIC models are hosted in ExperimentHub. This package also supports using models from https://github.com/zhou-lab/CytoMethIC_models, which will host the most frequently updated public repository of our lab’s classifiers.

CytoMethIC supported models
EHID PredictionLabelDescription Title
EH8395 TCGA cancer types (N=33) Random Forest Model for Pan-Cancer Tumor Classification
EH8396 TCGA cancer types (N=33) Support Vector Machine Model for Pan-Cancer Tumor Classification
EH8397 TCGA cancer types (N=33) XGBoost Model for Pan-Cancer Tumor Classification
EH8398 TCGA cancer types (N=33) Multilayer Perceptron Model for Pan-Cancer Tumor Classification
EH8399 CNS Tumor Class (N=66) Random Forest Model for CNS Tumor Classification
EH8400 CNS Tumor Class (N=66) Support Vector Machine Model for CNS Tumor Classification
EH8401 CNS Tumor Class (N=66) XGBoost Model for CNS Tumor Classification
EH8402 CNS Tumor Class (N=66) Multilayer Perceptron Model for CNS Tumor Classification
EH8421 NA Random Forest Model for Race Prediction
EH8422 NA Random Forest Model for Pan-cancer Subtype Prediction
EH8423 NA Random Forest Model for Cell of Origin Prediction

Gender

The gender/sex model is based on X chromosome inactivation sites, including both hyper and hypomethylation on inactive X.

betas_hm450 = sesameDataGet("HM450.1.TCGA.PAAD")$betas
cmi_model = readRDS(url(paste0(model_home, "/Sex2_HM450.rds")))
cmi_predict(betas_hm450, cmi_model)
kable(table(cmi_model$stats$additional_info, cmi_model$stats$truth))
FEMALE MALE
FEMALE 355 2
MALE 2 371

Ethnicity

The below snippet shows a demonstration of the cmi_classify function working to predict the ethnicity of the patient.

betas_hm450 = sesameDataGet("HM450.1.TCGA.PAAD")$betas
cmi_model = ExperimentHub()[["EH8421"]]
cmi_predict(betas_hm450, cmi_model)
betas_epicv2 = openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]])
cmi_model = readRDS(url(paste0(model_home, "/Race3_rfcTCGA_InfHum3.rds")))
cmi_predict(betas_epicv2, cmi_model, lift_over=TRUE)

Cancer Classification Models

CNS Cancer

betas_hm450 = sesameDataGet("HM450.1.TCGA.PAAD")$betas
cmi_model = ExperimentHub()[["EH8396"]]
## cmi_model = readRDS(url(paste0(model_home, "/CancerType33_rfcTCGA_InfHum3.rds")))
cmi_predict(betas_hm450, cmi_model, lift_over=TRUE)

Cancer Phenotyping Models

Cancer Cell of origin

The below snippet shows a demonstration of the cmi_classify function working to predict the cell of origin of the cancer.

betas_hm450 = sesameDataGet("HM450.1.TCGA.PAAD")$betas
cmi_model = ExperimentHub()[["EH8423"]]
cmi_predict(betas_hm450, cmi_model)
sessionInfo()
## R Under development (unstable) (2023-12-15 r85690)
## Platform: aarch64-apple-darwin22.6.0
## Running under: macOS Ventura 13.6.4
## 
## Matrix products: default
## BLAS:   /Users/zhouw3/.Renv/versions/4.4.0.devel/lib/R/lib/libRblas.dylib 
## LAPACK: /Users/zhouw3/.Renv/versions/4.4.0.devel/lib/R/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] knitr_1.45           dplyr_1.1.4          CytoMethIC_0.99.21  
##  [4] sesame_1.21.14       sesameData_1.21.10   ExperimentHub_2.11.1
##  [7] AnnotationHub_3.11.2 BiocFileCache_2.11.1 dbplyr_2.5.0        
## [10] BiocGenerics_0.49.1  rmarkdown_2.25       R6_2.5.1            
## 
## loaded via a namespace (and not attached):
##  [1] DBI_1.2.2                   bitops_1.0-7               
##  [3] rlang_1.1.3                 magrittr_2.0.3             
##  [5] matrixStats_1.2.0           e1071_1.7-14               
##  [7] compiler_4.4.0              RSQLite_2.3.5              
##  [9] png_0.1-8                   vctrs_0.6.5                
## [11] reshape2_1.4.4              stringr_1.5.1              
## [13] pkgconfig_2.0.3             crayon_1.5.2               
## [15] fastmap_1.1.1               XVector_0.43.1             
## [17] utf8_1.2.4                  tzdb_0.4.0                 
## [19] preprocessCore_1.65.0       purrr_1.0.2                
## [21] bit_4.0.5                   xfun_0.41                  
## [23] randomForest_4.7-1.1        zlibbioc_1.49.3            
## [25] cachem_1.0.8                GenomeInfoDb_1.39.9        
## [27] jsonlite_1.8.8              blob_1.2.4                 
## [29] DelayedArray_0.29.9         BiocParallel_1.37.1        
## [31] parallel_4.4.0              bslib_0.6.1                
## [33] stringi_1.8.3               RColorBrewer_1.1-3         
## [35] GenomicRanges_1.55.4        jquerylib_0.1.4            
## [37] Rcpp_1.0.12                 SummarizedExperiment_1.33.3
## [39] wheatmap_0.2.0              readr_2.1.5                
## [41] IRanges_2.37.1              Matrix_1.6-4               
## [43] tidyselect_1.2.1            abind_1.4-5                
## [45] yaml_2.3.8                  codetools_0.2-19           
## [47] curl_5.2.1                  lattice_0.22-5             
## [49] tibble_3.2.1                plyr_1.8.9                 
## [51] Biobase_2.63.0              withr_3.0.0                
## [53] KEGGREST_1.43.0             evaluate_0.23              
## [55] proxy_0.4-27                Biostrings_2.71.4          
## [57] pillar_1.9.0                BiocManager_1.30.22        
## [59] filelock_1.0.3              MatrixGenerics_1.15.0      
## [61] stats4_4.4.0                generics_0.1.3             
## [63] RCurl_1.98-1.14             BiocVersion_3.19.1         
## [65] S4Vectors_0.41.4            hms_1.1.3                  
## [67] ggplot2_3.5.0               munsell_0.5.0              
## [69] scales_1.3.0                BiocStyle_2.31.0           
## [71] class_7.3-22                glue_1.7.0                 
## [73] tools_4.4.0                 grid_4.4.0                 
## [75] AnnotationDbi_1.65.2        colorspace_2.1-0           
## [77] GenomeInfoDbData_1.2.11     cli_3.6.2                  
## [79] rappdirs_0.3.3              fansi_1.0.6                
## [81] S4Arrays_1.3.6              gtable_0.3.4               
## [83] sass_0.4.8                  digest_0.6.33              
## [85] SparseArray_1.3.4           memoise_2.0.1              
## [87] htmltools_0.5.7             lifecycle_1.0.4            
## [89] httr_1.4.7                  mime_0.12                  
## [91] bit64_4.0.5                 MASS_7.3-60.1