library(sesame)
library(CytoMethIC)
library(dplyr)
= "https://github.com/zhou-lab/CytoMethIC_models/raw/main/models"
model_home cmi_checkVersion()
## CytoMethIC requires matched versions of R, sesame, sesameData and ExperimentHub.
## Here is the current versions installed:
## R: 4.4.0
## Bioconductor: 3.19
## CytoMethIC: 0.99.21
## sesame: 1.21.14
## sesameData: 1.21.10
## ExperimentHub: 2.11.1
The mature CytoMethIC models are hosted in ExperimentHub. This package also supports using models from https://github.com/zhou-lab/CytoMethIC_models, which will host the most frequently updated public repository of our lab’s classifiers.
EHID | PredictionLabelDescription | Title |
---|---|---|
EH8395 | TCGA cancer types (N=33) | Random Forest Model for Pan-Cancer Tumor Classification |
EH8396 | TCGA cancer types (N=33) | Support Vector Machine Model for Pan-Cancer Tumor Classification |
EH8397 | TCGA cancer types (N=33) | XGBoost Model for Pan-Cancer Tumor Classification |
EH8398 | TCGA cancer types (N=33) | Multilayer Perceptron Model for Pan-Cancer Tumor Classification |
EH8399 | CNS Tumor Class (N=66) | Random Forest Model for CNS Tumor Classification |
EH8400 | CNS Tumor Class (N=66) | Support Vector Machine Model for CNS Tumor Classification |
EH8401 | CNS Tumor Class (N=66) | XGBoost Model for CNS Tumor Classification |
EH8402 | CNS Tumor Class (N=66) | Multilayer Perceptron Model for CNS Tumor Classification |
EH8421 | NA | Random Forest Model for Race Prediction |
EH8422 | NA | Random Forest Model for Pan-cancer Subtype Prediction |
EH8423 | NA | Random Forest Model for Cell of Origin Prediction |
The gender/sex model is based on X chromosome inactivation sites, including both hyper and hypomethylation on inactive X.
= sesameDataGet("HM450.1.TCGA.PAAD")$betas
betas_hm450 = readRDS(url(paste0(model_home, "/Sex2_HM450.rds")))
cmi_model cmi_predict(betas_hm450, cmi_model)
kable(table(cmi_model$stats$additional_info, cmi_model$stats$truth))
FEMALE | MALE | |
---|---|---|
FEMALE | 355 | 2 |
MALE | 2 | 371 |
The below snippet shows a demonstration of the cmi_classify function working to predict the ethnicity of the patient.
= sesameDataGet("HM450.1.TCGA.PAAD")$betas
betas_hm450 = ExperimentHub()[["EH8421"]]
cmi_model cmi_predict(betas_hm450, cmi_model)
= openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]])
betas_epicv2 = readRDS(url(paste0(model_home, "/Race3_rfcTCGA_InfHum3.rds")))
cmi_model cmi_predict(betas_epicv2, cmi_model, lift_over=TRUE)
= sesameDataGet("HM450.1.TCGA.PAAD")$betas
betas_hm450 = ExperimentHub()[["EH8396"]]
cmi_model ## cmi_model = readRDS(url(paste0(model_home, "/CancerType33_rfcTCGA_InfHum3.rds")))
cmi_predict(betas_hm450, cmi_model, lift_over=TRUE)
The below snippet shows a demonstration of the cmi_classify function working to predict the cell of origin of the cancer.
= sesameDataGet("HM450.1.TCGA.PAAD")$betas
betas_hm450 = ExperimentHub()[["EH8423"]]
cmi_model cmi_predict(betas_hm450, cmi_model)
sessionInfo()
## R Under development (unstable) (2023-12-15 r85690)
## Platform: aarch64-apple-darwin22.6.0
## Running under: macOS Ventura 13.6.4
##
## Matrix products: default
## BLAS: /Users/zhouw3/.Renv/versions/4.4.0.devel/lib/R/lib/libRblas.dylib
## LAPACK: /Users/zhouw3/.Renv/versions/4.4.0.devel/lib/R/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.45 dplyr_1.1.4 CytoMethIC_0.99.21
## [4] sesame_1.21.14 sesameData_1.21.10 ExperimentHub_2.11.1
## [7] AnnotationHub_3.11.2 BiocFileCache_2.11.1 dbplyr_2.5.0
## [10] BiocGenerics_0.49.1 rmarkdown_2.25 R6_2.5.1
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.2 bitops_1.0-7
## [3] rlang_1.1.3 magrittr_2.0.3
## [5] matrixStats_1.2.0 e1071_1.7-14
## [7] compiler_4.4.0 RSQLite_2.3.5
## [9] png_0.1-8 vctrs_0.6.5
## [11] reshape2_1.4.4 stringr_1.5.1
## [13] pkgconfig_2.0.3 crayon_1.5.2
## [15] fastmap_1.1.1 XVector_0.43.1
## [17] utf8_1.2.4 tzdb_0.4.0
## [19] preprocessCore_1.65.0 purrr_1.0.2
## [21] bit_4.0.5 xfun_0.41
## [23] randomForest_4.7-1.1 zlibbioc_1.49.3
## [25] cachem_1.0.8 GenomeInfoDb_1.39.9
## [27] jsonlite_1.8.8 blob_1.2.4
## [29] DelayedArray_0.29.9 BiocParallel_1.37.1
## [31] parallel_4.4.0 bslib_0.6.1
## [33] stringi_1.8.3 RColorBrewer_1.1-3
## [35] GenomicRanges_1.55.4 jquerylib_0.1.4
## [37] Rcpp_1.0.12 SummarizedExperiment_1.33.3
## [39] wheatmap_0.2.0 readr_2.1.5
## [41] IRanges_2.37.1 Matrix_1.6-4
## [43] tidyselect_1.2.1 abind_1.4-5
## [45] yaml_2.3.8 codetools_0.2-19
## [47] curl_5.2.1 lattice_0.22-5
## [49] tibble_3.2.1 plyr_1.8.9
## [51] Biobase_2.63.0 withr_3.0.0
## [53] KEGGREST_1.43.0 evaluate_0.23
## [55] proxy_0.4-27 Biostrings_2.71.4
## [57] pillar_1.9.0 BiocManager_1.30.22
## [59] filelock_1.0.3 MatrixGenerics_1.15.0
## [61] stats4_4.4.0 generics_0.1.3
## [63] RCurl_1.98-1.14 BiocVersion_3.19.1
## [65] S4Vectors_0.41.4 hms_1.1.3
## [67] ggplot2_3.5.0 munsell_0.5.0
## [69] scales_1.3.0 BiocStyle_2.31.0
## [71] class_7.3-22 glue_1.7.0
## [73] tools_4.4.0 grid_4.4.0
## [75] AnnotationDbi_1.65.2 colorspace_2.1-0
## [77] GenomeInfoDbData_1.2.11 cli_3.6.2
## [79] rappdirs_0.3.3 fansi_1.0.6
## [81] S4Arrays_1.3.6 gtable_0.3.4
## [83] sass_0.4.8 digest_0.6.33
## [85] SparseArray_1.3.4 memoise_2.0.1
## [87] htmltools_0.5.7 lifecycle_1.0.4
## [89] httr_1.4.7 mime_0.12
## [91] bit64_4.0.5 MASS_7.3-60.1