Skip to contents

Introduction

Welcome to the guide on enhancing your omics data analysis through the power of imputation. This vignette is designed to equip you with the knowledge and tools to effectively address missing data in your omics datasets with imputomics package. By employing advanced imputation techniques, you’ll be able to harness the full potential of your data, ensuring robust and insightful analysis outcomes.

Getting Started with the Package

Before diving into the world of omics data imputation, let’s set up the stage by installing and setting up the imputation package. You can easily install the package using the following command:

devtools::install_github("BioGenies/imputomics")

Once the package is installed, load it into your R environment:

Now, you can access imputation methods via graphical interface using

… or use the package functions directly.

For the purpose of this guide, we’ll use an example dataset that simulates the missingness that may encounter in real-world data. Let’s load this example dataset:

data(sim_miss)

sim_miss
#>             X1         X2        X3           X4          X5         X6
#> 1  0.155050833 0.88115677        NA 0.1194841210 0.600405471         NA
#> 2  0.968378809 0.60748407 0.6337790 0.6181194666 0.008132938 0.75207300
#> 3  0.468263086 0.57452956 0.2476089           NA 0.370125689         NA
#> 4  0.776819652 0.80313329 0.5513393 0.1459775318 0.724248714 0.84777606
#> 5  0.407885741 0.79911692 0.2347611 0.1897135850 0.418270750 0.70873497
#> 6  0.538797149 0.80186265 0.2586147 0.5754945197 0.238248518         NA
#> 7  0.830082966 0.75207300 0.9528881 0.0001881735 0.887881226 0.48833525
#> 8  0.187103555 0.57546133 0.8568746 0.3283282877 0.577775384 0.34722847
#> 9  0.779969688 0.94421829        NA 0.0782780354 0.739950257 0.20304801
#> 10 0.193943927 0.21898736 0.5545242 0.6041721753 0.160679622 0.73995026
#> 11 0.434231178 0.47799791 0.8769438 0.3267990556 0.529571799 0.89369752
#> 12 0.002274518 0.04466879 0.8018626 0.7904525734 0.574529562 0.57452956
#> 13 0.834692139 0.16848571 0.5467373 0.8300829662 0.960086967 0.08444870
#> 14          NA 0.36828087 0.2327532 0.8100510929 0.398616886 0.14825833
#> 15 0.956967818 0.56331137 0.6022678 0.2027530000 0.408468068 0.37028128
#> 16 0.948497073         NA 0.7125391 0.5060450307 0.613321482 0.86117762
#> 17 0.600729976 0.73432757 0.6489036 0.5206679751 0.706590850 0.83668180
#> 18 0.261807405 0.70365677 0.1875624 0.0842680801 0.799116918 0.74054705
#> 19 0.643034251 0.73682406 0.2383929 0.2687653562 0.161331222 0.98576580
#> 20 0.526233507         NA        NA 0.8147157354 0.885932416 0.03208182

Congratulations, you’re now ready to embark on a journey to conquer missing data and unleash the true potential of your omics analyses! In the upcoming sections, we’ll explore the various imputation methods at your disposal, delve into missingness patterns, and provide hands-on examples to seamlessly integrate imputation into your workflow. Let’s begin!

Imputation

In the realm of omics data analysis, a variety of imputation techniques are available to address missing values. The package imputomics offers you 43 implementations of missing values imputtaion methods. Each function for missing data imputation starts with impute_. You can easily access all the imputation functions by

list_imputations()
#>  [1] "impute_amelia"              "impute_areg"               
#>  [3] "impute_bayesmetab"          "impute_bcv_svd"            
#>  [5] "impute_bpca"                "impute_cm"                 
#>  [7] "impute_corknn"              "impute_eucknn"             
#>  [9] "impute_gsimp"               "impute_halfmin"            
#> [11] "impute_imputation_knn"      "impute_knn"                
#> [13] "impute_mai"                 "impute_mean"               
#> [15] "impute_median"              "impute_metabimpute_bpca"   
#> [17] "impute_metabimpute_gsimp"   "impute_metabimpute_halfmin"
#> [19] "impute_metabimpute_mean"    "impute_metabimpute_median" 
#> [21] "impute_metabimpute_min"     "impute_metabimpute_qrilc"  
#> [23] "impute_metabimpute_rf"      "impute_metabimpute_zero"   
#> [25] "impute_mice_cart"           "impute_mice_mixed"         
#> [27] "impute_mice_pmm"            "impute_mice_rf"            
#> [29] "impute_min"                 "impute_missforest"         
#> [31] "impute_missmda_em"          "impute_mnmf"               
#> [33] "impute_nipals"              "impute_pemm"               
#> [35] "impute_ppca"                "impute_qrilc"              
#> [37] "impute_random"              "impute_regimpute"          
#> [39] "impute_softimpute"          "impute_svd"                
#> [41] "impute_tknn"                "impute_vim_knn"            
#> [43] "impute_zero"

Since you’re already familiar with the methods we’ve implemented, there’s nothing left but to impute your data! For example, we’ll demonstrate how to impute data using the Bayesian Principal Component Analysis (BPCA) method:

impute_bpca(sim_miss)
#>             X1         X2        X3           X4          X5         X6
#> 1  0.155050833 0.88115677 0.5375502 0.1194841210 0.600405471 0.56556562
#> 2  0.968378809 0.60748407 0.6337790 0.6181194666 0.008132938 0.75207300
#> 3  0.468263086 0.57452956 0.2476089 0.4112819348 0.370125689 0.56556562
#> 4  0.776819652 0.80313329 0.5513393 0.1459775318 0.724248714 0.84777606
#> 5  0.407885741 0.79911692 0.2347611 0.1897135850 0.418270750 0.70873497
#> 6  0.538797149 0.80186265 0.2586147 0.5754945197 0.238248518 0.56556562
#> 7  0.830082966 0.75207300 0.9528881 0.0001881735 0.887881226 0.48833525
#> 8  0.187103555 0.57546133 0.8568746 0.3283282877 0.577775384 0.34722847
#> 9  0.779969688 0.94421829 0.5375502 0.0782780354 0.739950257 0.20304801
#> 10 0.193943927 0.21898736 0.5545242 0.6041721753 0.160679622 0.73995026
#> 11 0.434231178 0.47799791 0.8769438 0.3267990556 0.529571799 0.89369752
#> 12 0.002274518 0.04466879 0.8018626 0.7904525734 0.574529562 0.57452956
#> 13 0.834692139 0.16848571 0.5467373 0.8300829662 0.960086967 0.08444870
#> 14 0.553408593 0.36828087 0.2327532 0.8100510929 0.398616886 0.14825833
#> 15 0.956967818 0.56331137 0.6022678 0.2027530000 0.408468068 0.37028128
#> 16 0.948497073 0.59753202 0.7125391 0.5060450307 0.613321482 0.86117762
#> 17 0.600729976 0.73432757 0.6489036 0.5206679751 0.706590850 0.83668180
#> 18 0.261807405 0.70365677 0.1875624 0.0842680801 0.799116918 0.74054705
#> 19 0.643034251 0.73682406 0.2383929 0.2687653562 0.161331222 0.98576580
#> 20 0.526233507 0.59753202 0.5375502 0.8147157354 0.885932416 0.03208182