BioGenies
  • Home
  • Team
    • BioGenies team
    • BioGenies collaborators
    • Guest researchers
    • Former BioGenies members
    • About BioGenies
  • Our projects
    • OneTick
    • AMI‑CryoML
    • AmyloGraph 2.0
    • LIMAD
    • imputomics 2.0
    • FIBREA
  • Software
  • Seminars
  • Publications
  • Conferences etc.
  • Theses and dissertations
  1. imputomics: taming missing values in metabolomics data 🧬🧩
  • Our topics
    • Amyloids
    • Liquid-liquid phase separation
    • Antimicrobial peptides
    • Missing value imputation
    • HDX-MS

../../

  • πŸ”— Try it yourself
  • 🎧 Audio summary
  • πŸ”¬ What is this about?
  • βš™οΈ The core problem
  • 🧠 What we built
    • πŸ”§ Core idea:
  • βš™οΈ Key features
    • 🧬 Massive method coverage
    • βš™οΈ Unified interface (this is HUGE)
    • πŸ“Š Built-in benchmarking
    • πŸ§ͺ Missingness simulation
    • 🌐 Web server for humans
  • πŸ“Š Key insights from the paper
    • ⚠️ Many methods are unreliable
    • ⚑ Speed β‰  quality
    • 🧬 Best method depends on data
  • πŸš€ Why this matters
  • πŸ’š BioGenies perspective

imputomics: taming missing values in metabolomics data 🧬🧩

publications
metabolomics
imputomics is a web server and R package providing a unified interface for 40+ missing value imputation algorithms, enabling robust and reproducible metabolomics data analysis.
Author

BioGenies Lab

Published

February 20, 2024

Keywords

imputomics, missing data, metabolomics, imputation, bioinformatics tools, machine learning, data preprocessing


πŸ“Œ Project highlights

  • 🧬 Integrates 41+ missing value imputation algorithms (MVIAs)
  • βš™οΈ Provides R package + web server
  • πŸ“Š Enables benchmark-based method selection
  • πŸ” Supports MCAR, MAR, MNAR missingness simulation
  • πŸš€ Improves reproducibility and usability of imputation workflows

πŸŽ‰ New paper out! This one tackles a very real problem:

πŸ‘‰ missing values quietly breaking your omics analysis πŸ˜„

πŸ‘‰ imputomics: web server and R package for missing values imputation in metabolomics data


πŸ”— Try it yourself

  • 🌐 Web server
  • πŸ’» GitHub

πŸ‘‰ Everything in one place, no dependency hell πŸ™ƒ


🎧 Audio summary

Missing values, 50+ algorithms, and no clear best choice?
Yeah… welcome to metabolomics πŸ˜„

πŸ‘‰ Here’s a short audio overview 🎧 explaining what imputomics actually solves:

Your browser does not support the audio element.

πŸ‘‰ Perfect if you want the big picture before diving into 40+ methods


πŸ”¬ What is this about?

Missing values are everywhere in metabolomics data:

  • instrument sensitivity limits
  • sample variability
  • technical noise

πŸ‘‰ and ignoring them leads to:

  • biased results
  • reduced statistical power
  • broken ML models

βš™οΈ The core problem

There are many imputation methods…

πŸ‘‰ too many πŸ˜„

  • 52 MVIAs
  • 70 implementations
  • scattered across packages

πŸ‘‰ and they come with:

  • dependency issues
  • missing documentation
  • unstable implementations

🧠 What we built

πŸ‘‰ imputomics = unified wrapper + benchmark + interface

πŸ”§ Core idea:

  • one interface
  • many methods
  • consistent behavior

βš™οΈ Key features

🧬 Massive method coverage

  • integrates 41 out of 52 MVIAs
  • includes baseline (random imputation)

πŸ‘‰ largest practical collection in one tool


βš™οΈ Unified interface (this is HUGE)

Each method is wrapped to:

  • accept consistent inputs
  • return standardized outputs
  • avoid breaking your pipeline

πŸ‘‰ no more fighting R packages πŸ˜„


πŸ“Š Built-in benchmarking

The tool helps answer:

πŸ‘‰ which method should I use?

  • evaluates performance (NRMSE)
  • compares speed
  • tracks stability

πŸ‘‰ because there is no universal best method


πŸ§ͺ Missingness simulation

Supports:

  • MCAR (random)
  • MAR (dependent)
  • MNAR (censored / LOD)

πŸ‘‰ mirrors real-world metabolomics scenarios


🌐 Web server for humans

Not everyone wants to code in R:

πŸ‘‰ imputomics provides a Shiny app:

  • visualize missingness
  • run imputation
  • compare methods

πŸ‘‰ accessible to non-programmers


πŸ“Š Key insights from the paper

⚠️ Many methods are unreliable

  • ~40% of methods fail in some scenarios
  • some don’t outperform random baseline

πŸ‘‰ yes… random can win πŸ˜„


⚑ Speed β‰  quality

  • no strong correlation between runtime and performance

πŸ‘‰ fast β‰  good


🧬 Best method depends on data

Different winners depending on missingness:

  • MCAR β†’ Random Forest
  • MAR β†’ EM-based methods
  • MNAR β†’ minimum-based approaches

πŸ‘‰ context matters


πŸš€ Why this matters

Missing value handling is:

πŸ‘‰ one of the most underestimated steps in omics

And yet it affects:

  • statistical tests
  • ML models
  • biological interpretation

πŸ‘‰ imputomics turns this into a controlled, testable step


πŸ’š BioGenies perspective

This is exactly our philosophy:

  • don’t hide preprocessing
  • don’t trust defaults
  • benchmark everything

πŸ‘‰ because bad preprocessing = bad science

 

Β© 2026 Website developed by BioGenies team.
Privacy Policy

Cookie Preferences