MIG short course: Semisynthetic Simulation for Biological Data Analysis

Event information

When: Tuesday, 11 June 2024 – 10am - 12pm AEST

Where: Room 162, Peter Hall, Building 160. 813 Swanston Street, Parkville VIC 3052

Registration: https://www.eventbrite.com.au/e/886052325357

From differential testing to data integration, simulation can inform effective biological data analysis. Indeed, simulations allow us to control the ground truth, meaning that they can be used to test complex analysis workflows. Lately, several strategies for emerged for semisynthetic data generation, where simulated data are guided by existing experimental data, rather than created from scratch. This has set the stage for more realistic, controllable simulation of omics data. This short course will offer a hands-on introduction to the effective design of omics simulation studies, divided into three sessions:

  • Fundamental Concepts: We will review the statistical concepts behind semisynthetic data generation. We will then introduce new software that can be used to estimate and apply these simulators.
  • Power Analysis and Benchmarking: We will explore how simulation can help with experimental design and methods benchmarking. Examples will be drawn from realistic differential testing and network analysis applications.
  • Integration and Refinement: We will review advanced simulation use cases, including supporting data integration across multiple cohorts and assays. We will then discuss systematic approaches for evaluating and improving simulator quality.

Familiarity with R programming and the Bioconductor package ecosystem will be helpful to follow along. Necessary statistical concepts will be introduced from scratch. Bring a laptop — we will write code together, and all notebooks will be shared on GitHub.

A person wearing glasses and a backpack

Description automatically generated

Kris Sankaran is an assistant professor in the Department of Statistics at the University of Wisconsin - Madison and a Discovery Fellow at the Wisconsin Institute for Discovery. His research revolves around interactive workflows for biological data, especially in microbiome studies. His lab’s aim is to facilitate fluid, formal, and imaginative data analysis in problems critical to human and planetary health. He completed a postdoc in AI with Yoshua Bengio at Mila and a PhD in Statistics at Stanford University under the guidance of Susan Holmes.

A figure showing a UMAP figure