The BaseJumper™ research platform for single cell multiomic analysis and interactive visualization

V. Weigman, V. Amin, I. Salas Gonzalez, T. Tate, C. Culler; BioSkryb Genomics, Durham, NC

Translating the molecular alterations, especially those from a single cell, can be a powerful tool to understand underlying mechanisms of disease and somatic mosaicism. Along with managing the vast data volume from sequencing technologies, the complexity of these changes requires a multi-dimensional approach to first compute and identify the variance found within the cells of a study and then display these effects concomitantly. The platform was designed for researchers across computational skill levels, placing more energy on the ability ask direct questions of across the multiple forms within their dataset.

We present a secure, cloud-based bioinformatics and visualization platform that can be run from any laboratory computer on a standard web browser. Samples can be pulled from across repositories: local and cloud, to create analysis projects that can be wholly contained within an institution or shared across collaborators for distributed analysis considerations. Available pipelines provided within the platform enable simple (SNV/insertion/deletion) and complex (structural/copy number/highly polymorphic) variant detection in addition to cell identification, cell state and pathway status from gene expression. Pipelines have been benchmarked across single cells of NIST standards like NA12878 achieving a ~94% accuracy and ~99% precision for variant identification.

Context for variation across cells is provided through annotation to several known repositories for genetic variation (gnomAD, ExAC, 1000 genomics) and clinical relevance (COSMIC, ClinVar). These can be directly queried for prevalence or in addition to other filtering strategies, where schema can be saved and toggled within and across research groups. This filtering across the whole genome dataset (hundreds of thousands to tens of millions of variants) occurs in just a few seconds. Leveraging cellular or sample phenotypes, different methods can be directly applied to datasets to run association studies, and important genotypes presented, without the need for pulling extra compute. Native to the platform is the ability to toggle these associative markers across transcript and genome-level of paired datasets. Output from one analysis can be used to guide visualization or narrow focus of another data type. Visualization applications can take datasets across genomic and transcriptomic results and can be manipulated based on expert interpretation of the researcher to maximize biological leverage. Compiling all of this within a browser maximizes accessibility of data so you can perform analyses anywhere inspiration takes you.