Vision

From three validated contributions to a clinical platform

The MediVerse programme has been built up through three peer-reviewed papers and one forthcoming evaluation suite. The IDHDB 2024 paper validated the hybrid edge-cloud framework, the JSAN 2025 paper introduced the Trie-based shared-dictionary compression, and the PRICAI 2025 paper instantiated the voice-driven immersive analytics system. Each paper closes with limitations and explicit future-work directions.

The roadmap below picks up those threads. Each direction names the paper it extends, the open research question it answers, and a candidate methodology, so it can be picked up as a master's, PhD, or post-doctoral project.

Research directions

Seven thematic areas

Open to collaborators across computer science, biomedical informatics, and clinical research.

01

Generalise across omics modalities & clinical domains

Extend the hybrid edge-cloud architecture and compression methodology - validated in the IDHDB 2024 and JSAN 2025 papers on cytometry data - to whole-genome sequencing, RNA-seq, proteomics, and metabolomics. Each modality brings its own structural challenges (mass-spec peaks, splice variants, pathway annotations). In parallel, evaluate MediVerse beyond paediatric oncology in cardiology, neurology, and epidemiology.

Builds on the IDHDB & JSAN papersModality-specific dictionariesMulti-omics integration
02

Optimise edge processing and adaptive compression

Refine the edge-side data-cleaning pipeline and Trie traversal introduced in the JSAN 2025 paper to reduce computational overhead. Investigate parallel processing, incremental dictionary updates as cohorts grow, and adaptive dictionary construction that dynamically balances compression ratio against computational cost. Investigate privacy-preserving dictionary sharing for federated research scenarios.

Extends the JSAN paperIncremental Trie updatesFederated dictionary sharing
03

Data reduction for large-scale (TB-PB) datasets

The IDHDB 2024 and JSAN 2025 papers evaluated the hybrid edge-cloud framework on cohorts in the gigabyte range. Production genomic projects routinely produce terabyte- and petabyte-scale data (whole-genome sequencing repositories, longitudinal cytometry, real-time imaging). Future research should extend the framework with streaming Trie construction over chunked input, distributed dictionary building across compute nodes, tiered lossless / near-lossless compression for archive versus working data, and format-aware compression for BAM, VCF, and FASTQ. The success criterion is a near-constant compression ratio as dataset size grows by three orders of magnitude, with sublinear memory cost on the edge node.

Extends the IDHDB & JSAN papersTB-PB scale evaluationTiered lossy / lossless modes
04

User evaluation of voice-driven VR analytics

The PRICAI 2025 paper reports strong technical performance (95% retrieval accuracy, 1100-1740 ms latency, 95% visualisation correctness), but comprehensive user studies with clinicians and biomedical researchers are essential. Future work should run controlled trials comparing voice-driven VR analytics against desktop and SQL-based tools across task completion time, error rate, learning curve, cognitive load, and satisfaction.

Extends the PRICAI paperControlled within-subjects trialsLongitudinal adoption studies
05

Immersive VR vs. 2D desktop in voice-driven environments

A specific sub-question raised by the PRICAI 2025 paper: when the input modality is held constant as voice, does immersive 3D rendering in VR meaningfully outperform a 2D desktop rendering of the same query results? Methodology: a within-subjects controlled study where the same MediVerse voice pipeline drives two render targets - the existing Quest VR client and an equivalent web/desktop 2D view - on identical paediatric leukaemia and rhabdomyosarcoma tasks. Measure recall accuracy on multi-dimensional spatial patterns, time-to-insight, motion sickness, and user preference. Outcome separates the contribution of immersion from the contribution of voice, which today are conflated in evaluation.

Extends the PRICAI paperSame voice pipeline, VR vs 2DIsolates immersion from voice
06

IVEM threshold validation across multiple systems

The IVEM metric suite (paper in preparation) ships with six metrics and threshold values derived from first principles and related literature (e.g. Scene Stability Index ≥ 0.90, Update Latency < 2.0 s). Future research must empirically validate these thresholds across diverse immersive analytics platforms - scientific visualisation, business intelligence, educational tools - and user populations to establish their generalisability and identify interaction effects between rendering, AI decision, and explanation dimensions.

Extends the IVEM paper (in prep)Cross-system metric applicationThreshold range calibration
07

Clinical deployment & collaboration infrastructure

Test the integrated framework in varied healthcare settings, from well-resourced academic centres to resource-limited community hospitals. In parallel, develop a dynamic marketplace for machine-learning models that enables researchers to submit, evaluate, and deploy models across genomics projects - extending the framework into a collaborative ecosystem rather than a single-institution tool. This direction synthesises the contributions across all four MediVerse papers into a deployable platform.

Synthesises all four papersCross-tier hospital deploymentML model marketplace
Closing thought

The journey has only begun

As biomedical data continues to grow in volume and complexity, the need for intelligent infrastructure that can manage, compress, and present information effectively will only intensify. Meeting these challenges requires not isolated technical solutions but integrated systems that address the full pipeline from data acquisition to insight generation.

The future of biomedical research lies not in requiring more researchers to become programmers, but in creating systems that understand human intent and translate it into appropriate computational actions. The goal remains unchanged: to accelerate scientific discovery and improve patient outcomes by ensuring that the insights hidden in biomedical data are accessible to those who can best use them, when and where they need them.

Get involved

Collaborate on the next phase

If you're a clinician, computer scientist, or PhD candidate interested in any of the directions above, please get in touch. The MediVerse codebase, evaluation datasets (de-identified), and IVEM metric suite are available for academic collaborations.

Email Rani Read the underlying papers Read the abstract