Open to collaborators across computer science, biomedical informatics, and clinical research.
01
Generalise across omics modalities & clinical domains
Extend the hybrid edge-cloud architecture and compression methodology - validated in
the IDHDB 2024 and
JSAN 2025 papers on cytometry data - to whole-genome
sequencing, RNA-seq, proteomics, and metabolomics. Each modality brings its own
structural challenges (mass-spec peaks, splice variants, pathway annotations). In
parallel, evaluate MediVerse beyond paediatric oncology in cardiology, neurology, and
epidemiology.
Builds on the IDHDB & JSAN papersModality-specific dictionariesMulti-omics integration
02
Optimise edge processing and adaptive compression
Refine the edge-side data-cleaning pipeline and Trie traversal introduced in the
JSAN 2025 paper to reduce computational overhead.
Investigate parallel processing, incremental dictionary updates as cohorts grow, and
adaptive dictionary construction that dynamically balances compression ratio against
computational cost. Investigate privacy-preserving dictionary sharing for federated
research scenarios.
Extends the JSAN paperIncremental Trie updatesFederated dictionary sharing
03
Data reduction for large-scale (TB-PB) datasets
The IDHDB 2024 and JSAN 2025
papers evaluated the hybrid edge-cloud framework on cohorts in the gigabyte range.
Production genomic projects routinely produce terabyte- and petabyte-scale data
(whole-genome sequencing repositories, longitudinal cytometry, real-time imaging).
Future research should extend the framework with streaming Trie construction over
chunked input, distributed dictionary building across compute nodes, tiered
lossless / near-lossless compression for archive versus working data, and
format-aware compression for BAM, VCF, and FASTQ. The success criterion is a
near-constant compression ratio as dataset size grows by three orders of magnitude,
with sublinear memory cost on the edge node.
Extends the IDHDB & JSAN papersTB-PB scale evaluationTiered lossy / lossless modes
04
User evaluation of voice-driven VR analytics
The PRICAI 2025 paper reports strong technical performance
(95% retrieval accuracy, 1100-1740 ms latency, 95% visualisation correctness), but
comprehensive user studies with clinicians and biomedical researchers are essential.
Future work should run controlled trials comparing voice-driven VR analytics against
desktop and SQL-based tools across task completion time, error rate, learning curve,
cognitive load, and satisfaction.
Extends the PRICAI paperControlled within-subjects trialsLongitudinal adoption studies
05
Immersive VR vs. 2D desktop in voice-driven environments
A specific sub-question raised by the PRICAI 2025 paper:
when the input modality is held constant as voice, does immersive 3D rendering in VR
meaningfully outperform a 2D desktop rendering of the same query results? Methodology:
a within-subjects controlled study where the same MediVerse voice pipeline drives two
render targets - the existing Quest VR client and an equivalent web/desktop 2D view -
on identical paediatric leukaemia and rhabdomyosarcoma tasks. Measure recall accuracy
on multi-dimensional spatial patterns, time-to-insight, motion sickness, and user
preference. Outcome separates the contribution of immersion from the contribution of
voice, which today are conflated in evaluation.
Extends the PRICAI paperSame voice pipeline, VR vs 2DIsolates immersion from voice
06
IVEM threshold validation across multiple systems
The IVEM metric suite (paper in preparation) ships with six metrics and threshold
values derived from first principles and related literature (e.g. Scene Stability Index
≥ 0.90, Update Latency < 2.0 s). Future research must empirically validate these
thresholds across diverse immersive analytics platforms - scientific visualisation,
business intelligence, educational tools - and user populations to establish their
generalisability and identify interaction effects between rendering, AI decision, and
explanation dimensions.
Extends the IVEM paper (in prep)Cross-system metric applicationThreshold range calibration
07
Clinical deployment & collaboration infrastructure
Test the integrated framework in varied healthcare settings, from well-resourced
academic centres to resource-limited community hospitals. In parallel, develop a
dynamic marketplace for machine-learning models that enables researchers to submit,
evaluate, and deploy models across genomics projects - extending the framework into a
collaborative ecosystem rather than a single-institution tool. This direction synthesises
the contributions across all four MediVerse papers into a deployable platform.
Synthesises all four papersCross-tier hospital deploymentML model marketplace