Home›Benchmark Queries
Voice queries reference
The 20 seed queries used to evaluate intent interpretation, latency, and visualisation correctness across LLM backends in the PRICAI 2025 study.
| # | Query | Query type | Expected output |
|---|---|---|---|
| 1 | Show the distribution of initial WBC counts for high-risk patients. | Descriptive | Interactive 3D scatter plot of initial WBC counts for high-risk patients |
| 2 | What is the average age at diagnosis for female patients? | Descriptive | Single numeric value (mean age) |
| 3 | List all patients with Pax3 translocation and survival time less than 3 years. | Filtering | A tabular list of patient and clinical attributes |
| 4 | Compare relapse rates between treatment protocol BFM 95 and Study 8. | Comparative | A tabular summary of relapse rates for BFM 95 and Study 8 |
| 5 | Why are patients in Cluster 3 exhibiting lower event-free survival? | Diagnostic | Textual AI explanation |
| 6 | What is the average survival time for patients with embryonal rhabdomyosarcoma (ERMS)? | Descriptive | A single numeric value representing the mean survival time |
| 7 | What are the top five phenotypes by frequency in the patient dataset? | Descriptive | Ranked tabular list of the five most common values in the Phenotype field |
| 8 | Display the frequency of cytogenetic abnormalities in males vs. females. | Comparative | Tabular summary of cytogenetic abnormalities by gender |
| 9 | How many patients achieved overall survival beyond five years? | Descriptive | Single numeric count value |
| 10 | Why did treatment protocol Study 8 yield better outcomes than protocol BFM 95? | Diagnostic | Textual AI explanation |
| 11 | Filter for patients diagnosed after January 1, 2000. | Filtering | Tabular list of filtered patient data |
| 12 | How many patients fall into each IRS TNM Stage category? | Descriptive | Tabular count of patients per IRS TNM Stage category |
| 13 | Compare average survival times between ARMS and ERMS patients. | Comparative | Table showing mean survival time for each group |
| 14 | List all patients who relapsed within one year of starting treatment. | Filtering | Tabular list of patient IDs and relapse dates |
| 15 | Why do higher initial blast counts fail to predict treatment response consistently? | Diagnostic | Textual AI explanation |
| 16 | What is the distribution of tumor sizes among patients with ARMS? | Descriptive | Tabular summary of tumor size frequencies for ARMS patients |
| 17 | What is the median time to relapse for medium-risk patients? | Descriptive | Single numeric value (median) |
| 18 | How does gender distribution vary across different risk stratification categories? | Comparative | Table showing gender distribution by risk category |
| 19 | What is the average age at diagnosis for male and female patients? | Descriptive | Table showing mean age at diagnosis for each gender |
| 20 | Why did patient ALL101's WBC count spike immediately after therapy initiation? | Diagnostic | Textual AI explanation |