Abstract
Background: Intraoperative testing is a critical component of cochlear implant surgery. As AI tools like ChatGPT intersect with medical communication, it is important to assess their ability to handle highly specialized clinical content. Objective: To evaluate ChatGPT-4’s capacity to generate accurate and expert-level responses in a specialized surgical domain by comparing its answers on intraoperative testing in cochlear implant (CI) surgery with statements from an international expert consensus. Methods: Key questions and statements from the International Consensus on Intraoperative CI Testing were presented to GPT-4 twice. Two independent reviewers rated response similarity as high, medium, or low. Discrepancies were resolved by a third reviewer. GPT-4’s self-assessments were also collected. Results: Of 24 questions, 54.2% of responses were rated highly similar, 33.3% moderately similar, and 12.5% low similarity. GPT-4 self-rated 33.3% as highly similar and 66.7% as moderate. Response reproducibility was 79.2%. Inter-reviewer agreement was almost perfect (κ = 0.86), while agreement between reviewers and GPT-4 was moderate (κ = 0.44). Conclusion: GPT-4 demonstrates moderate alignment with expert consensus on cochlear implant testing but lacks the clinical depth required for autonomous decision-making. While it shows promise for supporting clinical communication, further refinement is needed in high-stakes surgical contexts.
| Original language | English |
|---|---|
| Article number | 195 |
| Journal | Egyptian Journal of Otolaryngology |
| Volume | 41 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - Dec 2025 |
Keywords
- ChatGPT
- Cochlear implant
- Consensus
- Intraoperative measurement
- LLM
Fingerprint
Dive into the research topics of 'How consistent are artificial intelligence responses with cochlear implant guidelines?'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver