Comprehensively understanding where the bot is safe or unsafe to use is challenging for a single team or institution. Instead, we need broad and coordinated efforts to systematically evaluate the risks and performance of chatbots. In the short term, health-care systems might consider using the bot in less risky scenarios — for instance, in clinical documentation or summaries of discharge notes, where the goal is to help with repeated tasks. But even here, trained and knowledgeable professionals still need to read and approve what the bot produces.
I’m not a fan of developing various kinds of large language models — for example, models designed specifically for different specialties or hospitals. We need to prioritize understanding and building on top of the state-of-the-art models, instead of starting anew.
We can fine-tune these models with respect to specific scenarios or ground these models to biomedical knowledge. In 2020, for instance, my colleagues and I developed a prototype AI dialogue agent that screens participants for mild cognitive impairment; more recently, we have built the integrative biomedical knowledge hub (iBKH) to consolidate existing biomedical knowledge and enable more reliable AI inference.
The potential of chatbots and AI in medicine is endless. We are on the cusp of game-changing innovations.