AI in Rx

Second Opinion

How can chatbots be used in medicine?

Robot-like human head with portions of the brain showing patient care.
Illustration: Zach Meyer; Portraits: Nigel Buchanan
Dr. Fei Wang

Dr. Fei Wang

Associate Professor of Population Health Sciences  
Director, Institute of Artificial Intelligence for Digital Health

Chatbots have been used in medicine for years now, but the latest version of one of them, ChatGPT, achieves a level of fluency never achieved before. Put simply, it makes conversations sound real and, in many instances, passes the Turing test, meaning it appears to think and communicate like a human.

This conversation-like ability opens the door to possibilities in medicine, from appointment scheduling to screening and therapy. Medicine, as a field, is massive, and chatbots can potentially help many of its subdisciplines. So far, however, there has been limited systematic evaluation of the potential risks that come with using bots. Chatbots can sometimes “hallucinate” and make claims that come across as convincing, but there is no ground truth to validate them. 

“We need broad and coordinated efforts to systematically evaluate the risks and performance of chatbots.”

Dr. Fei Wang

Comprehensively understanding where the bot is safe or unsafe to use is challenging for a single team or institution. Instead, we need broad and coordinated efforts to systematically evaluate the risks and performance of chatbots. In the short term, health-care systems might consider using the bot in less risky scenarios — for instance, in clinical documentation or summaries of discharge notes, where the goal is to help with repeated tasks. But even here, trained and knowledgeable professionals still need to read and approve what the bot produces.

I’m not a fan of developing various kinds of large language models — for example, models designed specifically for different specialties or hospitals. We need to prioritize understanding and building on top of the state-of-the-art models, instead of starting anew.

We can fine-tune these models with respect to specific scenarios or ground these models to biomedical knowledge. In 2020, for instance, my colleagues and I developed a prototype AI dialogue agent that screens participants for mild cognitive impairment; more recently, we have built the integrative biomedical knowledge hub (iBKH) to consolidate existing biomedical knowledge and enable more reliable AI inference.

The potential of chatbots and AI in medicine is endless. We are on the cusp of game-changing innovations.

Dr. Edward Schenck

Dr. Edward Schenck, M.S. ’18

Assistant Professor of Medicine,   
James P. Smith, M.D. Clinical Scholar

The role of clinicians has evolved significantly over the past 2,000 years. Yet central to that role is the process of taking the symptoms a human being is experiencing, in the context of their life history and who they are as an individual, and translating those symptoms into a set of processes that exist biologically and can be labeled as a disease. From there, clinicians translate the prognosis of that disease label and guide the patient through the therapeutic journey.

Right now, clinicians can turn to amazing resources and knowledge bases for help making a diagnosis. These resources are at our fingertips, and clinicians use them regularly to aid (but not replace) the diagnostic process. New and improved chatbots, however, can take symptoms and lab results and come up with a probable diagnosis with a degree of finality that surpasses existing resources and requires less synthesis from the clinician. Large language models are good at that, which means they can potentially serve as a patient’s first point of contact when they don’t yet have a diagnosis. The downside, however, is that chatbots do not reveal their thinking process or sources. They don’t link to clinical trials or allow users to dive into why a particular answer is given.

Chatbots have the potential to democratize medicine and improve outcomes in places lacking expertise. A first point of contact, such as a primary care doctor or nurse practitioner (NP), could use a bot to make diagnostic labeling more accurate. But again, if the doctor or NP lacks expertise, then how can they recognize the bot’s faulty information?

When it comes to medical training and education, the risks involved in using chatbots are even more far-reaching. Central to a clinician’s education is the mental process of connecting the dots to make an educated diagnosis. Training a generation of clinicians to rely on chatbots could stunt the development of that important skill, which begs the question: How can chatbots help clinicians — and improve patient outcomes — without displacing the diagnostic reasoning skills that remain central to the role of physicians?

Dr. He Sarina Yang

Dr. He Sarina Yang

Associate Professor of Clinical Pathology and Laboratory Medicine

Since November 2022, the release of OpenAI’s ChatGPT has generated huge excitement over the bot’s ability to provide human-like answers in response to text input in a conversational context. In March 2023, a newer GPT-4 version was rolled out. Despite widespread interest and use of the new bot among the public, ChatGPT was not extensively trained on biomedical data, and its responses were not thoroughly evaluated by medical experts.

Recently, my team carried out a pilot study to evaluate ChatGPT’s ability to answer questions frequently encountered in laboratory medicine, including questions related to basic medical and technical knowledge, the interpretation of laboratory results in a clinical context, and regulations by the Food and Drug Administration (FDA). We asked the bot a total of 65 questions and had three faculty members evaluate its responses, scoring them as “fully correct,” “partially correct,” “totally incorrect” or “irrelevant.” What did we learn? 

“GPT-4 performed better on questions related to basic medical knowledge but fell short on questions that involved interpreting complex lab results or explaining FDA regulations.”

Dr. He Sarina Yang

Of the 65 questions, 51% (or 33 questions) were answered “fully correct” by GPT-4, which performed better on questions related to basic medical knowledge but fell short on questions that involved interpreting complex lab results or explaining FDA regulations. In addition, some of its answers were not accurate or tailored to specific clinical consultation questions or the hospital setting. For example, the bot failed to distinguish alcoholic hepatitis from other types of hepatitis when given a panel of liver enzyme results. Likewise, when we asked the bot whether FDA-approved, high-sensitivity troponin point-of-care testing devices (used to detect blood levels of the protein troponin) exist in the U.S. market, the bot said “yes” and even provided examples that came across, to an unknowing eye, as accurate and convincing. In reality, these devices do not exist in the U.S. market, and the bot provided false information and examples — or what the AI industry refers to as “hallucinations.”

AI-based bots are powerful tools with the potential to provide faster responses to routine clinical laboratory questions — and help in other ways, too. But as the technology stands, clinicians cannot rely upon the information chatbots provide and must stay vigilant of their risks.

Dr. Curtis Cole

Dr. Curtis Cole, M.D. ’94

Vice President and Chief Global Information Officer, Cornell University  
Frances and John L. Loeb Associate Professor of Libraries and Information Technology  
Associate Professor of Clinical Medicine  
Associate Professor of Clinical Population Health Sciences

Across medicine, health systems already use chatbots for administrative tasks like scheduling appointments and finding physicians. Yet more intricate tasks, like refilling a prescription or answering a medical question, involve complexities that have made chatbots unsuitable — until now.

New and improved chatbots can help clinical care in a variety of ways. Bots are ideal, for instance, in situations where patients feel timid or embarrassed. Patients often want their doctors to know something they would rather tell a computer than speak out loud. In addition, they’re ideal for breaking down language barriers and can patiently and clearly communicate at the right level of detail. Chatbots can also convey empathy — that’s why some teenagers consider their bot their best friend.

Yet when it comes to matters of health, sickness, life and death, chatbots warrant concern and caution. “AI is engineering without physics,” someone once told me. Just as a child can use blocks to build a tower that stands until it topples, a clinician can use an AI-based bot that provides accurate information until — poof, out of nowhere — it fabricates a fact or figure. Chatbots give a false sense of assurance, and it’s difficult to predict when they’re going to fail. 

“Bots are ideal in situations where patients feel timid or embarrassed. In addition, they’re ideal for breaking down language barriers.”

Dr. Curtis Cole (M.D. ’94)

There’s also the doctor-patient relationship to consider. Increasingly, doctors must find ways to work efficiently, and this can decrease time with patients. Will this be another technology that boosts efficiency but harms the doctor-patient relationship? Most efficiencies result in doctors seeing more patients, instead of having more time with patients.

Chatbots also bring concerns about cognitive load. Patient intake, an area that bots could someday handle, is a prime example. Intake requires lower cognition than processes like diagnostic reasoning and critical thinking. But intake can provide a nice break from the more taxing work of providers, while strengthening the clinician-patient relationship. Do we really want chatbots to perform these services? These are the kinds of questions we must ask ourselves.

Fall 2023 Front to Back

  • From the Dean

    Message from the Dean

    New Dean Robert A. Harrington, M.D. reflects on Weill Cornell Medicine’s tripartite mission — to care, to discover and teach — and ways to deepen and advance these goals.
  • Features

    Cancer Vaccines’ Promise

    Patients are closer than ever to benefiting from a new treatment approach, thanks to strides in immunotherapy and COVID-19 vaccine technology.
  • Features

    Silent Partners

    How the brain’s less celebrated cells may drive Alzheimer’s disease and other dementias.
  • Features

    Future Forward

    Dean Robert A. Harrington, M.D., shares his vision for Weill Cornell Medicine in a wide-ranging Q&A.
  • Notable

    A New Residence for Graduate and Medical Students

    A modern new residence on the Upper East Side campus will enhance the student experience.
  • Notable

    Dateline

    Dr. Jyoti Mathad’s research could transform maternal health in under-resourced countries.
  • Notable

    Overheard

    Weill Cornell Medicine faculty members are leading the conversation about important health issues across the country and around the world.
  • Notable

    News Briefs

    Notable faculty appointments, honors, awards and more — from around campus and beyond.
  • Grand Rounds

    Playing With Heart

    A transplant serves up a new beginning.
  • Grand Rounds

    An End to Suffering in Silence

    Weill Cornell Medicine’s Center for Female Pelvic Health is committed to treating women with dignity.
  • Grand Rounds

    News Briefs

    The latest on teaching, learning and patient-centered care.
  • Grand Rounds

    3 Questions

    During the COVID-19 pandemic, Weill Cornell Medicine adapted medical education. It wasn’t the first time the institution responded to historic public health events.
  • Discovery

    Making a Male “Pill”

    A new “on-demand” method in development could offer men another choice for contraception.
  • Discovery

    Scientists Target Human Stomach Cells for Diabetes Therapy

    Stem cells from the human stomach offer a promising approach to treating diabetes.
  • Discovery

    Findings

    The latest advances in faculty research, published in the world’s leading journals.
  • Discovery

    3 Questions

    Dr. Gunisha Kaur and the team at the Weill Cornell Medicine Human Rights Impact Lab are finding ways to improve refugee health.
  • Alumni

    Profiles

    From serving vulnerable communities to forging critical connections to move research from the bench to the bedside, our alumni are making an impact.
  • Alumni

    Notes

    What’s new with you? Keep your classmates up to date on all your latest achievements with an Alumni Note.
  • Alumni

    In Memoriam

    Marking the passing of our faculty and alumni.
  • Alumni

    Moments

    Marking celebratory events in the lives of our students, including the White Coat Ceremony and receptions for new students.
  • Second Opinion

    AI in RX

    How can chatbots be used in medicine?
  • Exchange

    Diversifying Medicine

    Two physicians discuss the unique experiences of Latino men in medicine and the crucial need for diversity.
  • Muse

    Writing to Make Meaning

    Dr. Rachel Kowalsky is a pediatric emergency physician and an award-winning author.
  • Spotlight

    At the Forefront of Immunometabolism

    Dr. Ke “Dave” Xu (Ph.D. ’21) and Dr. Anjin Xianyu (Ph.D. ’20), the founders of META Pharmaceuticals, are developing treatments for autoimmune diseases.