Emerging Applications of Artificial Intelligence in Cancer Care

FEB 01, 2022

As recently as 10 years ago, the words “machine learning” evoked thoughts of sentient robots and fears of the singularity. Now, we trust the complex processes underlying artificial intelligence (AI) with everything from navigation to movie recommendations to targeted advertising.

Can we also trust machine learning with our health care?

The integration of AI and cancer care was a popular topic in 2021, as evidenced by prominent sessions at two of last year’s AACR conferences: the 14th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved, held virtually October 6-8, 2021, and the San Antonio Breast Cancer Symposium (SABCS), held in a hybrid format December 7-10, 2021. During these sessions, experts gave an overview of how machine learning works, shared data on new applications of AI technologies, and emphasized important considerations for making algorithms equitable.


Recognizing that a diverse audience of breast cancer clinicians and researchers may have questions about the fundamentals of AI, the SABCS session “Artificial Intelligence: Beyond the Soundbites” opened with a talk titled, “Everything You Always Wanted to Know About AI But Were Afraid to Ask,” presented by Regina Barzilay, PhD, the AI faculty lead at the Jameel Clinic of the Massachusetts Institute of Technology.

Barzilay summarized supervised learning—the most common method of machine learning—with the example of a program designed to predict whether a viewer may like a certain movie. In order to learn the viewer’s preferences, the program must be trained using examples of movies the viewer does and does not like.

The AI then identifies characteristics of these movies that might influence the viewer’s opinion and simplifies these characteristics into feature vectors—a set of more specific yes-or-no questions. For instance, is the movie a comedy? Does it feature a certain actor? These characteristics can be defined by an AI trainer, or, more commonly, left for the AI to determine itself.

The combined results of these vectors can be processed in a few different ways. Barzilay described one known as linear or logistic regression, in which the vectors are plotted on a graph in a multidimensional space—similar to a more familiar two- or three-dimensional graph, but with up to thousands of axes. The AI then attempts to draw a line of best fit, where the viewer liked most movies on one side of the line and disliked most movies on the other.

artificial intelligence

Getty Images

The training process is often followed by analysis of a validation set, in which the user inputs another set of movies with known outcomes to determine how well the AI performs. Using these data, the AI can continue to learn and adjust its line accordingly. Once the AI is validated, it can run the characteristics of any movie through its algorithm to determine which side of the boundary line the movie falls on, in order to predict the likelihood the viewer will like the movie.

The independent characteristics can alternatively be mapped as a series of nodes in a network designed to mimic the human brain, in a form of machine learning called deep learning. The nodes are arranged in a series of layers, and the nodes in one layer can trigger the activation of nodes in the subsequent layer if certain conditions are met.

“Neural networks essentially consist of an input layer, an output layer, and hidden layers in between,” said Gopal Vijayaraghavan, MD, director of Breast Imaging Services at the University of Massachusetts Medical School. “The more multi-tiered the hidden layer, the more complex tasks it can perform. It can learn and self-correct along the way, very much like the human brain.”


Vijayaraghavan, whose work centers primarily around mammography, explained several ways in which AI could improve accuracy and decrease radiologists’ workload. One such example was the use of AI as an initial screen to prioritize suspicious images for radiologist review. Such initial screenings could also effectively serve as one reviewer in situations where two reviewers are warranted. When dense breast tissue precludes the clear identification of malignancies, AI also has the potential to stratify which patients should receive further imaging or a biopsy.

“Our hope is that AI will be able to reduce human errors and radiologist burnout,” Vijayaraghavan said.

But are current AI programs accurate enough to make those determinations? Vijayaraghavan and colleagues tested this using a cohort of 131 mammograms from patients with a confirmed breast cancer diagnosis, as well as 154 negative mammograms. The researchers had the mammograms read by their AI, as well as by five experienced, fellowship-trained breast radiologists, who scored each suspicious lesion with a 0-100 percent probability of malignancy.

The AI model significantly outperformed all five pathologists in its ability to identify malignant lesions. When the human and AI scores were combined, the performance was marginally (but not significantly) superior to the AI alone. However, Vijayaraghavan found the trend promising.

“Despite the model’s higher stand-alone performance, the highest performance was achieved when using a weighted combination of the human and model scores, pointing to the potential of AI plus human performance exceeding either alone,” he said.

The study also included a set of 120 negative mammograms from patients who developed a positive mammogram around a year later. In several cases, the AI identified a high probability of malignancy in lesions the trained radiologists did not catch.

Barzilay also discussed the expansion of AI into areas where humans are not proficient, such as the use of mammography for cancer risk prediction. She and her colleagues have developed an AI known as MIRAI, designed to calculate the risk of cancer development within five years of a mammogram.

In a recent study of over 128,000 mammograms from seven globally diverse institutions, MIRAI performed significantly better than the current standard of cancer risk prediction, the Tyrer-Cuzick model; the area under the curve (AUC, a measurement of accuracy that increases as the value approaches 1) for MIRAI was 0.77 as compared to 0.63 for Tyrer-Cuzick.

The use of AI for breast cancer diagnostics and risk prediction is not limited to mammograms, however. Pathologists are also seeking better ways to analyze the thousands of breast cancer histopathology slides they see annually.

“Pathology today is mostly done in an analog fashion, where you end up with a huge number of slides,” said Thomas Fuchs, DSc, dean of Artificial Intelligence and Human Health at the Mount Sinai Icahn School of Medicine, and founder of the AI startup Paige. “It’s a difficult task, and also very subjective.”

Systems capable of scanning and digitizing such slides, with sufficient resolution to identify very small lesions, were developed relatively recently. As institutions build their slide libraries, Fuchs and colleagues have used the digital images to develop an AI capable of identifying prostate cancer. During the training phase, the AI was told which slides contain cancer, but not where the cancer was located or what it looked like. This forced the AI to determine for itself which characteristics define cancer, some of which may be imperceptible through human eyes.

In a clinical study, 16 pathologists read digitized slides from over 200 institutions on their own or with the help of the Paige AI. When pathologists were guided by the AI, the false negative rate fell by 70 percent, and the false positive rate fell by 24 percent.

These results led to the first FDA approval of an AI system used to read histology slides. “It shows that the system is safe and effective for patients, it helps pathologists to arrive at the correct diagnosis, and it creates a powerful clearance for future products,” Fuchs said.


In addition to helping improve pathologists’ accuracy, pathology-trained AI systems can bear some of the workload in parts of the world with insufficient numbers of trained professionals. The session “Artificial Intelligence for the New Frontier of Cancer Health Disparities Research” at the Cancer Disparities conference shed some light on this topic.

“Many low-income countries have less than one pathologist per 1 million people,” said Johan Lundin, MD, PhD, research director for the Institute for Molecular Medicine at the University of Helsinki in Finland and a professor of Medical Technology at Karolinska Institutet in Sweden. “The lack of experts has critical implications related to cancer screening.”

Rates of cervical cancer, Lundin explained, are extremely high in areas lacking pathologists, like most of sub-Saharan Africa, where a large number of women with HIV are especially at risk. Lundin and colleagues collaborated with the Kinondo Hospital in rural Kenya to train an AI capable of detecting atypical cells from a Pap smear.

For the training cohort, the researchers digitized 350 cervical smear slides from women with HIV, which were annotated by trained professionals. In the 361-slide validation cohort, the AI identified regions of low-grade and high-grade atypia with a sensitivity of 96 to 100 percent. Its specificity was also promising—93 to 99 percent for high-grade lesions and 82 to 86 percent for low-grade lesions.


Technical challenges associated with AI can make widespread application difficult and can introduce a host of other problems. Vijayaraghavan described some of these problems, such as the fact that we often don’t know what connections and correlations an AI makes to arrive at its conclusions.

He also stressed that differences between machines, software, and patient characteristics—especially between different geographic regions—can preclude the generalization of data. Even the rotation or resizing of an image can cause the AI to misidentify its subject. Further, the more patient information is used digitally, the more crucial cybersecurity becomes.

“Questions of data ownership, patient confidentiality, and vulnerability to cybercrime are unresolved and make oversight imperative,” Vijayaraghavan said.

During the Disparities session, Amit Sethi, PhD, a professor at the Indian Institute of Technology, Bombay, echoed many of these challenges and described how he and his colleagues are working toward solutions.

When a patient is diagnosed with breast cancer, for example, their tumor is evaluated for overexpression of the growth factor receptor HER2, often using a type of staining called immunohistochemistry (IHC). While differences in staining and scanning techniques can introduce variability, Sethi and colleagues have designed an AI that can make HER2 IHC slides from one lab look like they were produced by a different lab, without compromising accuracy. This image standardization is incredibly helpful for amassing large sets of slides to train an AI—such as one that can identify HER2 mutations from a histology slide, without the need for IHC.

Sethi is also working to develop “cautious AI” technology that better understands when it cannot confidently classify something and flags it as an outlier. One system he and colleagues have created, designed to identify normal breast tissue versus invasive breast cancer, does not try to label an image of in situ cancer, which is somewhere in between.

Overall, Sethi believes it is important to continue working to overcome these challenges because of the transformative potential of AI in cancer care. “There are costs associated with not using AI, in terms of lives lost who could not be diagnosed at the right time, with the right skill,” he said.

If AI is used incorrectly, other presenters stressed, those costs could disproportionately affect individuals who are already at a disadvantage. Barzilay shared the example of an AI designed to determine which hospitalized patients would require long-term care. Given the high cost of such care, lower-income patients, including many racial and ethnic minorities, were less likely to utilize those options, even if they were clinically indicated. Therefore, the AI was less likely to recommend long-term care to minority patients, regardless of their clinical characteristics. “Whatever biases the clinician who trained the AI had are exactly the biases the model will pick up,” Barzilay said.

“We are all accountable for these biases,” said Irene Dankwa-Mullan, MD, MPH, chief health equity officer and deputy chief health officer at IBM Watson Health, during the Disparities conference. “There’s never been a more important and urgent moment to center health equity, social justice, and ethical values like transparency, trust, and fairness in our AI and machine learning.”

Dankwa-Mullan described the “five E’s of bias” that can impact AI design and function:

  • Evidence Bias: Bias in the experimental design or the way data was collected
  • Experience or Expertise Bias: Bias in the way the data is analyzed or used at the point of care
  • Exclusion Bias: Bias created when data from marginalized groups is excluded; can be direct (i.e., clinical trial design) or indirect (i.e., exclusion of study participants with incomplete medical records)
  • Environment Bias: A lack of data on how environmental factors impact the study (i.e., social determinants of health)
  • Empathy Bias: A lack of understanding of how a patient’s complete lived experience contributes to the data (i.e., systemic racism)

Dankwa-Mullan explained that each type includes both explicit and implicit forms of bias, and that it is the responsibility of every user—from the data scientist designing the algorithm to the physician applying it—to mitigate bias where possible.

“AI and machine learning have a great potential to address and promote equity, but they have the potential to worsen existing disparities if we don’t put adequate methods, approaches, and frameworks in place and make them as robust as they need to be,” Dankwa-Mullan said.