Based on years of training and a well-tuned eye, she creates a
report indicating the type and severity of your breast cancer
along with how you might respond to certain treatments.
Finally, your oncologist reviews the report. She chooses from
hundreds of treatment options the one she thinks is right for
your tumor, a decision that could be a matter of life or death.
This story plays on repeat across the world. The cancer
diagnostic chain, while mostly good, remains a fragile process
susceptible to human error.
The explosion of
The last two decades have produced a data
explosion in healthcare. New tests and
therapies are emerging, and the patient
population is growing, underpinned by an
increased emphasis on digitization. Data
is being created faster than humans can
make sense of it, overwhelming a declining
population of diagnosticians.
While advances in machine learning have
incrementally translated artificial intelligence
(AI) from a matter of science fiction to reality,
breakthroughs in hardware and algorithmic
techniques at the beginning of this decade
sparked an explosion of possibility. The
surplus of data and deficit of adequate human
intelligence makes cancer diagnostics a
prime opportunity to leverage computational
approaches that drive objectivity and unlock
This confluence of trends is relatively new
but quickly evolving, especially given the
constraints unique to healthcare. We’re
beginning to see new applications prove
themselves in controlled academic settings,
become validated in the wild, and percolate
into medical practice across radiology,
pathology, molecular diagnostics, and more.
In 2016 and 2017, researchers held a two-part
worldwide competition to pit AI-based cancer
diagnostic systems against another. CAME-
LYON16 tested to see if the systems could
diagnose the presence of cancer cells in pathology images, called
whole slide images, and CAMELYON17 tested whether the
systems could stage the cancer and locate it on the slide image.
Participants were provided with 899 whole-slide images, some
of which were confirmed as cancerous and others as cancer-free,
for developing their algorithms. The systems were evaluated
against a test set of 500 whole-slide images from 100 patients.
Overall, the participants in the CAMELYON challenge
performed well, and the quadratic weighted Cohen’s kappa
metric ranged from 0.89 to -0.13, with the top performers
frequently in concordance with the human experts’ determination of cancer stage and location. The systems had the most
difficulty in identifying smaller metastasis, with a detection
rate below 40 percent, though that’s a difficult task even for
human experts. Post-event analysis also showed that combining multiple AI algorithms together resulted in higher kappa
metrics than any single algorithm.
For pathology, perhaps the most important link in the cancer diagnostics chain, the CAMELYON challenges were a
highly publicized event that proved models could be trained
to exceed human-only performance in diagnosis. In less than
two years, it has sparked a flurry of efforts to demonstrate
how this class of algorithms could be scaled to address re-al-world heterogeneous data. These models are already being
applied in pilot settings for disease states like melanoma or
prostate cancer, not yet as a surrogate for, but as a supplement