Author: Zongwei Zhou | 周纵苇
Weibo: @MrGiovanni
Email: zongweiz@asu.edu
Talk in BMI 570 Symposium journal club
Good afternoon. My name is Zongwei Zhou. I'm a 3rd year Ph.D. student in Dr. Jianming Liang's lab. My research interest is the artificial intelligence system in computer vision, especially in medical image analysis. So, no surprise, I'd like to introduce you an impressive paper in our field, yet it has been published for more than two years.
Dermatologist-level classification of skin cancer with deep neural networks
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017 Feb;542(7639):115.
AI system has recently been shown to exceed human performance in many visual tasks, such as playing Atari games (Human-level control through deep reinforcement learning), Go (Mastering the game of Go with deep neural networks and tree search), and object recognition (Imagenet large scale visual recognition challenge), thanks to advances in computation power and enormous amount of labeled dataset. For example, in the image recognition task, one is asked to identify the object from an image. The red line in the figure denotes the average human performance; AI system did a better job than average human performance in 2016, measured by prediction error rates of unseen images.
AI system is challenging doctors in medical image analysis as well. When we invent an AI system to detect diseases from such medical images as CT, MRI, ultrasound, microscopy, etc., usually we try to compare its performance with that of medical experts. Most of the time, AI systems tend to make a bunch of silly mistakes, but in some specific tasks, it may approximate human experts accuracy and even more. IEEE Spectrum, for example, proposed a survey on the performance between AI system and doctors since May 2015. It depicts a big picture of the recent progress of AI research in medical image analysis. Today, the paper that I will present covers one of the diseases reported in this survey -- skin cancer.
This paper states that the deep convolutional neural networks (CNN) achieve superior performance with all tested experts in both common skin cancer and deadliest skin cancer identification tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to human experts. Such a statement, AI tech outperforms human, is usually occurred as the compelling headline in nowadays technical news because it's always an eye-catching topic; yet how to convincingly verify the study design and comparison result remains a challenging problem. In doing so, this study is perhaps among the first to provide a substantial exemplary to convey comprehensive controlled experiments between the deep network and human experts in different perspectives.
A persistent question: toward which types of disease identification, computer-aid diagnosis (CAD) may contribute the most? (1) Time, those disease detection is not difficult but tedious and time-consuming, that not enough number of doctors can cover all the clinical needs on time. (2) Accuracy, those disease detection may be very challenging even for human experts. AI system is trained to identify patterns that are not obvious to humans. But in summary, those diseases should be at least the common disease for now.
There are 5.4 million new cases of skin cancer in the United States every year. One in five Americans will be diagnosed with a cutaneous malignancy in their lifetime. Early detection is critical, as the estimated 5-year survival rate for melanoma drops from over 99% if detected in its earliest stages to about 14% if detected in its latest stages. Thereby, skin cancer is one of the most suitable applications where AI system may contribute its power.
Controlled experiments study design is used in this paper, to compare AI system with human experts. Twenty-one board-certified dermatologists are selected to represent the average performance of human experts. In a technical perspective, nine-fold cross-validation was performed to give robust and statistically meaningful results, and both the AI system and human experts are presented in two tasks: common skin cancer and deadliest skin cancer identification.
They have collected a large clinical dataset for results verification, consisting of a hundred thousand clinical images with two thousand different diseases. AI system is data hungry, requiring a huge labeled dataset. The existing body of AI system uses small datasets of typically less than a thousand images of skin lesions, which, as a result, do not generalize well to new images. So, this study is potential to lead to a statistically meaningful conclusion and has more chance to get FDA approval.
The algorithm behind the AI system in this paper is a deep convolutional neural network, which is the key breakthrough in the field of computer vision for now. One question I found in the discussion section is why use the neural network, rather than other conventional machine learning algorithms? The answer is deep neural network is more powerful when a large amount of labeled data is in hand. Here, the x-axis is the number of data, the y-axis is the potential performance that these algorithms can achieve. Decades ago, as we don't have so many gold standards labeled associate with images and so powerful computation resource, traditional machine learning such as SVM, random forests, Adaboost were very useful, but their performance is upper bounded. The deep network can automatically learn the representation directly from the input image. Here deeper means the length of the sequence of layers.
The deep network appears like a tiramisu; input image goes through a sequence of layers, those layers are regarded as image feature generator. When extracting higher-level representation from the original image, it becomes easy to predict the final class of the input image.
Here display the overall performance between the AI system and 21 human experts. X and Y axes denote sensitivity and specificity. The larger, the better. Each red point represents one human expert performance in terms of sensitivity and specificity, and the green point represents their average performance and standard deviation. The prediction of the network is the probability of the input image belonging to one specific disease.
The deep network intends to model the human brain, that is designed to recognize patterns. The huge successes of deep networks are usually applied in a black box manner because of their nested non-linear structure---no visual information is provided about what exactly makes them arrive at their predictions. To develop more transparent behavior in diseases identification, they visualize and interpret the class activation maps, which reflect which regions contribute the most when the deep network makes its predictions.
Strength
- Develop a very powerful algorithm that learns from the image directly
- Visualize the salience map that may be helpful for medical education
Limitation
- Annotating a taxonomy of skin sub-class demands costly, specialty-oriented knowledge and skills.
For skin cancer, thanks to the research group in Stanford, a large number of the labeled database has been organized. Unfortunately, one should redo this process again and again when applying this technic to other diseases.
This paper demonstrates the effectiveness of deep learning in dermatology. A deep convolutional neural network, trained on general skin lesion classification dataset, matches the performance of at least 21 board-certified dermatologists. This fast, scalable method is deployable on mobile devices and holds the potential for substantial clinical impact.
I thank you!