Are Deep Learning Algorithms Easily Hackable?

While deep learning algorithms are successfully pushing the boundaries of computer vision by breaking the records of ImageNet Challenges (ILSVRC) year after year recently [1-6], researchers also started to notice and had reported that images with imperceptible distortions [7,9] or being artificially generated and completely unrecognizable [8], can in many cases easily fool deep learning algorithms and cause them to report high confidences on utterly wrong classes.
In this project, we built a web demo in which users can explore such kind of intriguing phenomenon simultaneously on multiple powerful deep learning networks [1-3] given any images and object classes, and by answering the questionnaire, potentially help us better characterize deep learning algorithms and the way they agree or disagree with human vision. Our algorithm is improved from methods proposed in [7,10] and implemented based on MatConvNet [11] and minConf [12], which should be generally efficient. However, due to the limitation of hardware resources, the demo site is running in CPU mode and isn't able to serve too many concurrent hacking requests. Thus, we do encourage users to use their own machines with our freely downloadable source code, where GPU mode is fully supported too, if frequent requests are to be made. Setting up mirror sites for this web demo would be very much appreciated, and the information can be shared through the wiki page. Do try out the web demo before reading the following results and see if you think deep learning algorithms are easily fooled too.

Click here to continue reading.

Yes, and probably all of them.

Table 1: Original images and their classification results from algorithms [1-3].

Labeled as (0946,0946,0946) with P = (0.99,0.97,0.98)	Labeled as (0990,0990,0990) with P = (0.99,0.87,0.99)	Labeled as (0873,0873,0873) with P = (0.99,1.00,1.00)
Labeled as (0834,0914,0537) with P = (0.39,0.60,0.34)	Labeled as (0988,0988,0999) with P = (0.40,0.70,0.60)

Table 1 shows the original images used in this experiment, where we tried to hack algorithms [1-3] altogether at the same time. The 3‑tuples below each image for top-scored labels and their probabilities are shown in the order corresponding to algorithms [1-3] as well. Generally speaking, all three algorithms accurately classified given images, especially considering their "vocabularies" are limited to the 1,000 ILSVRC classes.

Table 2: Hacked images and their classification results from algorithms [1-3].

Labeled as (0001,0001,0001) with P = (0.96,0.97,1.00)	Labeled as (0201,0201,0201) with P = (0.93,0.96,0.92)	Labeled as (0401,0401,0401) with P = (0.92,0.90,0.97)
Labeled as (0601,0601,0601) with P = (1.00,1.00,1.00)	Labeled as (0801,0801,0801) with P = (0.99,1.00,1.00)

Using the proposed algorithm, we can effectively and efficiently discover distortions which are hardly perceptible yet causing given images to be misclassified. As the results shown in Table 2, all original images are successfully hacked into specified ILSVRC classes, with surprisingly high confidences. Though not directly supported in the web demo, other funny ways of hacking deep learning algorithms, e.g. making one image to be differently misclassified by different algorithms, can be achieved with our algorithm as well.

But, does this mean they are foolish?

This intriguing and almost worrisome phenomenon, though has already had some good progress in analyzing root causes [7,8,9] and deriving solutions [8,9,18], undoubtedly still comprises many unanswered puzzles. For example, whether the hacked images are completely "hallucinatory" to the deep learning algorithms (i.e. there's no way to fundamentally tell hacked and natural images apart), or just exploiting certain algorithmic shortcomings of them.

Table 3: Feature similarities between natural and hacked images.

p = (1.9e-23,1.1e-25,8.4e-44)	p = (5.1e-10,2.2e-19,2.6e-35)	p = (2.9e-09,1.6e-20,5.7e-08)
p = (2.5e-29,1.4e-43,1.9e-43)	p = (3.3e-26,2.9e-40,8.4e-44)

Table 3 shows the results of a simple experiment comparing the pairwise Euclidean distances between the penultimate-layer features of natural images (ILSVRC validation set) and hacked images. The pairwise distances between hacked images and natural images (shown as red dots), compared to pairwise distances among natural images (shown as gray dots), are distributed significantly differently (under two-sample Kolmogorov-Smirnov tests), confirming the possibility of spotting hacked images [8]. More fundamentally, this implies the way the final softmax layers are constructed can be adjusted, e.g. to reveal more dark knowledge [13] or to incorporate advanced class membership modeling [14], to counter this problem in a principled way.

Table 4: Half-hacked images and their classification results from algorithms [1-3].

Labeled as (0001,0001,0001) with P = (1.00,1.00,1.00)	Labeled as (0201,0201,0201) with P = (0.98,1.00,1.00)	Labeled as (0401,0401,0401) with P = (0.97,0.84,1.00)
Labeled as (0601,0601,0601) with P = (1.00,1.00,1.00)	Labeled as (0801,0801,0801) with P = (1.00,1.00,1.00)

The algorithm proposed in this project consists of two optimization phases, where an "unconstrained" class probability maximization phase [10] precedes an iterative distortion minimization phase. Table 4 furthermore shows the half-hacked results from the first optimization phase, in which the visual cues of corresponding object classes are apparent and strong. Compared to previous works [7-10] where only single networks were targeted at one time, jointly "hacking" multiple architecturally diverse networks actually produces relatively vivid rendering of target objects, which suggests the correct embedding of knowledge about visual objects. Similar performance advantages of deep network ensembles were reported in [19,20] as well. Additionally, the correct "placement" of visual features (e.g. the "graduation cap" on top of cameraman's head) also suggests that detailed spatial information is still preserved in the highly compressed features [15-17].

Epilogue

Being one of the main reasons why deep learning algorithms are successful, non-saturating activation function strengthens backpropagation gradients and accelerates the convergence of extremely large neural networks that can in theory embed extremely rich knowledge about the visual world. However, as also argued by Goodfellow et al. [9], such kind of enforced linearity can be the exact reason why deep learning algorithms are easy to disturb. While it's fully justifiable to develop methods to "patch" this problem [8,9,18], we also envision a full revision of the deep learning algorithms, which aims to make them (even) more consistent with human vision. By providing a platform for collecting adversarial examples (i.e. hacked images) and human responses, we hope it can fundamentally help with attaining this goal.

Citing Ostrichinator

An arXiv paper describing the algorithmic details about this project will be released soon. You can also cite this project as follows.

@misc{ostrichinator,
  author = {C.-Y. Tsai and D. Cox},
  title = {Are Deep Learning Algorithms Easily Hackable?},
  howpublished = {\url{http://coxlab.github.io/ostrichinator}},
  year = 2015
}

References

[1] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675-678. ACM, 2014.
[2] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. CoRR, abs/1405.3531, 2014.
[3] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
[5] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. OverFeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229, 2013.
[6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, abs/1409.4842, 2014.
[7] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
[8] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. CoRR, abs/1412.1897, 2014.
[9] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
[10] K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, 2013.
[11] A. Vedaldi and K. Lenc. MatConvNet – convolutional neural networks for MATLAB. CoRR, abs/1412.4564, 2014.
[12] M. W. Schmidt, E. Berg, M. P. Friedlander, and K. P. Murphy. Optimizing costly functions with simple constraints: A limited-memory projected quasi-newton algorithm. In International Conference on Artificial Intelligence and Statistics, 2009.
[13] G. E. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. In NIPS 2014 Deep Learning Workshop, 2014.
[14] W. J. Scheirer, L. P. Jain, and T. E. Boult. Probability models for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 36, November 2014.
[15] J. Long, N. Zhang, and T. Darrell. Do convnets learn correspondence? CoRR, abs/1411.1091, 2014.
[16] A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. CoRR, abs/1412.0035, 2014.
[17] A. S. Razavian, H. Azizpour, A. Maki, J. Sullivan, C. H. Ek, and S. Carlsson. Persistent evidence of local image properties in generic convnets. CoRR, abs/1411.6509, 2014.
[18] S. Gu and L. Rigazio. Towards deep neural network architectures robust to adversarial examples. CoRR, abs/1412.5068, 2014.
[19] D. L. Yamins, H. Hong, C. Cadieu, and J. J. DiCarlo. Hierarchical modular optimization of convolutional networks achieves representations similar to macaque IT and human ventral stream. In Advances in Neural Information Processing Systems, pages 3093-3101, 2013.
[20] E. Vig, M. Dorr, and D. Cox. Large-scale optimization of hierarchical features for saliency prediction in natural images. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2798-2805, June 2014.