Going Deeper with convolutions

转载请注明：

本篇论文是针对ImageNet2014的比赛，论文中的方法是比赛的第一名，包括task1分类任务和task2检测任务。本文主要关注针对计算机视觉的高效深度神经网络结构，通过改进神经网络的结构达到不增加计算资源需求的前提下提高网络的深度，从而达到提高效果的目的。

1. Main Contribution

Improve utilization of the computing resources inside the network, which is achieved by carefully crafted design and allows for increasing the depth and width of the network while keeping the computational budget constant.

Architecture decisions are based on the Hebbian principle and the intuition of multi-scale processing.

A 22 layers deep network is assessed in the competition.

2. Ralated WorkDetection task’s leading approach is Regions with Convolutional Neural Networks(R-CNN) (参考文献6)。该方法分为两步：First utilize low-level cues such as color and superpixel consistency for potential object proposals in a category-agnostic fashion.Then use CNN classifiers to identity object categories at those locations.3. Motivation and High level considerations3.1. Drawback of increasing CNN size directly:More prone to overfitting.Dramatically increase use of computational resources. (for example, if most weights end up to be close to zero, then lots of computations is wasted.)3.2. How to solve it?4. Architectural Detail

The main idean of the Inception architecture is based on finding out how an optimal local sparse structure in a convolutional vision network can be approximated and covered b readily available dense components.

如何发现最优结构呢？可以这样考虑，较低的层次对应着图像的某个区域，使用1×1的卷积核仍然对应这个区域，使用3×3的卷积核，可以得到更大的区域对应。因而设计如图1。

图 1 Inception Module, Nave version

为了降维，使用1×1的核进行降维，设计如图2。降维能够起效主要得益于embedding技术的发展，，即使较低的维度仍然可以包含很多信息。

图 2 Inception Module with dimension reductions

在Filter concatenation层将1×1/3×3/5×5的卷积结果连接起来。

如此设计的好处在于防止了层数增多带来的计算资源的爆炸性需求。从而使网络的宽度和深度均可扩大。使用了Inception层的结构可以有2-3×的加速。

5. GoogLeNet

如图3所示。更详细的结构图太大请见原论文。

图 3 GoogLeNet incarnation of Inception architecture

6. Training Methodology7. Experiments Setup and ResultsTrained 7 versions of same GoogLeNet model, performed ensembel prediction with them. These models are trained with the same initialization, but differ in sampling methodologies and the random order in which they see input images.Testing: resize the image to 4 scales where the shorter dim is 256,288,320,352. Take the left, right, center square of these resized images, then take the 4 corners and the center 224×224 crop and the square resize 224×224, and their mirror version. Namely, 4×3×6×2=144 crops per image.Softmax probabilities are averaged over multiple crops and over all individual classifiers to obtain the final prediction. Simple averaged is the best.结果如下：

图 4 performance of the competition

图 5 performance of fusions of Models

8. Reference

[1]. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[J]. arXiv preprint arXiv:1409.4842, 2014.

挫折其实就是迈向成功所应缴的学费。

相关文章：

你感兴趣的文章：

标签云：