Commit 03671cc2 authored by Anh Nguyen's avatar Anh Nguyen

Added scripts for downloading proper patches of Caffe

parent a9c92b63
## General
# Compiled Object files
*.slo
*.lo
*.o
*.cuo
*.png
*.jpg
*.jpeg
# Compiled Dynamic libraries
*.so
*.dylib
# Compiled Static libraries
*.lai
*.la
*.a
# Compiled protocol buffers
*.pb.h
*.pb.cc
*_pb2.py
# Compiled python
*.pyc
# Compiled MATLAB
*.mex*
# build, distribute, and bins
build
.build_debug/*
.build_release/*
distribute/*
*.testbin
*.bin
python/caffe/proto/
# Editor temporaries
*.swp
*~
# IPython notebook checkpoints
.ipynb_checkpoints
## Caffe
# User's build configuration
#Makefile.config
# Data and examples are either
# 1. reference, and not casually committed
# 2. custom, and live on their own unless they're deliberated contributed
data/*
examples/*
# Generated documentation
docs/_site
_site
# Sublime Text settings
*.sublime-workspace
*.sublime-project
# Contributors
Caffe is developed by a core set of BVLC members and the open-source community.
We thank all of our [contributors](https://github.com/BVLC/caffe/graphs/contributors)!
**For the detailed history of contributions** of a given file, try
git blame file
to see line-by-line credits and
git log --follow file
to see the change log even across renames and rewrites.
Please refer to the [acknowledgements](http://caffe.berkeleyvision.org/#acknowledgements) on the Caffe site for further details.
# Installation
See http://caffe.berkeleyvision.org/installation.html for the latest
installation instructions.
Check the issue tracker in case you need help:
https://github.com/BVLC/caffe/issues
Copyright (c) 2014, The Regents of the University of California (Regents)
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This diff is collapsed.
## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!
# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# CUDA architecture setting: going with all of them.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \
-gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35
# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := atlas
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
# BLAS_INCLUDE := /path/to/your/blas
BLAS_INCLUDE := /usr/include/atlas
# BLAS_LIB := /path/to/your/blas
BLAS_LIB := /usr/lib/atlas-base
# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app
# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
PYTHON_INCLUDE := /usr/local/include/python2.7 \
/usr/include/python2.7 \
/usr/local/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# PYTHON_INCLUDE := $(HOME)/anaconda/include \
# $(HOME)/anaconda/include/python2.7 \
# $(HOME)/anaconda/lib/python2.7/site-packages/numpy/core/include
# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := /usr/local/lib
# PYTHON_LIB := $(HOME)/anaconda/lib
# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
BUILD_DIR := build
DISTRIBUTE_DIR := distribute
# Uncomment for debugging.
# DEBUG := 1
# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0
## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!
# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr
# CUDA architecture setting: going with all of them.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \
-gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35
# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := atlas
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
# BLAS_INCLUDE := /path/to/your/blas
# BLAS_LIB := /path/to/your/blas
# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app
# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
PYTHON_INCLUDE := /usr/local/include/python2.7 \
/usr/local/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# PYTHON_INCLUDE := $(HOME)/anaconda/include \
# $(HOME)/anaconda/include/python2.7 \
# $(HOME)/anaconda/lib/python2.7/site-packages/numpy/core/include
# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := /usr/local/lib
# PYTHON_LIB := $(HOME)/anaconda/lib
# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
BUILD_DIR := build
DISTRIBUTE_DIR := distribute
# Uncomment for debugging.
# DEBUG := 1
# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0
[Caffe: Convolutional Architecture for Fast Feature Extraction](http://caffe.berkeleyvision.org)
Created by [Yangqing Jia](http://daggerfs.com), UC Berkeley EECS department.
In active development by the Berkeley Vision and Learning Center ([BVLC](http://bvlc.eecs.berkeley.edu/)).
## Introduction
Caffe aims to provide computer vision scientists with a **clean, modifiable
implementation** of state-of-the-art deep learning algorithms. Network structure
is easily specified in separate config files, with no mess of hard-coded
parameters in the code. Python and Matlab wrappers are provided.
At the same time, Caffe fits industry needs, with blazing fast C++/Cuda code for
GPU computation. Caffe is currently the fastest GPU CNN implementation publicly
available, and is able to process more than **40 million images per day** on a
single NVIDIA K40 GPU (or 20 million per day on a K20)\*.
Caffe also provides **seamless switching between CPU and GPU**, which allows one
to train models with fast GPUs and then deploy them on non-GPU clusters with one
line of code: `Caffe::set_mode(Caffe::CPU)`.
Even in CPU mode, computing predictions on an image takes only 20 ms when images
are processed in batch mode.
* [Caffe introductory presentation](https://www.dropbox.com/s/10fx16yp5etb8dv/caffe-presentation.pdf)
* [Installation instructions](http://caffe.berkeleyvision.org/installation.html)
\* When measured with the [SuperVision](http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf) model that won the ImageNet Large Scale Visual Recognition Challenge 2012.
## License
Caffe is BSD 2-Clause licensed (refer to the
[LICENSE](http://caffe.berkeleyvision.org/license.html) for details).
The pretrained models published by the BVLC, such as the
[Caffe reference ImageNet model](https://www.dropbox.com/s/n3jups0gr7uj0dv/caffe_reference_imagenet_model)
are licensed for academic research / non-commercial use only. However, Caffe is
a full toolkit for model training, so start brewing your own Caffe model today!
## Citing Caffe
Please kindly cite Caffe in your publications if it helps your research:
@misc{Jia13caffe,
Author = {Yangqing Jia},
Title = { {Caffe}: An Open Source Convolutional Architecture for Fast Feature Embedding},
Year = {2013},
Howpublished = {\url{http://caffe.berkeleyvision.org/}}
}
## Documentation
Tutorials and general documentation are written in Markdown format in the `docs/` folder.
While the format is quite easy to read directly, you may prefer to view the whole thing as a website.
To do so, simply run `jekyll serve -s docs` and view the documentation website at `http://0.0.0.0:4000` (to get [jekyll](http://jekyllrb.com/), you must have ruby and do `gem install jekyll`).
We strive to provide provide lots of usage examples, and to document all code in docstrings.
We'd appreciate your contribution to this effort!
## Development
Caffe is developed with active participation of the community by the [Berkeley Vision and Learning Center](http://bvlc.eecs.berkeley.edu/).
We welcome all contributions!
### The release cycle
- The `dev` branch is for new development, including community contributions. We aim to keep it in a functional state, but large changes may occur and things may get broken every now and then. Use this if you want the "bleeding edge".
- The `master` branch is handled by BVLC, which will integrate changes from `dev` on a roughly monthly schedule, giving it a release tag. Use this if you want more stability.
### Setting priorities
- Make GitHub Issues for bugs, features you'd like to see, questions, etc.
- Development work is guided by [milestones](https://github.com/BVLC/caffe/issues?milestone=1), which are sets of issues selected for concurrent release (integration from `dev` to `master`).
- Please note that since the core developers are largely researchers, we may work on a feature in isolation from the open-source community for some time before releasing it, so as to claim honest academic contribution. We do release it as soon as a reasonable technical report may be written about the work, and we still aim to inform the community of ongoing development through Issues.
### Contibuting
- Do new development in [feature branches](https://www.atlassian.com/git/workflows#!workflow-feature-branch) with descriptive names.
- Bring your work up-to-date by [rebasing](http://git-scm.com/book/en/Git-Branching-Rebasing) onto the latest `dev`. (Polish your changes by [interactive rebase](https://help.github.com/articles/interactive-rebase), if you'd like.)
- [Pull request](https://help.github.com/articles/using-pull-requests) your contribution to BVLC/caffe's `dev` branch for discussion and review.
* PRs should live fast, die young, and leave a beautiful merge. Pull request sooner than later so that discussion can guide development.
* Code must be accompanied by documentation and tests at all times.
* Only fast-forward merges will be accepted.
See our [development guidelines](http://caffe.berkeleyvision.org/development.html) for further details–the more closely these are followed, the sooner your work will be merged.
#### [Shelhamer's](https://github.com/shelhamer) “life of a branch in four acts”
Make the `feature` branch off of the latest `bvlc/dev`
```
git checkout dev
git pull upstream dev
git checkout -b feature
# do your work, make commits
```
Prepare to merge by rebasing your branch on the latest `bvlc/dev`
```
# make sure dev is fresh
git checkout dev
git pull upstream dev
# rebase your branch on the tip of dev
git checkout feature
git rebase dev
```
Push your branch to pull request it into `dev`
```
git push origin feature
# ...make pull request to dev...
```
Now make a pull request! You can do this from the command line (`git pull-request -b dev`) if you install [hub](https://github.com/github/hub).
The pull request of `feature` into `dev` will be a clean merge. Applause.
### General
This directory contains the necessary code to reproduce the gradient
ascent images in the paper: Figures 13, S3, S4, S5, S6, and S7. This
is research code, and so it may contain paths and such that are
particular to our setup that will need to be changed for your own
setup.
**Important note: this code requires the slightly modified version of caffe in this repository's [ascent](https://github.com/Evolving-AI-Lab/fooling/tree/ascent) branch. If you try running on master, you'll get an error about `backward_from_layer`.** See the below steps for using the correct branch.
If you find any bugs, please submit a PR!
If you have any trouble getting the code to work, please get in touch, and we'll help where we can.
### Notes on running the gradient ascent code
* The gist of the gradient ascent code (along with a lot of
experimental bookkeeping) is in the
[find_image function in find_fooling_image.py](https://github.com/Evolving-AI-Lab/fooling/blob/master/caffe/ascent/find_fooling_image.py#L68-L274).
* If you happen to be working in a
cluster environment that uses ```qsub```, you may find the shell scripts
useful; otherwise they probably won't help you much.
* If you don't have a trained net around, you can download the trained model we used here: http://yosinski.cs.cornell.edu/yos_140311__caffenet_iter_450000
* A file containing class labels is also used by the script and can be downloaded here: http://s.yosinski.com/synset_words.txt
### Simple steps to generate one fooling image
We'll walk through exact steps to generate a fooling image of a lion (class 291) using gradient ascent on the output unit for lion.
First, clone the repo and checkout the ascent branch:
[~] $ git clone git@github.com:Evolving-AI-Lab/fooling.git
[~] $ cd fooling
[~/fooling] $ git checkout ascent
[~/fooling] $ cd caffe
Configure and compile caffe. See [installation instructions](http://caffe.berkeleyvision.org/installation.html). Make sure to compile the python bindings too:
[~/fooling/caffe] $ make -j && make -j pycaffe
Once Caffe is built, continue by fetching some auxiliary data (synsets.txt and a pre-trained model):
[~/fooling/caffe] $ cd data/ilsvrc12
[~/fooling/caffe/data/ilsvrc12] $ ./get_ilsvrc_aux.sh
[~/fooling/caffe/data/ilsvrc12] $ cd ../../ascent
[~/fooling/caffe/ascent] $ wget 'http://yosinski.cs.cornell.edu/yos_140311__caffenet_iter_450000'
Now we're ready to run the optimization. To find a quick fooling image for the Lion class (idx 291) using only 3 gradient steps, run the following:
[~/fooling/caffe/ascent] $ ./find_fooling_image.py --push_idx 291 --N 3
...
0 Push idx: 291, val: 0.00209935 (n02129165 lion, king of beasts, Panthera leo)
Max idx: 815, val: 0.0114864 (n04275548 spider web, spider's web)
...
1 Push idx: 291, val: 0.00962483 (n02129165 lion, king of beasts, Panthera leo)
Max idx: 330, val: 0.0224016 (n02325366 wood rabbit, cottontail, cottontail rabbit)
...
2 Push idx: 291, val: 0.0518007 (n02129165 lion, king of beasts, Panthera leo)
Max idx: 291, val: 0.0518007 (n02129165 lion, king of beasts, Panthera leo)
...
Result: majority success
name: "CaffeNet"
input: "data"
input_dim: 1
input_dim: 3
input_dim: 227
input_dim: 227
force_backward: true
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}
layers {
name: "pool1"
type: POOLING
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm1"
type: LRN
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
}
}
layers {
name: "relu2"
type: RELU
bottom: "conv2"
top: "conv2"
}
layers {
name: "pool2"
type: POOLING
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm2"
type: LRN
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
}
}
layers {
name: "relu3"
type: RELU
bottom: "conv3"
top: "conv3"
}
layers {
name: "conv4"
type: CONVOLUTION
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
}
}
layers {
name: "relu4"
type: RELU
bottom: "conv4"
top: "conv4"
}
layers {
name: "conv5"
type: CONVOLUTION
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
}
}
layers {
name: "relu5"
type: RELU
bottom: "conv5"
top: "conv5"
}
layers {
name: "pool5"
type: POOLING
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "fc6"
type: INNER_PRODUCT
bottom: "pool5"
top: "fc6"
inner_product_param {
num_output: 4096
}
}
layers {
name: "relu6"
type: RELU
bottom: "fc6"
top: "fc6"
}
layers {
name: "drop6"
type: DROPOUT
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc7"
type: INNER_PRODUCT
bottom: "fc6"
top: "fc7"
inner_product_param {
num_output: 4096
}
}
layers {
name: "relu7"
type: RELU
bottom: "fc7"
top: "fc7"
}
layers {
name: "drop7"
type: DROPOUT
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc8"
type: INNER_PRODUCT
bottom: "fc7"
top: "fc8"
inner_product_param {
num_output: 1000
}
}
layers {
name: "prob"
type: SOFTMAX
bottom: "fc8"
top: "prob"
}
#! /usr/bin/env python
from pylab import *
import os
import argparse
import ipdb as pdb
from find_fooling_image import load_net_mean, find_image
def rchoose(choices, prob=None):
if prob is None:
prob = ones(len(choices))
prob = array(prob, dtype='float')
return np.random.choice(choices, p=prob/prob.sum())
def main():
parser = argparse.ArgumentParser(description='Hyperparam search')
parser.add_argument('--result_prefix', type = str, default = './junk')
parser.add_argument('--hp_seed', type = int, default = 0)
parser.add_argument('--start_seed', type = int, default = 0)
parser.add_argument('--push_idx', type = int, default = 278)
parser.add_argument('--layer', type = str, default = 'prob', choices = ('fc8', 'prob'))
parser.add_argument('--startat', type = int, default = 0, choices = (0, 1))
args = parser.parse_args()
push_idx = args.push_idx
small_val_percentile = 0
start_at = 'mean_plus' if args.startat == 0 else 'randu'
if args.hp_seed == -1:
# Special hp_seed of -1 to do gradient descent without any regularization
decay = 0
N = 500
early_prog = .02
late_prog_mult = .1
blur_radius = 0
blur_every = 1
small_norm_percentile = 0
px_benefit_percentile = 0
px_abs_benefit_percentile = 0
else:
np.random.seed(args.hp_seed)
# Choose hyperparameter values given this seed
decay = rchoose((0, .0001, .001, .01, .1, .2, .3),
(4, 1, 1, 2, 1, 1, 1))
N = rchoose((250, 500, 750, 1000, 1500))
early_prog = rchoose(
(.02, .03, .04),
(1, 2, 1))
late_prog_mult = rchoose((.02, .05, .1, .2))
blur_radius = rchoose(
(0, .3, .4, .5, 1.0),
(10, 2, 1, 1, 1))
blur_every = rchoose((1, 2, 3, 4))
small_norm_percentile = rchoose(
(0, 10, 20, 30, 50, 80, 90),
(10, 10, 5, 2, 2, 2, 2))
px_benefit_percentile = rchoose(
(0, 10, 20, 30, 50, 80, 90),
(20, 10, 5, 2, 2, 2, 2))
px_abs_benefit_percentile = rchoose(
(0, 10, 20, 30, 50, 80, 90),
(10, 10, 5, 2, 2, 2, 2))
prefix = args.result_prefix
print 'prefix is', prefix
net, mnirgb, mn4d, labels = load_net_mean()
find_image(net, mnirgb, mn4d, labels,
decay = decay,
N = N,
rseed = args.start_seed,
push_idx = push_idx,
start_at = start_at,
prefix = prefix,
lr_policy = 'progress',
lr_params = {'max_lr': 1e7,
'early_prog': early_prog,
'late_prog_mult': late_prog_mult},
blur_radius = blur_radius,
blur_every = blur_every,
small_val_percentile = small_val_percentile,
small_norm_percentile = small_norm_percentile,
px_benefit_percentile = px_benefit_percentile,
px_abs_benefit_percentile = px_abs_benefit_percentile,
)
if __name__ == '__main__':
main()
#! /usr/bin/env python
from pylab import *
def figsize(width,height):
rcParams['figure.figsize'] = (width,height)
def norm01(arr):
arr = arr.copy()
arr -= arr.min()
arr /= arr.max()
return arr
def norm01c(arr, center):
'''Maps the center value to .5'''
arr = arr.copy()
arr -= center
arr /= max(2 * arr.max(), -2 * arr.min())
arr += .5
assert arr.min() >= 0
assert arr.max() <= 1
return arr
def showimage(im, c01=False, bgr=False):
if c01:
# switch order from c,0,1 -> 0,1,c
im = im.transpose((1,2,0))
if im.ndim == 3 and bgr:
# Change from BGR -> RGB
im = im[:, :, ::-1]
plt.imshow(im)
#axis('tight')
def showimagesc(im, c01=False, bgr=False):
showimage(norm01(im), c01=c01, bgr=bgr)
def saveimage(filename, im):
matplotlib.image.imsave(filename, im)
def saveimagesc(filename, im):
saveimage(filename, norm01(im))
def saveimagescc(filename, im, center):
saveimage(filename, norm01c(im, center))
def tile_images(data, padsize=1, padval=0, c01=False, width=None):
'''take an array of shape (n, height, width) or (n, height, width, channels)
and visualize each (height, width) thing in a grid. If width = None, produce
a square image of size approx. sqrt(n) by sqrt(n), else calculate height.'''
data = data.copy()
if c01:
# Convert c01 -> 01c
data = data.transpose(0, 2, 3, 1)
data -= data.min()
data /= data.max()
# force the number of filters to be square
if width == None:
width = int(np.ceil(np.sqrt(data.shape[0])))
height = width
else:
assert isinstance(width, int)
height = int(np.ceil(float(data.shape[0]) / width))
padding = ((0, width*height - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))
# tile the filters into an image
data = data.reshape((height, width) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
data = data.reshape((height * data.shape[1], width * data.shape[3]) + data.shape[4:])
data = data[0:-padsize, 0:-padsize] # remove excess padding
return data
def vis_square(data, padsize=1, padval=0, c01=False):
data = tile_images(data, padsize, padval, c01)
showimage(data, c01=False)
def shownet(net):
'''Print some stats about a net and its activations'''
print '%-41s%-31s%s' % ('', 'acts', 'act diffs')
print '%-45s%-31s%s' % ('', 'params', 'param diffs')
for k, v in net.blobs.items():
if k in net.params:
params = net.params[k]
for pp, blob in enumerate(params):
if pp == 0:
print ' ', 'P: %-5s'%k,
else:
print ' ' * 11,
print '%-32s' % repr(blob.data.shape),
print '%-30s' % ('(%g, %g)' % (blob.data.min(), blob.data.max())),
print '(%g, %g)' % (blob.diff.min(), blob.diff.max())
print '%-5s'%k, '%-34s' % repr(v.data.shape),
print '%-30s' % ('(%g, %g)' % (v.data.min(), v.data.max())),
print '(%g, %g)' % (v.diff.min(), v.diff.max())
#! /bin/bash
echo "just for reference"
exit 0
for idx in 0 1 2 3 4; do ./find_fooling_image.py --push_idx $idx --N 1500 --decay .03 --lr .001 --prefix 'result_idx3/idx_%(push_idx)03d_decay_%(decay).03f_lr_%(lr).03f_'; done
for idx in 0 1 2 3 4; do ./find_fooling_image.py --push_idx $idx --N 1500 --decay .00 --lr .001 --prefix 'result_idx3/idx_%(push_idx)03d_decay_%(decay).03f_lr_%(lr).03f_'; done
#! /bin/bash -x
thisscript=$(readlink -f $0)
scriptdir=`dirname $thisscript`
for hp_seed in -1 169 188 360; do
#for push_idx in 278 543 251 99 906 805; do
for push_idx in 200 207 215 279 366 367 390 414 445 500 509 580 643 657 704 713 782 805 826 906; do
for start_seed in `seq 0 4`; do
startat=0
seed_dir=`printf "seed_%04d" $hp_seed`
result_dir="$scriptdir/results/supplementary_imgs/$seed_dir"
mkdir -p $result_dir
run_str=`printf 's%04d_idx%03d_sa%d_ss%02d' $hp_seed $push_idx $startat $start_seed`
jobname="job_${run_str}"
script="$result_dir/run_${run_str}.sh"
result_prefix="$result_dir/$run_str"
echo "#! /bin/bash" > $script
echo "cd $scriptdir" >> $script
echo "./hyperparam_search.py --result_prefix $result_prefix --hp_seed $hp_seed --push_idx $push_idx --start_seed $start_seed --startat $startat 2>&1" >> $script
chmod +x $script
qsub -N "$jobname" -A ACCOUNT_NAME -l nodes=1:ppn=2 -l walltime="1:00:00" -d "$result_dir" $script
done
done
done
#! /bin/bash -x
thisscript=$(readlink -f $0)
scriptdir=`dirname $thisscript`
for hp_seed in `seq 101 399`; do
#for hp_seed in 0; do
for push_idx in 278 543 251 99 906 805; do
#for push_idx in 278; do
startat=0
start_seed=0
seed_dir=`printf "seed_%04d" $hp_seed`
result_dir="$scriptdir/results/$seed_dir"
mkdir -p $result_dir
run_str=`printf 's%04d_idx%03d_sa%d_ss%02d' $hp_seed $push_idx $start_at $start_seed`
jobname="job_${run_str}"
script="$result_dir/run_${run_str}.sh"
result_prefix="$result_dir/$run_str"
echo "#! /bin/bash" > $script
echo "cd $scriptdir" >> $script
echo "./hyperparam_search.py --result_prefix $result_prefix --hp_seed $hp_seed --push_idx $push_idx --start_seed $start_seed --startat $startat 2>&1" >> $script
chmod +x $script
qsub -N "$jobname" -A ACCOUNT_NAME -l nodes=1:ppn=2 -l walltime="1:00:00" -d "$result_dir" $script
#sleep 1
done
done
Bourne Shell
filter remove_matches ^\s*#
filter remove_inline #.*$
extension sh
script_exe sh
C
filter remove_matches ^\s*//
filter call_regexp_common C
filter remove_inline //.*$
extension c
extension ec
extension pgc
C++
filter remove_matches ^\s*//
filter remove_inline //.*$
filter call_regexp_common C
extension C
extension cc
extension cpp
extension cxx
extension pcc
C/C++ Header
filter remove_matches ^\s*//
filter call_regexp_common C
filter remove_inline //.*$
extension H
extension h
extension hh
extension hpp
Cuda
filter remove_matches ^\s*//
filter remove_inline //.*$
filter call_regexp_common C
extension cu
Python
filter remove_matches ^\s*#
filter docstring_to_C
filter call_regexp_common C
filter remove_inline #.*$
extension py
make
filter remove_matches ^\s*#
filter remove_inline #.*$
extension Gnumakefile
extension Makefile
extension am
extension gnumakefile
extension makefile
filename Gnumakefile
filename Makefile
filename gnumakefile
filename makefile
script_exe make
\ No newline at end of file
To generate stuff you can paste in an .md page from an IPython notebook, run
ipython nbconvert --to markdown <notebook_file>
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<title>Caffe</title>
<link rel="stylesheet" href="stylesheets/reset.css">
<link rel="stylesheet" href="stylesheets/styles.css">
<link rel="stylesheet" href="stylesheets/pygment_trac.css">
<script src="javascripts/scale.fix.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-46255508-1', 'daggerfs.com');
ga('send', 'pageview');
</script>
<div class="wrapper">
<header>
<h1 class="header"><a href="index.html">Caffe</a></h1>
<p class="header">Convolutional Architecture for Fast Feature Embedding</p>
<ul>
<!--<li class="download"><a class="buttons" href="https://github.com/BVLC/caffe/zipball/master">Download ZIP</a></li>
<li class="download"><a class="buttons" href="https://github.com/BVLC/caffe/tarball/master">Download TAR</a></li>-->
<li><a class="buttons github" href="https://github.com/BVLC/caffe">View On GitHub</a></li>
</ul>
<p class="header">Maintained by<br><a class="header name" href="http://bvlc.eecs.berkeley.edu/">BVLC</a></p>
<p class="header">Created by<br><a class="header name" href="http://daggerfs.com/">Yangqing Jia</a></p>
</header>
<section>
{{ content }}
</section>
<footer>
<p><small>Hosted on <a href="http://pages.github.com">GitHub Pages</a>.</small></p>
</footer>
</div>
<!--[if !IE]><script>fixScale(document);</script><![endif]-->
</body>
</html>
---
layout: default
title: Caffe
---
Alex's CIFAR-10 tutorial, Caffe style
=====================================
Alex Krizhevsky's [cuda-convnet](https://code.google.com/p/cuda-convnet/) details the model definitions, parameters, and training procedure for good performance on CIFAR-10. This example reproduces his results in Caffe.
We will assume that you have Caffe successfully compiled. If not, please refer to the [Installation page](installation.html). In this tutorial, we will assume that your caffe installation is located at `CAFFE_ROOT`.
We thank @chyojn for the pull request that defined the model schemas and solver configurations.
*This example is a work-in-progress. It would be nice to further explain details of the network and training choices and benchmark the full training.*
Prepare the Dataset
-------------------
You will first need to download and convert the data format from the [CIFAR-10 website](http://www.cs.toronto.edu/~kriz/cifar.html). To do this, simply run the following commands:
cd $CAFFE_ROOT/data/cifar10
./get_cifar10.sh
cd $CAFFE_ROOT/examples/cifar10
./create_cifar10.sh
If it complains that `wget` or `gunzip` are not installed, you need to install them respectively. After running the script there should be the dataset, `./cifar10-leveldb`, and the data set image mean `./mean.binaryproto`.
The Model
---------
The CIFAR-10 model is a CNN that composes layers of convolution, pooling, rectified linear unit (ReLU) nonlinearities, and local contrast normalization with a linear classifier on top of it all. We have defined the model in the `CAFFE_ROOT/examples/cifar10` directory's `cifar10_quick_train.prototxt`.
Training and Testing the "Quick" Model
--------------------------------------
Training the model is simple after you have written the network definition protobuf and solver protobuf files. Simply run `train_quick.sh`, or the following command directly:
cd $CAFFE_ROOT/examples/cifar10
./train_quick.sh
`train_quick.sh` is a simple script, so have a look inside. `GLOG_logtostderr=1` is the google logging flag that prints all the logging messages directly to stderr. The main tool for training is `train_net.bin`, with the solver protobuf text file as its argument.
When you run the code, you will see a lot of messages flying by like this:
I0317 21:52:48.945710 2008298256 net.cpp:74] Creating Layer conv1
I0317 21:52:48.945716 2008298256 net.cpp:84] conv1 <- data
I0317 21:52:48.945725 2008298256 net.cpp:110] conv1 -> conv1
I0317 21:52:49.298691 2008298256 net.cpp:125] Top shape: 100 32 32 32 (3276800)
I0317 21:52:49.298719 2008298256 net.cpp:151] conv1 needs backward computation.
These messages tell you the details about each layer, its connections and its output shape, which may be helpful in debugging. After the initialization, the training will start:
I0317 21:52:49.309370 2008298256 net.cpp:166] Network initialization done.
I0317 21:52:49.309376 2008298256 net.cpp:167] Memory required for Data 23790808
I0317 21:52:49.309422 2008298256 solver.cpp:36] Solver scaffolding done.
I0317 21:52:49.309447 2008298256 solver.cpp:47] Solving CIFAR10_quick_train
Based on the solver setting, we will print the training loss function every 100 iterations, and test the network every 500 iterations. You will see messages like this:
I0317 21:53:12.179772 2008298256 solver.cpp:208] Iteration 100, lr = 0.001
I0317 21:53:12.185698 2008298256 solver.cpp:65] Iteration 100, loss = 1.73643
...
I0317 21:54:41.150030 2008298256 solver.cpp:87] Iteration 500, Testing net
I0317 21:54:47.129461 2008298256 solver.cpp:114] Test score #0: 0.5504
I0317 21:54:47.129500 2008298256 solver.cpp:114] Test score #1: 1.27805
For each training iteration, `lr` is the learning rate of that iteration, and `loss` is the training function. For the output of the testing phase, **score 0 is the accuracy**, and **score 1 is the testing loss function**.
And after making yourself a cup of coffee, you are done!
I0317 22:12:19.666914 2008298256 solver.cpp:87] Iteration 5000, Testing net
I0317 22:12:25.580330 2008298256 solver.cpp:114] Test score #0: 0.7533
I0317 22:12:25.580379 2008298256 solver.cpp:114] Test score #1: 0.739837
I0317 22:12:25.587262 2008298256 solver.cpp:130] Snapshotting to cifar10_quick_iter_5000
I0317 22:12:25.590215 2008298256 solver.cpp:137] Snapshotting solver state to cifar10_quick_iter_5000.solverstate
I0317 22:12:25.592813 2008298256 solver.cpp:81] Optimization Done.
Our model achieved ~75% test accuracy. The model parameters are stored in binary protobuf format in
cifar10_quick_iter_5000
which is ready-to-deploy in CPU or GPU mode! Refer to the `CAFFE_ROOT/examples/cifar10/cifar10_quick.prototxt` for the deployment model definition that can be called on new data.
Why train on a GPU?
-------------------
CIFAR-10, while still small, has enough data to make GPU training attractive.
To compare CPU vs. GPU training speed, simply change one line in all the `cifar*solver.prototxt`:
# solver mode: CPU or GPU
solver_mode: CPU
and you will be using CPU for training.
---
layout: default
title: Caffe
---
Developing & Contributing
=========================
Caffe is developed with active participation of the community by the [Berkeley Vision and Learning Center](http://bvlc.eecs.berkeley.edu/).
We welcome all contributions!
The [contributing workflow](https://github.com/BVLC/caffe#development) is explained in the README. These guidelines cover development practices in Caffe. This is a work-in-progress.
**Development Flow**
- `master` is golden.
- `dev` is for new development: it is the branching point for features and the base of pull requests.
* The history of `dev` is not rewritten.
* Contributions are shepherded from `dev` to `master` by BVLC by merge.
- To err is human. Accidents are fixed by reverts.
- Releases are marked with tags on merge from `dev` to `master`.
**Issues & Pull Request Protocol**
0. Make issues for [bugs](https://github.com/BVLC/caffe/issues?labels=bug&page=1&state=open), tentative proposals, and [questions](https://github.com/BVLC/caffe/issues?labels=question&page=1&state=open).
1. Make PRs to signal development:
a. Make PRs *as soon as development begins*. Create a feature branch, make your initial commit, push, and PR to let everyone know you are working on it and let discussion guide development instead of review development after-the-fact.
b. When a proposal from the first step earns enough interest to warrant development, make a PR, and reference and close the old issue to direct the conversation to the PR.
2. When a PR is ready, comment to request a maintainer be assigned to review and merge to `dev`.
A PR is only ready for review when the code is committed, documented, linted, and tested!
**Documentation**: the documentation is bundled with Caffe in `docs/`. This includes the site you are reading now. Contributions should be documented both inline in code and through usage examples. New documentation is published by BVLC with each release and between releases as-needed.
We'd appreciate your contribution to the documentation effort!
**Testing**: run `make runtest` to check the project tests. New code requires new tests. Pull requests that fail tests will not be accepted.
The `googletest` framework we use provides many additional options, which you can access by running the test binaries directly. One of the more useful options is `--gtest_filter`, which allows you to filter tests by name:
# run all tests with CPU in the name
build/test/test_all.testbin --gtest_filter='*CPU*'
# run all tests without GPU in the name (note the leading minus sign)
build/test/test_all.testbin --gtest_filter=-'*GPU*'
To get a list of all options `googletest` provides, simply pass the `--help` flag:
build/test/test_all.testbin --help
**Style**
- Follow [Google C++ style](http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml) and [Google python style](http://google-styleguide.googlecode.com/svn/trunk/pyguide.html) + [PEP 8](http://legacy.python.org/dev/peps/pep-0008/).
- Wrap lines at 80 chars.
- Remember that “a foolish consistency is the hobgoblin of little minds,” so use your best judgement to write the clearest code for your particular case.
**Lint**: run `make lint` to check C++ code.
**Copyright**: assign copyright jointly to BVLC and contributors like so:
// Copyright 2014 BVLC and contributors.
The exact details of contributions are recorded by versioning and cited in our [acknowledgements](http://caffe.berkeleyvision.org/#acknowledgements). This method is impartial and always up-to-date.
---
layout: default
title: Caffe
---
Extracting Features
===================
In this tutorial, we will extract features using a pre-trained model.
Follow instructions for [setting up caffe](installation.html) and for [getting](getting_pretrained_models.html) the pre-trained ImageNet model.
If you need detailed information about the tools below, please consult their source code, in which additional documentation is usually provided.
Select data to run on
---------------------
We'll make a temporary folder to store things into.
mkdir examples/_temp
Generate a list of the files to process.
We're going to use the images that ship with caffe.
find `pwd`/examples/images -type f -exec echo {} \; > examples/_temp/temp.txt
The `ImageDataLayer` we'll use expects labels after each filenames, so let's add a 0 to the end of each line
sed "s/$/ 0/" examples/_temp/temp.txt > examples/_temp/file_list.txt
Define the Feature Extraction Network Architecture
--------------------------------------------------
In practice, subtracting the mean image from a dataset significantly improves classification accuracies.
Download the mean image of the ILSVRC dataset.
data/ilsvrc12/get_ilsvrc_aux.sh
We will use `data/ilsvrc212/imagenet_mean.binaryproto` in the network definition prototxt.
Let's copy and modify the network definition.
We'll be using the `ImageDataLayer`, which will load and resize images for us.
cp examples/feature_extraction/imagenet_val.prototxt examples/_temp
Edit `examples/_temp/imagenet_val.prototxt` to use correct path for your setup (replace `$CAFFE_DIR`)
Extract Features
----------------
Now everything necessary is in place.
build/tools/extract_features.bin examples/imagenet/caffe_reference_imagenet_model examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10
The name of feature blob that you extract is `fc7`, which represents the highest level feature of the reference model.
We can use any other layer, as well, such as `conv5` or `pool3`.
The last parameter above is the number of data mini-batches.
The features are stored to LevelDB `examples/_temp/features`, ready for access by some other code.
If you meet with the error "Check failed: status.ok() Failed to open leveldb examples/_temp/features", it is because the directory examples/_temp/features has been created the last time you run the command. Remove it and run again.
rm -rf examples/_temp/features/
If you'd like to use the Python wrapper for extracting features, check out the [layer visualization notebook](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb).
Clean Up
--------
Let's remove the temporary directory now.
rm -r examples/_temp
---
layout: default
---
# Pre-trained models
[BVLC](http://bvlc.eecs.berkeley.edu) aims to provide a variety of high quality pre-trained models.
Note that unlike Caffe itself, these models are licensed for **academic research / non-commercial use only**.
If you have any questions, please get in touch with us.
This page will be updated as more models become available.
### ImageNet
**Caffe Reference ImageNet Model**: Our reference implementation of an ImageNet model trained on ILSVRC-2012 can be downloaded (232.6MB) by running `examples/imagenet/get_caffe_reference_imagenet_model.sh` from the Caffe root directory.
- The bundled model is the iteration 310,000 snapshot.
- The best validation performance during training was iteration 313,000 with
validation accuracy 57.412% and loss 1.82328.
**AlexNet**: Our training of the Krizhevsky architecture, which differs from the paper's methodology by (1) not training with the relighting data-augmentation and (2) initializing non-zero biases to 0.1 instead of 1. (2) was found necessary for training, as initialization to 1 gave flat loss. Download the model (243.9MB) by running `examples/imagenet/get_caffe_alexnet_model.sh` from the Caffe root directory.
- The bundled model is the iteration 360,000 snapshot.
- The best validation performance during training was iteration 358,000 with
validation accuracy 57.258% and loss 1.83948.
**R-CNN (ILSVRC13)**: The pure Caffe instantiation of the [R-CNN](https://github.com/rbgirshick/rcnn) model for ILSVRC13 detection. Download the model (230.8MB) by running `examples/imagenet/get_caffe_rcnn_imagenet_model.sh` from the Caffe root directory. This model was made by transplanting the R-CNN SVM classifiers into a `fc-rcnn` classification layer, provided here as an off-the-shelf Caffe detector. Try the [detection example](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb) to see it in action. For the full details, refer to the R-CNN site. *N.B. For research purposes, make use of the official R-CNN package and not this example.*
Additionally, you will probably eventually need some auxiliary data (mean image, synset list, etc.): run `data/ilsvrc12/get_ilsvrc_aux.sh` from the root directory to obtain it.
---
layout: default
title: Caffe
---
Yangqing's Recipe on Brewing ImageNet
=====================================
"All your braincells are belong to us."
- Caffeine
We are going to describe a reference implementation for the approach first proposed by Krizhevsky, Sutskever, and Hinton in their [NIPS 2012 paper](http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf). Since training the whole model takes some time and energy, we provide a model, trained in the same way as we describe here, to help fight global warming. If you would like to simply use the pretrained model, check out the [Pretrained ImageNet](getting_pretrained_models.html) page. *Note that the pretrained model is for academic research / non-commercial use only*.
To clarify, by ImageNet we actually mean the ILSVRC12 challenge, but you can easily train on the whole of ImageNet as well, just with more disk space, and a little longer training time.
(If you don't get the quote, visit [Yann LeCun's fun page](http://yann.lecun.com/ex/fun/).
Data Preparation
----------------
We assume that you already have downloaded the ImageNet training data and validation data, and they are stored on your disk like:
/path/to/imagenet/train/n01440764/n01440764_10026.JPEG
/path/to/imagenet/val/ILSVRC2012_val_00000001.JPEG
You will first need to prepare some auxiliary data for training. This data can be downloaded by:
cd $CAFFE_ROOT/data/ilsvrc12/
./get_ilsvrc_aux.sh
The training and validation input are described in `train.txt` and `val.txt` as text listing all the files and their labels. Note that we use a different indexing for labels than the ILSVRC devkit: we sort the synset names in their ASCII order, and then label them from 0 to 999. See `synset_words.txt` for the synset/name mapping.
You may want to resize the images to 256x256 in advance. By default, we do not explicitly do this because in a cluster environment, one may benefit from resizing images in a parallel fashion, using mapreduce. For example, Yangqing used his lightedweighted [mincepie](https://github.com/Yangqing/mincepie) package to do mapreduce on the Berkeley cluster. If you would things to be rather simple and straightforward, you can also use shell commands, something like:
for name in /path/to/imagenet/val/*.JPEG; do
convert -resize 256x256\! $name $name
done
Go to `$CAFFE_ROOT/examples/imagenet/` for the rest of this guide.
Take a look at `create_imagenet.sh`. Set the paths to the train and val dirs as needed, and set "RESIZE=true" to resize all images to 256x256 if you haven't resized the images in advance. Now simply create the leveldbs with `./create_imagenet.sh`. Note that `imagenet_train_leveldb` and `imagenet_val_leveldb` should not exist before this execution. It will be created by the script. `GLOG_logtostderr=1` simply dumps more information for you to inspect, and you can safely ignore it.
Compute Image Mean
------------------
The model requires us to subtract the image mean from each image, so we have to compute the mean. `tools/compute_image_mean.cpp` implements that - it is also a good example to familiarize yourself on how to manipulate the multiple components, such as protocol buffers, leveldbs, and logging, if you are not familiar with them. Anyway, the mean computation can be carried out as:
./make_imagenet_mean.sh
which will make `data/ilsvrc12/imagenet_mean.binaryproto`.
Network Definition
------------------
The network definition follows strictly the one in Krizhevsky et al. You can find the detailed definition at `examples/imagenet/imagenet_train.prototxt`. Note the paths in the data layer - if you have not followed the exact paths in this guide you will need to change the following lines:
source: "ilvsrc12_train_leveldb"
mean_file: "../../data/ilsvrc12/imagenet_mean.binaryproto"
to point to your own leveldb and image mean. Likewise, do the same for `examples/imagenet/imagenet_val.prototxt`.
If you look carefully at `imagenet_train.prototxt` and `imagenet_val.prototxt`, you will notice that they are largely the same, with the only difference being the data layer sources, and the last layer: in training, we will be using a `softmax_loss` layer to compute the loss function and to initialize the backpropagation, while in validation we will be using an `accuracy` layer to inspect how well we do in terms of accuracy.
We will also lay out a protocol buffer for running the solver. Let's make a few plans:
* We will run in batches of 256, and run a total of 4,500,000 iterations (about 90 epochs).
* For every 1,000 iterations, we test the learned net on the validation data.
* We set the initial learning rate to 0.01, and decrease it every 100,000 iterations (about 20 epochs).
* Information will be displayed every 20 epochs.
* The network will be trained with momentum 0.9 and a weight decay of 0.0005.
* For every 10,000 iterations, we will take a snapshot of the current status.
Sound good? This is implemented in `examples/imagenet/imagenet_solver.prototxt`. Again, you will need to change the first two lines:
train_net: "imagenet_train.prototxt"
test_net: "imagenet_val.prototxt"
to point to the actual path if you have changed them.
Training ImageNet
-----------------
Ready? Let's train.
./train_imagenet.sh
Sit back and enjoy! On my K20 machine, every 20 iterations take about 36 seconds to run, so effectively about 7 ms per image for the full forward-backward pass. About 2.5 ms of this is on forward, and the rest is backward. If you are interested in dissecting the computation time, you can look at `examples/net_speed_benchmark.cpp`, but it was written purely for debugging purpose, so you may need to figure a few things out yourself.
Resume Training?
----------------
We all experience times when the power goes out, or we feel like rewarding ourself a little by playing Battlefield (does someone still remember Quake?). Since we are snapshotting intermediate results during training, we will be able to resume from snapshots. This can be done as easy as:
./resume_training.sh
where in the script `caffe_imagenet_train_1000.solverstate` is the solver state snapshot that stores all necessary information to recover the exact solver state (including the parameters, momentum history, etc).
Parting Words
-------------
Hope you liked this recipe! Many researchers have gone further since the ILSVRC 2012 challenge, changing the network architecture and/or finetuning the various parameters in the network. The recent ILSVRC 2013 challenge suggests that there are quite some room for improvement. **Caffe allows one to explore different network choices more easily, by simply writing different prototxt files** - isn't that exciting?
And since now you have a trained network, check out how to use it: [Running Pretrained ImageNet](getting_pretrained_models.html). This time we will use Python, but if you have wrappers for other languages, please kindly send a pull request!
---
layout: default
---
# Welcome to Caffe
Caffe is a framework for convolutional neural network algorithms, developed with speed in mind.
It was created by [Yangqing Jia](http://daggerfs.com), and is in active development by the [Berkeley Vision and Learning Center](http://bvlc.eecs.berkeley.edu).
Caffe is released under [the BSD 2-Clause license](https://github.com/BVLC/caffe/blob/master/LICENSE).
Check out the [classification demo](http://demo.caffe.berkeleyvision.org/)!
## Why Caffe?
Caffe aims to provide computer vision scientists and practitioners with a **clean and modifiable implementation** of state-of-the-art deep learning algorithms.
For example, network structure is easily specified in separate config files, with no mess of hard-coded parameters in the code.
At the same time, Caffe fits industry needs, with blazing fast C++/CUDA code for GPU computation.
Caffe is currently the fastest GPU CNN implementation publicly available, and is able to process more than **40 million images per day** with a single NVIDIA K40 or Titan GPU (or 20 million images per day on a K20 GPU)\*. That's 192 images per second during training and 500 images per second during test.
Caffe also provides **seamless switching between CPU and GPU**, which allows one to train models with fast GPUs and then deploy them on non-GPU clusters with one line of code: `Caffe::set_mode(Caffe::CPU)`.
Even in CPU mode, computing predictions on an image takes only 20 ms when images are processed in batch mode. While in GPU mode, computing predictions on an image takes only 2 ms when images are processed in batch mode.
## Documentation
* [Introductory slides](https://www.dropbox.com/s/10fx16yp5etb8dv/caffe-presentation.pdf): slides about the Caffe architecture, *updated 03/14*.
* [Installation](/installation.html): Instructions on installing Caffe (works on Ubuntu, Red Hat, OS X).
* [Pre-trained models](/getting_pretrained_models.html): BVLC provides some pre-trained models for academic / non-commercial use.
* [Development](/development.html): Guidelines for development and contributing to Caffe.
### Examples
* [Image Classification \[notebook\]][imagenet_classification]: classify images with the pretrained ImageNet model by the Python interface.
* [Detection \[notebook\]][detection]: run a pretrained model as a detector in Python.
* [Visualizing Features and Filters \[notebook\]][visualizing_filters]: extracting features and visualizing trained filters with an example image, viewed layer-by-layer.
* [Editing Model Parameters \[notebook\]][net_surgery]: how to do net surgery and manually change model parameters.
* [LeNet / MNIST Demo](/mnist.html): end-to-end training and testing of LeNet on MNIST.
* [CIFAR-10 Demo](/cifar10.html): training and testing on the CIFAR-10 data.
* [Training ImageNet](/imagenet_training.html): recipe for end-to-end training of an ImageNet classifier.
* [Feature extraction with C++](/feature_extraction.html): feature extraction using pre-trained model.
[imagenet_classification]: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/imagenet_classification.ipynb
[detection]: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb
[visualizing_filters]: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb
[net_surgery]: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb
## Citing Caffe
Please kindly cite Caffe in your publications if it helps your research:
@misc{Jia13caffe,
Author = {Yangqing Jia},
Title = { {Caffe}: An Open Source Convolutional Architecture for Fast Feature Embedding},
Year = {2013},
Howpublished = {\url{http://caffe.berkeleyvision.org/}
}
### Acknowledgements
Yangqing would like to thank the NVIDIA Academic program for providing K20 GPUs, and [Oriol Vinyals](http://www1.icsi.berkeley.edu/~vinyals/) for various discussions along the journey.
A core set of BVLC members have contributed lots of new functionality and fixes since the original release (alphabetical by first name):
- [Eric Tzeng](https://github.com/erictzeng)
- [Evan Shelhamer](http://imaginarynumber.net/)
- [Jeff Donahue](http://jeffdonahue.com/)
- [Jon Long](https://github.com/longjon)
- [Dr. Ross Girshick](http://www.cs.berkeley.edu/~rbg/)
- [Sergey Karayev](http://sergeykarayev.com/)
- [Dr. Sergio Guadarrama](http://www.eecs.berkeley.edu/~sguada/)
Additionally, the open-source community plays a large and growing role in Caffe's development.
Check out the Github [project pulse](https://github.com/BVLC/caffe/pulse) for recent activity, and the [contributors](https://github.com/BVLC/caffe/graphs/contributors) for an ordered list (by commit activity).
We sincerely appreciate your interest and contributions!
If you'd like to contribute, read [this](development.html).
---
\*: When measured with the [SuperVision](http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf) model that won the ImageNet Large Scale Visual Recognition Challenge 2012. See [performance and hardware configuration details](/performance_hardware.html).
---
layout: default
title: Caffe
---
# Installation
Prior to installing, it is best to read through this guide and take note of the details for your platform.
We have successfully compiled and run Caffe on Ubuntu 12.04, OS X 10.8, and OS X 10.9.
- [Prerequisites](#prerequisites)
- [Compilation](#compilation)
- [Hardware questions](#hardware_questions)
## Prerequisites
Caffe depends on several software packages.
* [CUDA](https://developer.nvidia.com/cuda-zone) (5.0, 5.5, or 6.0).
* [BLAS](http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) (provided via ATLAS, MKL, or OpenBLAS).
* [OpenCV](http://opencv.org/).
* [Boost](http://www.boost.org/) (we have only tested 1.55)
* `glog`, `gflags`, `protobuf`, `leveldb`, `snappy`, `hdf5`
* For the Python wrapper
* `Python`, `numpy (>= 1.7)`, boost-provided `boost.python`
* For the MATLAB wrapper
* MATLAB with the `mex` compiler.
### CUDA and BLAS
Caffe requires the CUDA `nvcc` compiler to compile its GPU code.
To install CUDA, go to the [NVIDIA CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. **Note:** you can install the CUDA libraries without a CUDA card or driver, in order to build and run Caffe on a CPU-only machine.
Caffe requires BLAS as the backend of its matrix and vector computations.
There are several implementations of this library.
The choice is yours:
* [ATLAS](http://math-atlas.sourceforge.net/): free, open source, and so the default for Caffe.
+ Ubuntu: `sudo apt-get install libatlas-base-dev`
+ CentOS/RHEL: `sudo yum install libatlas-devel`
+ OS X: already installed as the [Accelerate / vecLib Framework](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man7/Accelerate.7.html).
* [Intel MKL](http://software.intel.com/en-us/intel-mkl): commercial and optimized for Intel CPUs, with a free trial and [student](http://software.intel.com/en-us/intel-education-offerings) licenses.
1. Install MKL.
2. Set `BLAS := mkl` in `Makefile.config`
* [OpenBLAS](http://www.openblas.net/): free and open source; this optimized and parallel BLAS could require more effort to install, although it might offer a speedup.
1. Install OpenBLAS
2. Set `BLAS := open` in `Makefile.config`
### Python and/or Matlab wrappers (optional)
Python: The main requirements are `numpy` and `boost.python` (provided by boost). `pandas` is useful too and needed for some examples.
For **OS X**, we highly recommend using the [Anaconda](https://store.continuum.io/cshop/anaconda/) Python distribution, which provides most of the necessary packages, as well as the `hdf5` library dependency.
If you don't, please use Homebrew -- but beware of potential linking errors!
Note that if you use the **Ubuntu** default python, you will need to `apt-get install` the `python-dev` package to have the python headers. You can install any remaining dependencies with
pip install -r /path/to/caffe/python/requirements.txt
MATLAB: install MATLAB, and make sure that its `mex` is in your `$PATH`.
### The rest of the dependencies
#### Linux
On **Ubuntu**, the remaining dependencies can be installed with
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev
And on **CentOS or RHEL**, you can install via yum using:
sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel
The only exception being the google logging library, which does not exist in the Ubuntu 12.04 or CentOS/RHEL repositories. To install it, do:
wget https://google-glog.googlecode.com/files/glog-0.3.3.tar.gz
tar zxvf glog-0.3.3.tar.gz
./configure
make && make install
#### OS X
On **OS X**, we highly recommend using the [homebrew](http://brew.sh/) package manager, and ideally starting from a clean install of the OS (or from a wiped `/usr/local`) to avoid conflicts.
In the following, we assume that you're using Anaconda Python and Homebrew.
To install the OpenCV dependency, we'll need to provide an additional source for Homebrew:
brew tap homebrew/science
If using Anaconda Python, a modification is required to the OpenCV formula.
Do `brew edit opencv` and change the lines that look like the two lines below to exactly the two lines below.
-DPYTHON_LIBRARY=#{py_prefix}/lib/libpython2.7.dylib
-DPYTHON_INCLUDE_DIR=#{py_prefix}/include/python2.7
**NOTE**: We find that everything compiles successfully if `$LD_LIBRARY_PATH` is not set at all, and `$DYLD_FALLBACK_LIBRARY_PATH` is set to to provide CUDA, Python, and other relevant libraries (e.g. `/usr/local/cuda/lib:$HOME/anaconda/lib:/usr/local/lib:/usr/lib`).
In other `ENV` settings, things may not work as expected.
#### 10.8-specific Instructions
Simply run the following:
brew install --build-from-source --with-python boost
for x in snappy leveldb protobuf gflags glog szip homebrew/science/opencv; do brew install $x; done
Building boost from source is needed to link against your local Python (exceptions might be raised during some OS X installs, but **ignore** these and continue). If you do not need the Python wrapper, simply doing `brew install boost` is fine.
**Note** that the HDF5 dependency is provided by Anaconda Python in this case.
If you're not using Anaconda, include `hdf5` in the list above.
#### 10.9-specific Instructions
In OS X 10.9, clang++ is the default C++ compiler and uses `libc++` as the standard library.
However, NVIDIA CUDA (even version 6.0) currently links only with `libstdc++`.
This makes it necessary to change the compilation settings for each of the dependencies.
We do this by modifying the homebrew formulae before installing any packages.
Make sure that homebrew doesn't install any software dependencies in the background; all packages must be linked to `libstdc++`.
The prerequisite homebrew formulae are
boost snappy leveldb protobuf gflags glog szip homebrew/science/opencv
For each of these formulas, `brew edit FORMULA`, and add the ENV definitions as shown:
def install
# ADD THE FOLLOWING:
ENV.append "CXXFLAGS", "-stdlib=libstdc++"
ENV.append "CFLAGS", "-stdlib=libstdc++"
ENV.append "LDFLAGS", "-stdlib=libstdc++ -lstdc++"
# The following is necessary because libtool likes to strip LDFLAGS:
ENV["CXX"] = "/usr/bin/clang++ -stdlib=libstdc++"
...
To edit the formulae in turn, run
for x in snappy leveldb protobuf gflags glog szip boost homebrew/science/opencv; do brew edit $x; done
After this, run
for x in snappy leveldb protobuf gflags glog szip homebrew/science/opencv; do brew uninstall $x; brew install --build-from-source --fresh -vd $x; done
brew install --build-from-source --with-python --fresh -vd boost
**Note** that `brew install --build-from-source --fresh -vd boost` is fine if you do not need the Caffe Python wrapper.
**Note** that the HDF5 dependency is provided by Anaconda Python in this case.
If you're not using Anaconda, include `hdf5` in the list above.
#### Windows
There is an unofficial Windows port of Caffe at [niuzhiheng/caffe:windows](https://github.com/niuzhiheng/caffe). Thanks [@niuzhiheng](https://github.com/niuzhiheng)!
## Compilation
Now that you have the prerequisites, edit your `Makefile.config` to change the paths for your setup.
The defaults should work, but uncomment the relevant lines if using Anaconda Python.
cp Makefile.config.example Makefile.config
# Adjust Makefile.config (for example, if using Anaconda Python)
make all
make test
make runtest
Note that if there is no GPU in your machine, building and running CPU-only works, but GPU tests will naturally fail.
To compile the Python and MATLAB wrappers do `make pycaffe` and `make matcaffe` respectively.
Be sure to set your MATLAB and Python paths in `Makefile.config` first!
For Python support, you must add the compiled module to your `$PYTHONPATH` (as `/path/to/caffe/python` or the like).
*Distribution*: run `make distribute` to create a `distribute` directory with all the Caffe headers, compiled libraries, binaries, etc. needed for distribution to other machines.
*Speed*: for a faster build, compile in parallel by doing `make all -j8` where 8 is the number of parallel threads for compilation (a good choice for the number of threads is the number of cores in your machine).
Now that you have installed Caffe, check out the [MNIST demo](mnist.html) and the pretrained [ImageNet example](imagenet.html).
## Hardware Questions
**Laboratory Tested Hardware**: Berkeley Vision runs Caffe with K40s, K20s, and Titans including models at ImageNet/ILSVRC scale. We also run on GTX series cards and GPU-equipped MacBook Pros. We have not encountered any trouble in-house with devices with CUDA capability >= 3.0. All reported hardware issues thus-far have been due to GPU configuration, overheating, and the like.
**CUDA compute capability**: devices with compute capability <= 2.0 may have to reduce CUDA thread numbers and batch sizes due to hardware constraints. Your mileage may vary.
Refer to the project's issue tracker for [hardware/compatibility](https://github.com/BVLC/caffe/issues?labels=hardware%2Fcompatibility&page=1&state=open).
fixScale = function(doc) {
var addEvent = 'addEventListener',
type = 'gesturestart',
qsa = 'querySelectorAll',
scales = [1, 1],
meta = qsa in doc ? doc[qsa]('meta[name=viewport]') : [];
function fix() {
meta.content = 'width=device-width,minimum-scale=' + scales[0] + ',maximum-scale=' + scales[1];
doc.removeEventListener(type, fix, true);
}
if ((meta = meta[meta.length - 1]) && addEvent in doc) {
fix();
scales = [.25, 1.6];
doc[addEvent](type, fix, true);
}
};
\ No newline at end of file
---
layout: default
title: Caffe
---
Training MNIST with Caffe
================
We will assume that you have caffe successfully compiled. If not, please refer to the [Installation page](installation.html). In this tutorial, we will assume that your caffe installation is located at `CAFFE_ROOT`.
Prepare Datasets
----------------
You will first need to download and convert the data format from the MNIST website. To do this, simply run the following commands:
cd $CAFFE_ROOT/data/mnist
./get_mnist.sh
cd $CAFFE_ROOT/examples/mnist
./create_mnist.sh
If it complains that `wget` or `gunzip` are not installed, you need to install them respectively. After running the script there should be two datasets, `mnist-train-leveldb`, and `mnist-test-leveldb`.
LeNet: the MNIST Classification Model
-------------------------------------
Before we actually run the training program, let's explain what will happen. We will use the [LeNet](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf) network, which is known to work well on digit classification tasks. We will use a slightly different version from the original LeNet implementation, replacing the sigmoid activations with Rectified Linear Unit (ReLU) activations for the neurons.
The design of LeNet contains the essence of CNNs that are still used in larger models such as the ones in ImageNet. In general, it consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons. We have defined the layers in `CAFFE_ROOT/data/lenet.prototxt`.
If you would like to read about step-by-step instruction on how the protobuf definitions are written, see [MNIST: Define the Network](mnist_prototxt.html) and [MNIST: Define the Solver](mnist_solver_prototxt.html)?.
Training and Testing the Model
------------------------------
Training the model is simple after you have written the network definition protobuf and solver protobuf files. Simply run `train_mnist.sh`, or the following command directly:
cd $CAFFE_ROOT/examples/mnist
./train_lenet.sh
`train_lenet.sh` is a simple script, but here are a few explanations: `GLOG_logtostderr=1` is the google logging flag that prints all the logging messages directly to stderr. The main tool for training is `train_net.bin`, with the solver protobuf text file as its argument.
When you run the code, you will see a lot of messages flying by like this:
I1203 net.cpp:66] Creating Layer conv1
I1203 net.cpp:76] conv1 <- data
I1203 net.cpp:101] conv1 -> conv1
I1203 net.cpp:116] Top shape: 20 24 24
I1203 net.cpp:127] conv1 needs backward computation.
These messages tell you the details about each layer, its connections and its output shape, which may be helpful in debugging. After the initialization, the training will start:
I1203 net.cpp:142] Network initialization done.
I1203 solver.cpp:36] Solver scaffolding done.
I1203 solver.cpp:44] Solving LeNet
Based on the solver setting, we will print the training loss function every 100 iterations, and test the network every 1000 iterations. You will see messages like this:
I1203 solver.cpp:204] Iteration 100, lr = 0.00992565
I1203 solver.cpp:66] Iteration 100, loss = 0.26044
...
I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9785
I1203 solver.cpp:111] Test score #1: 0.0606671
For each training iteration, `lr` is the learning rate of that iteration, and `loss` is the training function. For the output of the testing phase, score 0 is the accuracy, and score 1 is the testing loss function.
And after a few minutes, you are done!
I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9897
I1203 solver.cpp:111] Test score #1: 0.0324599
I1203 solver.cpp:126] Snapshotting to lenet_iter_10000
I1203 solver.cpp:133] Snapshotting solver state to lenet_iter_10000.solverstate
I1203 solver.cpp:78] Optimization Done.
The final model, stored as a binary protobuf file, is stored at
lenet_iter_10000
which you can deploy as a trained model in your application, if you are training on a real-world application dataset.
Um... How about GPU training?
-----------------------------
You just did! All the training was carried out on the GPU. In fact, if you would like to do training on CPU, you can simply change one line in `lenet_solver.prototxt`:
# solver mode: CPU or GPU
solver_mode: CPU
and you will be using CPU for training. Isn't that easy?
MNIST is a small dataset, so training with GPU does not really introduce too much benefit due to communication overheads. On larger datasets with more complex models, such as ImageNet, the computation speed difference will be more significant.
---
layout: default
title: Caffe
---
Define the MNIST Network
=========================
This page explains the prototxt file `lenet_train.prototxt` used in the MNIST demo. We assume that you are familiar with [Google Protobuf](https://developers.google.com/protocol-buffers/docs/overview), and assume that you have read the protobuf definitions used by Caffe, which can be found at [src/caffe/proto/caffe.proto](https://github.com/Yangqing/caffe/blob/master/src/caffe/proto/caffe.proto).
Specifically, we will write a `caffe::NetParameter` (or in python, `caffe.proto.caffe_pb2.NetParameter`) protubuf. We will start by giving the network a name:
name: "LeNet"
Writing the Data Layer
----------------------
Currently, we will read the MNIST data from the leveldb we created earlier in the demo. This is defined by a data layer:
layers {
name: "mnist"
type: DATA
data_param {
source: "mnist-train-leveldb"
batch_size: 64
scale: 0.00390625
}
top: "data"
top: "label"
}
Specifically, this layer has name `mnist`, type `data`, and it reads the data from the given leveldb source. We will use a batch size of 64, and scale the incoming pixels so that they are in the range \[0,1\). Why 0.00390625? It is 1 divided by 256. And finally, this layer produces two blobs, one is the `data` blob, and one is the `label` blob.
Writing the Convolution Layer
--------------------------------------------
Let's define the first convolution layer:
layers {
name: "conv1"
type: CONVOLUTION
blobs_lr: 1.
blobs_lr: 2.
convolution_param {
num_output: 20
kernelsize: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "data"
top: "conv1"
}
This layer takes the `data` blob (it is provided by the data layer), and produces the `conv1` layer. It produces outputs of 20 channels, with the convolutional kernel size 5 and carried out with stride 1.
The fillers allow us to randomly initialize the value of the weights and bias. For the weight filler, we will use the `xavier` algorithm that automatically determines the scale of initialization based on the number of input and output neurons. For the bias filler, we will simply initialize it as constant, with the default filling value 0.
`blobs_lr` are the learning rate adjustments for the layer's learnable parameters. In this case, we will set the weight learning rate to be the same as the learning rate given by the solver during runtime, and the bias learning rate to be twice as large as that - this usually leads to better convergence rates.
Writing the Pooling Layer
-------------------------
Phew. Pooling layers are actually much easier to define:
layers {
name: "pool1"
type: POOLING
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
bottom: "conv1"
top: "pool1"
}
This says we will perform max pooling with a pool kernel size 2 and a stride of 2 (so no overlapping between neighboring pooling regions).
Similarly, you can write up the second convolution and pooling layers. Check `data/lenet.prototxt` for details.
Writing the Fully Connected Layer
----------------------------------
Writing a fully connected layer is also simple:
layers {
name: "ip1"
type: INNER_PRODUCT
blobs_lr: 1.
blobs_lr: 2.
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "pool2"
top: "ip1"
}
This defines a fully connected layer (for some legacy reason, Caffe calls it an `innerproduct` layer) with 500 outputs. All other lines look familiar, right?
Writing the ReLU Layer
----------------------
A ReLU Layer is also simple:
layers {
name: "relu1"
type: RELU
bottom: "ip1"
top: "ip1"
}
Since ReLU is an element-wise operation, we can do *in-place* operations to save some memory. This is achieved by simply giving the same name to the bottom and top blobs. Of course, do NOT use duplicated blob names for other layer types!
After the ReLU layer, we will write another innerproduct layer:
layers {
name: "ip2"
type: INNER_PRODUCT
blobs_lr: 1.
blobs_lr: 2.
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "ip1"
top: "ip2"
}
Writing the Loss Layer
-------------------------
Finally, we will write the loss!
layers {
name: "loss"
type: SOFTMAX_LOSS
bottom: "ip2"
bottom: "label"
}
The `softmax_loss` layer implements both the softmax and the multinomial logistic loss (that saves time and improves numerical stability). It takes two blobs, the first one being the prediction and the second one being the `label` provided by the data layer (remember it?). It does not produce any outputs - all it does is to compute the loss function value, report it when backpropagation starts, and initiates the gradient with respect to `ip2`. This is where all magic starts.
Now that we have demonstrated how to write the MNIST layer definition prototxt, maybe check out [how we write a solver prototxt](mnist_solver_prototxt.html)?
---
layout: default
title: Caffe
---
Define the MNIST Solver
=======================
The page is under construction. For now, check out the comments in the solver prototxt file, which explains each line in the prototxt:
# The training protocol buffer definition
train_net: "lenet_train.prototxt"
# The testing protocol buffer definition
test_net: "lenet_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "lenet"
# solver mode: 0 for CPU and 1 for GPU
solver_mode: 1
---
layout: default
title: Caffe
---
# Performance and Hardware Configuration
To measure performance on different NVIDIA GPUs we use the Caffe reference ImageNet model.
For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified.
**Acknowledgements**: BVLC members are very grateful to NVIDIA for providing several GPUs to conduct this research.
## NVIDIA K40
Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory.
Best settings with ECC off and maximum clock speed:
* Training is 26.5 secs / 20 iterations (5,120 images)
* Testing is 100 secs / validation set (50,000 images)
Other settings:
* ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set
* ECC on, default speed: training 31 secs / 20 iterations, test 117 secs / validation set
* ECC off, default speed: training 31 secs / 20 iterations, test 118 secs / validation set
### K40 configuration tips
For maximum K40 performance, turn off ECC and boost the clock speed (at your own risk).
To turn off ECC, do
sudo nvidia-smi -i 0 --ecc-config=0 # repeat with -i x for each GPU ID
then reboot.
Set the "persistence" mode of the GPU settings by
sudo nvidia-smi -pm 1
and then set the clock speed with
sudo nvidia-smi -i 0 -ac 3004,875 # repeat with -i x for each GPU ID
but note that this configuration resets across driver reloading / rebooting. Include these commands in a boot script to intialize these settings. For a simple fix, add these commands to `/etc/rc.local` (on Ubuntu).
## NVIDIA Titan
Training: 26.26 secs / 20 iterations (5,120 images).
Testing: 100 secs / validation set (50,000 images).
## NVIDIA K20
Training: 36.0 secs / 20 iterations (5,120 images).
Testing: 133 secs / validation set (50,000 images)
.highlight { background: #ffffff; }
.highlight .c { color: #999988; font-style: italic } /* Comment */
.highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */
.highlight .k { font-weight: bold } /* Keyword */
.highlight .o { font-weight: bold } /* Operator */
.highlight .cm { color: #999988; font-style: italic } /* Comment.Multiline */
.highlight .cp { color: #999999; font-weight: bold } /* Comment.Preproc */
.highlight .c1 { color: #999988; font-style: italic } /* Comment.Single */
.highlight .cs { color: #999999; font-weight: bold; font-style: italic } /* Comment.Special */
.highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */
.highlight .gd .x { color: #000000; background-color: #ffaaaa } /* Generic.Deleted.Specific */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .gr { color: #aa0000 } /* Generic.Error */
.highlight .gh { color: #999999 } /* Generic.Heading */
.highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */
.highlight .gi .x { color: #000000; background-color: #aaffaa } /* Generic.Inserted.Specific */
.highlight .go { color: #888888 } /* Generic.Output */
.highlight .gp { color: #555555 } /* Generic.Prompt */
.highlight .gs { font-weight: bold } /* Generic.Strong */
.highlight .gu { color: #800080; font-weight: bold; } /* Generic.Subheading */
.highlight .gt { color: #aa0000 } /* Generic.Traceback */
.highlight .kc { font-weight: bold } /* Keyword.Constant */
.highlight .kd { font-weight: bold } /* Keyword.Declaration */
.highlight .kn { font-weight: bold } /* Keyword.Namespace */
.highlight .kp { font-weight: bold } /* Keyword.Pseudo */
.highlight .kr { font-weight: bold } /* Keyword.Reserved */
.highlight .kt { color: #445588; font-weight: bold } /* Keyword.Type */
.highlight .m { color: #009999 } /* Literal.Number */
.highlight .s { color: #d14 } /* Literal.String */
.highlight .na { color: #008080 } /* Name.Attribute */
.highlight .nb { color: #0086B3 } /* Name.Builtin */
.highlight .nc { color: #445588; font-weight: bold } /* Name.Class */
.highlight .no { color: #008080 } /* Name.Constant */
.highlight .ni { color: #800080 } /* Name.Entity */
.highlight .ne { color: #990000; font-weight: bold } /* Name.Exception */
.highlight .nf { color: #990000; font-weight: bold } /* Name.Function */
.highlight .nn { color: #555555 } /* Name.Namespace */
.highlight .nt { color: #000080 } /* Name.Tag */
.highlight .nv { color: #008080 } /* Name.Variable */
.highlight .ow { font-weight: bold } /* Operator.Word */
.highlight .w { color: #bbbbbb } /* Text.Whitespace */
.highlight .mf { color: #009999 } /* Literal.Number.Float */
.highlight .mh { color: #009999 } /* Literal.Number.Hex */
.highlight .mi { color: #009999 } /* Literal.Number.Integer */
.highlight .mo { color: #009999 } /* Literal.Number.Oct */
.highlight .sb { color: #d14 } /* Literal.String.Backtick */
.highlight .sc { color: #d14 } /* Literal.String.Char */
.highlight .sd { color: #d14 } /* Literal.String.Doc */
.highlight .s2 { color: #d14 } /* Literal.String.Double */
.highlight .se { color: #d14 } /* Literal.String.Escape */
.highlight .sh { color: #d14 } /* Literal.String.Heredoc */
.highlight .si { color: #d14 } /* Literal.String.Interpol */
.highlight .sx { color: #d14 } /* Literal.String.Other */
.highlight .sr { color: #009926 } /* Literal.String.Regex */
.highlight .s1 { color: #d14 } /* Literal.String.Single */
.highlight .ss { color: #990073 } /* Literal.String.Symbol */
.highlight .bp { color: #999999 } /* Name.Builtin.Pseudo */
.highlight .vc { color: #008080 } /* Name.Variable.Class */
.highlight .vg { color: #008080 } /* Name.Variable.Global */
.highlight .vi { color: #008080 } /* Name.Variable.Instance */
.highlight .il { color: #009999 } /* Literal.Number.Integer.Long */
.type-csharp .highlight .k { color: #0000FF }
.type-csharp .highlight .kt { color: #0000FF }
.type-csharp .highlight .nf { color: #000000; font-weight: normal }
.type-csharp .highlight .nc { color: #2B91AF }
.type-csharp .highlight .nn { color: #000000 }
.type-csharp .highlight .s { color: #A31515 }
.type-csharp .highlight .sc { color: #A31515 }
/* MeyerWeb Reset */
html, body, div, span, applet, object, iframe,
h1, h2, h3, h4, h5, h6, p, blockquote, pre,
a, abbr, acronym, address, big, cite, code,
del, dfn, em, img, ins, kbd, q, s, samp,
small, strike, strong, sub, sup, tt, var,
b, u, i, center,
dl, dt, dd, ol, ul, li,
fieldset, form, label, legend,
table, caption, tbody, tfoot, thead, tr, th, td,
article, aside, canvas, details, embed,
figure, figcaption, footer, header, hgroup,
menu, nav, output, ruby, section, summary,
time, mark, audio, video {
margin: 0;
padding: 0;
border: 0;
font: inherit;
vertical-align: baseline;
}
body {
padding:10px 50px 0 0;
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
font-weight: 300;
font-size: 14px;
color: #232323;
background-color: #FBFAF7;
margin: 0;
line-height: 1.8em;
-webkit-font-smoothing: antialiased;
}
h1, h2, h3, h4, h5, h6 {
color:#232323;
margin:36px 0 10px;
}
p, ul, ol, table, dl {
margin:0 0 22px;
}
h1, h2, h3 {
font-family: Times, serif;
font-weight: 300;
line-height:1.3;
font-weight: normal;
display: block;
border-bottom: 1px solid #ccc;
padding-bottom: 5px;
}
h1 {
font-size: 30px;
}
h2 {
font-size: 24px;
}
h3 {
font-size: 18px;
}
h4, h5, h6 {
font-family: Times, serif;
font-weight: 700;
}
a {
color:#C30000;
text-decoration:none;
}
a:hover {
text-decoration: underline;
}
a small {
font-size: 12px;
}
em {
font-style: italic;
}
strong {
font-weight:700;
}
ul {
list-style: inside;
padding-left: 25px;
}
ol {
list-style: decimal inside;
padding-left: 20px;
}
blockquote {
margin: 0;
padding: 0 0 0 20px;
font-style: italic;
}
dl, dt, dd, dl p {
font-color: #444;
}
dl dt {
font-weight: bold;
}
dl dd {
padding-left: 20px;
font-style: italic;
}
dl p {
padding-left: 20px;
font-style: italic;
}
hr {
border:0;
background:#ccc;
height:1px;
margin:0 0 24px;
}
/* Images */
img {
position: relative;
margin: 0 auto;
max-width: 650px;
padding: 5px;
margin: 10px 0 32px 0;
border: 1px solid #ccc;
}
p img {
display: inline;
margin: 0;
padding: 0;
vertical-align: middle;
text-align: center;
border: none;
}
/* Code blocks */
code, pre {
font-family: monospace;
color:#000;
font-size:12px;
line-height: 14px;
}
pre {
padding: 6px 12px;
background: #FDFEFB;
border-radius:4px;
border:1px solid #D7D8C8;
overflow: auto;
white-space: pre-wrap;
margin-bottom: 16px;
}
/* Tables */
table {
width:100%;
}
table {
border: 1px solid #ccc;
margin-bottom: 32px;
text-align: left;
}
th {
font-family: 'Arvo', Helvetica, Arial, sans-serif;
font-size: 18px;
font-weight: normal;
padding: 10px;
background: #232323;
color: #FDFEFB;
}
td {
padding: 10px;
background: #ccc;
}
/* Wrapper */
.wrapper {
width:960px;
}
/* Header */
header {
background-color: #171717;
color: #FDFDFB;
width:170px;
float:left;
position:fixed;
border: 1px solid #000;
-webkit-border-top-right-radius: 4px;
-webkit-border-bottom-right-radius: 4px;
-moz-border-radius-topright: 4px;
-moz-border-radius-bottomright: 4px;
border-top-right-radius: 4px;
border-bottom-right-radius: 4px;
padding: 12px 25px 22px 50px;
margin: 24px 25px 0 0;
-webkit-font-smoothing: antialiased;
}
p.header {
font-size: 16px;
}
h1.header {
/*font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;*/
font-size: 30px;
font-weight: 300;
line-height: 1.3em;
border-bottom: none;
margin-top: 0;
}
h1.header, a.header, a.name, header a{
color: #fff;
}
a.header {
text-decoration: underline;
}
a.name {
white-space: nowrap;
}
header ul {
list-style:none;
padding:0;
}
header li {
list-style-type: none;
width:132px;
height:15px;
margin-bottom: 12px;
line-height: 1em;
padding: 6px 6px 6px 7px;
background: #AF0011;
background: -moz-linear-gradient(top, #AF0011 0%, #820011 100%);
background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#f8f8f8), color-stop(100%,#dddddd));
background: -webkit-linear-gradient(top, #AF0011 0%,#820011 100%);
background: -o-linear-gradient(top, #AF0011 0%,#820011 100%);
background: -ms-linear-gradient(top, #AF0011 0%,#820011 100%);
background: linear-gradient(top, #AF0011 0%,#820011 100%);
border-radius:4px;
border:1px solid #0D0D0D;
-webkit-box-shadow: inset 0px 1px 1px 0 rgba(233,2,38, 1);
box-shadow: inset 0px 1px 1px 0 rgba(233,2,38, 1);
}
header li:hover {
background: #C3001D;
background: -moz-linear-gradient(top, #C3001D 0%, #950119 100%);
background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#f8f8f8), color-stop(100%,#dddddd));
background: -webkit-linear-gradient(top, #C3001D 0%,#950119 100%);
background: -o-linear-gradient(top, #C3001D 0%,#950119 100%);
background: -ms-linear-gradient(top, #C3001D 0%,#950119 100%);
background: linear-gradient(top, #C3001D 0%,#950119 100%);
}
a.buttons {
-webkit-font-smoothing: antialiased;
background: url(../images/arrow-down.png) no-repeat;
font-weight: normal;
text-shadow: rgba(0, 0, 0, 0.4) 0 -1px 0;
padding: 2px 2px 2px 22px;
height: 30px;
}
a.github {
background: url(../images/octocat-small.png) no-repeat 1px;
}
a.buttons:hover {
color: #fff;
text-decoration: none;
}
/* Section - for main page content */
section {
width:650px;
float:right;
padding-bottom:50px;
}
/* Footer */
footer {
width:170px;
float:left;
position:fixed;
bottom:10px;
padding-left: 50px;
}
@media print, screen and (max-width: 960px) {
div.wrapper {
width:auto;
margin:0;
}
header, section, footer {
float:none;
position:static;
width:auto;
}
footer {
border-top: 1px solid #ccc;
margin:0 84px 0 50px;
padding:0;
}
header {
padding-right:320px;
}
section {
padding:20px 84px 20px 50px;
margin:0 0 20px;
}
header a small {
display:inline;
}
header ul {
position:absolute;
right:130px;
top:84px;
}
}
@media print, screen and (max-width: 720px) {
body {
word-wrap:break-word;
}
header {
padding:10px 20px 0;
margin-right: 0;
}
section {
padding:10px 0 10px 20px;
margin:0 0 30px;
}
footer {
margin: 0 0 0 30px;
}
header ul, header p.view {
position:static;
}
}
@media print, screen and (max-width: 480px) {
header ul li.download {
display:none;
}
footer {
margin: 0 0 0 20px;
}
footer a{
display:block;
}
}
@media print {
body {
padding:0.4in;
font-size:12pt;
color:#444;
}
}
// Copyright 2014 BVLC and contributors.
#ifndef CAFFE_BLOB_HPP_
#define CAFFE_BLOB_HPP_
#include "caffe/common.hpp"
#include "caffe/syncedmem.hpp"
#include "caffe/proto/caffe.pb.h"
namespace caffe {
template <typename Dtype>
class Blob {
public:
Blob()
: num_(0), channels_(0), height_(0), width_(0), count_(0), data_(),
diff_() {}
explicit Blob(const int num, const int channels, const int height,
const int width);
void Reshape(const int num, const int channels, const int height,
const int width);
void ReshapeLike(const Blob& other);
inline int num() const { return num_; }
inline int channels() const { return channels_; }
inline int height() const { return height_; }
inline int width() const { return width_; }
inline int count() const {return count_; }
inline int offset(const int n, const int c = 0, const int h = 0,
const int w = 0) const {
CHECK_GE(n, 0);
CHECK_LE(n, num_);
CHECK_GE(channels_, 0);
CHECK_LE(c, channels_);
CHECK_GE(height_, 0);
CHECK_LE(h, height_);
CHECK_GE(width_, 0);
CHECK_LE(w, width_);
return ((n * channels_ + c) * height_ + h) * width_ + w;
}
// Copy from source. If copy_diff is false, we copy the data; if copy_diff
// is true, we copy the diff.
void CopyFrom(const Blob<Dtype>& source, bool copy_diff = false,
bool reshape = false);
inline Dtype data_at(const int n, const int c, const int h,
const int w) const {
return *(cpu_data() + offset(n, c, h, w));
}
inline Dtype diff_at(const int n, const int c, const int h,
const int w) const {
return *(cpu_diff() + offset(n, c, h, w));
}
inline const shared_ptr<SyncedMemory>& data() const {
CHECK(data_);
return data_;
}
inline const shared_ptr<SyncedMemory>& diff() const {
CHECK(diff_);
return diff_;
}
const Dtype* cpu_data() const;
void set_cpu_data(Dtype* data);
const Dtype* gpu_data() const;
const Dtype* cpu_diff() const;
const Dtype* gpu_diff() const;
Dtype* mutable_cpu_data();
Dtype* mutable_gpu_data();
Dtype* mutable_cpu_diff();
Dtype* mutable_gpu_diff();
void Update();
void FromProto(const BlobProto& proto);
void ToProto(BlobProto* proto, bool write_diff = false) const;
// Set the data_/diff_ shared_ptr to point to the SyncedMemory holding the
// data_/diff_ of Blob other -- useful in layers which simply perform a copy
// in their forward or backward pass.
// This deallocates the SyncedMemory holding this blob's data/diff, as
// shared_ptr calls its destructor when reset with the = operator.
void ShareData(const Blob& other);
void ShareDiff(const Blob& other);
protected:
shared_ptr<SyncedMemory> data_;
shared_ptr<SyncedMemory> diff_;
int num_;
int channels_;
int height_;
int width_;
int count_;
DISABLE_COPY_AND_ASSIGN(Blob);
}; // class Blob
} // namespace caffe
#endif // CAFFE_BLOB_HPP_
// Copyright 2014 BVLC and contributors.
// caffe.hpp is the header file that you need to include in your code. It wraps
// all the internal caffe header files into one for simpler inclusion.
#ifndef CAFFE_CAFFE_HPP_
#define CAFFE_CAFFE_HPP_
#include "caffe/common.hpp"
#include "caffe/blob.hpp"
#include "caffe/filler.hpp"
#include "caffe/layer.hpp"
#include "caffe/net.hpp"
#include "caffe/solver.hpp"
#include "caffe/util/io.hpp"
#include "caffe/vision_layers.hpp"
#include "caffe/proto/caffe.pb.h"
#endif // CAFFE_CAFFE_HPP_
// Copyright 2014 BVLC and contributors.
#ifndef CAFFE_COMMON_HPP_
#define CAFFE_COMMON_HPP_
#include <boost/shared_ptr.hpp>
#include <cublas_v2.h>
#include <cuda.h>
#include <curand.h>
#include <driver_types.h> // cuda driver types
#include <glog/logging.h>
// Disable the copy and assignment operator for a class.
#define DISABLE_COPY_AND_ASSIGN(classname) \
private:\
classname(const classname&);\
classname& operator=(const classname&)
// Instantiate a class with float and double specifications.
#define INSTANTIATE_CLASS(classname) \
template class classname<float>; \
template class classname<double>
// A simple macro to mark codes that are not implemented, so that when the code
// is executed we will see a fatal log.
#define NOT_IMPLEMENTED LOG(FATAL) << "Not Implemented Yet"
// CUDA: various checks for different function calls.
#define CUDA_CHECK(condition) \
/* Code block avoids redefinition of cudaError_t error */ \
do { \
cudaError_t error = condition; \
CHECK_EQ(error, cudaSuccess) << " " << cudaGetErrorString(error); \
} while (0)
#define CUBLAS_CHECK(condition) \
do { \
cublasStatus_t status = condition; \
CHECK_EQ(status, CUBLAS_STATUS_SUCCESS) << " " \
<< caffe::cublasGetErrorString(status); \
} while (0)
#define CURAND_CHECK(condition) \
do { \
curandStatus_t status = condition; \
CHECK_EQ(status, CURAND_STATUS_SUCCESS) << " " \
<< caffe::curandGetErrorString(status); \
} while (0)
// CUDA: grid stride looping
#define CUDA_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; \
i < (n); \
i += blockDim.x * gridDim.x)
// CUDA: check for error after kernel execution and exit loudly if there is one.
#define CUDA_POST_KERNEL_CHECK CUDA_CHECK(cudaPeekAtLastError())
// Define not supported status for pre-6.0 compatibility.
#if CUDA_VERSION < 6000
#define CUBLAS_STATUS_NOT_SUPPORTED 831486
#endif
namespace caffe {
// We will use the boost shared_ptr instead of the new C++11 one mainly
// because cuda does not work (at least now) well with C++11 features.
using boost::shared_ptr;
// A singleton class to hold common caffe stuff, such as the handler that
// caffe is going to use for cublas, curand, etc.
class Caffe {
public:
~Caffe();
inline static Caffe& Get() {
if (!singleton_.get()) {
singleton_.reset(new Caffe());
}
return *singleton_;
}
enum Brew { CPU, GPU };
enum Phase { TRAIN, TEST };
// This random number generator facade hides boost and CUDA rng
// implementation from one another (for cross-platform compatibility).
class RNG {
public:
RNG();
explicit RNG(unsigned int seed);
explicit RNG(const RNG&);
RNG& operator=(const RNG&);
void* generator();
private:
class Generator;
shared_ptr<Generator> generator_;
};
// Getters for boost rng, curand, and cublas handles
inline static RNG& rng_stream() {
if (!Get().random_generator_) {
Get().random_generator_.reset(new RNG());
}
return *(Get().random_generator_);
}
inline static cublasHandle_t cublas_handle() { return Get().cublas_handle_; }
inline static curandGenerator_t curand_generator() {
return Get().curand_generator_;
}
// Returns the mode: running on CPU or GPU.
inline static Brew mode() { return Get().mode_; }
// Returns the phase: TRAIN or TEST.
inline static Phase phase() { return Get().phase_; }
// The setters for the variables
// Sets the mode. It is recommended that you don't change the mode halfway
// into the program since that may cause allocation of pinned memory being
// freed in a non-pinned way, which may cause problems - I haven't verified
// it personally but better to note it here in the header file.
inline static void set_mode(Brew mode) { Get().mode_ = mode; }
// Sets the phase.
inline static void set_phase(Phase phase) { Get().phase_ = phase; }
// Sets the random seed of both boost and curand
static void set_random_seed(const unsigned int seed);
// Sets the device. Since we have cublas and curand stuff, set device also
// requires us to reset those values.
static void SetDevice(const int device_id);
// Prints the current GPU status.
static void DeviceQuery();
protected:
cublasHandle_t cublas_handle_;
curandGenerator_t curand_generator_;
shared_ptr<RNG> random_generator_;
Brew mode_;
Phase phase_;
static shared_ptr<Caffe> singleton_;
private:
// The private constructor to avoid duplicate instantiation.
Caffe();
DISABLE_COPY_AND_ASSIGN(Caffe);
};
// NVIDIA_CUDA-5.5_Samples/common/inc/helper_cuda.h
const char* cublasGetErrorString(cublasStatus_t error);
const char* curandGetErrorString(curandStatus_t error);
// CUDA: thread number configuration.
// Use 1024 threads per block, which requires cuda sm_2x or above,
// or fall back to attempt compatibility (best of luck to you).
#if __CUDA_ARCH__ >= 200
const int CAFFE_CUDA_NUM_THREADS = 1024;
#else
const int CAFFE_CUDA_NUM_THREADS = 512;
#endif
// CUDA: number of blocks for threads.
inline int CAFFE_GET_BLOCKS(const int N) {
return (N + CAFFE_CUDA_NUM_THREADS - 1) / CAFFE_CUDA_NUM_THREADS;
}
} // namespace caffe
#endif // CAFFE_COMMON_HPP_
// Copyright 2014 BVLC and contributors.
// Fillers are random number generators that fills a blob using the specified
// algorithm. The expectation is that they are only going to be used during
// initialization time and will not involve any GPUs.
#ifndef CAFFE_FILLER_HPP
#define CAFFE_FILLER_HPP
#include <string>
#include "caffe/common.hpp"
#include "caffe/blob.hpp"
#include "caffe/syncedmem.hpp"
#include "caffe/util/math_functions.hpp"
#include "caffe/proto/caffe.pb.h"
namespace caffe {
template <typename Dtype>
class Filler {
public:
explicit Filler(const FillerParameter& param) : filler_param_(param) {}
virtual ~Filler() {}
virtual void Fill(Blob<Dtype>* blob) = 0;
protected:
FillerParameter filler_param_;
}; // class Filler
template <typename Dtype>
class ConstantFiller : public Filler<Dtype> {
public:
explicit ConstantFiller(const FillerParameter& param)
: Filler<Dtype>(param) {}
virtual void Fill(Blob<Dtype>* blob) {
Dtype* data = blob->mutable_cpu_data();
const int count = blob->count();
const Dtype value = this->filler_param_.value();
CHECK(count);
for (int i = 0; i < count; ++i) {
data[i] = value;
}
CHECK_EQ(this->filler_param_.sparse(), -1)
<< "Sparsity not supported by this Filler.";
}
};
template <typename Dtype>
class UniformFiller : public Filler<Dtype> {
public:
explicit UniformFiller(const FillerParameter& param)
: Filler<Dtype>(param) {}
virtual void Fill(Blob<Dtype>* blob) {
CHECK(blob->count());
caffe_rng_uniform<Dtype>(blob->count(), Dtype(this->filler_param_.min()),
Dtype(this->filler_param_.max()), blob->mutable_cpu_data());
CHECK_EQ(this->filler_param_.sparse(), -1)
<< "Sparsity not supported by this Filler.";
}
};
template <typename Dtype>
class GaussianFiller : public Filler<Dtype> {
public:
explicit GaussianFiller(const FillerParameter& param)
: Filler<Dtype>(param) {}
virtual void Fill(Blob<Dtype>* blob) {
Dtype* data = blob->mutable_cpu_data();
CHECK(blob->count());
caffe_rng_gaussian<Dtype>(blob->count(), Dtype(this->filler_param_.mean()),
Dtype(this->filler_param_.std()), blob->mutable_cpu_data());
int sparse = this->filler_param_.sparse();
CHECK_GE(sparse, -1);
if (sparse >= 0) {
// Sparse initialization is implemented for "weight" blobs; i.e. matrices.
// These have num == channels == 1; height is number of inputs; width is
// number of outputs. The 'sparse' variable specifies the mean number
// of non-zero input weights for a given output.
CHECK_EQ(blob->num(), 1);
CHECK_EQ(blob->channels(), 1);
int num_inputs = blob->height();
Dtype non_zero_probability = Dtype(sparse) / Dtype(num_inputs);
rand_vec_.reset(new SyncedMemory(blob->count() * sizeof(int)));
int* mask = reinterpret_cast<int*>(rand_vec_->mutable_cpu_data());
caffe_rng_bernoulli(blob->count(), non_zero_probability, mask);
for (int i = 0; i < blob->count(); ++i) {
data[i] *= mask[i];
}
}
}
protected:
shared_ptr<SyncedMemory> rand_vec_;
};
template <typename Dtype>
class PositiveUnitballFiller : public Filler<Dtype> {
public:
explicit PositiveUnitballFiller(const FillerParameter& param)
: Filler<Dtype>(param) {}
virtual void Fill(Blob<Dtype>* blob) {
Dtype* data = blob->mutable_cpu_data();
DCHECK(blob->count());
caffe_rng_uniform<Dtype>(blob->count(), 0, 1, blob->mutable_cpu_data());
// We expect the filler to not be called very frequently, so we will
// just use a simple implementation
int dim = blob->count() / blob->num();
CHECK(dim);
for (int i = 0; i < blob->num(); ++i) {
Dtype sum = 0;
for (int j = 0; j < dim; ++j) {
sum += data[i * dim + j];
}
for (int j = 0; j < dim; ++j) {
data[i * dim + j] /= sum;
}
}
CHECK_EQ(this->filler_param_.sparse(), -1)
<< "Sparsity not supported by this Filler.";
}
};
// A filler based on the paper [Bengio and Glorot 2010]: Understanding
// the difficulty of training deep feedforward neuralnetworks, but does not
// use the fan_out value.
//
// It fills the incoming matrix by randomly sampling uniform data from
// [-scale, scale] where scale = sqrt(3 / fan_in) where fan_in is the number
// of input nodes. You should make sure the input blob has shape (num, a, b, c)
// where a * b * c = fan_in.
template <typename Dtype>
class XavierFiller : public Filler<Dtype> {
public:
explicit XavierFiller(const FillerParameter& param)
: Filler<Dtype>(param) {}
virtual void Fill(Blob<Dtype>* blob) {
CHECK(blob->count());
int fan_in = blob->count() / blob->num();
Dtype scale = sqrt(Dtype(3) / fan_in);
caffe_rng_uniform<Dtype>(blob->count(), -scale, scale,
blob->mutable_cpu_data());
CHECK_EQ(this->filler_param_.sparse(), -1)
<< "Sparsity not supported by this Filler.";
}
};
// A function to get a specific filler from the specification given in
// FillerParameter. Ideally this would be replaced by a factory pattern,
// but we will leave it this way for now.
template <typename Dtype>
Filler<Dtype>* GetFiller(const FillerParameter& param) {
const std::string& type = param.type();
if (type == "constant") {
return new ConstantFiller<Dtype>(param);
} else if (type == "gaussian") {
return new GaussianFiller<Dtype>(param);
} else if (type == "positive_unitball") {
return new PositiveUnitballFiller<Dtype>(param);
} else if (type == "uniform") {
return new UniformFiller<Dtype>(param);
} else if (type == "xavier") {
return new XavierFiller<Dtype>(param);
} else {
CHECK(false) << "Unknown filler name: " << param.type();
}
return (Filler<Dtype>*)(NULL);
}
} // namespace caffe
#endif // CAFFE_FILLER_HPP_
// Copyright 2014 BVLC and contributors.
#ifndef CAFFE_LAYER_H_
#define CAFFE_LAYER_H_
#include <string>
#include <vector>
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/proto/caffe.pb.h"
using std::string;
using std::vector;
namespace caffe {
template <typename Dtype>
class Layer {
public:
// You should not implement your own constructor. Any set up code should go
// to SetUp(), where the dimensions of the bottom blobs are provided to the
// layer.
explicit Layer(const LayerParameter& param)
: layer_param_(param) {
// The only thing we do is to copy blobs if there are any.
if (layer_param_.blobs_size() > 0) {
blobs_.resize(layer_param_.blobs_size());
for (int i = 0; i < layer_param_.blobs_size(); ++i) {
blobs_[i].reset(new Blob<Dtype>());
blobs_[i]->FromProto(layer_param_.blobs(i));
}
}
}
virtual ~Layer() {}
// SetUp: your function should implement this, and call Layer::SetUp for
// common SetUp functionality.
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top) {
CheckBlobCounts(bottom, *top);
}
// Forward and backward wrappers. You should implement the cpu and
// gpu specific implementations instead, and should not change these
// functions.
inline Dtype Forward(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
inline void Backward(const vector<Blob<Dtype>*>& top,
const bool propagate_down,
vector<Blob<Dtype>*>* bottom);
// Returns the vector of blobs.
vector<shared_ptr<Blob<Dtype> > >& blobs() {
return blobs_;
}
// Returns the layer parameter
const LayerParameter& layer_param() { return layer_param_; }
// Writes the layer parameter to a protocol buffer
virtual void ToProto(LayerParameter* param, bool write_diff = false);
// Returns the layer type as an enum value.
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_NONE;
}
// Returns the layer type name.
virtual inline const string& type_name() const {
return LayerParameter_LayerType_Name(type());
}
// These methods can be overwritten to declare that this layer type expects
// a certain number of blobs as input and output.
//
// ExactNum{Bottom,Top}Blobs return a non-negative number to require an exact
// number of bottom/top blobs; the Min/Max versions return a non-negative
// number to require a minimum and/or maximum number of blobs.
// If Exact is specified, neither Min nor Max should be specified, and vice
// versa. These methods may not rely on SetUp having been called.
virtual inline int ExactNumBottomBlobs() const { return -1; }
virtual inline int MinBottomBlobs() const { return -1; }
virtual inline int MaxBottomBlobs() const { return -1; }
virtual inline int ExactNumTopBlobs() const { return -1; }
virtual inline int MinTopBlobs() const { return -1; }
virtual inline int MaxTopBlobs() const { return -1; }
protected:
// The protobuf that stores the layer parameters
LayerParameter layer_param_;
// The vector that stores the parameters as a set of blobs.
vector<shared_ptr<Blob<Dtype> > > blobs_;
// Forward functions: compute the layer output
// (and loss layers return the loss; other layers return the dummy value 0.)
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top) = 0;
// If no gpu code is provided, we will simply use cpu code.
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top) {
// LOG(WARNING) << "Using CPU code as backup.";
return Forward_cpu(bottom, top);
}
// Backward functions: compute the gradients for any parameters and
// for the bottom blobs if propagate_down is true.
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down,
vector<Blob<Dtype>*>* bottom) = 0;
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down,
vector<Blob<Dtype>*>* bottom) {
// LOG(WARNING) << "Using CPU code as backup.";
Backward_cpu(top, propagate_down, bottom);
}
// CheckBlobCounts: called by the parent Layer's SetUp to check that the
// number of bottom and top Blobs provided as input match the expected
// numbers specified by the {ExactNum,Min,Max}{Bottom,Top}Blobs() functions.
virtual void CheckBlobCounts(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
if (ExactNumBottomBlobs() >= 0) {
CHECK_EQ(ExactNumBottomBlobs(), bottom.size())
<< type_name() << " Layer takes " << ExactNumBottomBlobs()
<< " bottom blob(s) as input.";
}
if (MinBottomBlobs() >= 0) {
CHECK_LE(MinBottomBlobs(), bottom.size())
<< type_name() << " Layer takes at least " << MinBottomBlobs()
<< " bottom blob(s) as input.";
}
if (MaxBottomBlobs() >= 0) {
CHECK_GE(MaxBottomBlobs(), bottom.size())
<< type_name() << " Layer takes at most " << MaxBottomBlobs()
<< " bottom blob(s) as input.";
}
if (ExactNumTopBlobs() >= 0) {
CHECK_EQ(ExactNumTopBlobs(), top.size())
<< type_name() << " Layer produces " << ExactNumTopBlobs()
<< " top blob(s) as output.";
}
if (MinTopBlobs() >= 0) {
CHECK_LE(MinTopBlobs(), top.size())
<< type_name() << " Layer produces at least " << MinTopBlobs()
<< " top blob(s) as output.";
}
if (MaxTopBlobs() >= 0) {
CHECK_GE(MaxTopBlobs(), top.size())
<< type_name() << " Layer produces at most " << MaxTopBlobs()
<< " top blob(s) as output.";
}
}
DISABLE_COPY_AND_ASSIGN(Layer);
}; // class Layer
// Forward and backward wrappers. You should implement the cpu and
// gpu specific implementations instead, and should not change these
// functions.
template <typename Dtype>
inline Dtype Layer<Dtype>::Forward(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top) {
switch (Caffe::mode()) {
case Caffe::CPU:
return Forward_cpu(bottom, top);
case Caffe::GPU:
return Forward_gpu(bottom, top);
default:
LOG(FATAL) << "Unknown caffe mode.";
return Dtype(0);
}
}
template <typename Dtype>
inline void Layer<Dtype>::Backward(const vector<Blob<Dtype>*>& top,
const bool propagate_down,
vector<Blob<Dtype>*>* bottom) {
switch (Caffe::mode()) {
case Caffe::CPU:
Backward_cpu(top, propagate_down, bottom);
break;
case Caffe::GPU:
Backward_gpu(top, propagate_down, bottom);
break;
default:
LOG(FATAL) << "Unknown caffe mode.";
}
}
// Serialize LayerParameter to protocol buffer
template <typename Dtype>
void Layer<Dtype>::ToProto(LayerParameter* param, bool write_diff) {
param->Clear();
param->CopyFrom(layer_param_);
param->clear_blobs();
for (int i = 0; i < blobs_.size(); ++i) {
blobs_[i]->ToProto(param->add_blobs(), write_diff);
}
}
// The layer factory function
template <typename Dtype>
Layer<Dtype>* GetLayer(const LayerParameter& param);
} // namespace caffe
#endif // CAFFE_LAYER_H_
// Copyright 2014 BVLC and contributors.
#ifndef CAFFE_LOSS_LAYERS_HPP_
#define CAFFE_LOSS_LAYERS_HPP_
#include <string>
#include <utility>
#include <vector>
#include "leveldb/db.h"
#include "pthread.h"
#include "boost/scoped_ptr.hpp"
#include "hdf5.h"
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/layer.hpp"
#include "caffe/neuron_layers.hpp"
#include "caffe/proto/caffe.pb.h"
namespace caffe {
const float kLOG_THRESHOLD = 1e-20;
/* LossLayer
Takes two inputs of same num (a and b), and has no output.
The gradient is propagated to a.
*/
template <typename Dtype>
class LossLayer : public Layer<Dtype> {
public:
explicit LossLayer(const LayerParameter& param)
: Layer<Dtype>(param) {}
virtual void SetUp(
const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top);
virtual void FurtherSetUp(
const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {}
virtual inline int ExactNumBottomBlobs() const { return 2; }
virtual inline int ExactNumTopBlobs() const { return 0; }
};
/* SigmoidCrossEntropyLossLayer
*/
template <typename Dtype>
class SigmoidCrossEntropyLossLayer : public LossLayer<Dtype> {
public:
explicit SigmoidCrossEntropyLossLayer(const LayerParameter& param)
: LossLayer<Dtype>(param),
sigmoid_layer_(new SigmoidLayer<Dtype>(param)),
sigmoid_output_(new Blob<Dtype>()) {}
virtual void FurtherSetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_SIGMOID_CROSS_ENTROPY_LOSS;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
shared_ptr<SigmoidLayer<Dtype> > sigmoid_layer_;
// sigmoid_output stores the output of the sigmoid layer.
shared_ptr<Blob<Dtype> > sigmoid_output_;
// Vector holders to call the underlying sigmoid layer forward and backward.
vector<Blob<Dtype>*> sigmoid_bottom_vec_;
vector<Blob<Dtype>*> sigmoid_top_vec_;
};
/* EuclideanLossLayer
Compute the L_2 distance between the two inputs.
loss = (1/2 \sum_i (a_i - b_i)^2)
a' = 1/I (a - b)
*/
template <typename Dtype>
class EuclideanLossLayer : public LossLayer<Dtype> {
public:
explicit EuclideanLossLayer(const LayerParameter& param)
: LossLayer<Dtype>(param), diff_() {}
virtual void FurtherSetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_EUCLIDEAN_LOSS;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
Blob<Dtype> diff_;
};
/* InfogainLossLayer
*/
template <typename Dtype>
class InfogainLossLayer : public LossLayer<Dtype> {
public:
explicit InfogainLossLayer(const LayerParameter& param)
: LossLayer<Dtype>(param), infogain_() {}
virtual void FurtherSetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_INFOGAIN_LOSS;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
Blob<Dtype> infogain_;
};
/* HingeLossLayer
*/
template <typename Dtype>
class HingeLossLayer : public LossLayer<Dtype> {
public:
explicit HingeLossLayer(const LayerParameter& param)
: LossLayer<Dtype>(param) {}
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_HINGE_LOSS;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
};
/* MultinomialLogisticLossLayer
*/
template <typename Dtype>
class MultinomialLogisticLossLayer : public LossLayer<Dtype> {
public:
explicit MultinomialLogisticLossLayer(const LayerParameter& param)
: LossLayer<Dtype>(param) {}
virtual void FurtherSetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_MULTINOMIAL_LOGISTIC_LOSS;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
};
/* AccuracyLayer
Note: not an actual loss layer! Does not implement backwards step.
Computes the accuracy and logprob of a with respect to b.
*/
template <typename Dtype>
class AccuracyLayer : public Layer<Dtype> {
public:
explicit AccuracyLayer(const LayerParameter& param)
: Layer<Dtype>(param) {}
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_ACCURACY;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom) {
NOT_IMPLEMENTED;
}
};
/* Also see
- SoftmaxWithLossLayer in vision_layers.hpp
*/
} // namespace caffe
#endif // CAFFE_LOSS_LAYERS_HPP_
// Copyright 2014 BVLC and contributors.
#ifndef CAFFE_NET_HPP_
#define CAFFE_NET_HPP_
#include <map>
#include <set>
#include <string>
#include <vector>
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"
using std::map;
using std::vector;
using std::set;
using std::string;
namespace caffe {
template <typename Dtype>
class Net {
public:
explicit Net(const NetParameter& param);
explicit Net(const string& param_file);
virtual ~Net() {}
// Initialize a network with the network parameter.
void Init(const NetParameter& param);
// Run forward with the input blobs already fed separately. You can get the
// input blobs using input_blobs().
const vector<Blob<Dtype>*>& ForwardPrefilled(Dtype* loss = NULL);
// Run forward using a set of bottom blobs, and return the result.
const vector<Blob<Dtype>*>& Forward(const vector<Blob<Dtype>* > & bottom,
Dtype* loss = NULL);
// Run forward using a serialized BlobProtoVector and return the result
// as a serialized BlobProtoVector
string Forward(const string& input_blob_protos, Dtype* loss = NULL);
// The network backward should take no input and output, since it solely
// computes the gradient w.r.t the parameters, and the data has already
// been provided during the forward pass.
void Backward();
Dtype ForwardBackward(const vector<Blob<Dtype>* > & bottom) {
Dtype loss;
Forward(bottom, &loss);
Backward();
return loss;
}
// Updates the network weights based on the diff values computed.
void Update();
// For an already initialized net, ShareTrainedLayersWith() implicitly copies
// (i.e., using no additional memory) the already trained layers from another
// Net.
void ShareTrainedLayersWith(Net* other);
// For an already initialized net, CopyTrainedLayersFrom() copies the already
// trained layers from another net parameter instance.
void CopyTrainedLayersFrom(const NetParameter& param);
void CopyTrainedLayersFrom(const string trained_filename);
// Writes the net to a proto.
void ToProto(NetParameter* param, bool write_diff = false);
// returns the network name.
inline const string& name() { return name_; }
// returns the layer names
inline const vector<string>& layer_names() { return layer_names_; }
// returns the blob names
inline const vector<string>& blob_names() { return blob_names_; }
// returns the blobs
inline const vector<shared_ptr<Blob<Dtype> > >& blobs() { return blobs_; }
// returns the layers
inline const vector<shared_ptr<Layer<Dtype> > >& layers() { return layers_; }
// returns the bottom and top vecs for each layer - usually you won't need
// this unless you do per-layer checks such as gradients.
inline vector<vector<Blob<Dtype>*> >& bottom_vecs() { return bottom_vecs_; }
inline vector<vector<Blob<Dtype>*> >& top_vecs() { return top_vecs_; }
// returns the parameters
inline vector<shared_ptr<Blob<Dtype> > >& params() { return params_; }
// returns the parameter learning rate multipliers
inline vector<float>& params_lr() {return params_lr_; }
inline vector<float>& params_weight_decay() { return params_weight_decay_; }
// Input and output blob numbers
inline int num_inputs() { return net_input_blobs_.size(); }
inline int num_outputs() { return net_output_blobs_.size(); }
inline vector<Blob<Dtype>*>& input_blobs() { return net_input_blobs_; }
inline vector<Blob<Dtype>*>& output_blobs() { return net_output_blobs_; }
inline vector<int>& input_blob_indices() { return net_input_blob_indices_; }
inline vector<int>& output_blob_indices() { return net_output_blob_indices_; }
// has_blob and blob_by_name are inspired by
// https://github.com/kencoken/caffe/commit/f36e71569455c9fbb4bf8a63c2d53224e32a4e7b
// Access intermediary computation layers, testing with centre image only
bool has_blob(const string& blob_name);
const shared_ptr<Blob<Dtype> > blob_by_name(const string& blob_name);
bool has_layer(const string& layer_name);
const shared_ptr<Layer<Dtype> > layer_by_name(const string& layer_name);
protected:
// Helpers for Init.
// Append a new input or top blob to the net.
void AppendTop(const NetParameter& param, const int layer_id,
const int top_id, set<string>* available_blobs,
map<string, int>* blob_name_to_idx);
// Append a new bottom blob to the net.
int AppendBottom(const NetParameter& param, const int layer_id,
const int bottom_id, set<string>* available_blobs,
map<string, int>* blob_name_to_idx);
// Function to get misc parameters, e.g. the learning rate multiplier and
// weight decay.
void GetLearningRateAndWeightDecay();
// Individual layers in the net
vector<shared_ptr<Layer<Dtype> > > layers_;
vector<string> layer_names_;
map<string, int> layer_names_index_;
vector<bool> layer_need_backward_;
// blobs stores the blobs that store intermediate results between the
// layers.
vector<shared_ptr<Blob<Dtype> > > blobs_;
vector<string> blob_names_;
map<string, int> blob_names_index_;
vector<bool> blob_need_backward_;
// bottom_vecs stores the vectors containing the input for each layer.
// They don't actually host the blobs (blobs_ does), so we simply store
// pointers.
vector<vector<Blob<Dtype>*> > bottom_vecs_;
vector<vector<int> > bottom_id_vecs_;
// top_vecs stores the vectors containing the output for each layer
vector<vector<Blob<Dtype>*> > top_vecs_;
vector<vector<int> > top_id_vecs_;
// blob indices for the input and the output of the net
vector<int> net_input_blob_indices_;
vector<int> net_output_blob_indices_;
vector<Blob<Dtype>*> net_input_blobs_;
vector<Blob<Dtype>*> net_output_blobs_;
string name_;
// The parameters in the network.
vector<shared_ptr<Blob<Dtype> > > params_;
// the learning rate multipliers
vector<float> params_lr_;
// the weight decay multipliers
vector<float> params_weight_decay_;
// The bytes of memory used by this net
size_t memory_used_;
DISABLE_COPY_AND_ASSIGN(Net);
};
} // namespace caffe
#endif // CAFFE_NET_HPP_
// Copyright 2014 BVLC and contributors.
#ifndef CAFFE_NEURON_LAYERS_HPP_
#define CAFFE_NEURON_LAYERS_HPP_
#include <string>
#include <utility>
#include <vector>
#include "leveldb/db.h"
#include "pthread.h"
#include "boost/scoped_ptr.hpp"
#include "hdf5.h"
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"
#define HDF5_DATA_DATASET_NAME "data"
#define HDF5_DATA_LABEL_NAME "label"
namespace caffe {
/* NeuronLayer
An interface for layers that take one blob as input (x),
and produce one blob as output (y).
*/
template <typename Dtype>
class NeuronLayer : public Layer<Dtype> {
public:
explicit NeuronLayer(const LayerParameter& param)
: Layer<Dtype>(param) {}
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_NONE;
}
virtual inline int ExactNumBottomBlobs() const { return 1; }
virtual inline int ExactNumTopBlobs() const { return 1; }
};
/* BNLLLayer
y = x + log(1 + exp(-x)) if x > 0
y = log(1 + exp(x)) if x <= 0
y' = exp(x) / (exp(x) + 1)
*/
template <typename Dtype>
class BNLLLayer : public NeuronLayer<Dtype> {
public:
explicit BNLLLayer(const LayerParameter& param)
: NeuronLayer<Dtype>(param) {}
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_BNLL;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
};
/* DropoutLayer
During training only, sets some portion of x to 0, adjusting the
vector magnitude accordingly.
mask = bernoulli(1 - threshold)
scale = 1 / (1 - threshold)
y = x * mask * scale
y' = mask * scale
*/
template <typename Dtype>
class DropoutLayer : public NeuronLayer<Dtype> {
public:
explicit DropoutLayer(const LayerParameter& param)
: NeuronLayer<Dtype>(param) {}
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_DROPOUT;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
shared_ptr<Blob<unsigned int> > rand_vec_;
Dtype threshold_;
Dtype scale_;
unsigned int uint_thres_;
};
/* PowerLayer
y = (shift + scale * x) ^ power
y' = scale * power * (shift + scale * x) ^ (power - 1)
= scale * power * y / (shift + scale * x)
*/
template <typename Dtype>
class PowerLayer : public NeuronLayer<Dtype> {
public:
explicit PowerLayer(const LayerParameter& param)
: NeuronLayer<Dtype>(param) {}
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_POWER;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
Dtype power_;
Dtype scale_;
Dtype shift_;
Dtype diff_scale_;
};
/* ReLULayer
Rectified Linear Unit non-linearity.
The simple max is fast to compute, and the function does not saturate.
y = max(0, x).
y' = 0 if x < 0
y' = 1 if x > 0
*/
template <typename Dtype>
class ReLULayer : public NeuronLayer<Dtype> {
public:
explicit ReLULayer(const LayerParameter& param)
: NeuronLayer<Dtype>(param) {}
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_RELU;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
};
/* SigmoidLayer
Sigmoid function non-linearity, a classic choice in neural networks.
Note that the gradient vanishes as the values move away from 0.
The ReLULayer is often a better choice for this reason.
y = 1. / (1 + exp(-x))
y ' = exp(x) / (1 + exp(x))^2
or
y' = y * (1 - y)
*/
template <typename Dtype>
class SigmoidLayer : public NeuronLayer<Dtype> {
public:
explicit SigmoidLayer(const LayerParameter& param)
: NeuronLayer<Dtype>(param) {}
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_SIGMOID;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
};
/* TanHLayer
Hyperbolic tangent non-linearity, popular in auto-encoders.
y = 1. * (exp(2x) - 1) / (exp(2x) + 1)
y' = 1 - ( (exp(2x) - 1) / (exp(2x) + 1) ) ^ 2
*/
template <typename Dtype>
class TanHLayer : public NeuronLayer<Dtype> {
public:
explicit TanHLayer(const LayerParameter& param)
: NeuronLayer<Dtype>(param) {}
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_TANH;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom);
};
/* ThresholdLayer
Outputs 1 if value in input is above threshold, 0 otherwise.
The defult threshold = 0, which means positive values would become 1 and
negative or 0, would become 0
y = 1 if x > threshold
y = 0 if x <= threshold
y' = don't differenciable
*/
template <typename Dtype>
class ThresholdLayer : public NeuronLayer<Dtype> {
public:
explicit ThresholdLayer(const LayerParameter& param)
: NeuronLayer<Dtype>(param) {}
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual inline LayerParameter_LayerType type() const {
return LayerParameter_LayerType_THRESHOLD;
}
protected:
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const bool propagate_down, vector<Blob<Dtype>*>* bottom) {
NOT_IMPLEMENTED;
}
Dtype threshold_;
};
} // namespace caffe
#endif // CAFFE_NEURON_LAYERS_HPP_
// Copyright 2014 BVLC and contributors.
#ifndef CAFFE_OPTIMIZATION_SOLVER_HPP_
#define CAFFE_OPTIMIZATION_SOLVER_HPP_
#include <string>
#include <vector>
namespace caffe {
template <typename Dtype>
class Solver {
public:
explicit Solver(const SolverParameter& param);
explicit Solver(const string& param_file);
void Init(const SolverParameter& param);
// The main entry of the solver function. In default, iter will be zero. Pass
// in a non-zero iter number to resume training for a pre-trained net.
virtual void Solve(const char* resume_file = NULL);
inline void Solve(const string resume_file) { Solve(resume_file.c_str()); }
virtual ~Solver() {}
inline shared_ptr<Net<Dtype> > net() { return net_; }
protected:
// PreSolve is run before any solving iteration starts, allowing one to
// put up some scaffold.
virtual void PreSolve() {}
// Get the update value for the current iteration.
virtual void ComputeUpdateValue() = 0;
// The Solver::Snapshot function implements the basic snapshotting utility
// that stores the learned net. You should implement the SnapshotSolverState()
// function that produces a SolverState protocol buffer that needs to be
// written to disk together with the learned net.
void Snapshot();
// The test routine
void TestAll();
void Test(const int test_net_id = 0);
virtual void SnapshotSolverState(SolverState* state) = 0;
// The Restore function implements how one should restore the solver to a
// previously snapshotted state. You should implement the RestoreSolverState()
// function that restores the state from a SolverState protocol buffer.
void Restore(const char* resume_file);
virtual void RestoreSolverState(const SolverState& state) = 0;
SolverParameter param_;
int iter_;
shared_ptr<Net<Dtype> > net_;
vector<shared_ptr<Net<Dtype> > > test_nets_;
DISABLE_COPY_AND_ASSIGN(Solver);
};
template <typename Dtype>
class SGDSolver : public Solver<Dtype> {
public:
explicit SGDSolver(const SolverParameter& param)
: Solver<Dtype>(param) {}
explicit SGDSolver(const string& param_file)
: Solver<Dtype>(param_file) {}
protected:
virtual void PreSolve();
Dtype GetLearningRate();
virtual void ComputeUpdateValue();
virtual void SnapshotSolverState(SolverState * state);
virtual void RestoreSolverState(const SolverState& state);
// history maintains the historical momentum data.
vector<shared_ptr<Blob<Dtype> > > history_;
DISABLE_COPY_AND_ASSIGN(SGDSolver);
};
} // namespace caffe
#endif // CAFFE_OPTIMIZATION_SOLVER_HPP_
// Copyright 2014 BVLC and contributors.
#ifndef CAFFE_SYNCEDMEM_HPP_
#define CAFFE_SYNCEDMEM_HPP_
#include <cstdlib>
#include "caffe/common.hpp"
namespace caffe {
// Theoretically, CaffeMallocHost and CaffeFreeHost should simply call the
// cudaMallocHost and cudaFree functions in order to create pinned memory.
// However, those codes rely on the existence of a cuda GPU (I don't know
// why that is a must since allocating memory should not be accessing the
// GPU resorce, but it just creates an error as of Cuda 5.0) and will cause
// problem when running on a machine without GPU. Thus, we simply define
// these two functions for safety and possible future change if the problem
// of calling cuda functions disappears in a future version.
//
// In practice, although we are creating unpinned memory here, as long as we
// are constantly accessing them the memory pages almost always stays in
// the physical memory (assuming we have large enough memory installed), and
// does not seem to create a memory bottleneck here.
inline void CaffeMallocHost(void** ptr, size_t size) {
*ptr = malloc(size);
}
inline void CaffeFreeHost(void* ptr) {
free(ptr);
}
class SyncedMemory {
public:
SyncedMemory()
: cpu_ptr_(NULL), gpu_ptr_(NULL), size_(0), head_(UNINITIALIZED),
own_cpu_data_(false) {}
explicit SyncedMemory(size_t size)
: cpu_ptr_(NULL), gpu_ptr_(NULL), size_(size), head_(UNINITIALIZED),
own_cpu_data_(false) {}
~SyncedMemory();
const void* cpu_data();
void set_cpu_data(void* data);
const void* gpu_data();
void* mutable_cpu_data();
void* mutable_gpu_data();
enum SyncedHead { UNINITIALIZED, HEAD_AT_CPU, HEAD_AT_GPU, SYNCED };
SyncedHead head() { return head_; }
size_t size() { return size_; }
private:
void to_cpu();
void to_gpu();
void* cpu_ptr_;
void* gpu_ptr_;
size_t size_;
SyncedHead head_;
bool own_cpu_data_;
DISABLE_COPY_AND_ASSIGN(SyncedMemory);
}; // class SyncedMemory
} // namespace caffe
#endif // CAFFE_SYNCEDMEM_HPP_
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment