Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Game of Thrones Daily
RMH
Three Goblin Art
occasionally subtle

if i look back, i am lost

ellievsbear

blake kathryn
Keni
Sweet Seals For You, Always
Show & Tell
TVSTRANGERTHINGS
Stranger Things

tannertan36
almost home

PR's Tumblrdome
NASA
Cosimo Galluzzi
Monterey Bay Aquarium
AnasAbdin
we're not kids anymore.
seen from Bangladesh
seen from United States
seen from United States
seen from United States
seen from United States

seen from United States
seen from United States
seen from United States

seen from United States
seen from United States
seen from United States

seen from United States

seen from United States

seen from Malaysia
seen from United States

seen from United States
seen from United States

seen from Malaysia
seen from United States
seen from United States
@aiskoaskosd
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
ImageNet Classification with Deep Convolutional Neural Networks
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
This paper proposes a Parametric Rectified Linear Unit (PReLU) and the generalized initialization for a model with ReLU. Xavier initialization is the famous way of initialization and it tries to keep the variance of gradients or the response from neurons. It works well, but this initialization implicitly accusmes linear functions, thus the model with non-linear functions (such as ReLU, tanh, softsign...) needs a different initialization, even though Xavier initialization still works with the model with nonlinearity. The basic idea of ReLU initialization is same as Xavier initialization: keep the variance while propagating.
PReLU
PReLu is defined as:
$ \begin{equation} f(y_i)= \begin{cases}\tag{1} y_i, & \text{if}\ y_i > 0 \newline a_i y_i, & \text{if}\ y_i \le 0 \end{cases} \end{equation} $
Initialization
$ \begin{equation} Var[w_l] = \begin{cases} \frac{2}{(1+a^2)n^l}, & \text{feedforward case} \newline \frac{2}{(1+a^2)n^{l+1}}, & \text{backward case} \newline \frac{4}{(1+a^2)(n^l + n^{l+1})}, & \text{averaged case} \newline \end{cases} \end{equation} $ $ \begin{align} y_i &\text{: the input of the nonlinear activation f on the }i\text{th channel} \newline a_i &\text{: the learnable parameter} \newline w_l &\text{: weights at layer l} \newline n_l &\text{: the number of connections at layer l, details are at (5)} \end{align} $
PReLU with a fixed a_i, typically 0.01, is called leaky ReLU (If a_i is zero, it is called ReLU). The parameter a_i is introduced to avoid zero gradients. The number of adjustable parameters for PReLU is equal to the total number of channels. If a_i is shared on the layer, the number of adjustable parameters is equal to the number of layers.
To optimize a_i, backpropagation is used:
$ \begin{equation}\tag{2} \frac{\delta L}{\delta a_i} = \sum\limits_{y_i} \frac{\delta L}{\delta f(y_i)} \frac{\delta f(y_i)}{\delta a_i} \end{equation} $ $L \text{: a cost function }$
Gradients are defined as:
$ \begin{equation} \frac{\delta f(y_i)}{\delta a_i} = \begin{cases}\tag{3} 0, & \text{if}\ y_i > 0 \newline y_i, & \text{if}\ y_i \le 0 \end{cases} \end{equation} $
The update rule can have the momentum: $ \tag{4} \Delta a_i := \mu \Delta a_i + \epsilon \frac{\delta L}{\delta a_i} $
$ \begin{align} \mu &\text{: the momentum} \newline \epsilon &\text{: the learning rate} \end{align} $
Note that a weight decay tends to push a_i towards zero, thus PReLU might become ReLU. That is why a weight decay is not used into the update rule. This paper uses a_i = 0.25 as the initialization.
Architecture: small model
The more increasing the depth, the smaller coefficients. This phenomenon implies nonlinearity at higher layers and efforts to keep information at lower layer.
Architecture: big model
THe difference of model B and model C is the width of layers.
Imagenet result
The model that all ReLU are replaced by PReLU gets 1.2% more accuracy at Imagenet2012.
Initialization
All models have weights and biases, and they are needed to be initialized before leraning. The way of initialization is important. It dominates the maximum accuracy of a model. Gaussian distribution, uniform distribution or Xavier initialization is used often for the initialization, and especially Xavier initialization is famous, but it assumes that a model only uses linear functions, thus a model with ReLU needs a different initialization. This paper suggests the way of initialization for a model with ReLU. Let me calculate it. k is the spatial size of the layer and c is the input channels. At layer l, the number of connections n: $ n_l = k_l^2 c_l \tag{5}$
$s^l = z^l w^l + b^l\tag{6}$ $z^{l+1} = f(s^l)\tag{7}$ s are the responses at a pixel of the output map, W are weights and b are biases. z is activation values and f is an activation function. Backpropagation is described as: $s^{l+1} = z^{l+1} W^{l+1} + b^{l+1} = f(s^l) W^{l+1} + b^{l+1}\tag{8}$ $\frac{\delta s^{l+1}}{\delta s^l} = \frac{\delta f}{\delta s^l} W^{l+1}\tag{9}$ $\frac{\delta s^{l}}{\delta W^l} = z^{l}\tag{10}$ $\frac{\delta L}{\delta s^l} = \frac{\delta L}{\delta s^{l+1}}\frac{\delta s^{l+1}}{\delta s^l} = \frac{\delta f}{\delta s^l} W^{l+1} \frac{\delta L}{\delta s^{l+1}}\tag{11}$ $\frac{\delta L}{\delta W^l} = \frac{\delta L}{\delta s^{l}}\frac{\delta s^{l}}{\delta W^l} = z^l \frac{\delta f}{\delta s^l} W^{l+1} \frac{\delta L}{\delta s^{l+1}}\tag{12}$ L is a cost function.
If X and Y are i.i.d:
$ \begin{align} Var[XY] &= [E[X]]^2 Var[Y] + [E[Y]]^2 Var[X] + Var[X]Var[Y] \newline &= E[X^2]E[Y^2] - [E[X]]^2 [E[Y]]^2 \tag{13} \end{align} $
$Var[aX \pm bY] = a^2 Var[X] + b^2 Var[Y] \pm 2ab Cov[X.Y]\tag{14}$ $ \begin{align} Cov[X,Y] &= E[[X - E[x]][Y-E[Y]]] \newline &= E[XY] - E[X]E[Y]\tag{15} \end{align} $
Firstly consider the variance of feedforward propagation: Assume b are zero, w and s are zero-mean symmetric distributions, and the activation function f is ReLU. $E[z^{l}] = E[f(s^{l-1})] \neq 0 \tag{16}$ $E[{z^l}^2] = \frac{1}{2} Var[s^{l+1}] = \frac{1}{2} E[{s^{l+1}}^2] \tag{17}$
From (5), (6), (14), (15), (17):
$ \begin{align} Var[s^l] &= n^l Var[z^lw^l + b^l] \newline &= n^l Var[z^lw^l] \newline &= n^l E[{z^l}^2] E[{w^l}^2] \newline &= n^l E[{z^l}^2] Var[W^l]\newline &= \frac{1}{2} n^l Var[s^{l+1}] Var[W^l] \end{align}\tag{18} $
Thus, $Var[s^l] = Var[s^1](\prod\limits_{l=2}^{L} \frac{1}{2} n^l Var[w_l]) \tag{19}$ To keep flowing information, $\frac{1}{2}n^l Var[w^l] = 1, \, \forall l. \tag{20}$
Secondly consider the variance of backward propagation: From (6), (11), $ \begin{align} \frac{\delta L}{\delta z^{l+1}} &= \frac{\delta L}{\delta s^{i+1}} \frac{\delta s^{i+1}}{\delta z^{i+1}} \newline &= \frac{1}{w^{l+1}\frac{\delta f}{\delta s^l}} \frac{\delta L}{\delta s^l} \times w^{l+1} \newline &= \frac{\delta f}{\delta s^l}^{-1} \frac{\delta L}{\delta s^l} \end{align}\tag{21} $
From (6), (21),
$ \begin{align}\tag{22} \frac{\delta L}{\delta z^l} &= \frac{\delta L}{\delta s^l} \frac{\delta s^l}{\delta z^l} \newline &= w^l\frac{\delta L}{\delta s^l} \newline &= w^l \frac{\delta f}{\delta s^l}\frac{\delta L}{\delta z^{l+1}} \end{align} $
From (5), (22), $Var[\frac{\delta L}{\delta z^l}] = n^{l+1}Var[w^l \frac{\delta f}{\delta s^l}\frac{\delta L}{\delta z^{l+1}}\tag{23}]$ Assume, $\forall l, \; E[\frac{\delta f}{\delta s^l}] = \frac{1}{2}, Var[\frac{\delta f}{\delta s^l}] = \frac{1}{4}, E[w^l]=0, E[\frac{\delta L}{\delta z^i}]=0\tag{24}$ From (13), (23), (24):
$ \begin{align}\tag{25} Var[\frac{\delta L}{\delta z^l}] &= \frac{1}{4} n^{l+1}Var[w^l]Var[\frac{\delta L}{\delta z^{l+1}}] + \frac{1}{4} n^{l+1}Var[w^l]Var[\frac{\delta L}{\delta z^{l+1}}]\newline &=\frac{1}{2} n^{l+1}Var[w^l]Var[\frac{\delta L}{\delta z^{l+1}}] \end{align} $
Thus, $Var[\frac{\delta L}{\delta z^2}] = Var[\frac{\delta L}{\delta z^{L+1}}](\prod\limits_{l=2}^{L} \frac{1}{2} n^{l+1} Var[w_l]) \tag{26}$ To keep flowing information, $\frac{1}{2} n^{l+1} Var[w_l] = 1, \forall l. \tag{28}$ From (20), (28) for ReLU, $ \begin{equation} Var[w_l] = \begin{cases}\tag{29} \frac{2}{n^l}, & \text{feedforward case} \newline \frac{2}{n^{l+1}}, & \text{backward case} \newline \frac{4}{n^l + n^{l+1}}, & \text{averaged case} \newline \end{cases} \end{equation} $
For PRELU, $ \begin{equation} Var[w_l] = \begin{cases}\tag{30} \frac{2}{(1+a^2)n^l}, & \text{feedforward case} \newline \frac{2}{(1+a^2)n^{l+1}}, & \text{backward case} \newline \frac{4}{(1+a^2)(n^l + n^{l+1})}, & \text{averaged case} \newline \end{cases} \end{equation} $
As you can see at (30), if a=0, it is the ReLU case: if a=1, it is the linear case. ReLU initialization is better than Xavier initialization. Xavier initialization can not take care of 30-layer model.
Preprocess
subtract mean-pixel
Horizontally flipping
Data-augmentation
RGB-shift
s is randomly sampled from the range of [256, 512]
The shortest side of the picture is resized to s
224*224 patch is cropped out randomly
Optimization
SGD: when the error rate gets stagnated, divide the learning rate by 10
Learning rate: 0.01
Minibatch: 128
Momentum: 0.9
Weight decay: 0.0005
Test evaluation
Dense evaluation and multi-view testing on feature maps are combined.
The resized picture is fed onto the model and the feature map from the last convolutional layer, 14*14 window, is pooled by SPP pool. Then pooled feature map is fed onto the latter layer and the scores of all dense sliding windows are averaged. The same process is executed to horizontally flipped images and images at multiple scales. Simply, their socres are averaged.
Dropout
the first two fully-connected layers with probability 0.5
Table 5 is about imagenet single-model result. For test evaluation, it utilizes 10-view evaluation. You can see that the width of a layer is also the matter for the accuracy. Imagenet single model result: Multi-scale, multiview and dense evaluation are used. Imagenet multi-model result: Multi-scale, multiview and dense evaluation are used.
Understanding the difficulty of training deep feedforward neural networks
The initialization often dominates the maximum accuracy of the network and the speed of convergence. The good initialization results in the good result, thus we have to be careful about how to initialize. This paper uses 5 layer multi-perceptrons for the experiment and analyses results. Interestingly, different activation functions or the way initializations change the behavior of inside the network (gradients and the ratio of activation values). In the end, this paper suggests and analyses normalized initialization. Normalized initialization is often called Xavier initialization (it derives from the author name). This initialization is designed to simply try to keep the variance of gradients during propagation.
Date: 2010
Initializations
Default initialization
$W_{ij} \sim U[-\frac{1}{\sqrt{n}}, \frac{1}{\sqrt{n}}]$
U: uniform distribution
n: the size of the previous layer (the number of columns of W)
Normalized initialization
$W_{ij} = U[\frac{-\sqrt{6}}{\sqrt{n_i + n_{i+1}}},\frac{\sqrt{6}}{\sqrt{n_i + n_{i+1}}}]$
U: uniform distribution
n: the size of layer
i: layer i
I think normalized initialization can be $Gaussian(0, \frac{2}{n_i + n_{i+1}})$
Architecture
The neural networks with one to five hidden layers, with one thousand hidden units per layer, and with a softmax logistic regression for the output layer.
The cost function is the negative log-likelihood −log P(y|x)
Actication function and the derivatives of themself
Activate functions
$sigmoid(x) = \frac{1}{1 + \mathrm{e}^{x}}$
$tanh(x) = \frac{\mathrm{e}^{x} - \mathrm{e}^{-x}}{\mathrm{e}^{x} + \mathrm{e}^{-x}}$
$softsign(x) = \frac{x}{1 + |x|}$
The derivatives
$\frac{\delta sigmoid}{\delta x} = \frac{\mathrm{e}^{-x}}{(1 + \mathrm{e}^{-x})^2}$
$\frac{\delta tanh}{\delta x} = 1 - tanh(x)^2$
$\frac{\delta softsign}{\delta x} = \frac{1}{(1 + |x|)^2}$
The tail of the derivative of the softsign is of the shape of the quadratic polynomials rather than exponentials, thus gradients flows.
Section 3
The sigmoid non-linearity has been already shown to slow down learning because of its none-zero mean that induces important singular values in the Hessian. Layers start to saturate from bottom to top. According to this paper, this phenomenon is caused by a combination of random initialization and the fact that an hidden unit output of 0 corresponds to a saturated sigmoid and correlates with slow convergence. Note that this phenomenon will not be observed when weights are initialized by unsupervised pre-training. As you can see, the last layer gets lower saturation quickly, thus gradients vanish on the way back and learning might not start at the lower layer. Unfortunately, due to gradient vanishing at lowe layers, a 5 layers network is too deep for this method for this network to start learning. This lower saturation is caused by the tendency of softmax to rely on more biases than activations from the previous layer. b of softmax(b + Wh) is learned quickly, but h varies lots, thus Wh is pushed towards 0 for stabilization by pushing h towards 0. The derivative of sigmoid(0) is low, gradients might not be propagated at lower layer.
This saturation phenomenon needs to be explained in the future.
Different activation functions result in different activation values ratio.
Section 4
Conditional log-likelihood cost function works much better than the quadratic cost. There are clearly more severe plateaus with the quadratic cost.
It has been found that back-propagated gradients after initialization were smaller as one moves from the output layer towards the input layer and the variance of the back-propagated gradients decreases as we go backwards in the network. If normalized initialization is applied, this phenomenon will not be observed.
Calculate normalized initialization backpropagation is like this: $s^i = z^i W^i + b^i$ $z^{i+1} = f(s^i)$ $s^{i+1} = z^{i+1} W^{i+1} + b^{i+1} = f(s^i) W^{i+1} + b^{i+1}$ $\frac{\delta s^{i+1}}{\delta s^i} = \frac{\delta f}{\delta s^i} W^{i+1}$ $\frac{\delta s^{i}}{\delta W^i} = z^{i}$ $\frac{\delta L}{\delta s^i} = \frac{\delta L}{\delta s^{i+1}}\frac{\delta s^{i+1}}{\delta s^i} = \frac{\delta f}{\delta s^i} W^{i+1} \frac{\delta L}{\delta s^{i+1}}$ $\frac{\delta L}{\delta W^i} = \frac{\delta L}{\delta s^{i}}\frac{\delta s^{i}}{\delta W^i} = z^i \frac{\delta f}{\delta s^i} W^{i+1} \frac{\delta L}{\delta s^{i+1}}$ Consider linear function, thus: $\frac{\delta f}{\delta s^i} \approx 1$ Let the variance be Var[ ]: $\mu(x_i)=\mu(W_i) =0, then \, Var[W_i x_i] = Var[W_i]Var[x_i]$ $Var[z^i] = Var[x] \, \prod\limits_{i'=0}^{i'-1} n_{i'} Var[W^{i'}]$ $Var[\frac{\delta L}{\delta s^i}] =Var[\frac{\delta L}{\delta s^d}] \, \prod\limits_{i'=i}^{d} n_{i'+1} Var[W^{i'}]$ $Var[\frac{\delta L}{\delta W^i}] =\prod\limits_{i'=0}^{i'-1} n_{i'} Var[W^{i'}] \, \prod\limits_{i'=i}^{d} n_{i'+1} Var[W^{i'}] \times Var[x] Var[\frac{\delta L}{\delta s^d}] $ To keep information flowing: $\forall(i,i'),\, Var[z^i]=Var[z^{i'}]$ To keep information flowing backward: $\forall (i,i'), \, \frac{\delta L}{\delta s^i} = \frac{\delta L}{\delta s^{i'}}$ Two conditions are defined by two equations above: $\forall i,\, n_i Var[W^i]=1$ $\forall i,\, n_{i+1} Var[W^i]=1$ As a compromise between these two constraints: $\forall i,\, Var[W^i]=\frac{2}{n_i + n_{i+1}}$ The variance of uniform distribution is: $W_i\sim\, U[a,b], \, Var[U[a,b]]=\frac{(b-a)^2}{12}$ Thus, the variance of default initialization is: $W_{i} \sim\, U[-\frac{1}{\sqrt{n_{i-1}}}, \frac{1}{\sqrt{n_{i-1}}}], \, Var[U[-\frac{1}{\sqrt{n_{i-1}}}, \frac{1}{\sqrt{n_{i-1}}}]]=\frac{1}{3n}$ $n_{i-1}Var[W_i] = \frac{1}{3}$ If all n are the same size, you can see that information will be lost on the way, to prevent it: $W\sim\, U[\frac{-\sqrt{6}}{\sqrt{n_i + n_{i+1}}},\frac{\sqrt{6}}{\sqrt{n_i + n_{i+1}}}]$ This initialization is called normalized distribution or Xavier initialization. Confirm the variance: $Var[U[\frac{-\sqrt{6}}{\sqrt{n_i + n_{i+1}}},\frac{\sqrt{6}}{\sqrt{n_i + n_{i+1}}}]] = \frac{(\frac{\sqrt{6}}{\sqrt{n_i + n_{i+1}}} + \frac{\sqrt{6}}{\sqrt{n_i + n_{i+1}}})^2}{12} = \frac{2}{n_i + n_{i+1}}$ Thus, no information loss from the variance point of view while propagating
If normalized initialization, activation values are totally different. The variance of the back-propagated gradients gets smaller if the initialization is not normalized. Normalized initialization tries to solve the problem exactly. Gradients flow backwardly more. Normalized initialization works. The variances are preserved if normalized initialization is applied during training. When compared with figure3, the variances change less. Pre-training works well. I suppose that unsupervised pre-training tries to keep the amount of information at the last layer as much as possible, thus what unsupervised pre-training is doing is intrinsicaly same as normalized initialization. The way of initialization dominates the accuracy
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
The paper proposes a integrated approach to object detection, recognition, and localization with a single Convolutional Net. The way of test evaluation is new.
Date: 2013
Preprocess
The shortest side of the picture is resized to 256
Horizontally flipping
Data-augmentation
Each five random 224*224 patch from original 256*256 patch and horizontally flipped patch: totally 10
Optimization
SGD: decrease the learning rate by a factor of 0.5 after (30, 50, 60, 70, 80) epochs
Learning rate: 0.05
Momentum: 0.6
Batch size: 128
Weight decay: 0.00001
Weight initialization
Weights: zero-mean gaussian distribution with standard deviation 0.01
Biases: zero-mean gaussian distribution with standard deviation 0.01
Dropout
Fully-connected layers at 6th and 7th with probability 0.5
Test Evaluation
This test-evaluation is often called dense sliding window method or dense evaluation.
Shifted output from 5th layer is fed onto the latter layers and the results are averaged.
(1) For a single image, at a given scale, we start with the unpooled layer 5 feature maps (2) Each of unpooled maps undergoes a 3*3 max pooling operation (non-overlapping regions), repeated 3*3 times for (∆x, ∆y) pixel offsets of {0, 1, 2}. (3) This produces a set of pooled feature maps, replicated (3*3) times for different (∆x, ∆y) combinations. (4) The classifier (layers 6,7,8) has a fixed input size of 5*5 and produces a C-dimensional output vector for each location within the pooled maps. The classifier is applied in sliding-window fashion to the pooled maps, yielding C-dimensional output maps (for a given (∆x, ∆y) combination). (5) The output maps for different (∆x, ∆y) combinations are reshaped into a single 3D output map (two spatial dimensions x C classes).
Architecture
Result
Imagenet recognition

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Add or subtract between dictionaries in python
like this:
from collections import Counter a = {'a':1,'b':2} b = {'a':2, 'b':3, 'c':1} c = dict(Counter(d1) + Counter(d2)) #{'a': 3, 'b': 5, 'c': 1}
you have to be careful when a value is negative or 0,
from collections import Counter a = {'a':1,'b':3.1} b = {'a':1, 'b':3, 'c':1} c = dict(Counter(a) - Counter(b)) #{'b':0.100000000}
As you can see, the values that are less than equal 0 were vanished from the dictionary
How to uninstall modules that were installed by setup.py
First, install again, then remove all modules via module.txt:
python setup.py install --record module.txt cat module.txt | xargs rm -rvf
reference: http://stackoverflow.com/questions/1550226/python-setup-py-uninstall/25209129#25209129
How to install Gensim in EC2
Here is the perfect reference:http://hivecolor.com/id/54 It worked well at 1/8/15.
Quotation from http://hivecolor.com/id/54 almost all:
on terminal,
sudo yum groupinstall -y 'Development Tools' install openssl-devel* zlib*.x86_64 sudo yum install -y gcc gcc-c++ zlib-devel openssl-devel bzip2-devel sudo yum install -y atlas atlas-devel atlas-sse3 atlas-sse3-devel gcc-gfortran cd wget http://python.org/ftp/python/2.7/Python-2.7.tgz tar xfz Python-2.7.tgz cd Python-2.7 ./configure --prefix=/opt/python2.7 --with-threads --enable-shared make sudo make install cd vi ~/.bashrc
on vi,
alias python='/opt/python2.7/bin/python' alias python2.7='/opt/python2.7/bin/python' PATH=$PATH:/opt/python2.7/bin
on terminal,
source ~/.bashrc sudo vi /etc/ld.so.conf.d/opt-python2.7.conf
on vi,
/opt/python2.7/lib
on terminal,
sudo ldconfig sudo chown -R ec2-user /opt/python2.7 sudo chgrp -R ec2-user /opt/python2.7 cd wget http://downloads.sourceforge.net/project/numpy/NumPy/1.7.0/numpy-1.7.0.tar.gz tar zxvf numpy-1.7.0.tar.gz cd numpy-1.7.0 python2.7 setup.py build python2.7 setup.py install cd wget http://downloads.sourceforge.net/project/scipy/scipy/0.11.0/scipy-0.11.0.tar.gz tar zxvf scipy-0.11.0.tar.gz cd scipy-0.11.0 python2.7 setup.py build python2.7 setup.py install cd wget https://pypi.python.org/packages/source/g/gensim/gensim-0.12.1.tar.gz#md5=66c279a2b00de4b3da9e64d4d2e846a7 tar zxvf gensim-0.12.1.tar.gz cd gensim-0.12.1 python2.7 setup.py build python2.7 setup.py install cd wget https://bootstrap.pypa.io/ez_setup.py python2.7 ez_setup.py cd /opt/python2.7/bin/ ./easy_install pip ./pip install ipython
check if it succeeded to install or not,
cd ipython import numpy import scipy import gensim
Fail to update brew
Error:
04:44 $ brew update error: Your local changes to the following files would be overwritten by merge: Library/Formula/boost-python.rb Library/Formula/boost.rb Library/Formula/gflags.rb Library/Formula/glog.rb Library/Formula/leveldb.rb Library/Formula/lmdb.rb Library/Formula/protobuf.rb Library/Formula/snappy.rb Library/Formula/szip.rb Please, commit your changes or stash them before you can merge. error: The following untracked working tree files would be overwritten by merge: Library/Formula/gupnp-tools.rb Library/Formula/pdf2svg.rb Please move or remove them before you can merge. Aborting Error: Failure while executing: git pull -q origin refs/heads/master:refs/remotes/origin/master
you need to overwrite local repository of brew by new version of brew, thus on terminal:
cd $(brew --prefix) git fetch origin git reset --hard origin/master brew update && brew upgrade
reference: http://qiita.com/harapeko_wktk/items/f4f44ddb5d3912e15ea2
Bind arrow keys in js
cite from here: http://stackoverflow.com/questions/1402698/binding-arrow-keys-in-js-jquery
$(document).keydown(function(e) { switch(e.which) { case 37: // left break; case 38: // up break; case 39: // right break; case 40: // down break; default: return; // exit this handler for other keys } e.preventDefault(); // prevent the default action (scroll / move caret) });

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Resources that I consumed for Reinforcement learning
video
強化学習講義 (牧野貴樹) 難易度★★(reinforcement learning lecture)
url: https://www.youtube.com/watch?v=XsFhA8kyfkI slide: https://www.sat.t.u-tokyo.ac.jp/~mak/20140319-makino.pdf reference: http://www.ai-gakkai.or.jp/my-bookmark_vol26-no3/
朱鷺の杜Wiki
http://ibisforest.org/index.php?強化学習
The difference of accuracy and precision in Machine Learning
Accuracy = (TP + TN) / (TP + FP + TN + FN) Precision(positive) = TP / (TP + FP) Precision(negative) = TN / (TN + FN) Recall(positive) = TP / (TP + FN) Recall(negative) = TN / (TN + FP)
reference: http://pika-shi.hatenablog.com/entry/20120526/1338051235
Resources that I consumed for Deep Learning
SlideShare
How to develop and experiment a deep learning by Caffe and maf:
http://www.slideshare.net/KentaOono/how-to-develop
It is edited well about the architecture of Caffe and maf.
The basis and implementation for deep learning:
http://www.slideshare.net/beam2d/deep-learningimplementation?related=1
From page 34 to 37, it explains how to implement deep learning and it's so valuable information. Let me summarize.
p.34---Pretreatment: whitenig is used oftern for data pretreatment. If you'd like to take advantage of a data structure, ZCA whitening is a good way. p.35---Architecture design: It's better to start from a shallow network and if the precision isn't improved by tuning parameteres, then increase the number of units. Let the system deep, when you can't improve the precision by tuning parameteres or you can see the sign of overfitting(plot the accuracy by a test dataset and train dataset and see the difference). Change a activation function and compare the precision. p.36---How to initialize weights: U is defined as a number of units. If sigmoid function, initialize weights as less than 1/sqrt(U). They are often generated by uniform distribution or gaussian distribution. The Bias is 0. If ReLU, rectified linear unit, then initialize weights as less than 1/U. You can generate them by random number. The Bias has to be more than 0. If bias is a positive value, the learning is faster, but it is unstable, thus set a proper value. p.37---How to adjust parameteres: Start from a big number, then you can observe that weights go to Inf or NaN, exploding, thus reduce the value of parameteres and repeat the observation and reduction. At some point, you can get parameteres that weights doen't go to Inf or NaN, then reduce the value of parameteres a little bit, and that's it. If an accuracy score gets stagnated, reduce the value of learning rate(pylearn2 can do this task automatically).
Caffe
GoogLeNet train/test definition: https://github.com/amiralush/caffe/commit/8a1ef8b1e64ff445b4cad955c7dcf545e259bff9 https://github.com/amiralush/caffe/commit/5edaf8b61aac69a14c074f09bdc2e53a3fff2672 https://github.com/amiralush/caffe/commit/3a7d7735479339a714da16ac7c8de8da68ea0322
Chainer
Done tutorial http://docs.chainer.org/en/latest/tutorial/index.html Introduction to Chainer(ustream): It is almost same as chainer tutorial. video --- http://www.ustream.tv/recorded/64082997 ppt --- http://www.slideshare.net/beam2d/introduction-to-chainer-a-flexible-framework-for-deep-learning?from_action=save
Nature
Machine intelligence section: http://www.nature.com/nature/supplements/insights/machine-intelligence/index.html
Deeplearning.net
http://deeplearning.net
Article
Inceptionism: Going Deeper into Neural Networks: This is so fascinated and it relates to human-creativity. http://googleresearch.blogspot.jp/2015/06/inceptionism-going-deeper-into-neural.html
How to change a volume of EC2 instance
This worked for me perfectly: http://asobicocoro.com/tips/article/aws-change-ebs-volume
How to compress a folder
like this:
tar -czvf filename.tar.gz foldername
references: http://inetnote.exblog.jp/4156060
http://shrine-bell.seesaa.net/article/107507264.html

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Clip url in html by regular expression in python
Firstly, I extracted a url by two steps like this:
#ele is the element that is converted into string import re #extract url url = re.findall(r'href="[\w,\/,:,\.,\-,%,=,?,+,&,;]*',ele) #remove href url = url[0][6:]
But the seconde step is not necessary:
#ele is the element that is converted into string import re #extract url url = re.findall(r'href="([\w,\/,:,\.,\-,%,=,?,+,&,;]*)"',b)[0]
you can clip a url by () from extracted strings by regular expression
references: http://docs.python.jp/2/library/re.html http://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q12135557007
How to install Caffe in Amazon EC2 instance
This instruciton worked perfrctly, just follow it: https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-%28Ubuntu,-CUDA-7,-cuDNN%29