Crates.io | caffe2op-channelbackprop |
lib.rs | caffe2op-channelbackprop |
version | 0.1.5-alpha.0 |
source | src |
created_at | 2023-03-02 14:42:52.588968 |
updated_at | 2023-03-25 13:17:15.407231 |
description | xxx |
homepage | |
repository | https://github.com/kleb6/caffe2-rs |
max_upload_size | |
id | 798921 |
size | 78,980 |
The caffe2op-channelbackprop
crate defines
a mathematical operator used in deep learning
called Channel Backpropagation.
Note: This crate is currently being translated from C++ to Rust, and some function bodies may still be in the process of translation.
Channel Backpropagation is a technique used to calculate the gradient of the input of a neural network layer with respect to the loss function. Specifically, it is used to calculate the gradient of the input after a normalization operation (such as batch normalization) has been applied.
The key idea behind Channel Backpropagation is to propagate the gradients through the normalization operation, by calculating the gradient of the normalization output with respect to its input. This gradient can then be used to compute the gradient of the input of the normalization operation with respect to the loss function.
The caffe2op-channelbackprop
crate implements
the ChannelBackpropStatsOp
operator, which
computes statistics needed for Channel
Backpropagation. These statistics include the
mean, standard deviation, and number of elements
for a given input tensor.
In terms of mathematics, the Channel Backpropagation algorithm involves calculating the gradient of the normalization output with respect to its input. This gradient can be calculated using the chain rule, as follows:
Let x
be the input tensor to the normalization
operation, y
be the output tensor, mu
be
the mean tensor, sigma
be the standard
deviation tensor, gamma
be the scaling
tensor, beta
be the bias tensor, and
epsilon
be a small constant added to the
standard deviation for numerical stability.
First, compute the mean and standard deviation
of x
:
mu = mean(x)
sigma = sqrt(var(x) + epsilon)
x
:x_hat = (x - mu) / sigma
gamma
and beta
:y = gamma * x_hat + beta
x_hat
:dy_dx_hat = dL_dy * gamma
where dL_dy
is the gradient of the loss function
with respect to y
.
x
:dsigma_dx = -0.5 * sum(dy_dx_hat * x_hat, axis=channel) / (sigma**3)
where channel
is the channel dimension of the
input tensor.
x
:dmu_dx = -sum(dy_dx_hat / sigma, axis=channel) - 2 * dsigma_dx * mean(x - mu)
x
:dx = dy_dx_hat / sigma + dsigma_dx * 2 * (x - mu) / sampleSize + dmu_dx / sampleSize
where sampleSize
is the total number of elements
in the input tensor.
Overall, Channel Backpropagation is a powerful technique that allows the efficient calculation of gradients through normalization operations, which are commonly used in deep learning.
The caffe2op-channelbackprop
crate provides an
implementation of this technique in Rust, allowing
for efficient computation on modern hardware.