echoai.activation PyTorch This page contains details of all activation functions for PyTorch backend supported in Echo.
Mish
Copy echoAI.Activation.t_ops.Mish()
Applies the element-wise function:
Mish ( x ) = x tanh ( softplus ( x ) ) \textbf{Mish}(x)=x\tanh(\text{softplus}(x)) Mish ( x ) = x tanh ( softplus ( x )) Shape:
Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
Reference:
Mish: A Self Regularized Non-Monotonic Activation Function
Swish
Copy echoAI.Activation.t_ops.Swish(eswish = False, swish = True, beta = 1.735, flatten = False, pfts = False)
Allows the following element-wise functions:
Swish ( x ) = x sigmoid ( β 1 x ) \textbf{Swish}(x)=x\text{sigmoid}(\beta_{1} x) Swish ( x ) = x sigmoid ( β 1 x ) Parameters:
eswish - Uses E-Swish activation function. Default: False
.
swish - Uses Swish activation function. Default: False
.
flatten - Uses Flatten T-Swish activation function. c is a constant of value -0.2. Default: False
.
pfts - Uses Parametric Flatten T-Swish function. Has the same formulation as Flatten T-Swish with only c being a trainable parameter initialized with the value of -0.2 instead of setting as a constant. Default: False
.
Note: By default, SILU is initialized.
Shape:
References:
Searching for Activation Functions
E-swish: Adjusting Activations to Different Network Depths
Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning
Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning
Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning
Aria2
Copy echoAI.Activation.t_ops.Aria2(beta = 0.5, alpha = 1.0)
Applies the element-wise function:
Parameters:
Shape:
Reference:
ARiA: Utilizing Richard's Curve for Controlling the Non-monotonicity of the Activation Function in Deep Neural Nets
BReLU
Copy echoAI.Activation.t_ops.BReLU()
Applies the element-wise function:
Shape:
Reference:
Shifting Mean Activation Towards Zero with Bipolar Activation Functions
APL
Copy echoAI.Activation.t_ops.APL(s)
Applies the element-wise function:
Parameter:
s - hyperparameter, number of hinges to be set in advance. Default: 1
Shape:
Reference:
Learning Activation Functions to Improve Deep Neural Networks
ELiSH
Copy echoAI.Activation.t_ops.Elish(hard = False)
Allows the following element-wise functions:
Parameter:
hard - Uses Hard ELiSH activation function. Default: False
Note: By default, ELiSH is initialized.
Shape:
Reference:
The Quest for the Golden Activation Function
ISRU
Copy echoAI.Activation.t_ops.ISRU(alpha = 1.0, isrlu = False)
Allows the following element-wise functions:
Parameters:
isrlu - Uses ISRLU activation function. Default: False
Note: By default, ISRU is initialized.
Shape:
Reference:
Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)
Maxout
Copy echoAI.Activation.t_ops.Maxout()
Applies the element-wise function:
Shape:
Reference:
Maxout Networks
NLReLU
Copy echoAI.Activation.t_ops.NLReLU(beta = 1.0, inplace = False)
Applies the element-wise function:
Parameters:
inplace - can optionally do the operation in-place. Default: False
Shape:
Reference:
Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks
Soft Clipping
Copy echoAI.Activation.t_ops.SoftClipping(alpha = 0.5)
Applies the element-wise function:
Parameter:
Shape:
Reference:
Neural Network-Based Approach to Phase Space Integration
Soft Exponential
Copy echoAI.Activation.t_ops.SoftExponential(alpha = None)
Applies the element-wise function:
Parameter:
Shape:
Reference:
A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks
SQNL
Copy echoAI.Activation.t_ops.SQNL()
Applies the element-wise function:
Shape:
Reference:
SQNL: A New Computationally Efficient Activation Function
SReLU
Copy echoAI.Activation.t_ops.SReLU(in_features, parameters = None)
Applies the element-wise function:
Parameters:
in_features - Shape of the input. Datatype: Tuple
Shape:
Reference:
Deep Learning with S-shaped Rectified Linear Activation Units
Funnel
Copy echoAI.Activation.t_ops.Funnel(in_channels)
Applies the element-wise function:
Parameter:
in_channels - Number of channels in the input tensor. Datatype: Integer
Shape:
Reference:
Funnel Activation for Visual Recognition
SLAF
Copy echoAI.Activation.t_ops.SLAF(k = 2)
Applies the element-wise function:
Parameter:
k - Number of Taylor coefficients. Default: 2
Shape:
Reference:
Learning Activation Functions: A new paradigm for understanding Neural Networks
AReLU
Copy echoAI.Activation.t_ops.AReLU(alpha = 0.90, beta = 2.0)
Applies the element-wise function:
Parameters:
Shape:
Reference:
AReLU: Attention-based Rectified Linear Unit
FReLU
Copy echoAI.Activation.t_ops.FReLU()
Applies the element-wise function:
Shape:
Reference:
FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks
DICE
Copy echoAI.Activation.t_ops.DICE(emb_size, dim = 2, epsilon = 1e-8)
Applies the function:
Reference:
Deep Interest Network for Click-Through Rate Prediction
Seagull
Copy echoAI.Activation.t_ops.Seagull()
Applies the function:
Shape:
Reference:
A Use of Even Activation Functions in Neural Networks
Snake
Copy echoAI.Activation.t_ops.Snake(in_features, alpha = None, alpha_trainable = True)
Applies the function:
Parameters:
in_features - shape of the input
Shape:
Reference:
Neural Networks Fail to Learn Periodic Functions and How to Fix It
SIREN
Copy echoAI.Activation.t_ops.SIREN(dim_in, dim_out, w0 = 30., c = 6., is_first = False, use_bias = True, activation = None)
Applies the function:
Parameters:
dim_out - output dimension
c - hyper-parameter used in weight initialisation for the linear layer. Default: 6.
is_first - used for weight initialisation for the linear layer. Default: False
use_bias - initialises bias parameter for the linear layer. Default: True
activation - used to initialise an activation function. Default: None
. Initialises the sine
activation function when None
Shape:
Reference:
Implicit Neural Representations with Periodic Activation Functions
ESwish ( x ) = β x sigmoid ( x ) \textbf{ESwish}(x)=\beta x\text{sigmoid}(x) ESwish ( x ) = β x sigmoid ( x ) SILU ( x ) = x sigmoid ( x ) \textbf{SILU}(x)=x\text{sigmoid}(x) SILU ( x ) = x sigmoid ( x ) Flatten T-Swish ( x ) = { x sigmoid ( x ) + c if x ≥ 0 c otherwise \textbf{Flatten T-Swish}(x)= \begin{cases}
x\text{sigmoid}(x) + c & \text{if } x\geq 0\\
c & \text{otherwise}
\end{cases} Flatten T-Swish ( x ) = { x sigmoid ( x ) + c c if x ≥ 0 otherwise beta - β \beta β parameter used for E-Swish formulation. Default: 1.375
Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
Aria2 ( x ) = ( 1 + e − β ∗ x ) − α \textbf{Aria2}(x)= {(1+e^{-\beta \ast x})}^{-\alpha} Aria2 ( x ) = ( 1 + e − β ∗ x ) − α beta -β \beta β is the exponential growth rate. Default: 0.5
alpha -α \alpha α is a hyper-parameter which has a two-fold effect; it reduces the curvature in 3rd quadrant as well as increases the curvature in first quadrant while lowering the value of activation. Default: 1.0
Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
BReLU ( x i ) = { f ( x i ) if i mod 2 = 0 − f ( − x i ) if i mod 2 ≠ 0 \textbf{BReLU}(x_{i})= \begin{cases}
f(x_{i}) & \text{if } \text{\textit{i} mod 2 = 0}\\
-f(-x_{i}) & \text{if } \text{\textit{i} mod 2} \neq \text{0}
\end{cases} BReLU ( x i ) = { f ( x i ) − f ( − x i ) if i mod 2 = 0 if i mod 2 = 0 Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
APL ( x ) = max ( 0 , x ) + ∑ s = 1 S a i s ∗ max ( 0 , − x + b i s ) \textbf{APL}(x)= \max(0,x) + \sum^{S}_{s=1}{a_{i}^{s} \ast \max(0, -x+b_{i}^{s})} APL ( x ) = max ( 0 , x ) + s = 1 ∑ S a i s ∗ max ( 0 , − x + b i s ) Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
ELiSH ( x ) = { x sigmoid ( x ) if x ≥ 0 ( e x − 1 ) sigmoid ( x ) otherwise \textbf{ELiSH}(x)= \begin{cases}
x\text{sigmoid}(x) & \text{if } x \geq 0\\
(e^{x}-1)\text{sigmoid}(x) & \text{otherwise}
\end{cases} ELiSH ( x ) = { x sigmoid ( x ) ( e x − 1 ) sigmoid ( x ) if x ≥ 0 otherwise Hard ELiSH ( x ) = { x max ( 0 , min ( 1 , ( x + 1 ) / 2 ) ) if x ≥ 0 ( e x − 1 ) max ( 0 , min ( 1 , ( x + 1 ) / 2 ) ) otherwise \textbf{Hard ELiSH}(x)= \begin{cases}
x\max(0, \min(1,(x+1)/2)) & \text{if } x \geq 0\\
(e^{x}-1)\max(0, \min(1,(x+1)/2)) & \text{otherwise}
\end{cases} Hard ELiSH ( x ) = { x max ( 0 , min ( 1 , ( x + 1 ) /2 )) ( e x − 1 ) max ( 0 , min ( 1 , ( x + 1 ) /2 )) if x ≥ 0 otherwise Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
ISRU ( x ) = x 1 + α x 2 \textbf{ISRU}(x)= \frac{x}{\sqrt{1+\alpha x^{2}}} ISRU ( x ) = 1 + α x 2 x ISRLU ( x ) = { x if x ≥ 0 x 1 + α x 2 otherwise \textbf{ISRLU}(x)= \begin{cases}
x & \text{if } x \geq 0\\
\frac{x}{\sqrt{1+\alpha x^{2}}} & \text{otherwise}
\end{cases} ISRLU ( x ) = { x 1 + α x 2 x if x ≥ 0 otherwise alpha - hyperparameterα \alpha α controls the value to which an ISRLU saturates for negative inputs. Default: 1.0
Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
Maxout ( x ⃗ ) = max i ( x i ) \textbf{Maxout}(\vec{x})= \max_{i}(x_{i}) Maxout ( x ) = i max ( x i ) Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
NLReLU ( x ) = ln ( β max ( 0 , x ) + 1.0 ) \textbf{NLReLU}(x)= \ln(\beta\max(0,x)+1.0) NLReLU ( x ) = ln ( β max ( 0 , x ) + 1.0 ) beta - β \beta β parameter used for NLReLU formulation. Default: 1.0
Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
Soft Clipping ( x ) = 1 α log ( 1 + e α x 1 + e α ( x − 1 ) ) \textbf{Soft Clipping}(x)= \frac{1}{\alpha}\log{\big(\frac{1+e^{\alpha x}}{1+e^{\alpha (x-1)}}\big)} Soft Clipping ( x ) = α 1 log ( 1 + e α ( x − 1 ) 1 + e αx ) alpha -α \alpha α hyper-parameter, which determines how close to linear the central region is and how sharply the linear region turns to the asymptotic values. Default: 0.5
Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
Soft Exponential ( x ) = { − log ( 1 + α ( x + α ) ) α if α < 0 x if α = 0 e α x − 1 α if α > 0 \textbf{Soft Exponential}(x)= \begin{cases}
\frac{-\log{(1+\alpha(x + \alpha))}}{\alpha} & \text{if } \alpha < 0\\
x & \text{if } \alpha = 0\\
\frac{e^{\alpha x}-1}{\alpha} & \text{if } \alpha > 0
\end{cases} Soft Exponential ( x ) = ⎩ ⎨ ⎧ α − l o g ( 1 + α ( x + α )) x α e αx − 1 if α < 0 if α = 0 if α > 0 alpha -α \alpha α trainable hyper-parameter which is initialized to zero by default. Default: None
Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
SQNL ( x ) = { 1 if x > 2 x − x 2 4 if 0 ≤ x ≤ 2 x + x 2 4 if − 2 ≤ x < 0 − 1 if x < − 2 \textbf{SQNL}(x)= \begin{cases}
1 & \text{if } x > 2\\
x - \frac{x^2}{4} & \text{if } 0 \leq x \leq 2\\
x + \frac{x^2}{4} & \text{if } -2 \leq x < 0\\
-1 & \text{if } x < -2
\end{cases} SQNL ( x ) = ⎩ ⎨ ⎧ 1 x − 4 x 2 x + 4 x 2 − 1 if x > 2 if 0 ≤ x ≤ 2 if − 2 ≤ x < 0 if x < − 2 Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
SReLU ( x i ) = { t i r + a i r ( x i − t i r ) if x i ≥ t i r x i if t i r > x i > t i l t i l + a i l ( x i − t i l ) x i ≤ t i l \textbf{SReLU}(x_{i})= \begin{cases}
t_i^r + a_i^r(x_i - t_i^r) & \text{if } x_i \geq t_i^r\\
x_i & \text{if } t_i^r > x_i > t_i^l\\
t_i^l + a_i^l(x_i - t_i^l) & x_i \leq t_i^l
\end{cases} SReLU ( x i ) = ⎩ ⎨ ⎧ t i r + a i r ( x i − t i r ) x i t i l + a i l ( x i − t i l ) if x i ≥ t i r if t i r > x i > t i l x i ≤ t i l parameters - ( t r , t l , a r , a l t^r,t^l,a^r,a^l t r , t l , a r , a l ) parameters for manual initialization, Default: None
. If None
is passed, parameters are initialized randomly.
Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
Funnel ( x ) = max ( x , T ( x ) ) \textbf{Funnel}(x)= \max(x,\mathbb{T}(x)) Funnel ( x ) = max ( x , T ( x )) Input:( N , C , H , W ) (\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W}) ( N , C , H , W ) whereC \mathbf{C} C indicates the number of channels.
Output:( N , C , H , W ) (\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W}) ( N , C , H , W ) , same shape as input
SLAF ( x ) = a 0 + a 1 x + a 2 x 2 + . . . . + a N − 1 x N − 1 \textbf{SLAF}(x)= a_0 + a_1 x + a_2 x^2 + .... + a_{N-1}x^{N-1} SLAF ( x ) = a 0 + a 1 x + a 2 x 2 + .... + a N − 1 x N − 1 Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
AReLU ( x i ) = { C ( α ) x i if x i < 0 ( 1 + σ ( β ) ) x i otherwise \textbf{AReLU}(x_i)= \begin{cases}
C(\alpha)x_i & \text{if } x_i <0\\
(1+\sigma(\beta))x_i & \text{otherwise}
\end{cases} AReLU ( x i ) = { C ( α ) x i ( 1 + σ ( β )) x i if x i < 0 otherwise alpha -α \alpha α trainable hyper-parameter. Default: 0.90
beta -β \beta β trainable hyper-parameter. Default: 2.0
Input:( N , C , H , W ) (\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W}) ( N , C , H , W ) whereC \mathbf{C} C indicates the number of channels.
Output:( N , C , H , W ) (\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W}) ( N , C , H , W ) , same shape as input
FReLU ( x ) = ReLU ( x ) + b l \textbf{FReLU}(x)= \textbf{ReLU}(x) + b_{l} FReLU ( x ) = ReLU ( x ) + b l Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
DICE ( x ) = p(x) ∗ x + ( 1 − p(x) ) α x \textbf{DICE}(x)= \textbf{p(x)}\ast x + (1-\textbf{p(x)})\alpha x DICE ( x ) = p(x) ∗ x + ( 1 − p(x) ) αx p(x) = 1 1 + e − x − E ( x ) V a r ( x ) + ϵ \textbf{p(x)} = \frac{1}{1 + e ^ {-\frac{x - E(x)}{\sqrt{Var(x) + \epsilon}}}} p(x) = 1 + e − Va r ( x ) + ϵ x − E ( x ) 1 Seagull ( x ) = log ( 1 + x 2 ) \textbf{Seagull}(x)= \text{log}(1 + x^{2}) Seagull ( x ) = log ( 1 + x 2 ) Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
Snake ( x ) = x + 1 α sin 2 ( α x ) \textbf{Snake}(x)= x + \frac{1}{\alpha}\sin^{2}(\alpha x) Snake ( x ) = x + α 1 sin 2 ( αx ) alpha -α \alpha α trainable hyper-parameter. Default: 1.0 when specified as None
alpha_trainable - switches α \alpha α to be a trainable parameter. Default: True
Input:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) where∗ \ast ∗ means any number of additional dimensions
Output:( N , ∗ ) (\mathbf{N}, \ast) ( N , ∗ ) , same shape as input
SIREN ( x ) = sin ( ω 0 ∗ linear ( x ) ) \textbf{SIREN}(x)= \sin(\omega_{0}\ast\text{linear}(x)) SIREN ( x ) = sin ( ω 0 ∗ linear ( x )) w0 -ω 0 \omega_{0} ω 0 hyper-parameter. Default: 30.0
Input:( x , d i m _ i n ) (\mathbf{x}, dim\_in) ( x , d im _ in )
Output:( x , d i m _ o u t ) (\mathbf{x}, dim\_out) ( x , d im _ o u t )