PyTorch

This page contains details of all activation functions for PyTorch backend supported in Echo.

Mish

echoAI.Activation.t_ops.Mish()

Applies the element-wise function:

Mish(x)=xtanh(softplus(x))\textbf{Mish}(x)=x\tanh(\text{softplus}(x))

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Mish: A Self Regularized Non-Monotonic Activation Function

Swish

echoAI.Activation.t_ops.Swish(eswish = False, swish = True, beta = 1.735, flatten = False, pfts = False)

Allows the following element-wise functions:

Swish(x)=xsigmoid(β1x)\textbf{Swish}(x)=x\text{sigmoid}(\beta_{1} x)
ESwish(x)=βxsigmoid(x)\textbf{ESwish}(x)=\beta x\text{sigmoid}(x)
SILU(x)=xsigmoid(x)\textbf{SILU}(x)=x\text{sigmoid}(x)
Flatten T-Swish(x)={xsigmoid(x)+cif x0cotherwise\textbf{Flatten T-Swish}(x)= \begin{cases} x\text{sigmoid}(x) + c & \text{if } x\geq 0\\ c & \text{otherwise} \end{cases}

Parameters:

  • eswish - Uses E-Swish activation function. Default: False.

  • swish - Uses Swish activation function. Default: False.

  • flatten - Uses Flatten T-Swish activation function. c is a constant of value -0.2. Default: False.

  • beta - β\betaparameter used for E-Swish formulation. Default: 1.375

  • pfts - Uses Parametric Flatten T-Swish function. Has the same formulation as Flatten T-Swish with only c being a trainable parameter initialized with the value of -0.2 instead of setting as a constant. Default: False.

Note: By default, SILU is initialized.

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

References:

Searching for Activation Functions

E-swish: Adjusting Activations to Different Network Depths

Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

Aria2

echoAI.Activation.t_ops.Aria2(beta = 0.5, alpha = 1.0)

Applies the element-wise function:

Aria2(x)=(1+eβx)α\textbf{Aria2}(x)= {(1+e^{-\beta \ast x})}^{-\alpha}

Parameters:

  • beta -β\betais the exponential growth rate. Default: 0.5

  • alpha -α\alphais a hyper-parameter which has a two-fold effect; it reduces the curvature in 3rd quadrant as well as increases the curvature in first quadrant while lowering the value of activation. Default: 1.0

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

ARiA: Utilizing Richard's Curve for Controlling the Non-monotonicity of the Activation Function in Deep Neural Nets

BReLU

echoAI.Activation.t_ops.BReLU()

Applies the element-wise function:

BReLU(xi)={f(xi)if i mod 2 = 0f(xi)if i mod 20\textbf{BReLU}(x_{i})= \begin{cases} f(x_{i}) & \text{if } \text{\textit{i} mod 2 = 0}\\ -f(-x_{i}) & \text{if } \text{\textit{i} mod 2} \neq \text{0} \end{cases}

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Shifting Mean Activation Towards Zero with Bipolar Activation Functions

APL

echoAI.Activation.t_ops.APL(s)

Applies the element-wise function:

APL(x)=max(0,x)+s=1Saismax(0,x+bis)\textbf{APL}(x)= \max(0,x) + \sum^{S}_{s=1}{a_{i}^{s} \ast \max(0, -x+b_{i}^{s})}

Parameter:

  • s - hyperparameter, number of hinges to be set in advance. Default: 1

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Learning Activation Functions to Improve Deep Neural Networks

ELiSH

echoAI.Activation.t_ops.Elish(hard = False)

Allows the following element-wise functions:

ELiSH(x)={xsigmoid(x)if x0(ex1)sigmoid(x)otherwise\textbf{ELiSH}(x)= \begin{cases} x\text{sigmoid}(x) & \text{if } x \geq 0\\ (e^{x}-1)\text{sigmoid}(x) & \text{otherwise} \end{cases}
Hard ELiSH(x)={xmax(0,min(1,(x+1)/2))if x0(ex1)max(0,min(1,(x+1)/2))otherwise\textbf{Hard ELiSH}(x)= \begin{cases} x\max(0, \min(1,(x+1)/2)) & \text{if } x \geq 0\\ (e^{x}-1)\max(0, \min(1,(x+1)/2)) & \text{otherwise} \end{cases}

Parameter:

  • hard - Uses Hard ELiSH activation function. Default: False

Note: By default, ELiSH is initialized.

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

The Quest for the Golden Activation Function

ISRU

echoAI.Activation.t_ops.ISRU(alpha = 1.0, isrlu = False)

Allows the following element-wise functions:

ISRU(x)=x1+αx2\textbf{ISRU}(x)= \frac{x}{\sqrt{1+\alpha x^{2}}}
ISRLU(x)={xif x0x1+αx2otherwise\textbf{ISRLU}(x)= \begin{cases} x & \text{if } x \geq 0\\ \frac{x}{\sqrt{1+\alpha x^{2}}} & \text{otherwise} \end{cases}

Parameters:

  • alpha - hyperparameterα\alphacontrols the value to which an ISRLU saturates for negative inputs. Default: 1.0

  • isrlu - Uses ISRLU activation function. Default: False

Note: By default, ISRU is initialized.

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)

Maxout

echoAI.Activation.t_ops.Maxout()

Applies the element-wise function:

Maxout(x)=maxi(xi)\textbf{Maxout}(\vec{x})= \max_{i}(x_{i})

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Maxout Networks

NLReLU

echoAI.Activation.t_ops.NLReLU(beta = 1.0, inplace = False)

Applies the element-wise function:

NLReLU(x)=ln(βmax(0,x)+1.0)\textbf{NLReLU}(x)= \ln(\beta\max(0,x)+1.0)

Parameters:

  • beta - β\betaparameter used for NLReLU formulation. Default: 1.0

  • inplace - can optionally do the operation in-place. Default: False

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks

Soft Clipping

echoAI.Activation.t_ops.SoftClipping(alpha = 0.5)

Applies the element-wise function:

Soft Clipping(x)=1αlog(1+eαx1+eα(x1))\textbf{Soft Clipping}(x)= \frac{1}{\alpha}\log{\big(\frac{1+e^{\alpha x}}{1+e^{\alpha (x-1)}}\big)}

Parameter:

  • alpha -α\alphahyper-parameter, which determines how close to linear the central region is and how sharply the linear region turns to the asymptotic values. Default: 0.5

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Neural Network-Based Approach to Phase Space Integration

Soft Exponential

echoAI.Activation.t_ops.SoftExponential(alpha = None)

Applies the element-wise function:

Soft Exponential(x)={log(1+α(x+α))αif α<0xif α=0eαx1αif α>0\textbf{Soft Exponential}(x)= \begin{cases} \frac{-\log{(1+\alpha(x + \alpha))}}{\alpha} & \text{if } \alpha < 0\\ x & \text{if } \alpha = 0\\ \frac{e^{\alpha x}-1}{\alpha} & \text{if } \alpha > 0 \end{cases}

Parameter:

  • alpha -α\alphatrainable hyper-parameter which is initialized to zero by default. Default: None

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks

SQNL

echoAI.Activation.t_ops.SQNL()

Applies the element-wise function:

SQNL(x)={1if x>2xx24if 0x2x+x24if 2x<01if x<2\textbf{SQNL}(x)= \begin{cases} 1 & \text{if } x > 2\\ x - \frac{x^2}{4} & \text{if } 0 \leq x \leq 2\\ x + \frac{x^2}{4} & \text{if } -2 \leq x < 0\\ -1 & \text{if } x < -2 \end{cases}

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

SQNL: A New Computationally Efficient Activation Function

SReLU

echoAI.Activation.t_ops.SReLU(in_features, parameters = None)

Applies the element-wise function:

SReLU(xi)={tir+air(xitir)if xitirxiif tir>xi>tiltil+ail(xitil)xitil\textbf{SReLU}(x_{i})= \begin{cases} t_i^r + a_i^r(x_i - t_i^r) & \text{if } x_i \geq t_i^r\\ x_i & \text{if } t_i^r > x_i > t_i^l\\ t_i^l + a_i^l(x_i - t_i^l) & x_i \leq t_i^l \end{cases}

Parameters:

  • in_features - Shape of the input. Datatype: Tuple

  • parameters - ( tr,tl,ar,alt^r,t^l,a^r,a^l ) parameters for manual initialization, Default: None. If None is passed, parameters are initialized randomly.

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Deep Learning with S-shaped Rectified Linear Activation Units

Funnel

echoAI.Activation.t_ops.Funnel(in_channels)

Applies the element-wise function:

Funnel(x)=max(x,T(x))\textbf{Funnel}(x)= \max(x,\mathbb{T}(x))

Parameter:

  • in_channels - Number of channels in the input tensor. Datatype: Integer

Shape:

  • Input:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})whereC\mathbf{C}indicates the number of channels.

  • Output:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W}), same shape as input

Reference:

Funnel Activation for Visual Recognition

SLAF

echoAI.Activation.t_ops.SLAF(k = 2)

Applies the element-wise function:

SLAF(x)=a0+a1x+a2x2+....+aN1xN1\textbf{SLAF}(x)= a_0 + a_1 x + a_2 x^2 + .... + a_{N-1}x^{N-1}

Parameter:

  • k - Number of Taylor coefficients. Default: 2

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Learning Activation Functions: A new paradigm for understanding Neural Networks

AReLU

echoAI.Activation.t_ops.AReLU(alpha = 0.90, beta = 2.0)

Applies the element-wise function:

AReLU(xi)={C(α)xiif xi<0(1+σ(β))xiotherwise\textbf{AReLU}(x_i)= \begin{cases} C(\alpha)x_i & \text{if } x_i <0\\ (1+\sigma(\beta))x_i & \text{otherwise} \end{cases}

Parameters:

  • alpha -α\alphatrainable hyper-parameter. Default: 0.90

  • beta -β\betatrainable hyper-parameter. Default: 2.0

Shape:

  • Input:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})whereC\mathbf{C}indicates the number of channels.

  • Output:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W}), same shape as input

Reference:

AReLU: Attention-based Rectified Linear Unit

FReLU

echoAI.Activation.t_ops.FReLU()

Applies the element-wise function:

FReLU(x)=ReLU(x)+bl\textbf{FReLU}(x)= \textbf{ReLU}(x) + b_{l}

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks

DICE

echoAI.Activation.t_ops.DICE(emb_size, dim = 2, epsilon = 1e-8)

Applies the function:

DICE(x)=p(x)x+(1p(x))αx\textbf{DICE}(x)= \textbf{p(x)}\ast x + (1-\textbf{p(x)})\alpha x
p(x)=11+exE(x)Var(x)+ϵ\textbf{p(x)} = \frac{1}{1 + e ^ {-\frac{x - E(x)}{\sqrt{Var(x) + \epsilon}}}}

Reference:

Deep Interest Network for Click-Through Rate Prediction

Seagull

echoAI.Activation.t_ops.Seagull()

Applies the function:

Seagull(x)=log(1+x2)\textbf{Seagull}(x)= \text{log}(1 + x^{2})

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

A Use of Even Activation Functions in Neural Networks

Snake

echoAI.Activation.t_ops.Snake(in_features, alpha = None, alpha_trainable = True)

Applies the function:

Snake(x)=x+1αsin2(αx)\textbf{Snake}(x)= x + \frac{1}{\alpha}\sin^{2}(\alpha x)

Parameters:

  • in_features - shape of the input

  • alpha -α\alphatrainable hyper-parameter. Default: 1.0 when specified as None

  • alpha_trainable - switches α\alphato be a trainable parameter. Default: True

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Neural Networks Fail to Learn Periodic Functions and How to Fix It

SIREN

echoAI.Activation.t_ops.SIREN(dim_in, dim_out, w0 = 30., c = 6., is_first = False, use_bias = True, activation = None)

Applies the function:

SIREN(x)=sin(ω0linear(x))\textbf{SIREN}(x)= \sin(\omega_{0}\ast\text{linear}(x))

Parameters:

  • dim_in - input dimension

  • dim_out - output dimension

  • w0 -ω0\omega_{0}hyper-parameter. Default: 30.0

  • c - hyper-parameter used in weight initialisation for the linear layer. Default: 6.

  • is_first - used for weight initialisation for the linear layer. Default: False

  • use_bias - initialises bias parameter for the linear layer. Default: True

  • activation - used to initialise an activation function. Default: None. Initialises the sine activation function when None

Shape:

  • Input:(x,dim_in)(\mathbf{x}, dim\_in)

  • Output:(x,dim_out)(\mathbf{x}, dim\_out)

Reference:

Implicit Neural Representations with Periodic Activation Functions

Last updated