MegEngine

This page contains details of all activation functions for MegEngine backend supported in Echo.

Mish

echoAI.Activation.m_ops.Mish()

Applies the element-wise function:

Mish(x)=xtanh(softplus(x))\textbf{Mish}(x)=x\tanh(\text{softplus}(x))

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Mish: A Self Regularized Non-Monotonic Activation Function

Swish

echoAI.Activation.m_ops.Swish(eswish = False, swish = True, beta = 1.735, flatten = False)

Allows the following element-wise functions:

Swish(x)=xsigmoid(β1x)\textbf{Swish}(x)=x\text{sigmoid}(\beta_{1} x)
ESwish(x)=βxsigmoid(x)\textbf{ESwish}(x)=\beta x\text{sigmoid}(x)
SILU(x)=xsigmoid(x)\textbf{SILU}(x)=x\text{sigmoid}(x)
Flatten T-Swish(x)={xsigmoid(x)if x00otherwise\textbf{Flatten T-Swish}(x)= \begin{cases} x\text{sigmoid}(x) & \text{if } x\geq 0\\ 0 & \text{otherwise} \end{cases}

Parameters:

  • eswish - Uses E-Swish activation function. Default: False.

  • swish - Uses Swish activation function. Default: False.

  • flatten - Uses Flatten T-Swish activation function. Default: False.

  • beta - β\betaparameter used for E-Swish formulation. Default: 1.375

Note: When eswish, swish and flatten are False, it initializes the SILU activation function by default.

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

References:

Searching for Activation Functions

E-swish: Adjusting Activations to Different Network Depths

Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

Aria2

echoAI.Activation.m_ops.Aria2(beta = 0.5, alpha = 1.0)

Applies the element-wise function:

Aria2(x)=(1+eβx)α\textbf{Aria2}(x)= {(1+e^{-\beta \ast x})}^{-\alpha}

Parameters:

  • beta -β\betais the exponential growth rate. Default: 0.5

  • alpha -α\alphais a hyper-parameter which has a two-fold effect; it reduces the curvature in 3rd quadrant as well as increases the curvature in first quadrant while lowering the value of activation. Default: 1.0

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

ARiA: Utilizing Richard's Curve for Controlling the Non-monotonicity of the Activation Function in Deep Neural Nets

ELiSH

echoAI.Activation.m_ops.Elish(hard = False)

Allows the following element-wise functions:

ELiSH(x)={xsigmoid(x)if x0(ex1)sigmoid(x)otherwise\textbf{ELiSH}(x)= \begin{cases} x\text{sigmoid}(x) & \text{if } x \geq 0\\ (e^{x}-1)\text{sigmoid}(x) & \text{otherwise} \end{cases}
Hard ELiSH(x)={xmax(0,min(1,(x+1)/2))if x0(ex1)max(0,min(1,(x+1)/2))otherwise\textbf{Hard ELiSH}(x)= \begin{cases} x\max(0, \min(1,(x+1)/2)) & \text{if } x \geq 0\\ (e^{x}-1)\max(0, \min(1,(x+1)/2)) & \text{otherwise} \end{cases}

Parameter:

  • hard - Uses Hard ELiSH activation function. Default: False

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

The Quest for the Golden Activation Function

ISRU

echoAI.Activation.m_ops.ISRU(alpha = 1.0, isrlu = False)

Allows the following element-wise functions:

ISRU(x)=x1+αx2\textbf{ISRU}(x)= \frac{x}{\sqrt{1+\alpha x^{2}}}
ISRLU(x)={xif x0x1+αx2otherwise\textbf{ISRLU}(x)= \begin{cases} x & \text{if } x \geq 0\\ \frac{x}{\sqrt{1+\alpha x^{2}}} & \text{otherwise} \end{cases}

Parameters:

  • alpha - hyperparameterα\alphacontrols the value to which an ISRLU saturates for negative inputs. Default: 1.0

  • isrlu - Uses ISRLU activation function. Default: False

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)

NLReLU

echoAI.Activation.m_ops.NLReLU(beta = 1.0)

Applies the element-wise function:

NLReLU(x)=ln(βmax(0,x)+1.0)\textbf{NLReLU}(x)= \ln(\beta\max(0,x)+1.0)

Parameters:

  • beta - β\betaparameter used for NLReLU formulation. Default: 1.0

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks

Soft Clipping

echoAI.Activation.m_ops.SoftClipping(alpha = 0.5)

Applies the element-wise function:

Soft Clipping(x)=1αlog(1+eαx1+eα(x1))\textbf{Soft Clipping}(x)= \frac{1}{\alpha}\log{\big(\frac{1+e^{\alpha x}}{1+e^{\alpha (x-1)}}\big)}

Parameters:

  • alpha -α\alphahyper-parameter, which determines how close to linear the central region is and how sharply the linear region turns to the asymptotic values. Default: 0.5

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Neural Network-Based Approach to Phase Space Integration

Soft Exponential

echoAI.Activation.m_ops.SoftExponential(alpha = None)

Applies the element-wise function:

Soft Exponential(x)={log(1+α(x+α))αif α<0xif α=0eαx1αif α>0\textbf{Soft Exponential}(x)= \begin{cases} \frac{-\log{(1+\alpha(x + \alpha))}}{\alpha} & \text{if } \alpha < 0\\ x & \text{if } \alpha = 0\\ \frac{e^{\alpha x}-1}{\alpha} & \text{if } \alpha > 0 \end{cases}

Parameters:

  • alpha -α\alphatrainable hyper-parameter which is initialized to zero by default. Default: None

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks

SQNL

echoAI.Activation.m_ops.SQNL()

Applies the element-wise function:

SQNL(x)={1if x>2xx24if 0x2x+x24if 2x<01if x<2\textbf{SQNL}(x)= \begin{cases} 1 & \text{if } x > 2\\ x - \frac{x^2}{4} & \text{if } 0 \leq x \leq 2\\ x + \frac{x^2}{4} & \text{if } -2 \leq x < 0\\ -1 & \text{if } x < -2 \end{cases}

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

SQNL: A New Computationally Efficient Activation Function

SReLU

echoAI.Activation.m_ops.SReLU(in_features, parameters = None)

Applies the element-wise function:

SReLU(xi)={tir+air(xitir)if xitirxiif tir>xi>tiltil+ail(xitil)xitil\textbf{SReLU}(x_{i})= \begin{cases} t_i^r + a_i^r(x_i - t_i^r) & \text{if } x_i \geq t_i^r\\ x_i & \text{if } t_i^r > x_i > t_i^l\\ t_i^l + a_i^l(x_i - t_i^l) & x_i \leq t_i^l \end{cases}

Parameters:

  • in_features - Shape of the input. Datatype: Tuple

  • parameters - ( tr,tl,ar,alt^r,t^l,a^r,a^l ) parameters for manual initialization, Default: None. If None is passed, parameters are initialized randomly.

Shape:

  • Input:(N,)(\mathbf{N}, \ast)where\astmeans any number of additional dimensions

  • Output:(N,)(\mathbf{N}, \ast), same shape as input

Reference:

Deep Learning with S-shaped Rectified Linear Activation Units

FReLU

echoAI.Activation.m_ops.FReLU(in_channels)

Applies the element-wise function:

FReLU(x)=max(x,T(x))\textbf{FReLU}(x)= \max(x,\mathbb{T}(x))

Parameter:

  • in_channels - Number of channels in the input tensor. Datatype: Integer

Shape:

  • Input:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})whereC\mathbf{C}indicates the number of channels.

  • Output:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W}), same shape as input

Reference:

Funnel Activation for Visual Recognition

Last updated