Echo
  • Echo Docs
  • Installation
  • echoai.activation
    • Activation Functions
    • PyTorch
    • TensorFlow
    • MegEngine
    • Snippets
  • echoai.Attention
    • CV
      • PyTorch
      • TensorFlow
      • MegEngine
    • NLP
      • PyTorch
      • TensorFlow
      • MegEngine
    • Snippets
  • echoAI.optim
    • Optimizers
    • PyTorch
    • TensorFlow
    • MegEngine
    • Snippets
  • Examples
  • Contributing Guidelines
  • Releases
Powered by GitBook
On this page
  • Mish
  • Swish
  • Aria2
  • BReLU
  • APL
  • ELiSH
  • ISRU
  • Maxout
  • NLReLU
  • Soft Clipping
  • Soft Exponential
  • SQNL
  • SReLU
  • Funnel
  • SLAF
  • AReLU
  • FReLU
  • DICE
  • Seagull
  • Snake
  • SIREN

Was this helpful?

  1. echoai.activation

PyTorch

This page contains details of all activation functions for PyTorch backend supported in Echo.

PreviousActivation FunctionsNextTensorFlow

Last updated 4 years ago

Was this helpful?

Mish

echoAI.Activation.t_ops.Mish()

Applies the element-wise function:

Mish(x)=xtanh⁡(softplus(x))\textbf{Mish}(x)=x\tanh(\text{softplus}(x))Mish(x)=xtanh(softplus(x))

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

Swish

echoAI.Activation.t_ops.Swish(eswish = False, swish = True, beta = 1.735, flatten = False, pfts = False)

Allows the following element-wise functions:

Swish(x)=xsigmoid(β1x)\textbf{Swish}(x)=x\text{sigmoid}(\beta_{1} x)Swish(x)=xsigmoid(β1​x)
ESwish(x)=βxsigmoid(x)\textbf{ESwish}(x)=\beta x\text{sigmoid}(x)ESwish(x)=βxsigmoid(x)
SILU(x)=xsigmoid(x)\textbf{SILU}(x)=x\text{sigmoid}(x)SILU(x)=xsigmoid(x)
Flatten T-Swish(x)={xsigmoid(x)+cif x≥0cotherwise\textbf{Flatten T-Swish}(x)= \begin{cases} x\text{sigmoid}(x) + c & \text{if } x\geq 0\\ c & \text{otherwise} \end{cases}Flatten T-Swish(x)={xsigmoid(x)+cc​if x≥0otherwise​

Parameters:

  • eswish - Uses E-Swish activation function. Default: False.

  • swish - Uses Swish activation function. Default: False.

  • flatten - Uses Flatten T-Swish activation function. c is a constant of value -0.2. Default: False.

  • beta - β\betaβparameter used for E-Swish formulation. Default: 1.375

  • pfts - Uses Parametric Flatten T-Swish function. Has the same formulation as Flatten T-Swish with only c being a trainable parameter initialized with the value of -0.2 instead of setting as a constant. Default: False.

Note: By default, SILU is initialized.

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

References:

Aria2

echoAI.Activation.t_ops.Aria2(beta = 0.5, alpha = 1.0)

Applies the element-wise function:

Aria2(x)=(1+e−β∗x)−α\textbf{Aria2}(x)= {(1+e^{-\beta \ast x})}^{-\alpha}Aria2(x)=(1+e−β∗x)−α

Parameters:

  • beta -β\betaβis the exponential growth rate. Default: 0.5

  • alpha -α\alphaαis a hyper-parameter which has a two-fold effect; it reduces the curvature in 3rd quadrant as well as increases the curvature in first quadrant while lowering the value of activation. Default: 1.0

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

BReLU

echoAI.Activation.t_ops.BReLU()

Applies the element-wise function:

BReLU(xi)={f(xi)if i mod 2 = 0−f(−xi)if i mod 2≠0\textbf{BReLU}(x_{i})= \begin{cases} f(x_{i}) & \text{if } \text{\textit{i} mod 2 = 0}\\ -f(-x_{i}) & \text{if } \text{\textit{i} mod 2} \neq \text{0} \end{cases}BReLU(xi​)={f(xi​)−f(−xi​)​if i mod 2 = 0if i mod 2=0​

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

APL

echoAI.Activation.t_ops.APL(s)

Applies the element-wise function:

APL(x)=max⁡(0,x)+∑s=1Sais∗max⁡(0,−x+bis)\textbf{APL}(x)= \max(0,x) + \sum^{S}_{s=1}{a_{i}^{s} \ast \max(0, -x+b_{i}^{s})}APL(x)=max(0,x)+s=1∑S​ais​∗max(0,−x+bis​)

Parameter:

  • s - hyperparameter, number of hinges to be set in advance. Default: 1

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

ELiSH

echoAI.Activation.t_ops.Elish(hard = False)

Allows the following element-wise functions:

ELiSH(x)={xsigmoid(x)if x≥0(ex−1)sigmoid(x)otherwise\textbf{ELiSH}(x)= \begin{cases} x\text{sigmoid}(x) & \text{if } x \geq 0\\ (e^{x}-1)\text{sigmoid}(x) & \text{otherwise} \end{cases}ELiSH(x)={xsigmoid(x)(ex−1)sigmoid(x)​if x≥0otherwise​
Hard ELiSH(x)={xmax⁡(0,min⁡(1,(x+1)/2))if x≥0(ex−1)max⁡(0,min⁡(1,(x+1)/2))otherwise\textbf{Hard ELiSH}(x)= \begin{cases} x\max(0, \min(1,(x+1)/2)) & \text{if } x \geq 0\\ (e^{x}-1)\max(0, \min(1,(x+1)/2)) & \text{otherwise} \end{cases}Hard ELiSH(x)={xmax(0,min(1,(x+1)/2))(ex−1)max(0,min(1,(x+1)/2))​if x≥0otherwise​

Parameter:

  • hard - Uses Hard ELiSH activation function. Default: False

Note: By default, ELiSH is initialized.

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

ISRU

echoAI.Activation.t_ops.ISRU(alpha = 1.0, isrlu = False)

Allows the following element-wise functions:

ISRU(x)=x1+αx2\textbf{ISRU}(x)= \frac{x}{\sqrt{1+\alpha x^{2}}}ISRU(x)=1+αx2​x​
ISRLU(x)={xif x≥0x1+αx2otherwise\textbf{ISRLU}(x)= \begin{cases} x & \text{if } x \geq 0\\ \frac{x}{\sqrt{1+\alpha x^{2}}} & \text{otherwise} \end{cases}ISRLU(x)={x1+αx2​x​​if x≥0otherwise​

Parameters:

  • alpha - hyperparameterα\alphaαcontrols the value to which an ISRLU saturates for negative inputs. Default: 1.0

  • isrlu - Uses ISRLU activation function. Default: False

Note: By default, ISRU is initialized.

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

Maxout

echoAI.Activation.t_ops.Maxout()

Applies the element-wise function:

Maxout(x⃗)=max⁡i(xi)\textbf{Maxout}(\vec{x})= \max_{i}(x_{i})Maxout(x)=imax​(xi​)

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

NLReLU

echoAI.Activation.t_ops.NLReLU(beta = 1.0, inplace = False)

Applies the element-wise function:

NLReLU(x)=ln⁡(βmax⁡(0,x)+1.0)\textbf{NLReLU}(x)= \ln(\beta\max(0,x)+1.0)NLReLU(x)=ln(βmax(0,x)+1.0)

Parameters:

  • beta - β\betaβparameter used for NLReLU formulation. Default: 1.0

  • inplace - can optionally do the operation in-place. Default: False

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

Soft Clipping

echoAI.Activation.t_ops.SoftClipping(alpha = 0.5)

Applies the element-wise function:

Soft Clipping(x)=1αlog⁡(1+eαx1+eα(x−1))\textbf{Soft Clipping}(x)= \frac{1}{\alpha}\log{\big(\frac{1+e^{\alpha x}}{1+e^{\alpha (x-1)}}\big)}Soft Clipping(x)=α1​log(1+eα(x−1)1+eαx​)

Parameter:

  • alpha -α\alphaαhyper-parameter, which determines how close to linear the central region is and how sharply the linear region turns to the asymptotic values. Default: 0.5

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

Soft Exponential

echoAI.Activation.t_ops.SoftExponential(alpha = None)

Applies the element-wise function:

Soft Exponential(x)={−log⁡(1+α(x+α))αif α<0xif α=0eαx−1αif α>0\textbf{Soft Exponential}(x)= \begin{cases} \frac{-\log{(1+\alpha(x + \alpha))}}{\alpha} & \text{if } \alpha < 0\\ x & \text{if } \alpha = 0\\ \frac{e^{\alpha x}-1}{\alpha} & \text{if } \alpha > 0 \end{cases}Soft Exponential(x)=⎩⎨⎧​α−log(1+α(x+α))​xαeαx−1​​if α<0if α=0if α>0​

Parameter:

  • alpha -α\alphaαtrainable hyper-parameter which is initialized to zero by default. Default: None

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

SQNL

echoAI.Activation.t_ops.SQNL()

Applies the element-wise function:

SQNL(x)={1if x>2x−x24if 0≤x≤2x+x24if −2≤x<0−1if x<−2\textbf{SQNL}(x)= \begin{cases} 1 & \text{if } x > 2\\ x - \frac{x^2}{4} & \text{if } 0 \leq x \leq 2\\ x + \frac{x^2}{4} & \text{if } -2 \leq x < 0\\ -1 & \text{if } x < -2 \end{cases}SQNL(x)=⎩⎨⎧​1x−4x2​x+4x2​−1​if x>2if 0≤x≤2if −2≤x<0if x<−2​

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

SReLU

echoAI.Activation.t_ops.SReLU(in_features, parameters = None)

Applies the element-wise function:

SReLU(xi)={tir+air(xi−tir)if xi≥tirxiif tir>xi>tiltil+ail(xi−til)xi≤til\textbf{SReLU}(x_{i})= \begin{cases} t_i^r + a_i^r(x_i - t_i^r) & \text{if } x_i \geq t_i^r\\ x_i & \text{if } t_i^r > x_i > t_i^l\\ t_i^l + a_i^l(x_i - t_i^l) & x_i \leq t_i^l \end{cases}SReLU(xi​)=⎩⎨⎧​tir​+air​(xi​−tir​)xi​til​+ail​(xi​−til​)​if xi​≥tir​if tir​>xi​>til​xi​≤til​​

Parameters:

  • in_features - Shape of the input. Datatype: Tuple

  • parameters - ( tr,tl,ar,alt^r,t^l,a^r,a^ltr,tl,ar,al ) parameters for manual initialization, Default: None. If None is passed, parameters are initialized randomly.

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

Funnel

echoAI.Activation.t_ops.Funnel(in_channels)

Applies the element-wise function:

Funnel(x)=max⁡(x,T(x))\textbf{Funnel}(x)= \max(x,\mathbb{T}(x))Funnel(x)=max(x,T(x))

Parameter:

  • in_channels - Number of channels in the input tensor. Datatype: Integer

Shape:

  • Input:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})(N,C,H,W)whereC\mathbf{C}Cindicates the number of channels.

  • Output:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})(N,C,H,W), same shape as input

Reference:

SLAF

echoAI.Activation.t_ops.SLAF(k = 2)

Applies the element-wise function:

SLAF(x)=a0+a1x+a2x2+....+aN−1xN−1\textbf{SLAF}(x)= a_0 + a_1 x + a_2 x^2 + .... + a_{N-1}x^{N-1}SLAF(x)=a0​+a1​x+a2​x2+....+aN−1​xN−1

Parameter:

  • k - Number of Taylor coefficients. Default: 2

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

AReLU

echoAI.Activation.t_ops.AReLU(alpha = 0.90, beta = 2.0)

Applies the element-wise function:

AReLU(xi)={C(α)xiif xi<0(1+σ(β))xiotherwise\textbf{AReLU}(x_i)= \begin{cases} C(\alpha)x_i & \text{if } x_i <0\\ (1+\sigma(\beta))x_i & \text{otherwise} \end{cases}AReLU(xi​)={C(α)xi​(1+σ(β))xi​​if xi​<0otherwise​

Parameters:

  • alpha -α\alphaαtrainable hyper-parameter. Default: 0.90

  • beta -β\betaβtrainable hyper-parameter. Default: 2.0

Shape:

  • Input:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})(N,C,H,W)whereC\mathbf{C}Cindicates the number of channels.

  • Output:(N,C,H,W)(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})(N,C,H,W), same shape as input

Reference:

FReLU

echoAI.Activation.t_ops.FReLU()

Applies the element-wise function:

FReLU(x)=ReLU(x)+bl\textbf{FReLU}(x)= \textbf{ReLU}(x) + b_{l}FReLU(x)=ReLU(x)+bl​

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

DICE

echoAI.Activation.t_ops.DICE(emb_size, dim = 2, epsilon = 1e-8)

Applies the function:

DICE(x)=p(x)∗x+(1−p(x))αx\textbf{DICE}(x)= \textbf{p(x)}\ast x + (1-\textbf{p(x)})\alpha xDICE(x)=p(x)∗x+(1−p(x))αx
p(x)=11+e−x−E(x)Var(x)+ϵ\textbf{p(x)} = \frac{1}{1 + e ^ {-\frac{x - E(x)}{\sqrt{Var(x) + \epsilon}}}}p(x)=1+e−Var(x)+ϵ​x−E(x)​1​

Reference:

Seagull

echoAI.Activation.t_ops.Seagull()

Applies the function:

Seagull(x)=log(1+x2)\textbf{Seagull}(x)= \text{log}(1 + x^{2})Seagull(x)=log(1+x2)

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

Snake

echoAI.Activation.t_ops.Snake(in_features, alpha = None, alpha_trainable = True)

Applies the function:

Snake(x)=x+1αsin⁡2(αx)\textbf{Snake}(x)= x + \frac{1}{\alpha}\sin^{2}(\alpha x)Snake(x)=x+α1​sin2(αx)

Parameters:

  • in_features - shape of the input

  • alpha -α\alphaαtrainable hyper-parameter. Default: 1.0 when specified as None

  • alpha_trainable - switches α\alphaαto be a trainable parameter. Default: True

Shape:

  • Input:(N,∗)(\mathbf{N}, \ast)(N,∗)where∗\ast∗means any number of additional dimensions

  • Output:(N,∗)(\mathbf{N}, \ast)(N,∗), same shape as input

Reference:

SIREN

echoAI.Activation.t_ops.SIREN(dim_in, dim_out, w0 = 30., c = 6., is_first = False, use_bias = True, activation = None)

Applies the function:

SIREN(x)=sin⁡(ω0∗linear(x))\textbf{SIREN}(x)= \sin(\omega_{0}\ast\text{linear}(x))SIREN(x)=sin(ω0​∗linear(x))

Parameters:

  • dim_in - input dimension

  • dim_out - output dimension

  • w0 -ω0\omega_{0}ω0​hyper-parameter. Default: 30.0

  • c - hyper-parameter used in weight initialisation for the linear layer. Default: 6.

  • is_first - used for weight initialisation for the linear layer. Default: False

  • use_bias - initialises bias parameter for the linear layer. Default: True

  • activation - used to initialise an activation function. Default: None. Initialises the sine activation function when None

Shape:

  • Input:(x,dim_in)(\mathbf{x}, dim\_in)(x,dim_in)

  • Output:(x,dim_out)(\mathbf{x}, dim\_out)(x,dim_out)

Reference:

Mish: A Self Regularized Non-Monotonic Activation Function
Searching for Activation Functions
E-swish: Adjusting Activations to Different Network Depths
Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning
Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning
Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning
ARiA: Utilizing Richard's Curve for Controlling the Non-monotonicity of the Activation Function in Deep Neural Nets
Shifting Mean Activation Towards Zero with Bipolar Activation Functions
Learning Activation Functions to Improve Deep Neural Networks
The Quest for the Golden Activation Function
Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)
Maxout Networks
Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks
Neural Network-Based Approach to Phase Space Integration
A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks
SQNL: A New Computationally Efficient Activation Function
Deep Learning with S-shaped Rectified Linear Activation Units
Funnel Activation for Visual Recognition
Learning Activation Functions: A new paradigm for understanding Neural Networks
AReLU: Attention-based Rectified Linear Unit
FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks
Deep Interest Network for Click-Through Rate Prediction
A Use of Even Activation Functions in Neural Networks
Neural Networks Fail to Learn Periodic Functions and How to Fix It
Implicit Neural Representations with Periodic Activation Functions