PyTorch

This page contains details of all activation functions for PyTorch backend supported in Echo.

Mish

echoAI.Activation.t_ops.Mish()

Applies the element-wise function:

\textbf{Mish}(x)=x\tanh(\text{softplus}(x))

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Mish: A Self Regularized Non-Monotonic Activation Function

Swish

echoAI.Activation.t_ops.Swish(eswish = False, swish = True, beta = 1.735, flatten = False, pfts = False)

Allows the following element-wise functions:

\textbf{Swish}(x)=x\text{sigmoid}(\beta_{1} x)

\textbf{ESwish}(x)=\beta x\text{sigmoid}(x)

\textbf{SILU}(x)=x\text{sigmoid}(x)

\textbf{Flatten T-Swish}(x)= \begin{cases} x\text{sigmoid}(x) + c & \text{if } x\geq 0\\ c & \text{otherwise} \end{cases}

Parameters:

eswish - Uses E-Swish activation function. Default: False.
swish - Uses Swish activation function. Default: False.
flatten - Uses Flatten T-Swish activation function. c is a constant of value -0.2. Default: False.
beta - $\beta$ parameter used for E-Swish formulation. Default: 1.375
pfts - Uses Parametric Flatten T-Swish function. Has the same formulation as Flatten T-Swish with only c being a trainable parameter initialized with the value of -0.2 instead of setting as a constant. Default: False.

Note: By default, SILU is initialized.

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

References:

Searching for Activation Functions

E-swish: Adjusting Activations to Different Network Depths

Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

Aria2

echoAI.Activation.t_ops.Aria2(beta = 0.5, alpha = 1.0)

Applies the element-wise function:

\textbf{Aria2}(x)= {(1+e^{-\beta \ast x})}^{-\alpha}

Parameters:

beta - $\beta$ is the exponential growth rate. Default: 0.5
alpha - $\alpha$ is a hyper-parameter which has a two-fold effect; it reduces the curvature in 3rd quadrant as well as increases the curvature in first quadrant while lowering the value of activation. Default: 1.0

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

ARiA: Utilizing Richard's Curve for Controlling the Non-monotonicity of the Activation Function in Deep Neural Nets

BReLU

echoAI.Activation.t_ops.BReLU()

Applies the element-wise function:

\textbf{BReLU}(x_{i})= \begin{cases} f(x_{i}) & \text{if } \text{\textit{i} mod 2 = 0}\\ -f(-x_{i}) & \text{if } \text{\textit{i} mod 2} \neq \text{0} \end{cases}

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Shifting Mean Activation Towards Zero with Bipolar Activation Functions

APL

echoAI.Activation.t_ops.APL(s)

Applies the element-wise function:

\textbf{APL}(x)= \max(0,x) + \sum^{S}_{s=1}{a_{i}^{s} \ast \max(0, -x+b_{i}^{s})}

Parameter:

s - hyperparameter, number of hinges to be set in advance. Default: 1

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Learning Activation Functions to Improve Deep Neural Networks

ELiSH

echoAI.Activation.t_ops.Elish(hard = False)

Allows the following element-wise functions:

\textbf{ELiSH}(x)= \begin{cases} x\text{sigmoid}(x) & \text{if } x \geq 0\\ (e^{x}-1)\text{sigmoid}(x) & \text{otherwise} \end{cases}

\textbf{Hard ELiSH}(x)= \begin{cases} x\max(0, \min(1,(x+1)/2)) & \text{if } x \geq 0\\ (e^{x}-1)\max(0, \min(1,(x+1)/2)) & \text{otherwise} \end{cases}

Parameter:

hard - Uses Hard ELiSH activation function. Default: False

Note: By default, ELiSH is initialized.

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

The Quest for the Golden Activation Function

ISRU

echoAI.Activation.t_ops.ISRU(alpha = 1.0, isrlu = False)

Allows the following element-wise functions:

\textbf{ISRU}(x)= \frac{x}{\sqrt{1+\alpha x^{2}}}

\textbf{ISRLU}(x)= \begin{cases} x & \text{if } x \geq 0\\ \frac{x}{\sqrt{1+\alpha x^{2}}} & \text{otherwise} \end{cases}

Parameters:

alpha - hyperparameter $\alpha$ controls the value to which an ISRLU saturates for negative inputs. Default: 1.0
isrlu - Uses ISRLU activation function. Default: False

Note: By default, ISRU is initialized.

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)

Maxout

echoAI.Activation.t_ops.Maxout()

Applies the element-wise function:

\textbf{Maxout}(\vec{x})= \max_{i}(x_{i})

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Maxout Networks

NLReLU

echoAI.Activation.t_ops.NLReLU(beta = 1.0, inplace = False)

Applies the element-wise function:

\textbf{NLReLU}(x)= \ln(\beta\max(0,x)+1.0)

Parameters:

beta - $\beta$ parameter used for NLReLU formulation. Default: 1.0
inplace - can optionally do the operation in-place. Default: False

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks

Soft Clipping

echoAI.Activation.t_ops.SoftClipping(alpha = 0.5)

Applies the element-wise function:

\textbf{Soft Clipping}(x)= \frac{1}{\alpha}\log{\big(\frac{1+e^{\alpha x}}{1+e^{\alpha (x-1)}}\big)}

Parameter:

alpha - $\alpha$ hyper-parameter, which determines how close to linear the central region is and how sharply the linear region turns to the asymptotic values. Default: 0.5

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Neural Network-Based Approach to Phase Space Integration

Soft Exponential

echoAI.Activation.t_ops.SoftExponential(alpha = None)

Applies the element-wise function:

\textbf{Soft Exponential}(x)= \begin{cases} \frac{-\log{(1+\alpha(x + \alpha))}}{\alpha} & \text{if } \alpha < 0\\ x & \text{if } \alpha = 0\\ \frac{e^{\alpha x}-1}{\alpha} & \text{if } \alpha > 0 \end{cases}

Parameter:

alpha - $\alpha$ trainable hyper-parameter which is initialized to zero by default. Default: None

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks

SQNL

echoAI.Activation.t_ops.SQNL()

Applies the element-wise function:

\textbf{SQNL}(x)= \begin{cases} 1 & \text{if } x > 2\\ x - \frac{x^2}{4} & \text{if } 0 \leq x \leq 2\\ x + \frac{x^2}{4} & \text{if } -2 \leq x < 0\\ -1 & \text{if } x < -2 \end{cases}

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

SQNL: A New Computationally Efficient Activation Function

SReLU

echoAI.Activation.t_ops.SReLU(in_features, parameters = None)

Applies the element-wise function:

\textbf{SReLU}(x_{i})= \begin{cases} t_i^r + a_i^r(x_i - t_i^r) & \text{if } x_i \geq t_i^r\\ x_i & \text{if } t_i^r > x_i > t_i^l\\ t_i^l + a_i^l(x_i - t_i^l) & x_i \leq t_i^l \end{cases}

Parameters:

in_features - Shape of the input. Datatype: Tuple
parameters - ( $t^r,t^l,a^r,a^l$ ) parameters for manual initialization, Default: None. If None is passed, parameters are initialized randomly.

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Deep Learning with S-shaped Rectified Linear Activation Units

Funnel

echoAI.Activation.t_ops.Funnel(in_channels)

Applies the element-wise function:

\textbf{Funnel}(x)= \max(x,\mathbb{T}(x))

Parameter:

in_channels - Number of channels in the input tensor. Datatype: Integer

Shape:

Input: $(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})$ where $\mathbf{C}$ indicates the number of channels.
Output: $(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})$ , same shape as input

Reference:

Funnel Activation for Visual Recognition

SLAF

echoAI.Activation.t_ops.SLAF(k = 2)

Applies the element-wise function:

\textbf{SLAF}(x)= a_0 + a_1 x + a_2 x^2 + .... + a_{N-1}x^{N-1}

Parameter:

k - Number of Taylor coefficients. Default: 2

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Learning Activation Functions: A new paradigm for understanding Neural Networks

AReLU

echoAI.Activation.t_ops.AReLU(alpha = 0.90, beta = 2.0)

Applies the element-wise function:

\textbf{AReLU}(x_i)= \begin{cases} C(\alpha)x_i & \text{if } x_i <0\\ (1+\sigma(\beta))x_i & \text{otherwise} \end{cases}

Parameters:

alpha - $\alpha$ trainable hyper-parameter. Default: 0.90
beta - $\beta$ trainable hyper-parameter. Default: 2.0

Shape:

Input: $(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})$ where $\mathbf{C}$ indicates the number of channels.
Output: $(\mathbf{N}, \mathbf{C}, \mathbf{H}, \mathbf{W})$ , same shape as input

Reference:

AReLU: Attention-based Rectified Linear Unit

FReLU

echoAI.Activation.t_ops.FReLU()

Applies the element-wise function:

\textbf{FReLU}(x)= \textbf{ReLU}(x) + b_{l}

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks

DICE

echoAI.Activation.t_ops.DICE(emb_size, dim = 2, epsilon = 1e-8)

Applies the function:

\textbf{DICE}(x)= \textbf{p(x)}\ast x + (1-\textbf{p(x)})\alpha x

\textbf{p(x)} = \frac{1}{1 + e ^ {-\frac{x - E(x)}{\sqrt{Var(x) + \epsilon}}}}

Reference:

Deep Interest Network for Click-Through Rate Prediction

Seagull

echoAI.Activation.t_ops.Seagull()

Applies the function:

\textbf{Seagull}(x)= \text{log}(1 + x^{2})

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

A Use of Even Activation Functions in Neural Networks

Snake

echoAI.Activation.t_ops.Snake(in_features, alpha = None, alpha_trainable = True)

Applies the function:

\textbf{Snake}(x)= x + \frac{1}{\alpha}\sin^{2}(\alpha x)

Parameters:

in_features - shape of the input
alpha - $\alpha$ trainable hyper-parameter. Default: 1.0 when specified as None
alpha_trainable - switches $\alpha$ to be a trainable parameter. Default: True

Shape:

Input: $(\mathbf{N}, \ast)$ where $\ast$ means any number of additional dimensions
Output: $(\mathbf{N}, \ast)$ , same shape as input

Reference:

Neural Networks Fail to Learn Periodic Functions and How to Fix It

SIREN

echoAI.Activation.t_ops.SIREN(dim_in, dim_out, w0 = 30., c = 6., is_first = False, use_bias = True, activation = None)

Applies the function:

\textbf{SIREN}(x)= \sin(\omega_{0}\ast\text{linear}(x))

Parameters:

dim_in - input dimension
dim_out - output dimension
w0 - $\omega_{0}$ hyper-parameter. Default: 30.0
c - hyper-parameter used in weight initialisation for the linear layer. Default: 6.
is_first - used for weight initialisation for the linear layer. Default: False
use_bias - initialises bias parameter for the linear layer. Default: True
activation - used to initialise an activation function. Default: None. Initialises the sine activation function when None

Shape:

Input: $(\mathbf{x}, dim\_in)$
Output: $(\mathbf{x}, dim\_out)$

Reference:

Implicit Neural Representations with Periodic Activation Functions

PreviousActivation Functions NextTensorFlow

Last updated 4 years ago

Was this helpful?