Mathematical Formulas for Machine Learning

May 3, 2024·
Junhong Liu
Junhong Liu
· 2 min read

1. Gradient Descent

Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for finding a local minimum of a differentiable multivariate function.

$$ \theta_{j+1} = \theta_{j} - \alpha \nabla J(\theta_{j}) $$

2. Normal Distribution

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable.

$$ f(x | \mu , \sigma^2) = \frac{1}{\sigma \sqrt{2 \pi}} \cdot \mathsf{exp}\left( - \frac{1}{2} \cdot ( \frac{x - \mu}{ \sigma})^2 \right) $$

3. Z-score

In statistics, the standard score (Z-score) is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

$$ Z = \frac{x - \mu}{\sigma} $$

4. Sigmoid

Sigmoid function is any mathematical function whose graph has a characteristic S-shaped or sigmoid curve. A common example of a sigmoid function is the logistic function shown below.

$$ \sigma(x) = \frac{1}{1 + e^{-x}} $$

5. Population Pearson Correlation Coefficient

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1.

$$ \mathsf{Correlation} = \frac{\mathsf{Cov}(X, Y)}{\mathsf{Std}(X) \cdot \mathsf{Std}(Y)} $$

6. Cosine Similarity

$$ \mathsf{Similarity} = \frac{\vec{A} \cdot \vec{B}}{||\vec{A}|| \cdot ||\vec{B}||} $$

7. Naive Bayes

$$ P(y|x_1, ...,x_n) = \frac{P(y) \prod_{i=1}^n P(x_i|y)}{P(x_1, ...,x_n)} $$

8. Maximum Likelihood Estimation (MLE)

$$ \underset{\theta}{\operatorname{argmax}} \prod_{i=1}^n P(x_i|\theta) $$

9. Ordinary Least Squares (OLS)

$$ \hat{\beta} = (X^{\mathsf{T}} X)^{-1} X^{\mathsf{T}}y $$

F1 Score

$$ F_1 = \frac{2 \cdot P \cdot R}{P + R} $$