diff --git a/reports/bolsonaro.tex b/reports/bolsonaro.tex index 9c83998387002b4b0eb4f112c02b0fbf3f596db0..24e7fb12263adff9dbf4e56add20976f90966592 100644 --- a/reports/bolsonaro.tex +++ b/reports/bolsonaro.tex @@ -5,6 +5,13 @@ \usepackage{algpseudocode} \usepackage{algorithm} \usepackage{float} +\usepackage{dsfont} +\usepackage{amsmath} + + +\DeclareMathOperator*{\argmax}{arg\,max} +\DeclareMathOperator*{\argmin}{arg\,min} + \algnewcommand\algorithmicforeach{\textbf{for each}} \algdef{S}[FOR]{ForEach}[1]{\algorithmicforeach\ #1\ \algorithmicdo} @@ -23,7 +30,28 @@ \section{Introduction} \subsection{Notation} -$S = \{(x_i, y_i)\}^n_{i=1}$ the dataset, with $x_i \in X$ and $y_i \in Y$. $T = \{t_1, t_2, \dots, t_d\}$ the random forest of $d$ trees, such that $t_j : X \rightarrow Y$. +Let $ X \in \mathbb{R}^{n \times d}$ be the matrix data and $Y \in \mathbb{R}^{n}$ be the labels vector associated to the matrix $X$, where for each $i$, $\textbf{x}_i \in \cal{X} \subseteq \mathbb{R}^d$ and $y_i \in {\cal Y} \subseteq \mathbb{R}$. \\ + +A random forest $F_{t_1, \dots, t_l}$ is a classifier made of a collection of $l$ trees ${t_1, \dots, t_l}$. This forest can be seen as a function, and noted as: +% +$$\begin{array}{ccccc} +F_{t_1, \dots, t_l} & : & \cal{X} & \to & \cal{Y} \\ + & & \textbf{x} & \mapsto & F_{t_1, \dots, t_l}(\textbf{x}) = f(\{t_1, \dots, t_l\} , \textbf{x}) \\ +\end{array}$$ +% +where $f$ is a function which depend on the task. In a regression setup, where ${\cal Y} = \mathbb{R}$, this function can be defined as: +% +$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \sum_{i = 1}^{l} \alpha_i t_i(x) \ \text{ where } \alpha_i \in \mathbb{R},$$ +% +while in a classification setup, in which ${\cal Y} = \{ c_1, \dots, c_m \}$, $f$ will be a majority vote function: +% +$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \argmax_{c \in {\cal Y}} \sum_{i = 1}^{l} \mathds{1}(t_i(\textbf{x}) = c).$$ +% +We will need to define the vector prediction of a forest for all the data matrix: $F_{t_1, \dots, t_l}(X) = \begin{pmatrix} + F_{t_1, \dots, t_l}(x_1) \\ + \dots \\ + F_{t_1, \dots, t_l}(x_n) +\end{pmatrix}.$ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Orthogonal Matching Pursuit (OMP)}