Farah notation and related work
Compare changes
- farah.cherfaoui authored
+ 31
− 28
@@ -34,52 +34,55 @@
@@ -34,52 +34,55 @@
Let $ X \in \mathbb{R}^{n \times d}$ \todo{est-ce-que le non-gras majuscule est fréquent pour les matrices?} be the matrix data and $Y \in \mathbb{R}^{n}$ be the labels vector associated to the matrix $X$, where for each $i$, $\textbf{x}_i \in \mathcal{X} \subseteq \mathbb{R}^{d}$ and $y_i \in {\mathcal Y} \subseteq \mathbb{R}$. \\
A random forest $F_{t_1, \dots, t_l}$ \todo[inline]{confusion possible notation: majuscule non gras: fonction ou matrice? les deux, à éclaircir} is a classifier made of a collection of $l$ trees ${t_1, \dots, t_l}$. This forest can be seen as a function, and noted as:
where $f$ is a function which depend on the task\todo{f unclear: why to introduce it?}. In a regression setup, where ${\cal Y} = \mathbb{R}$\todo{I don't think it is usefull}, this function can be defined as:
$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \sum_{i = 1}^{l} \alpha_i t_i(x) \ \text{ where } \alpha_i \in \mathbb{R},$$
while in a classification setup, in which ${\cal Y} = \{ c_1, \dots, c_m \}$, $f$ will be a majority vote function:
$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \argmax_{c \in {\cal Y}} \sum_{i = 1}^{l} \mathds{1}(t_i(\textbf{x}) = c).$$
\todo{$\mathds{1}$ not defined}We \todo{no we}will need to define the vector prediction of a forest for all the data matrix: $F_{t_1, \dots, t_l}(X) = \begin{pmatrix}
$F_{t_1, \dots, t_l}(\textbf{x}) \in {\cal Y}$ & the predicted label of \textbf{x} by the forest $F_{t_1, \dots, t_l}$ \\
$F_{t_1, \dots, t_l}(X) \in {\cal Y}^n$ & the predicted label of all the data of $X$ by the forest $F_{t_1, \dots, t_l}$\\
\end{table}\todo[inline]{ajouter les codifications des notations: bold minuscule: vecteur; non-bold majuscule: matrix, etc..}