diff --git a/reports/bolsonaro.tex b/reports/bolsonaro.tex index 44a7bb2ed5c54c14a0fa0ab7bfff0b46aa24840c..b60cbc710c27c4f9e5c84295287f37cffe2b4b21 100644 --- a/reports/bolsonaro.tex +++ b/reports/bolsonaro.tex @@ -34,52 +34,55 @@ introduire le pb et les motivation ... \subsection{Notation} -Let $ X \in \mathbb{R}^{n \times d}$ \todo{est-ce-que le non-gras majuscule est fréquent pour les matrices?} be the matrix data and $Y \in \mathbb{R}^{n}$ be the labels vector associated to the matrix $X$, where for each $i$, $\textbf{x}_i \in \mathcal{X} \subseteq \mathbb{R}^{d}$ and $y_i \in {\mathcal Y} \subseteq \mathbb{R}$. \\ +Let $ \textbf{X} \in \mathbb{R}^{n \times d}$ \todo{est-ce-que le non-gras majuscule est fréquent pour les matrices?} be the matrix data and $\textbf{y} \in \mathbb{R}^{n}$ be the labels vector associated to the matrix $\textbf{X}$, where for each $i$, $y_i \in {\mathcal Y}$. \\ -A random forest $F_{t_1, \dots, t_l}$ \todo[inline]{confusion possible notation: majuscule non gras: fonction ou matrice? les deux, à éclaircir} is a classifier made of a collection of $l$ trees ${t_1, \dots, t_l}$. This forest can be seen as a function, and noted as: +A random forest $F_{T_1, \dots, T_l}$ is a classifier made of a collection of $l$ trees ${T_1, \dots, T_l}$. A single tree and a forest can both be seen as functions. To define this tools, let us introduce ${\cal H}$ as the set of all possible trees: +\todo[inline]{confusion possible notation: majuscule non gras: fonction ou matrice? les deux, à éclaircir} +% +$$ {\cal H} = \{T \ | \ T :\ \mathbb{R}^{n \times d} \ \to \ \cal{Y}\}.$$ +% +In this case, a forest can be noted as: % $$\begin{array}{ccccc} -F_{t_1, \dots, t_l} & : & \cal{X} & \to & \cal{Y} \\ - & & \textbf{x} & \mapsto & F_{t_1, \dots, t_l}(\textbf{x}) = f(\{t_1, \dots, t_l\} , \textbf{x}) \\ +F_{T_1, \dots, T_l} & : & \cal{X} & \to & \cal{Y} \\ + & & \textbf{x} & \mapsto & F_{T_1, \dots, T_l}(\textbf{x}) = H(\{T_1, \dots, T_l\} , \textbf{x}) \\ \end{array}$$ % -where $f$ is a function which depend on the task\todo{f unclear: why to introduce it?}. In a regression setup, where ${\cal Y} = \mathbb{R}$\todo{I don't think it is usefull}, this function can be defined as: +where $H$ is a function which depend on the task\todo{f unclear: why to introduce it?}. In a regression setup, where ${\cal Y} = \mathbb{R}$\todo{I don't think it is usefull}, this function can be defined as: % -$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \sum_{i = 1}^{l} \alpha_i t_i(x) \ \text{ where } \alpha_i \in \mathbb{R},$$ +$$H(\{T_1, \dots, T_l \} , \textbf{x}) = \sum_{i = 1}^{l} \alpha_i T_i(x) \ \text{ where } \alpha_i \in \mathbb{R},$$ % -while in a classification setup, in which ${\cal Y} = \{ c_1, \dots, c_m \}$, $f$ will be a majority vote function: +while in a classification setup, in which ${\cal Y} = \{ c_1, \dots, c_m \}$, $H$ will be a majority vote function: % -$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \argmax_{c \in {\cal Y}} \sum_{i = 1}^{l} \mathds{1}(t_i(\textbf{x}) = c).$$ +$$H(\{T_1, \dots, T_l \} , \textbf{x}) = \argmax_{c \in {\cal Y}} \sum_{i = 1}^{l} \mathds{1}(T_i(\textbf{x}) = c),$$ +where $\mathds{1}$ is the indicator function which return $1$ if its argument is correct, and $0$ otherwise. % -\todo{$\mathds{1}$ not defined}We \todo{no we}will need to define the vector prediction of a forest for all the data matrix: $F_{t_1, \dots, t_l}(X) = \begin{pmatrix} - F_{t_1, \dots, t_l}(x_1) \\ +\todo{$\mathds{1}$ not defined}We \todo{no we}will need to define the vector prediction of a forest for all the data matrix: $F_{t_1, \dots, t_l}(\textbf{X}) = \begin{pmatrix} + F_{T_1, \dots, T_l}(\textbf{x}_1) \\ \dots \\ - F_{t_1, \dots, t_l}(x_n) + F_{T_1, \dots, T_l}(\textbf{x}_n) \end{pmatrix}.$\\ % % % -All these notations can be summarized in the following table:\\ +All these notations can be summarized in Table \ref{table: notation}:\\ \begin{table} -\begin{tabular}{l c} - - %\hline - \textbf{x} & the vector x \\ - $k$ & the desired (pruned) forest size \\ - $X$ & the matrix $X$ \\ - ${\cal X}$ & the data representation space \\ - ${\cal Y}$ & the label representation space \\ - $n$ & the number of data\\ +\begin{tabular}{ l c } + lowercase & integer \\ + bold lowercase& vector \\ + bold capital & matrix \\ + calligraphic letters & vector space \\ + $F_{T_1, \dots, T_l}$ & a forest of $l$ trees \\ + $F_{T_1, \dots, T_l}(\textbf{x}) \in {\cal Y}$ & the predicted label of \textbf{x} by the forest $F_{T_1, \dots, T_l}$ \\ + $F_{T_1, \dots, T_l}(\textbf{X}) \in {\cal Y}^n$ & the predicted label of all the data of $\textbf{X}$ by the forest $F_{T_1, \dots, T_l}$\\ + $n$ & the number of data \\ $d$ & the data dimension \\ - $l$ & the forest size \\ - $F_{t_1, \dots, t_l}$ & a forest of $l$ trees \\ - $F_{t_1, \dots, t_l}(\textbf{x}) \in {\cal Y}$ & the predicted label of \textbf{x} by the forest $F_{t_1, \dots, t_l}$ \\ - $F_{t_1, \dots, t_l}(X) \in {\cal Y}^n$ & the predicted label of all the data of $X$ by the forest $F_{t_1, \dots, t_l}$\\ - %\hline - + $l$ & the initial forest size \\ + $k$ & the desired (pruned) forest size \\ \end{tabular} -\caption{Notations} +\caption{Notations used in this paper} +\label{table: notation} \end{table}\todo[inline]{ajouter les codifications des notations: bold minuscule: vecteur; non-bold majuscule: matrix, etc..}