Charly Lamothe
--- a/reports/bolsonaro.tex

+ 31

− 28
+++ b/reports/bolsonaro.tex

+ 31

− 28
 @@ -34,52 +34,55 @@
 @@ -34,52 +34,55 @@
 introduire le pb et les motivation ...
 \subsection{Notation}
-Let $ X \in \mathbb{R}^{n \times d}$ \todo{est-ce-que le non-gras majuscule est fréquent pour les matrices?} be the matrix data and $Y \in \mathbb{R}^{n}$ be the labels vector associated to the matrix $X$, where for each $i$, $\textbf{x}_i \in \mathcal{X} \subseteq \mathbb{R}^{d}$ and $y_i \in {\mathcal Y} \subseteq \mathbb{R}$. \\
+Let $ \textbf{X} \in \mathbb{R}^{n \times d}$ \todo{est-ce-que le non-gras majuscule est fréquent pour les matrices?} be the matrix data and $\textbf{y} \in \mathbb{R}^{n}$ be the labels vector associated to the matrix $\textbf{X}$, where for each $i$, $y_i \in {\mathcal Y}$. \\
-A random forest $F_{t_1, \dots, t_l}$ \todo[inline]{confusion possible notation: majuscule non gras: fonction ou matrice? les deux, à éclaircir}  is a classifier  made of a collection of $l$ trees ${t_1, \dots, t_l}$. This forest can be seen as a function, and noted as:
+A random forest $F_{T_1, \dots, T_l}$  is a classifier  made of a collection of $l$ trees ${T_1, \dots, T_l}$. A single tree and a forest can both be seen as functions. To define this tools, let us introduce ${\cal H}$ as the set of all possible trees:
+\todo[inline]{confusion possible notation: majuscule non gras: fonction ou matrice? les deux, à éclaircir}
+%
+$$ {\cal H} = \{T \ | \ T :\ \mathbb{R}^{n \times d} \ \to \ \cal{Y}\}.$$
+%
+In this case, a forest can be noted as:
 %
 $$\begin{array}{ccccc}
-F_{t_1, \dots, t_l} & : & \cal{X} & \to & \cal{Y} \\
+F_{T_1, \dots, T_l} & : & \cal{X} & \to & \cal{Y} \\
- & & \textbf{x} & \mapsto & F_{t_1, \dots, t_l}(\textbf{x}) = f(\{t_1, \dots, t_l\} , \textbf{x}) \\
+ & & \textbf{x} & \mapsto & F_{T_1, \dots, T_l}(\textbf{x}) = H(\{T_1, \dots, T_l\} , \textbf{x}) \\
 \end{array}$$
 %
-where $f$ is a function which depend on the task\todo{f unclear: why to introduce it?}. In a regression setup, where ${\cal Y} = \mathbb{R}$\todo{I don't think it is usefull}, this function can be defined as: 
+where $H$ is a function which depend on the task\todo{f unclear: why to introduce it?}. In a regression setup, where ${\cal Y} = \mathbb{R}$\todo{I don't think it is usefull}, this function can be defined as: 
 %
-$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \sum_{i = 1}^{l} \alpha_i t_i(x) \ \text{ where } \alpha_i \in \mathbb{R},$$
+$$H(\{T_1, \dots, T_l \} , \textbf{x}) = \sum_{i = 1}^{l} \alpha_i T_i(x) \ \text{ where } \alpha_i \in \mathbb{R},$$
 %
-while in a classification setup, in which ${\cal Y} = \{ c_1, \dots, c_m \}$, $f$ will be a majority vote function:
+while in a classification setup, in which ${\cal Y} = \{ c_1, \dots, c_m \}$, $H$ will be a majority vote function:
 %
-$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \argmax_{c \in {\cal Y}} \sum_{i = 1}^{l}  \mathds{1}(t_i(\textbf{x}) = c).$$
+$$H(\{T_1, \dots, T_l \} , \textbf{x}) = \argmax_{c \in {\cal Y}} \sum_{i = 1}^{l}  \mathds{1}(T_i(\textbf{x}) = c),$$
+where $\mathds{1}$ is the indicator function which return $1$ if its argument is correct, and $0$ otherwise.
 %
-\todo{$\mathds{1}$ not defined}We \todo{no we}will need to define the vector prediction of a forest for all the data matrix: $F_{t_1, \dots, t_l}(X)  = \begin{pmatrix}
+\todo{$\mathds{1}$ not defined}We \todo{no we}will need to define the vector prediction of a forest for all the data matrix: $F_{t_1, \dots, t_l}(\textbf{X})  = \begin{pmatrix}
-   F_{t_1, \dots, t_l}(x_1) \\
+   F_{T_1, \dots, T_l}(\textbf{x}_1) \\
   \dots \\
-   F_{t_1, \dots, t_l}(x_n) 
+   F_{T_1, \dots, T_l}(\textbf{x}_n) 
 \end{pmatrix}.$\\
 %
 %
 %
-All these notations can be summarized in the following table:\\
+All these notations can be summarized in Table \ref{table: notation}:\\
 \begin{table}
-\begin{tabular}{l c} 
+\begin{tabular}{ l c }     
+  lowercase &  integer \\
-  %\hline
+  bold lowercase&   vector \\
-  \textbf{x} & the vector x \\
+  bold capital &  matrix \\
-  $k$ & the desired (pruned) forest size \\
+  calligraphic letters & vector space \\
-  $X$ & the matrix $X$ \\
+  $F_{T_1, \dots, T_l}$ & a forest of $l$ trees \\
-  ${\cal X}$ & the data representation space \\
+  $F_{T_1, \dots, T_l}(\textbf{x}) \in {\cal Y}$ & the predicted label of \textbf{x} by the forest $F_{T_1, \dots, T_l}$ \\
-  ${\cal Y}$ & the label representation space \\
+  $F_{T_1, \dots, T_l}(\textbf{X}) \in {\cal Y}^n$ & the predicted label of all the data of $\textbf{X}$ by the forest $F_{T_1, \dots, T_l}$\\  
-  $n$ & the number of data\\
+  $n$ & the number of data \\
  $d$ & the data dimension \\
-  $l$ & the forest size \\
+  $l$ & the initial forest size \\
-  $F_{t_1, \dots, t_l}$ & a forest of $l$ trees \\
+  $k$ & the desired (pruned) forest size \\
-  $F_{t_1, \dots, t_l}(\textbf{x}) \in {\cal Y}$ & the predicted label of \textbf{x} by the forest $F_{t_1, \dots, t_l}$ \\
-  $F_{t_1, \dots, t_l}(X) \in {\cal Y}^n$ & the predicted label of all the data of $X$ by the forest $F_{t_1, \dots, t_l}$\\
-  %\hline
 \end{tabular} 
-\caption{Notations} 
+\caption{Notations used in this paper} 
+\label{table: notation}
 \end{table}\todo[inline]{ajouter les codifications des notations: bold minuscule: vecteur; non-bold majuscule: matrix, etc..}