Skip to content
Snippets Groups Projects
Commit 7666a1f2 authored by valentin.emiya's avatar valentin.emiya
Browse files

minor updates in mad_array class and tutorial

parent 8345a2e2
No related branches found
No related tags found
No related merge requests found
Pipeline #
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%pylab inline %pylab inline
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Tutorial on how to use `MadArray` objects # Tutorial on how to use `MadArray` objects
A `MadArray` is a numpy array with missing elements. It is generated using three types of parameter: A `MadArray` is a numpy array with missing elements. It is generated using three types of parameter:
* **data** as an array of entries, either *int*, *float* or *complex*; * **data** as an array of entries, either *int*, *float* or *complex*;
* a **mask** indicating the missing entries; * a **mask** indicating the missing entries;
* **options** to define the behaviour of the object. * **options** to define the behaviour of the object.
A basic initialisation requires only a data matrix. Without mask, all elements are considered as non-missing A basic initialisation requires only a data matrix. Without mask, all elements are considered as non-missing
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from madarrays import MadArray from madarrays import MadArray
# initialisation without mask # initialisation without mask
data = np.random.rand(4, 6) data = np.random.rand(4, 6)
A = MadArray(data) A = MadArray(data)
print(A) print(A)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Masking ## Masking
The masking of data differs according to the type of entries: The masking of data differs according to the type of entries:
* if the entries are *int* or *float*, the masking is done exclusively by giving aa boolean array of the same size of the data as argument `mask`, each entry indicating if the corresponding entry in the data array is missing or not * if the data entries are not *complex* (e.g., *int* or *float*), argument `mask` must be exclusively a boolean array with the same shape as the data array, each entry indicating if the corresponding entry in the data array is missing or not;
* if the entries are *complex*, the masking can be done as previously, or by giving boolean arrays of the same size of the data as arguments `mask_amplitude` and `mask_phase`, each entry indicating respectively if the magnitude and the phase of the corresponding entry is missing or not. * if the data entries are *complex*, the masking can be done as previously, or by setting argument `complex_masking` to `true` and by giving two boolean arrays `mask_magnitude` and `mask_phase` with the same size with the same shape as the data array, each entry indicating respectively if the magnitude and the phase of the corresponding entry is missing or not.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# initialization with a mask # initialization with a mask
mask = np.random.random(data.shape) < 0.5 mask = np.random.random(data.shape) < 0.5
Am = MadArray(data, mask) Am = MadArray(data, mask)
print(mask) print(mask)
print(Am) print(Am)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
A *MadArray* can also be defined from another *MadArray*, for example to copy the object: A *MadArray* can also be defined from another *MadArray*, for example to copy the object:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
Am2 = MadArray(Am) Am2 = MadArray(Am)
print('{} - {}'.format(str(Am), repr(Am))) print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am2), repr(Am2))) print('{} - {}'.format(str(Am2), repr(Am2)))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
A different mask can also be used: A different mask can also be used:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
mask2 = np.random.random(data.shape) < 0.9 mask2 = np.random.random(data.shape) < 0.9
Am3 = MadArray(Am, mask2) Am3 = MadArray(Am, mask2)
print('{} - {}'.format(str(Am), repr(Am))) print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am3), repr(Am3))) print('{} - {}'.format(str(Am3), repr(Am3)))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
For complex data:
%% Cell type:code id: tags:
``` python
import madarrays
complex_data = np.random.rand(4, 6) + 1j * np.random.rand(4, 6)
mask_mag = np.random.random(data.shape) < 0.5
mask_pha = np.random.random(data.shape) < 0.5
A_cpx1 = MadArray(complex_data, mask)
A_cpx2 = MadArray(complex_data, mask_magnitude=mask_mag, mask_phase=mask_pha, complex_masking=True)
print('{} - {}'.format(str(A_cpx1), repr(A_cpx1)))
print('{} - {}'.format(str(A_cpx2), repr(A_cpx2)))
print('Magnitude mask', mask_mag)
print('Phase mask', mask_pha)
```
%% Cell type:markdown id: tags:
## Properties ## Properties
A *MadArray* has attributes that give information about the masking. A *MadArray* has attributes that give information about the masking.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# mask of non-missing elements # mask of non-missing elements
print(Am.known_mask) print(Am.known_mask)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# mask of missing elements # mask of missing elements
print(Am.unknown_mask) print(Am.unknown_mask)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
print('Is masked: {}'.format(Am.is_masked)) print('Is masked: {}'.format(Am.is_masked))
print('Ratio missing data: {}'.format(Am.ratio_missing_data)) print('Ratio missing data: {}'.format(Am.ratio_missing_data))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Indexing ## Indexing
There are two different and incompatible ways to index *MadArray*. By default (`masked_indexing=False`), it is similar to the indexing of *nd-array*: both the data matrix and the mask are indexed, and a *MadArray* with the shape defined by the indices is returned: There are two different and incompatible ways to index *MadArray*. By default (`masked_indexing=False`), it is similar to the indexing of *nd-array*: both the data matrix and the mask are indexed, and a *MadArray* with the shape defined by the indices is returned:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
print(A[0:3, 1:3]) print(A[0:3, 1:3])
print(Am[0:3, 1:3]) print(Am[0:3, 1:3])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
With the other way (`masked_indexing=True`), a MadArray with the shape unchanged is returned, where non-indexed entries are considered as masked. With the other way (`masked_indexing=True`), a MadArray with the shape unchanged is returned, where non-indexed entries are considered as masked.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
Am4 = MadArray(data, mask, masked_indexing=True) Am4 = MadArray(data, mask, masked_indexing=True)
print(Am4[0:3, 1:3]) print(Am4[0:3, 1:3])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This latter approach is adapted to be handled with *scikit-learn* procedures. This latter approach is adapted to be handled with *scikit-learn* procedures.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Numerical operations ## Numerical operations
Numpy functions apply on *MadArray*, but **without** taking into account the mask Numpy functions apply on *MadArray*, but **without** taking into account the mask
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
print(np.mean(A)) print(np.mean(A))
print(np.mean(Am)) print(np.mean(Am))
``` ```
%% Cell type:code id: tags:
``` python
```
......
...@@ -50,8 +50,9 @@ import numpy as np ...@@ -50,8 +50,9 @@ import numpy as np
def _merge_masks(ma1, ma2): def _merge_masks(ma1, ma2):
"""Merge the masks of two MadArray objects and return the arguments used """Merge the masks of two :class:`MadArray` objects and return the
for initialisation of the resulting MadArray. arguments used for initialisation of the resulting :class:`MadArray`
object.
Parameters Parameters
---------- ----------
...@@ -63,7 +64,8 @@ def _merge_masks(ma1, ma2): ...@@ -63,7 +64,8 @@ def _merge_masks(ma1, ma2):
Returns Returns
------- -------
dict dict
Arguments to be used for the initialisation of a MadArray. Arguments to be used for the initialisation of a :class:`MadArray`
object.
""" """
if ma1._complex_masking or ma2._complex_masking: if ma1._complex_masking or ma2._complex_masking:
...@@ -134,24 +136,28 @@ class MadArray(np.ndarray): ...@@ -134,24 +136,28 @@ class MadArray(np.ndarray):
* *3* if the magnitude and the phase are masked (unknown magnitude and * *3* if the magnitude and the phase are masked (unknown magnitude and
phase). phase).
This mode is selected by setting :paramref:`complex_masking` to True. This mode is selected by setting :paramref:`complex_masking` to `True`.
Entries are converted to a complex type. Entries are converted to a complex type. If entries are complex values
and paramref:`complex_masking` is `False`, both the magnitude and
phase are masked.
.. _indexing_madarray: .. _indexing_madarray:
**Indexing**: two different mode to index a MadArray are implemented: **Indexing**: two different modes to index a :class:`MadArray` object are
implemented:
* a MadArray array of shape corresponding to the indices is returned, with * a :class:`MadArray` object with shape corresponding to the indices is
both the data matrix and the mask properly indexed. This is the default returned, with both the data matrix and the mask properly indexed. This
mode; is the default mode;
* a MadArray with the shape unchanged is returned, where non-indexed * a :class:`MadArray` object with unchanged shape is returned,
entries are set as masked. This mode is selected by setting the parameter where non-indexed entries are set as masked. This mode is selected by
:paramref:`masked_indexing` to True. setting the parameter :paramref:`masked_indexing` to True.
.. _numpy_behaviour_madarray: .. _numpy_behaviour_madarray:
**Numpy behaviour**: it is possible to use standard operations (+, -, /, **Numpy behaviour**: it is possible to use standard operations (+, -, /,
//, \*, T) between two :class:`MadArray`, likewise operations between numpy //, \*, T) between two :class:`MadArray` objects, likewise operations
between numpy
arrays. The resulting object has a mask consisting of the union of the arrays. The resulting object has a mask consisting of the union of the
operands. It is also possible to use pickle operations to jointly store the operands. It is also possible to use pickle operations to jointly store the
data and the mask. data and the mask.
...@@ -190,10 +196,10 @@ class MadArray(np.ndarray): ...@@ -190,10 +196,10 @@ class MadArray(np.ndarray):
Notes Notes
----- -----
This class implements an alternative masked array different from This class implements an alternative masked array different from
np.ma.MadArray. The reason of this choice is that it is only used as a :class:`np.ma.MaskedArray`. The reason of this choice is that it is only
container of a ndarray and a mask. No masked operations are needed. used as a container of a ndarray and a mask. No masked operations are
needed.
""" """
def __new__(cls, data, mask=None, mask_magnitude=None, mask_phase=None, def __new__(cls, data, mask=None, mask_magnitude=None, mask_phase=None,
masked_indexing=None, complex_masking=None, **kwargs): masked_indexing=None, complex_masking=None, **kwargs):
...@@ -703,11 +709,20 @@ class MadArray(np.ndarray): ...@@ -703,11 +709,20 @@ class MadArray(np.ndarray):
if np.issubdtype(self.dtype, np.integer): if np.issubdtype(self.dtype, np.integer):
arr_str = arr_str.replace('.', '') arr_str = arr_str.replace('.', '')
string = 'MadArray, dtype={}, {} missing entries ({:.1f}%)\n{}' if self._complex_masking:
string = 'MadArray, dtype={0},' \
'{1[0]} missing magnitudes ({2[0]:.1%}) ' \
'and {1[1]} missing phases ({2[1]:.1%})\n{3}'
return string.format(self.dtype,
self.n_missing_data,
self.ratio_missing_data,
arr_str)
else:
string = 'MadArray, dtype={}, {} missing entries ({:.1%})\n{}'
return string.format(self.dtype, return string.format(self.dtype,
self.n_missing_data, self.n_missing_data,
100 * self.ratio_missing_data, arr_str) self.ratio_missing_data, arr_str)
def __repr__(self): def __repr__(self):
string = '<MadArray at {}>' string = '<MadArray at {}>'
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment