Update README.md

be512391 · Hachem Kadri · b1cb28f5 · be512391
Commit be512391 authored Jun 11, 2023 by Hachem Kadri
--- a/README.md
+++ b/README.md
 # ML QUANTUM SEPARABILITY

-The library is a toolbox written in python dedicated to the efficient generation of large labeled dataset for the quantum separability problem in high dimensional space. 
+The library is a toolbox written in Python dedicated to the efficient generation of large-scale labeled datasets for the quantum separability problem in high-dimensional settings. 

+The repo contains code & dataset accompaning the paper, Large-Scale Quantum Separability Through a Reproducible Machine Learning Lens.
 
-## dependencies
- numpy (>= 1.23.5)
- scipy (>= 1.10.0)

-## organisation
+## Dependencies
+- Python (>=3.6)
+- NumPy (>= 1.23.5)
+- SciPy (>= 1.10.0)

- src : contain the algorithms
- data : contain the datasets used in our experiments
- models : contain all the models trained during our experiments
+## Organisation

+- src: source base directory containing all the Python source code. 
+- data.zip: simulated labeled dataset with thousands of bipartite separable and entangled density matrices of sizes 9 × 9 and 49 × 49 (two-qudit mixed states with d=3 or d=7).
+- main.py: code for learning from data the decision function between separable and entangled states (the classifier in this example is SVM from scikit-learn). 
+- models.zip: contains SVM models trained on the quantum-separability dataset.

-## usage
+
+## Usage

 ### Pipeline
-the library is organised around the Pipeline class, which allow to define a sampling strategy as a serie of transformative steps applied to density matrices sampled from an initial probability distribution.
+The library is organised around the "Pipeline" class. It is based on sampling density matrices and defining a number of transformations on them.
+
+#### Example: PPT entagled density matrices

-We give a typical use case in the following snipped of code :
+We give a typical use case in the following code snippet. The goal here is to generate density matrices that will probably be PPT and entangled.

 ```python
 from types import save_dmstack, load_dmstack
@@ -33,28 +39,34 @@ states, infos = Pipeline([
 	('sample', RandomInduced(k_params=[25]).states),		# induced measure of parameter 25
 	('ppt only', select(PPT.is_respected, True)),			# respecting the PPT criterion
 	('fw', add(FrankWolfe(1000).approximation, key = 'approx'), # compute the sep approx.
-	('sel ent', select(DistToSep(0.01, sep_key = 'fw__approx').predict, Label.ENT))
+	('sel ent', select(DistToSep(0.01, sep_key = 'fw_approx').predict, Label.ENT))
 ]).sample(1000, [3,3])

 save_dmstack('states_3x3', states, infos)
 ```
-in this example, the following procedure is repeated until we obtain 1000 density matrices in dimensions [3,3] :
+In this example, the following procedure is repeated until we obtain 1000 density matrices in dimensions [3,3]:
+
+- We sample a density matrix. The random density matrices are generated uniformly with respect to some induced measure.
+- We select only the density matrices satisfying the PPT criterion.
+- We add the nearest separable approximation of each density matrix using the Frank-Wolfe (FW) algorithm.
+- We only select the sampled density matrices at a distance from their nearest FW approximation greater than 0.01.
+
+The pipeline class works with 3 types of functions: sample, transform and model. These are detailed below.
+
+#### DMStack
+
+A DMStack is a class that represents a stack of density matrices. It is a numpy.ndarray of shape (n_matrices, ...) and have n additional attribute dims which is a list indicating the dimensions of the quantum subsystems.

- we sample states from the induced distribution 
- we select the sampled states respecting the PPT criterion
- we add the separable approximation of each state in the infos dictionnary at the key 'fw__approx'
- we only select the sampled states at a distance 0.01 or greater of their approximation by Frank Wolfe.
+In the example above, the states and all the information are then saved in the file 'states_3x3' at the .mat format using the function 'save_dmstack'. They can be retrieved later via the function 'load_dmstack'.

-the states and all the informations are then saved in the file 'states_3x3' at the .mat format and can be retrieved later via the function load_dmstack.

-The pipeline function work with 3 types of functions :

- ### samplers
-a sampler function produce a set of density matrices with relevant information and have the signature 
+ #### Sample
+Produce a set of density matrices.
 ```python
-def sampler(n_states : int, dims : list[int]) -> DMStack, dict
+def sample(n_states : int, dims : list[int]) -> DMStack, dict
 ```
-the following samplers can be found in the library :
+The following sample functions can be found in the library:

 - samplers.utils.FromSet
 - samplers.pure.RandomHaar
@@ -63,27 +75,25 @@ the following samplers can be found in the library :
 - samplers.separable.RandomSeparable
 - sampler.entangled.AugmentedPPTEnt

-### transformer
+#### Transform

-a transformer function associate, to each density matrix in a set, ... 
-a transformer function may use additional informations about the states and produce new informations.
+Apply transformations to each density matrix.
 ```python
-def transformer(states : DMStack, infos : dict) -> DMStack, dict
+def transform(states : DMStack, infos : dict) -> DMStack, dict
 ```
-the following transformers can be found in the library :
+the following transform functions can be found in the library:

 - transformers.sep_approximations.FrankWolfe
 - transformers.representations.GellMann
 - transformer.representations.Measures

-### model
+#### Model

-a model function associate, to each density matrix in a set, a label.
-a model function may use additional informations about the states and produce new informations.
+Labeling each density matrix using a predefined model.
 ```python
 def model(states : DMStack, infos : dict) -> list[int], dict
 ```
-the following labelxx can be found in the library :
+the following labeling models can be found in the library:

 - models.criteria.PPT
 - models.criteria.SepBall
@@ -92,33 +102,32 @@ the following labelxx can be found in the library :
 - models.approx_based.DistToSep
 - models.approx_based.WitQuality

-## data
+## Data.zip

-the datasets are grouped by :
+The simulated quantum separability dataset. Data are grouped by:

- dimensions (3x3 or 7x7)
- usage (TRAIN or TEST)
- category (SEP, PPT, NPPT, FW)
+- dimensions (3x3 or 7x7),
+- usage (TRAIN or TEST),
+- category (SEP, PPT, NPPT, FW).

-The content of each files can be accessed by the function types.load_dmstack, which will return a DMStack containing all the state and a dictionnary containing informations about each states.
-For states of the PPT category, the dictionary contain an approximation of the optimal witness in the 'fw__witness' key.
+The content of each file can be accessed by the function types.load_dmstack, which will return a DMStack containing all the states and a dictionnary containing information about each states.
+For states of the class PPT, the dictionary contain an approximation of the optimal witness in the 'fw_witness' key.

-In all datasets, the states are in the form of complex density matrices.
-Use transformations.GellMann or transformations.Measures to obtain a real-valued vector representation.
+The states are in the form of complex density matrices.
+Use "GellMann" or "Measures" transformations to obtain a real-valued vector representation.

-## models
+## Models.zip

-the models are grouped by :
+The SVM models trained on the quantum-separability dataset. they are grouped by:

- dimensions of the input (3x3 or 7x7)
- creation method for the PPT-ENT examples (AUG or NOAUG)
+- dimensions (3x3 or 7x7),
+- creation method for the PPT-ENT examples (AUG for data augmentation or NOAUG for without data augmentation).

-the type of the model and the proportion of PPT-ENT states used during training is indicated in the file name :
-for example the files
+The type of the model and the proportion of PPT-ENT states used during training is indicated in the file name. For example the files

-SVM_1000_[0.50]_(x)
+SVM_1000_[0.50]_

-(with x an index in [0,4]) contain a SVM trained using a dataset of 1000 examples per class where 50% of the entangled examples were PPT-ENT.
+contain a SVM trained using a dataset of 1000 examples per class where 50% of the entangled examples are PPT-ENT.

 All the models are accessible by the function joblib.load in the form of a GridSearchCV model (from sklearn).
 All the models in the library use the Gell-Mann representation of states as input.