add README

c1e0fccf · Benoit Favre · fffe9643 · c1e0fccf
Commit c1e0fccf authored Nov 30, 2016 by Benoit Favre
--- a/README
+++ b/README
+SUMMARIZATION MODULE
+====================
+This module can generate synopses based on the the conversation you give it. it
+is based on learning features thanks to some annotation on previous synopses
+applied for a new conversation. This file gives you the needs and what you need
+to change if you want to improve or adapt the method to fit you data.
+
+HOW TO
+======
+See example.py
+
+NEEDS
+=====
+- Python 2.7
+- Icsiboost
+
+FILES & ANNOTATIONS
+===================
+source/syn.annot
+----------------
+the annotation of the synopses are like so:
+topic conversation_ID Annotator text <a class="instance" variable="$SLOT_NAME" 
+style="color:cyan" title="$SLOT_NAME" href="#"> slot_value </a> text.
+
+/!\ if you create new slots_name make sure to update the icsiboost.names /!\
+
+conversations files
+-------------------
+The tsv format was defined for storing Decoda annotations, each word has a few 
+fields:
+<filename> <global-wordnum> <wordnum-in-sentence> <word> NULL <postag> NULL NULL 
+<dependency-label> <governor-wordnum> <text-id> <lemma> <morphology> <speaker> 
+0.0 0.0 0.0 _ <mention> <features> <corefence-label>
+
+predsyn.py
+----------
+Generate the data you'll need to learn the features.
+For each phrases it gives you:
+- name of the conversation
+- word (text)
+- postag
+- lemma
+- named entity
+- parent
+- parent pos
+- dependency label
+- topic
+- length
+- sentence number
+- speaker
+
+summarizer.py
+-------------
+Generate the final synopsis based on all the previous data.
+    exemple:
+        import summarizer
+        #conv = summarizer.Word()
+        summarizer.summarize(conversation, threshold, convID)
+
+    threshold here is a fixed value to start looking results from. Based on our
+    experiment it was optimal at 0.02 for the icsiboost system.