From c1e0fccf246fa73f8e756b4cda7afa0c3b01366a Mon Sep 17 00:00:00 2001
From: Benoit Favre <benoit.favre@lif.univ-mrs.fr>
Date: Wed, 30 Nov 2016 09:39:54 +0100
Subject: [PATCH] add README

---
 README | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 README

diff --git a/README b/README
new file mode 100644
index 0000000..3917d3b
--- /dev/null
+++ b/README
@@ -0,0 +1,61 @@
+SUMMARIZATION MODULE
+====================
+This module can generate synopses based on the the conversation you give it. it
+is based on learning features thanks to some annotation on previous synopses
+applied for a new conversation. This file gives you the needs and what you need
+to change if you want to improve or adapt the method to fit you data.
+
+HOW TO
+======
+See example.py
+
+NEEDS
+=====
+- Python 2.7
+- Icsiboost
+
+FILES & ANNOTATIONS
+===================
+source/syn.annot
+----------------
+the annotation of the synopses are like so:
+topic conversation_ID Annotator text <a class="instance" variable="$SLOT_NAME" 
+style="color:cyan" title="$SLOT_NAME" href="#"> slot_value </a> text.
+
+/!\ if you create new slots_name make sure to update the icsiboost.names /!\
+
+conversations files
+-------------------
+The tsv format was defined for storing Decoda annotations, each word has a few 
+fields:
+<filename> <global-wordnum> <wordnum-in-sentence> <word> NULL <postag> NULL NULL 
+<dependency-label> <governor-wordnum> <text-id> <lemma> <morphology> <speaker> 
+0.0 0.0 0.0 _ <mention> <features> <corefence-label>
+
+predsyn.py
+----------
+Generate the data you'll need to learn the features.
+For each phrases it gives you:
+- name of the conversation
+- word (text)
+- postag
+- lemma
+- named entity
+- parent
+- parent pos
+- dependency label
+- topic
+- length
+- sentence number
+- speaker
+
+summarizer.py
+-------------
+Generate the final synopsis based on all the previous data.
+    exemple:
+        import summarizer
+        #conv = summarizer.Word()
+        summarizer.summarize(conversation, threshold, convID)
+
+    threshold here is a fixed value to start looking results from. Based on our
+    experiment it was optimal at 0.02 for the icsiboost system.
-- 
GitLab