Commit c1e0fccf authored by Benoit Favre's avatar Benoit Favre
Browse files

add README

parent fffe9643
SUMMARIZATION MODULE
====================
This module can generate synopses based on the the conversation you give it. it
is based on learning features thanks to some annotation on previous synopses
applied for a new conversation. This file gives you the needs and what you need
to change if you want to improve or adapt the method to fit you data.
HOW TO
======
See example.py
NEEDS
=====
- Python 2.7
- Icsiboost
FILES & ANNOTATIONS
===================
source/syn.annot
----------------
the annotation of the synopses are like so:
topic conversation_ID Annotator text <a class="instance" variable="$SLOT_NAME"
style="color:cyan" title="$SLOT_NAME" href="#"> slot_value </a> text.
/!\ if you create new slots_name make sure to update the icsiboost.names /!\
conversations files
-------------------
The tsv format was defined for storing Decoda annotations, each word has a few
fields:
<filename> <global-wordnum> <wordnum-in-sentence> <word> NULL <postag> NULL NULL
<dependency-label> <governor-wordnum> <text-id> <lemma> <morphology> <speaker>
0.0 0.0 0.0 _ <mention> <features> <corefence-label>
predsyn.py
----------
Generate the data you'll need to learn the features.
For each phrases it gives you:
- name of the conversation
- word (text)
- postag
- lemma
- named entity
- parent
- parent pos
- dependency label
- topic
- length
- sentence number
- speaker
summarizer.py
-------------
Generate the final synopsis based on all the previous data.
exemple:
import summarizer
#conv = summarizer.Word()
summarizer.summarize(conversation, threshold, convID)
threshold here is a fixed value to start looking results from. Based on our
experiment it was optimal at 0.02 for the icsiboost system.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment