Skip to content
Snippets Groups Projects
Select Git revision
  • master default protected
1 result

README

Blame
  • README 4.79 KiB
    SENSEI processing environment
    =============================
    
    This pipeline is a set of annotation services which can be run against the
    conversation repository or run as standalone REST services.
    
    
    Install requirements
    --------------------
    
    The REST services require bottle.py to be installed. 
    
      pip install --user -r requirements.txt -U
    
    
    REST services
    -------------
    
    Core modules can be integrated as REST services which provide annotations
    through a simple http protocol. The backend uses the bottle.py framework and a
    server of your choice (the default WSGIRefServer is quite slow, you may install
    "paste" instead).
    
    For instance, you can run the rest/test.py script which reverses character strings.
    
      rest/test.py
    
    It replies to three URLs: / is a short textual help for using the service.
    
      curl http://localhost:1234
    
    It also recognizes GET requests of the form /test/<text>, where <text> is the
    string you want to reverse.
    
      curl http://localhost:1234/test/hello
    
    Finally, for large inputs, it recognizes POST queries where the input string is
    specified in a "text" form parameter.
    
      curl --form "text=hello" http://localhost:1234/test
    
    A generic REST service is also available. It runs a custom command and feeds it
    with inputs through stdin and collects output from stdout. The following
    example echoes the inputs as output. The name parameter set the name of the
    service in the url.
    
      rest/generic.py --port 1234 --name "cat" --command "cat"
      curl http://localhost:1234/cat/hello
    
    The command can also be made persistent (run in the background) and fed line by
    line through stdin, and its output read line by line from stdout. Note that for
    this to work, the command needs to flush stdout after each input.
    
      rest/generic.py --port 1234 --name --command "awk '{x+=1;print x,\$0;fflush()}'" --persistent True
    
    WARNING: the generic REST service does not enforce any kind of security.
    
    You can check the rest/test.py implementation for making your own REST services
    using the provided framework.
    
    
    Repository integration
    ----------------------
    
    Repository-integrated services poll the repository for new documents, process
    them and push back annotation sets.  In order to get access to the repository,
    you can create a tunnel with (given a proper ssh key is setup):
    
      util/repository.py --tunnel
    
    The util/repository.py script has a lot more commands available. It can also be
    used as a python module as full-functional repository client.
    
    Once the tunnel is setup, you can try the test annotator which polls the
    repository and adds phony annotations. That script allows to choose the host and
    port of the repository.
    
      repo/test.py
    
    This script gets all documents which don't have the AMU_hasTest feature, puts
    generated annotations in the AMU_Test annotation set. Then, it sets the
    AMU_hasTest feature to True.
    
    After a few documents are processed, you can kill the script and run the
    cleaner which deletes all annotation sets created by the previous script.
    
      repo/clean.py --presence_feature AMU_hasTest --annotation_set AMU_Test
    
    The first way to integrate a novel processing module to the repository is to
    use the generic annotator. It grabs all documents which don't have a given
    feature, passes them as json, through stdin, to a custom command, reads the
    generated annotations as json from the command's stdout, and creates a new
    annotation set with the result and marks the document as processed.
    
    Note that for the repository to accept the new annotation, it must conform to
    its expectations and contain the right fields. Overwise, an error is returned.
    
    For example, you can write a script which computes the length of the json
    representation of a document, and creates a new "checksum" annotation with it.
    
    cat script.sh
    awk '{print "[{\"type\": \"checksum\", \"features\": {\"value\":"length()"}, \"start\": 0, \"end\": 0}]"}'
    repo/generic.py --command ./script.sh --mark_feature "AMU_hasChecksum" --annotation "AMU_Checksum"
    
    The command can be run once for each document, or just once in the background.
    When doing so, the command is fed line by line and its output is read line by
    line. For this mode to work, the command MUST flush stdout after processing an
    input. See the generic REST service example for more details.
    
    The second way to integrate a novel processing module to the repository is to
    subclass the repository.AnnotationGenerator class in python. See the
    repo/test.py script for an example.
    
    class Annotator(repository.AnnotationGenerator):
        def __init__(self):
            query = '_MISSING_=MarkingFeature&_MAX_=20'
            super(Annotator, self).__init__(query)
            ... # initialize object
    
        def process_document(self, client, document):
            print(document['content']['id'])
            ... # generate annotation
            client.put_annotation_set(doc_id, 'AnnotatinoName', ...)
            client.put_features(doc_id, {'MarkingFeature': True})