SENSEI processing environment
This pipeline is a set of annotation services which can be run against the conversation repository or run as standalone REST services.
Install requirements
The REST services require bottle.py to be installed.
pip install --user -r requirements.txt -U
REST services
Core modules can be integrated as REST services which provide annotations through a simple http protocol. The backend uses the bottle.py framework and a server of your choice (the default WSGIRefServer is quite slow, you may install "paste" instead).
For instance, you can run the rest/test.py script which reverses character strings.
rest/test.py
It replies to three URLs: / is a short textual help for using the service.
curl http://localhost:1234
It also recognizes GET requests of the form /test/, where is the string you want to reverse.
curl http://localhost:1234/test/hello
Finally, for large inputs, it recognizes POST queries where the input string is specified in a "text" form parameter.
curl --form "text=hello" http://localhost:1234/test
A generic REST service is also available. It runs a custom command and feeds it with inputs through stdin and collects output from stdout. The following example echoes the inputs as output. The name parameter set the name of the service in the url.
rest/generic.py --port 1234 --name "cat" --command "cat"
curl http://localhost:1234/cat/hello
The command can also be made persistent (run in the background) and fed line by line through stdin, and its output read line by line from stdout. Note that for this to work, the command needs to flush stdout after each input.
rest/generic.py --port 1234 --name --command "awk '{x+=1;print x,\$0;fflush()}'" --persistent True
WARNING: the generic REST service does not enforce any kind of security.
You can check the rest/test.py implementation for making your own REST services using the provided framework.
Repository integration
Repository-integrated services poll the repository for new documents, process them and push back annotation sets. In order to get access to the repository, you can create a tunnel with (given a proper ssh key is setup):
util/repository.py --tunnel
The util/repository.py script has a lot more commands available. It can also be used as a python module as full-functional repository client.
Once the tunnel is setup, you can try the test annotator which polls the repository and adds phony annotations. That script allows to choose the host and port of the repository.
repo/test.py
This script gets all documents which don't have the AMU_hasTest feature, puts generated annotations in the AMU_Test annotation set. Then, it sets the AMU_hasTest feature to True.
After a few documents are processed, you can kill the script and run the cleaner which deletes all annotation sets created by the previous script.
repo/clean.py --presence_feature AMU_hasTest --annotation_set AMU_Test
The first way to integrate a novel processing module to the repository is to use the generic annotator. It grabs all documents which don't have a given feature, passes them as json, through stdin, to a custom command, reads the generated annotations as json from the command's stdout, and creates a new annotation set with the result and marks the document as processed.
Note that for the repository to accept the new annotation, it must conform to its expectations and contain the right fields. Overwise, an error is returned.
For example, you can write a script which computes the length of the json representation of a document, and creates a new "checksum" annotation with it.
cat script.sh
awk '{print "[{\"type\": \"checksum\", \"features\": {\"value\":"length()"}, \"start\": 0, \"end\": 0}]"}'
repo/generic.py --command ./script.sh --mark_feature "AMU_hasChecksum" --annotation "AMU_Checksum"
The command can be run once for each document, or just once in the background. When doing so, the command is fed line by line and its output is read line by line. For this mode to work, the command MUST flush stdout after processing an input. See the generic REST service example for more details.
The second way to integrate a novel processing module to the repository is to subclass the repository.AnnotationGenerator class in python. See the repo/test.py script for an example.
class Annotator(repository.AnnotationGenerator):
def __init__(self):
query = '_MISSING_=MarkingFeature&_MAX_=20'
super(Annotator, self).__init__(query)
... # initialize object
def process_document(self, client, document):
print(document['content']['id'])
... # generate annotation
client.put_annotation_set(doc_id, 'AnnotatinoName', ...)
client.put_features(doc_id, {'MarkingFeature': True})