Friday 10 January 2014

Task Queues in Python: Getting work done on Google AppEngine Cloud

The previous post mentioned task queues in AppEngine's application model. This post is an example of push queues for Google AppEngine in Python 2.7. Task queues come close to the concept of getting work done in parallel in an application. In addition to that, it is closely associated with doing work in small chunks rather than in total. Task queues are useful when you have to do deferred tasks. If the task does not need to be done immediately or if the result need not be shown to the user immediately, then queue it for deferred execution.

AppEngine tasks are put in queues. In a push queue, AppEngine takes tasks off a queue and processes them. When you write the code to pull tasks too then it is a pull queue. In a pull queue Apps need to take a finished task off the queue too. By default queues are push queues. Once tasks are on a queue, we need to execute them. Task execution is realised by writing url handlers for each queue. For every queue there is a url and a handler. The url for a queue is of the form /_ah/queue/<queue name>. When AppEngine is taking tasks off a push queue for execution, it POSTs to this url for the queue. The application needs to handle this post request. The post code is where the application does what is needed to be done in a task. Task queues also help in times when an AppEngine instance was terminated in the middle of a task. Each task has a task name and AppEngine remembers the name for some time. So duplicate tasks cannot be queued. There are mechanism to adjust the frequency/speed of queue execution. Each queue executes a task on it if, it has a token. Tokens are available in buckets and tokens are replenished. Apps can specify the rate of replenishing and the bucket size too. If five tasks are on the queue and there are 4 tokens, 4 of those tasks are executed. The remaining wait until replenishment. Tasks also provide a built-in mechanism to recover from failure. Failed tasks are retried automatically. The mechanism for retries can be controlled. Applications can specify task age, retry limit and back off mechanism. Again tasks queues have a target which can be pointed to a backend instance, in which case the back end executes the queue's tasks.

An example of a GAE task push queues:
Python sample code for task queues is available here. In this example, a master queue is used to hold a master task. A master task is an aggregate task with subtasks. A sub task in this example simply counts from A to B and return a list. Master task specifies subtasks for a number of intervals A & B. If subtasks are done. Then master task is done too. The master task is pushed on to its queue. From there, it is taken off the queue and does its job of creating subtasks. The master task pushes subtasks on to a separate queue for execution. i.e We tie the task queues together here. Sub tasks get executed as and when their time comes. A unique master task (name, refer code) cannot be duplicated. The same is true for subtasks. In this example, The subtasks simply write the counts to the backends log. The log is also shown below. The back end also has a shutdown hook. (refer sample code). Parameters to tasks can be sent using key-value pairs and as payload.

a) Backend definition in yaml file. This backend is the target for queues in this code.

b) Queue definition in yaml.

c) App yaml file. Watch the url handlers for queues

d) Code to enqueue an item to a task with check for tombstoned and duplicate tasks. Task params are sent using key value pairs. Here the value is a pickled task object.
e) Master Queue status after run
f) Sub tasks in seperate queue
g) Log showing the execution

Code download for this app is available here.

No comments: