Download

Documentation

Community

Development

Tasks

Tasks or Jobs are a process that can be run on the cluster. Pydra provides a set of base classes that encapsulate signalling and other magic that allows the cluster to manage and track a Task. Tasks can be combined to perform complex actions either sequentially, in parallel or any combinations therein. Tasks are also reusable, for instance a task that downloads files from an FTP server could be reused by other tasks in combination with a task for processing the files.

Task Bases

Task

Task

Task is the root of all Tasks. It handles the most basic operations of starting, stopping, and managing task state. it accepts arguments into its work() function and returns results. A single-threaded task should extend Task.

ContainerTask


ContainerTask combines multiple tasks together. The tasks may be executed in sequence, or at the same time. While contained tasks can be run in parallel, ContainerTask includes no mechanism for dividing work between tasks. It is intended to be used for a set of tasks that must all run before the overall task is complete rather than the same task running multiple times with different work units.

When a ContainerTask runs, it passes its results from one subtask to the next.

ParallelTask


ParallelTask is the base used for creating tasks that run in Parallel. ParallelTask contains a single subtask that it will submit work requests for. ParallelTask uses a function for dividing the incoming data into pieces that will be distributed to the cluster. Workers are assigned to run the workunits and return the results for post-processing.

Work requests are sent automatically by ParallelTasks by using Datasources to retrieve and partition your input data. This allows Pydra to efficiently handle requests, and their assignments.

MapReduceTask

MapReduceTask is a special container implementing the  MapReduce pattern. MapReduceTask contains two subtasks, A Mapper and a Reducer. The subtasks can be an implementation of any Task base class.


Task Packages

Task packages are used for packaging tasks and the files they depend on. When tasks are synchronized across the cluster these dependencies must also be sent.

Libraries

Tasks may depend on other libraries to function. For instance tasks for proteins might require biopython or mmLib. Libraries packaged in the lib directory will be added to the the python path when a task is run.

Dependencies

Packages can depend on other Packages. This allows code to be separated but still ensure that required code is all synchronized and loaded

Parameter Forms

Tasks are intended to be generic and reusable, parameter forms are used to accept parameters per scheduled job. Parameter forms are a standard django form object. The form is rendered when scheduling job.

Task forms can be located anywhere within the task package.

Attachments