Scheduler
The scheduler is the component which decides which tasks to run when. It is a priority queue scheduler. Tasks and workunits are submitted and queued.
Definitions
Workunit
A unit of work that a parallelized task handles.
Generally only the unique identifier and required args for a workunit should be stored in the queue. It is possible for a workunit to include all of the data in the worker request but it is generally advised that data be sent through a distributed datastore to limit data pickling. For the sake of simplicity in this document we will only talk about queuing workunits or tasks even though this means only a subset of its information is queued.
A workunit in the queue may be a group of workunits that the task can process. The scheduler treats all requests the same regardless of how the workunits are sliced.
Work Request / Worker Request
A request for a worker for a workunit or task. This is a workunit/task that is in the queue and the process for putting it in the queue.
Priorities
The scheduler uses a simple ranking system which by default is the order in which they were queued. Users may increase or decrease the priority. Currently tasks in the long-term-queue cannot be set to a priority higher than any tasks in the short-term-queue.
This may be extended to include proportional scheduling or Highest-Response-Ratio-Next Scheduling?.
WorkUnits that generate more workunits
In some cases a worker may be returned to the idle pool and reassigned to a lower priority task too soon. The scheduler will hold workers until a task confirms it no longer needs the worker. ParallelTask and MapReduceTask both handle this automatically. New workunits should be added to the datasource within the user defined workunit_complete().
usecase: A cluster has two tasks running. Both tasks schedule requests. The higher priority task is a webcrawler. Each workunit may produce more workunits. The scheduler must not release workers to the idle pool until the higher priority task is complete.
Batching Workunits
This is a feature on our roadmap as ticket #119 that has not been implemented yet.
Batching workunits increases the efficiency of Pydra by reducing overhead. Assigning a workunit, or receiving a response has overhead cost for transmitting messages and processing the responses. Sending workunits in batches reduces overhead because there are less messages sent.
The ideal batch size takes at least several minutes, but not so long that a failed worker causes a large loss of time. The ideal time can be configured by the OPTIMAL_BATCH_RUNTIME setting.
Initial workunits will not be batched, subsequent batch sizes will be derived from the average runtime of previous batches and the OPTIMAL_BATCH_RUNTIME setting. If there are other runs of this task recorded in the task history, they will be included in the calculation. After the initial batch size is calculated, it will be reevaluated in greater intervals.
Memory Usage
All work units are scheduled at once by ParallelTask and MapReduceTask. This can result in too much memory being used
Usecase: User starts a document import with 500,000 documents
Potential Solutions:
- Limit the number of requests any given task can queue.
- Push the slicing to the scheduler. WorkUnits can be read as needed.
