Controller API
The Controller API is an interface exposed by the Master that allows a Pydra cluster to be controlled. It is used to relay commands and information.
The Django based user interface, shipped with Pydra, uses the controller API. It should be considered the reference implementation of how to use the controller.
The Controller API is a generic interface. The module system is used to expose functions, and an Interface Module is created that allows connections over a specific protocol. The default implementation of this is an HTTP protocol.
Communication Protocol
The Controller API uses HTTP. This protocol is used because it a stateless protocol is required for communication. If the web app was stateful and maintained connections, these connections could be left behind if django is not shutdown cleanly.
Ideally the Twisted PersistenceBroker would be used, because it is used for RPC in all other places within Pydra. Unfortunately the reactor does not support being started and stopped more than once per application. This is a limitation due to how it uses threading within its reactor. There is also concern for startup time for the reactor since commands and information may be sent of the ControllerInterface frequently.
Security
see the security section for information about controller interface security.
Using The Controller
Configuration
The Controller requires an encryption key to obtain access to the Master.
- Obtain a copy of the Master's Key, by default located at /var/lib/pydra/master.key
- Place the key in the directory the controller will be run from. It is not possible to specify the location of the key at this time.
The Controller must be configured to point at the Master. This can either be hardcoded, or if run on a server where pydra is installed it can be loaded from pydra_settings.
from pydra.config import load_settings pydra_settings = load_settings() host = pydra_settings.HOST port = pydra_settings.CONTROLLER_PORT
Once the host and port are set you can create an instance of the Controller
from pydra.cluster.controller.web.controller import WebController controller = WebController(host, port)
Running Commands
Commands can be run using the controller as if they were a member of the controller. Authentication occurs transparently behind the scenes.
Functions that are called are not known to exist. Keeping a list of functions cached would mean that updating the Master with new modules would also require a restart of the User Interface. Instead it is assumed that the developer using the controller will know which functions are available.
tasks - controller.list_tasks()
Arguments and key-word arguments are passed through the interface to the backend
task_id = controller.run_tasks('MyTask', foo='bar')
Exceptions
ControllerExceptions are thrown by the controller to indicate the following:
- Authentication Failure: -1
- Disconnection: -2
- Error in remote function: -3
- No RSA Key: -4
- Unknown Error: -5
- Remote Function Not Found: -6
These codes are available as variables within pydra.cluster.controller
API
The API commands are likely to be modified, New commands will be added as needed while developing the user interface
cancel_task(task_key)
Cancels a task. If the task is in the queue it will be dequeued. If the task is running it will be stopped.
When stopping a running task a stop command will be issued to all workers running any part of the task. Tasks will only stop if and when they honor the STOP_FLAG.
connect()
Instructs Master to connect to all Nodes it is not yet connected to.
list_known_nodes()
list_queue()
List TasksInstances? that are in the queue
list_running()
List TaskInstances? that are running
list_tasks(toplevel=True, keys=[])
Returns list task keys that the cluster is capable of running
- toplevel- Only return top level tasks, subtasks are ignored. Defaults to True
- keys - Only return keys from this list
node_detail(id)
Returns details about a Node.
- id - Node identifier
node_edit(values)
Edit the values of a Node stored by the Master. This only affects the values the Master stores, changing the port or host will not change the port/host that the Node runs on. That must be changed in the Nodes configuration.
- values - dictionary of values to set. If this contains id it will update an existing Node, otherwise a new record will be created
node_list(page=1)
Returns a list of Nodes that Master is configured to connect to.
- page - results are paginated, page number to return
node_status()
Returns status of all Nodes and their Workers. Status object is a list of dictionaries.
queue_task(task_key, args={}
Schedules a task to run. If there are free workers the task will immediately be run, otherwise it will remain in the queue.
- task_key - identifier of task to run
- args - dictionary of arguments to send to task
task_history(key, page)
Returns a list of TaskInstance? details, as dictionaries, for a given task.
- key - task key
- page - page number. pages are indexed from 1..n
task_history_detail(task_instance_id)
Returns a dictionary of details for a given TaskInstance?
- task_instance_id - ID of TaskInstance?
task_statuses()
Returns the status of all running tasks. This function causes the master to update cached statuses for all tasks. This is an asynchronous operation. The first time this method is called it may not have information that is up to date for all tasks because it will return before the asynchronous update.
