About Pydra
Pydra is a new project. We are working towards our first public release.
What is Pydra?
Pydra is a distributed and parallel computing framework for python. Pydra aims to provide an easy to use framework for writing and running parallelized programs for developers, and an easy to manage cluster for the administrators.
some of the main features that are implemented or will be implemented in 1.0:
- Easily defined tasks - Clustering software is only as useful as the jobs that can be written for it. To this end pydra hides parallelization code and makes writing tasks simple.
- Map-Reduce support - Map-Reduce is a concept introduced by google. Pydra currently supports most of the Map-Reduce concept. The only missing piece is parallelizing the map function.
- Network auto-discovery - To aide in setting up a cluster nodes will be discoverable via the
- Job queueing - Jobs will be started immediately if they can be, otherwise Pydra will queue jobs automatically
- Web based interface - Through the management interface the cluster can be managed, and tasks can be run or
- Central cluster configuration - Nodes can be configured and updated via the management interface
- History - Detailed history of jobs.
- Fault tolerance - Things go wrong, pydra is designed to gracefully recover when it happens.
- security - Pydra remotely executes code, the controls for doing so are tightly wrapped to prevent tampering with your cluster.
Pydra is open source
An open source license means that Pydra is freely distributed and supported by its community. Community contributions are the air in Pydra's lungs. So if you find Pydra useful, please get involved! There are plenty of ways to get involved and there are always tasks for all experience levels.
Pydra is built with the open source frameworks Twisted Python and Django.
History
Pydra was born out of necessity. Developers at the Open Source Lab were designing an application that had extensive data importation and processing tasks. These tasks had an estimated runtime of hours at best, days at worst. Management and parallelization of these tasks was paramount. After discovering there were no comprehensive solutions for clustered computing in python we decided to build Pydra as a generic tool for others to use.
