Download

Documentation

Community

Development

Warning: Can't synchronize with the repository (No changeset 0c269e18a1c1a0e65c88feb4ae345e6c063fe343 in the repository). Look in the Trac log for more information.

Ticket #8 (closed enhancement: fixed)

Opened 12 months ago

Last modified 6 months ago

Cluster - autodiscovery of nodes

Reported by: peter Owned by: redduck666
Priority: wish_list Milestone:
Component: cluster Version: undetermined
Keywords: Cc: almir@…

Description

implement a feature to auto-discover nodes on the network.

  • This should be a user fired event.
  • Any new nodes on the network should be detected and listed
  • new nodes can then be selected and added

Attachments

autodiscovery Download (5.2 KB) - added by redduck666 8 months ago.
autodiscovery2 Download (6.6 KB) - added by redduck666 7 months ago.
autodiscovery3 Download (12.9 KB) - added by redduck666 7 months ago.
autodiscovery4 Download (15.3 KB) - added by redduck666 7 months ago.
autodiscovery5 Download (2.2 KB) - added by redduck666 7 months ago.

Change History

Changed 12 months ago by peter

  • priority changed from major to wish_list

Changed 12 months ago by peter

  • type changed from defect to enhancement

Changed 12 months ago by peter

  • status changed from new to accepted

Changed 12 months ago by peter

  • version set to undetermiend

Changed 8 months ago by redduck666

  • cc almir@… added

i'm thinking of doing this, a proper way would use python-avahi i presume? or did you plan to do it some other way?

Changed 8 months ago by peter

originally I planned on port scanning. IE. iterating a subnet or range of IPs and attempting to connect to the twisted service run by the Node. My reasoning for this is I'd like to be able to add Nodes from other subnets. However, theres no guarantee of the ports used. This is a major flaw in this approach and no good way around it besides enforcing ports.

Avahi is great, more elegant that port scanning for sure, but it only works within your own subnet. If multiple subnets is not a common occurrence then avahi would be the best solution because nodes can still be added manually.

The big question is whether multiple subnets is an edge case. We (the osl) will likely deploy it on multiple subnets, but I have not other data points. No one that i know of is using pydra, outside of testing it, so that question is unlikely to be answered right now.

For now avahi works for me. If it needs to be more robust it can be dealt with later on.

Changed 8 months ago by redduck666

  • owner changed from peter to redduck666
  • status changed from accepted to assigned

let's start with a comment, i don't think there is a sane and automated way to do auto discovery on non local networks.

i played with avahi/dbus, the problem is dbus wants glib event loop :-/ (which would as a consequence introduce a dependency). on the upside, google thinks adding just two lines will make twisted use glib's event loop (transparently to the rest of the code)

from twisted.internet import glib2reactor
glib2reactor.install()

is that acceptable? do you have better ideas?

Changed 8 months ago by peter

I don't have a problem with it unless it causes errors elsewhere.

No I don't have better ideas. The alternatives involve some form of scanning which runs into issues with ports.

Changed 8 months ago by redduck666

Changed 8 months ago by redduck666

i attached the patch, and here are my notes.

this code needs more testing.

the big picture over view is that Master queries (in the glib event loop) the
avahi for new _pydra._tcp (this seems to be the naming convention for avahi)
services, the nodes when they start they publish themselves as _pydra._tcp.
this approach (as opposed to server publishing and nodes doing discovery) was
chosen as it eliminates the step where node has to say to master "hey i'm here,
use me" and master has to handle it.

the autodiscovery() is very ugly :-), but since the dbus stuff requires the callback mechanism to be used nested functions (to clearly separate the avahi stuff from everything else) seems the cleanest way out.

also another thing that i need to have a look at is how to generate those numbers (cpu speed, memory..)

Changed 7 months ago by peter

The publish and discovery works great. It immediately finds the service. Theres problems with resolving the hosts which leads to problems adding the Nodes.

It resolved two services for me:

  • My outward facing IP, assigned to eth0
  • A bridge device used for virtual machines, in the 192.168.1.x range.

I had put in the host name as localhost which neither of these IPs matched, so both were added. Upon restarting the Master attempted to connect to all three Nodes. The two newly added Nodes failed to connect, throwing the error you described last night. The NodeServer? had already been paired as 'localhost' and so it attempted to send a challenge. The new Node records had no keys because they hadn't been paired.

I think the easiest way to solve this is to have the NodeServer? generate a random string, that persists through restarts, to include in the service advertisement as an identifier. The NodeServer?'s public key comes to mind, since it fits as a long random string that persists, but the pairing logic would have to change slightly.

Changed 7 months ago by redduck666

Changed 7 months ago by redduck666

i attached the second version of the patch, it has two improvements over the last one:
- it picks up nodes as soon as they are discovered (the last one required master restart)
- it takes care of the issue you mentioned

the way it deals with second problem is that together with the advertisement it sends the md5 hash of the public key, this is by no means a security measure, it's just a way to NOT accept multicasted advertisement from a node whose public key we already have. i choose md5 as according to %timeit it is faster than sha1 (and way faster than generating uuids)

Changed 7 months ago by peter

looks good.

My intention was for this to be a user fired event via the gui, that returns a set of new nodes to choose from. The issue is multiple instances of pydra within the same subnet. This would cause Masters to discover Nodes that might not be intended for them. A worst case scenario is discovering and pairing with a Node, which would prevent the rightful owner from Pairing.

This is a very good start to the functionality.

Changed 7 months ago by redduck666

Changed 7 months ago by redduck666

here is new patch :-)

it defines a new pydraSetting, multicast_all, if it is set to true it has the same behavior as old patch (uses whatever nodes finds). If it is set to false on the other hand only saves the info about them giving you the ability to activate them via web interface.

this patch also removes any safety when adding nodes (with multicast_all set to true master starts producing tracebacks if you start the node more than once :-)), this is gonna be handled when the pairing/key exchange is moved to avatars.

Changed 7 months ago by redduck666

Changed 7 months ago by redduck666

ok, new patch is up, it should implement the duplicate node detection in a generic way :-)

Changed 7 months ago by redduck666

Changed 7 months ago by redduck666

this is a diff against modulizing branch which for the most part makes autodiscovery functional

wih multicast_all disabled if you try to use any nodes it throws a nasty error:
[error] ErrorFault?: ErrorFault? level=error code=u'Service.MethodNotFound?' description=u'Unknown method connect'
Traceback:
u"UnknownServiceMethodError: Unknown method connect?"

i haven't managed to track down the error yet, i have found out that if you take out the authenticated() from master/node_connection_manager.py:95 it works just fine.

Changed 6 months ago by peter

  • status changed from assigned to closed
  • resolution set to fixed

closing this ticket. any remaining issues with this feature can be entered as specific tickets.

Note: See TracTickets for help on using tickets.