Как лучше всего остановить борьбу машин, сохранив при этом их идентичность, чтобы сохранить все просто и уменьшить единичные точки отказа?
Check out this article which suggests using Hudson (now Jenkins) for the job.
Excerpt:
For remote jobs, Hudson can sign onto systems with SSH, copy over its own runtime, and run whatever you’d like on the remote system. This means that, no matter how many servers in a cluster need scheduled jobs, Hudson can schedule, run, and log them from one server. Hudson can distribute the jobs dynamically based on which machines are already busy, or it can bind jobs to specific boxes.
Cheers
The best way is to maintain some kind of global lock right in the script. This lock should be reachable from every server so there are two obvious choices here: using a DB (and you already have it) and using a distributed file system (and you don't have it probably). And of course you will have to modify behaviour of the script to set and to remove the lock.