Design Notes - process architecture
This is a description of the new process architecture developed after our experiences working with the a1 release.
Overall Plan:
Mother MPI_Spawn()'s Nannies
Mother MPI_Spawn()'s mdrun (we have removed the child, and just have
mdrun)
Mother:
1. Spawn Nannies, 1 per node in mother.conf
2. Capability Discovery with Nannies
3. Make mdrun<->Nanny assignments
a) Get PID and Hostname information from mdrun (Init_FATC())
b) Make nanny/mdrun assignments and distribute to nannies
c) Reap any unused nannies
d) Tell each nanny the # of children assigned to them
4. Run grompp
5. Spawn mdrun
6. Collect periodic checkpoint files from nanny0
7. When mdrun completes
a) Completioin of mdrun is indicated by mdrun0 sending a message to the mother. This message will pass the exit code (sucess or flavor of failure).
b) Nanny0 will send all the necessary files to the mother
8. Reap all nannies
9. Report result to F@C server, get a new molecule, and restart with the new molecule
Nanny:
1. Get # of children to look for with PID information from mother
2. When the checkpoint file is updated nanny0 will send it to the mother
3. When a nanny checkpoints/checks-in-with it's mdrun process it will
compare the cpu time from the last checkpoint with the cpu time from this
checkpoint and
a) if it has not changed then it will report the stale state to the
mother
b) if the process goes away then report that to the mother.
*Still not sure how to do this in an elegant way.
5. When mdrun finishes [mother tells all the nannies when this happens] nanny0 will send all of the files to the mother
mdrun:
0. The mdrun binary will be renamed to fatc_child as part of "$ make release")
1. No source code changes except:
- stderr -> stdout
- error codes instead of exit()
- Init_FATC code for PID/hostname communication, and freopen. This is called just after MPI_Init by all mdrun processes.
- Finalize_FATC code for "finished" message to the mother. This is called just before MPI_Finalize by all mdrun processes
Notes:
1. The child.[c,h] files will be moved to folding-at-clusters/source/old
2. We want to limit the changes we make in mdrun, but script based changes that are easy to apply are ok.
3. There are some kludges in the way that the nanny 'finds' the mdrun
process it is matched with. There are better ways to do this, but for the
moment the kludges allow for a proof of concept and quick solution.
4. We are using MPI_Spawn() instead of system(mpirun ...) because the former
allows a bit more control over the MPI_COMM group for the mdrun process(s),
whereas the latter completely separates the processes and adds some more
challenges that are harder to overcome.
Questions:
1. Set Nice/ProcessPriority level
Answer: mdrun already has a command line option for setting the nice level.
2. Redirect stdout to a log file [via freopen].
Answer: Init_FATC function in mdrun. nannies will transmit the logs back to the mother on completion.
3. Get Work Unit from mother [tpr, gro(?) files]
Answer: Mother pre-populates files on the nanny0 node as JoshH suggested.
This works if nanny0 and the mother are on different nodes and on NFS and
non-NFS systems.
4. Notification to the mother that we are finished.
Answer: Finialize_FATC sends message to mother from mdrun0
Posted by charliep at
06:01 PM
|
Comments (0)
New Image Plan
A couple of days ago I replied to JoshM's message about testing the new cairo image with questions about where to find what was installed, etc. I think I was headed down the wrong road and that we should instead focus on making it easy to add/change things in the image, re-image machines, and re-boot machines. More like the release early and often mantra.
1) Get a basic image built. Done for cairo, to be done for bazaar.
2) Check/update the list of 0th node items. Create the file /cluster/project/sna/0th-node.html, check documents in that directory and MT for items. Include how to rebuild with a new image and then apply changes.
3) Check/update /cluster/project/sna/cluster-imaging.html, particularly the parts about modifying an existing image and forcing an update on a one or a couple of nodes and then forcing an update on the whole cluster.
4) Use F@C on a couple of nodes as a first-order test before calling an image ready to deploy on all the nodes in a cluster.
5) Install the new images on all the nodes (including the 0th nodes) and we'll start using them and fix what's broken. We should make sure to keep all the documents referenced above updated as we go.
Posted by charliep at
05:35 PM
|
Comments (0)