Jets Homepage Distributed Jets project

JetsMachine handbook

Project - Task - Job
Commands
Jobs

Project - Task - Job

In the whole document, $PROJECT is the computation project home directory. All the work can be done only if the project directory is the current one (pwd).

The main principle on which is the whole machinery based is the following: There is a single file server, which shares the project directory. All the workers, i. e. the workstations which does all the computations, are just moving around files representing a piece of the work (jobs) placed from some specific project subdirectories to the other (usually from Ready over InProgress to Done directory). If necessary, the new job-files or report-files are created.

In the project directory, several tasks may be placed. Each task has its unique TaskID consisting of a few letters which must be unique in the project. The appropriate mc/TaskID.init.mc file specifies the task commands.

Every task has at the beginning its root job); this root job may branch out to the tree of consequent jobs. By task we mean the whole tree including all subsequent jobs. Every job is in at every moment in a well-defined state (to be run, failed, done successfully, done with children job etc.) determined by the place where is its unique JobID.runme placed. The JobID is a concatenation of the TaskID and a position of the job in the task tree. The root job of the task has as its JobID the TaskID itself.

Tasks and jobs are internally handled using * in the unix command parameters so take care in the case your tasks has common beginning part of TaskID.

Commands

All the commands are stored in the $PROJECT/bin directory, i.e. in bin since $PROJECT is current. In the project home directory is usually placed a softlink bin which points to the shared bin directory common for all the projects. In general, the low-case command are advanced and should not be used by the common user. Caution: There are no confirmation dialogs regardless of dangerousness of the command!

In the following description of most common commands, [x] means optional x, {x|y} means x or y (exclusive). By x* we mean that x and and all considerable objects beginning with x are taken (it is handled by the well-known unix * operator; in the case x is task or job id, abc* matches abcd, abcde, abc1 etc. but does not abz).

Management commands

bin/Prepare TaskID
Prepares the task to the processing. Already existing files (or results) are deleted.
bin/DeleteAll {TaskID* | JobID*}
Deletes all TaskID* tasks and/or JobID* jobs.
Wihout the parameter, deletes the whole project. If TaskID is specified, all tasks which task ID beginning is TaskID are deleted. If JobID is specified, deletes all the files of the job specified by the JobID and all the consequences (so the whole branch of the task is cut off).

Informational commands

bin/Statistics
Counts number of jobs in all the possible states (to be run, done, errors, successes etc).
bin/ListErrors {TaskID* | JobID*}
bin/CheckErrors {TaskID* | JobID*}
Gives a short list of errors, resp. makes a deep check (looking for the word error in all the logfiles).

Commands to be run on the workers machines

bin/Run {TaskID* | JobID*}
Runs grabbing all JobID* jobs.
Without the parameter, runs the worker grabbing any job of the project.
Note that this command can be run in ssh session on the remote machine and & operator or nohup command for background processing may be used in conjunction with nice command. For example, something like

ssh user@machine
cd projectdir;
nohup nice bin/Run TaskID

or, written in single command,

ssh user@machine "cd projectdir; nice bin/Run TaskID"

Jobs

There are some technical details about the jobs. By job we mean the node of a given task (a piece of the work in the whole task tree); no matter it is already processed (and what the result was) or still waiting for processing.

The log, state and results

As a result of job processing, several files may be created. First of all, the whole Maple session of job processing is stored in the Logs/JobID.log file. This log is being written during the processing so you can look to the log for progress of the job, reason of failure or details about the result.

When the job is done (not matter what the result is), there is created States/JobID.state file where the result of Jets store() command is stored.

As a side-effect, several other files may be created:
Jets's run() command final message of the job What happened Destination directory where JobID.runme has moved File(s) created Notes
Success! Job is done Done Result/JobID.success Report about the success details created.
You can setup this report in the function ReportSuccessState() placed in the file mc/CommonProc.mc; the informations logged to screen and thereby to the .log file are to be setup in PrintSuccessReport().
Better than editing this function is placing its user version to the mc/TaskID.init.mc file which will be used instead of the standard version.
Linear failure Job forked Done Several Ready/newJobID.runme Set of (resolved) new jobs created, to be continued automatically.
Nonlinear failure Cannot fork job Done Ready/JobID.nonlinfail Resolving must be done manually.
1=0 Contradictory equations Done Abandoned/JobID.ce Job abandoned.
Nonimportant state Nonimportant job Done Abandoned/JobID.nonimportant Job abandoned. Setup TestOfNonImportantness() function of your mc/TaskID.init.mc file to define the nonimportantness.
Error Maple session failed Errors See Logs/JobID.log Exception raised. The whole job directory has moved to Errors/Machine/JobID

Jobs states and .runme file

The state of the job at the moment is determined by the place where its unique JobID.runme job file is:
Place of JobID.runme State name Job state description
Ready Ready Ready to be run, waiting for processing
Done Done Done (no matter what the result was but without rising an exception)
InProgress/Machine/Job In progressIn processing rigth now. Don't touch this directory during the processing!
Errors/Machine/Job ErrorException raised during the processing

Be careful when you moving .runme file manually and please do not move job files which are under processing by hand under no circumstances!

Stopping jobs

The running job cannot be stopped (except 'killing' its process, see bellow), but you can avoid starting of new computations.

To stop grabbing of any new jobs, create the file InProgress/stop (no matter what the content is). There will be no new computation started until you remove (or delete) this file.

To stop grabbing new jobs on some particular machine you can create the file InProgress/Machine/stop which triggers off grabbing of new jobs on the machine Machine (where the bin/Run command is running).

When you kill the job process on the working machine, you have to clean up its working directory InProgress/Machine/JobID, i. e.

  1. move the file InProgress/Machine/JobID/JobID.runme to the Ready directory (or somewhere else, see bellow for Ready/Sleeping) and
  2. delete the whole InProgress/Machine/JobID directory.
Beware of not deleting .runme file.

Putting jobs away from the processing

When you want a particular job ready for processing to be put away (because you know it is too exhausting or or you are interested in other forks), you can move its .runme job file away from Ready directory. For example, you can use Ready/Sleeping directory for such purpose.

Jets Homepage Distributed Jets project

Valid HTML 4.01 Transitional