Service Desk Knowledgebase: Condor

From Computer Laboratory System Administration
Jump to navigationJump to search

This is the Condor content page of the CL Wiki Service Desk Knowledgebase. Its purpose is to provide information to the Service Desk team on how to handle problems and requests about this CL service. If you are involved with the provision of this CL service please feel free to add to the knowledge about that it.

If CL staff need to tell the Service Desk team about problems with this service please email
sys-admin-aside@cl.cam.ac.uk.

Return to the Service Desk Knowledgebase SERVICE PORTFOLIO

Key Service Description & URLs

CL Customer Documentation

Further CL Sys-Admin Resources

NOTE: Machines for Condor jobs are NOT always powered up and are started up on request.

Underpinning Services

Customer-base for this Service

  • All staff and students of the collegiate University

Costs

  • Free to all current staff and students of the collegiate University

SLA

  • N/A

Service Desk Call Handling Procedure

  • RT tickets can be escalated to the changing the Queue to sys-admin with the Owner set to Nobody and Status as New. Tell the requestor:
    I am passing this request over to our Condor team who, I'm sure, will be in contact shortly.

Condor

Graham Titmus (31/01/2015)


1The user needs to setup a Kerberos Ticket on the machine.

2 The Xen VMs need to be started - these are named pb0xx. e.g. to start machine 30 with 2 CPUs and 7GB memory use

cl-condor-start pb030 2 7000000000

3 The PATH variable needs setting to

PATH=/opt/condor-6.8.3/bin:$PATH;export PATH

4 The user can then use the commands to submit and monitor jobs.

Common problem is that jobs are held

condor_q -analyze 

just says "Request is held". A common problem is to underestimate the amount or RAM needed. The job will repeatedly run out and fail, so be held.

condor_q
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 
1374.31 mr472 2/4 00:53 0+00:31:52 R 0 32128.9 runconfig.sh /home

shows jobs require 32GB each,

condor_Qr
1374.31 2S 5U 2C mr472 vm1@pb035 (Memory >= 4000)

shows the user seem not to have specified suitable RAM when starting the machine using 'cl-condor-start'. The solution is to start up new machines with enough memory.


Dealing with the XenPool Machines

See Accessing the Xen Console

Contacts

Primary

Availability

  • 24x7

Hints, Tips & Know Issues

Condor…

Graham Titmus (31/01/2015)


1The user needs to setup a Kerberos Ticket on the machine.

2 The Xen VMs need to be started - these are named pb0xx. e.g. to start machine 30 with 2 CPUs and 7GB memory use

cl-condor-start pb030 2 7000000000

3 The PATH variable needs setting to

PATH=/opt/condor-6.8.3/bin:$PATH;export PATH

4 The user can then use the commands to submit and monitor jobs.

Common problem is that jobs are held

condor_q -analyze 

just says "Request is held". A common problem is to underestimate the amount or RAM needed. The job will repeatedly run out and fail, so be held.

condor_q
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 
1374.31 mr472 2/4 00:53 0+00:31:52 R 0 32128.9 runconfig.sh /home

shows jobs require 32GB each,

condor_Qr
1374.31 2S 5U 2C mr472 vm1@pb035 (Memory >= 4000)

shows the user seem not to have specified suitable RAM when starting the machine using 'cl-condor-start'. The solution is to start up new machines with enough memory.


Categorising Keywords

  • Condor pool