Service Desk Knowledgebase: Condor: Difference between revisions
(→Title=) |
|||
Line 73: | Line 73: | ||
==Hints, Tips & Know Issues== | ==Hints, Tips & Know Issues== | ||
==Title=== | ===Title=== | ||
[http://www.lookup.cam.ac.uk/person/CRSid Firstname Surname] (Date) | [http://www.lookup.cam.ac.uk/person/CRSid Firstname Surname] (Date) | ||
Revision as of 15:23, 27 February 2015
This is the Condor content page of the CL Wiki Service Desk Knowledgebase. Its purpose is to provide information to the Service Desk team on how to handle problems and requests about this CL service. If you are involved with the provision of this CL service please feel free to add to the knowledge about that it.
If CL staff need to tell the Service Desk team about problems with this service please email
sys-admin-aside@cl.cam.ac.uk.
Return to the Service Desk Knowledgebase SERVICE PORTFOLIO
Key Service Description & URLs
- Condor batch system
- Condor – local guide
- Computer Laboratory News (Twitter use @UC_CL_SysAdm)
CL Customer Documentation
Further CL Sys-Admin Resources
NOTE: Machines for Condor jobs are NOT always powered up and are started up on request.
Underpinning Services
- XenE - Xen
Customer-base for this Service
- All staff and students of the collegiate University
Costs
- Free to all current staff and students of the collegiate University
SLA
- N/A
Service Desk Call Handling Procedure
- RT tickets can be escalated to the changing the Queue to sys-admin with the Owner set to Nobody and Status as New. Tell the requestor:
I am passing this request over to our Condor team who, I'm sure, will be in contact shortly.
Condor
Graham Titmus (31/01/2015)
1The user needs to setup a Kerberos Ticket on the machine.
2 The Xen VMs need to be started - these are named pb0xx. e.g. to start machine 30 with 2 CPUs and 7GB memory use
cl-condor-start pb030 2 7000000000
3 The PATH variable needs setting to
PATH=/opt/condor-6.8.3/bin:$PATH;export PATH
4 The user can then use the commands to submit and monitor jobs.
Common problem is that jobs are held
condor_q -analyze
just says "Request is held". A common problem is to underestimate the amount or RAM needed. The job will repeatedly run out and fail, so be held.
condor_q ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1374.31 mr472 2/4 00:53 0+00:31:52 R 0 32128.9 runconfig.sh /home
shows jobs require 32GB each,
condor_Qr 1374.31 2S 5U 2C mr472 vm1@pb035 (Memory >= 4000)
shows the user seem not to have specified suitable RAM when starting the machine using 'cl-condor-start'. The solution is to start up new machines with enough memory.
Dealing with the XenPool Machines
Contacts
Primary
- sys-admin-comment@cl.cam.ac.uk (Goes to CL back office team)
Availability
- 24x7
Hints, Tips & Know Issues
Title
Firstname Surname (Date)
Info
Categorising Keywords
- Condor pool