Service Desk Knowledgebase: Linux: Difference between revisions

From Computer Laboratory System Administration
Jump to navigationJump to search
 
(122 intermediate revisions by 6 users not shown)
Line 131: Line 131:
If they can login there but have no home then it is probably a problem with the filesystem.  If they cannot login at all then it is an authentication problem.  They should then try from another machine that is known to work to check their login works.
If they can login there but have no home then it is probably a problem with the filesystem.  If they cannot login at all then it is an authentication problem.  They should then try from another machine that is known to work to check their login works.


===Adding privileged users===
===Adding privileged or 'assigned' users===


'''Linux PCs Assigned to Users:'''<br />
'''Linux PCs Assigned to Users:'''<br />
Machines are setup with a single 'assigned user' having both '''cl-asuser''' access (due to owning the file /etc/user-config/bundles) and '''sudo''' access (due to being in a suitable group which has sudo rights).  If the assigned user has not been setup (because a machine has been moved to a new user or was not done when the machine was installed) login and first run '''cl-asuser cl-hostid-fix --user <font color="red">$CRSid</font>''' which will show you what it thinks needs to be done and (if it looks okay) then run '''cl-asuser cl-hostid-fix --user <font color="red">$CRSid</font> -a''' to actually do it.  Then '''[Edit]''' & '''[Update]''' the machine names' entry(s) in the [https://dbwebserver.ad.cl.cam.ac.uk/SCG/Equipment/Inventory.aspx inventory] and set the '''User: <font color="red">$CRSid</font>'''  and a '''Comment''' like '''RT#12345 User=<font color="red">$CRSid</font>'''.
Machines are setup with a single 'assigned user' having both '''cl-asuser''' access (due to owning the file /etc/user-config/bundles) and '''sudo''' access (due to being in a suitable group which has sudo rights).  If the assigned user has not been setup (because a machine has been moved to a new user or was not done when the machine was installed):
# Login to the user's machine from a Lab machine such as slogin-serv using '''ssh -K <font color="red">$MachineName</font>'''
# First run '''cl-asuser cl-hostid-fix --user <font color="red">$CRSid</font>''' which will show you what it thinks needs to be done and (if it looks okay)  
# Then run '''cl-asuser cl-hostid-fix --user <font color="red">$CRSid</font> -a''' to actually do it.   
# ''The user should be told that if they are currently logged in they '''must logout''' & log back in again for the changes to take effect.'' 
# Then '''[Edit]''' & '''[Update]''' the machine names' entry(s) in the [https://dbwebserver.ad.cl.cam.ac.uk/SCG/Equipment/Inventory.aspx inventory] and set the '''User: <font color="red">$CRSid</font>'''  and a '''Comment''' like '''RT#<font color="red">12345</font> User=<font color="red">$CRSid</font>'''.
 
NOTE: If the user's account and home directory have not already been created (as in [https://wiki.cam.ac.uk/cl-sys-admin/Service_Desk_Knowledgebase:_User_Accounts_and_Groups#Part_2:_Create_home_directories_for_new_users Create home directories for new users]) you will get errors like:
  chown jmt78 /etc/user-config/bundles
  chown: invalid user: ‘jmt78’
  chown jmt78 /etc/user-config/patches
  chown: invalid user: ‘jmt78’
  chown jmt78 /etc/user-config/hostid
  chown: invalid user: ‘jmt78’
and you will need to re-run the command after the account has been created.


'''Group Servers & PCs with multiple admins:'''<br />
'''Group Servers & PCs with multiple admins:'''<br />
For group servers which may want multiple admins, they can use being in the sudo group to grant privileges to other users. Liaise with the machine owner to check what is wanted.  To actually do it:
For group servers which may want multiple admins, they can use being in the sudo group to grant privileges to other users. Liaise with the machine owner to check what is wanted.  To actually do it:


First '''ssh -K <font color="red">$hostname</font>''' (if it's not turned on try '''cl-boot-mc''' on any of the [http://www.cl.cam.ac.uk/local/sys/access/remote-access.html slogin]  machines, or '''[http://www-dyns.cl.cam.ac.uk/cgi/raven/boot-mc.cgi Wake-on-Lan (WoL)]''' - wait 3-4 minutes for it to appear online) and then...
First '''ssh -K <font color="red">$hostname</font>''' (if it's not turned on try '''cl-boot-mc <font color="red">$MachineName</font>''' on any of the [http://www.cl.cam.ac.uk/local/sys/access/remote-access.html slogin]  machines, or '''[http://www-dyns.cl.cam.ac.uk/cgi/raven/boot-mc.cgi Wake-on-Lan (WoL)]''' - wait 3-4 minutes for it to appear online) and then...


'''cl-asuser access''': (if ACLs are enabled) is setup using '''sudo setfacl -m u:<font color="red">$CRSid</font>:rw /etc/user-config/bundles''' where <font color="red">$CRSid</font> should be replaced by the CRSid of the person who is to be granted privilege.  cl-asuser privileges should then be available immediately.
'''cl-asuser access''': (if ACLs are enabled) is setup using '''sudo setfacl -m u:<font color="red">$CRSid</font>:rw /etc/user-config/bundles''' where <font color="red">$CRSid</font> should be replaced by the CRSid of the person who is to be granted privilege.  cl-asuser privileges should then be available immediately.
Line 152: Line 166:


If there are sudo problems use '''groups <font color="red">$CRSid</font>''' to check which groups the user is in, and '''sudo -l -U <font color="red">$CRSid</font>''' to check the status. Check '''/etc/sudoers''' using '''sudo view sudoers''' and '''/etc/sudoers.d/*''' to check which groups give ALL access.
If there are sudo problems use '''groups <font color="red">$CRSid</font>''' to check which groups the user is in, and '''sudo -l -U <font color="red">$CRSid</font>''' to check the status. Check '''/etc/sudoers''' using '''sudo view sudoers''' and '''/etc/sudoers.d/*''' to check which groups give ALL access.
===Removing privileged or 'assigned' users when they leave===
When the assigned user leaves the machine is usually removed from the network and into GC20.
As it will be re-installed when re-allocated there is no need to un-assign the user.
If it was needed to be done for some reason it could always be assigned to "'''localadmin'''"
If the person leaving had been granted "assigned user" type privileges on someone else's machine (which was remaining un-reclaimed on the network) then the leaving person could have their access undone with:
# delete user from '''/etc/groups'''
# remove from ACL with: '''sudo setfacl -x u:<font color="red">$CRSid</font> /etc/user-config/bundles'''


===(4.7) BMC ACL - when up if present===
===(4.7) BMC ACL - when up if present===
Based on http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/MachineSetup#bmcacl [http://www.lookup.cam.ac.uk/person/pb22 Piete Brooks] (23 Feb 2015)
Based on http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/MachineSetup#bmcacl [http://www.lookup.cam.ac.uk/person/pb22 Piete Brooks] (23 Feb 2015)


# Make sure Pageant.EXE is running and has your private key by double clicking on '''CL.ppk''' or similar.
# Use an omnipotent machine (laira  toton  or radyr) e.g. '''ssh -K laira''' (or '''toton''') & press '''[Enter]''' to get the '''laira:~$''' prompt
# With it running in the system tray launch '''PuTTY''' and go to the CL's '''slogin-serv.cl.cam.ac.uk'''
# Replacing '''<font color="red">$CRSid</font>''' with the ''assigned user's CRSid'' in the following instructions use '''cd /home/<font color="red">$CRSid</font>/''' and '''[Enter]'''
# Type '''kinit''' & press '''[Enter]'''
# Check if the files '''.amtpw''', '''.amtuser''', '''.ipmi-pw''' & '''.ipmi-user''' already exist by running the command to create them - if they already exist, it will list the files:<br/> '''/usr/groups/netmaint/setamt <font color="red">$CRSid</font>'''
# Enter your CL '''Password for CRSid@AD.CL.CAM.AC.UK''' & press '''[Enter]'''
# Display the 'random' password for later use, e.g. for iAMT<br/> '''sudo cat /home/<font color="red">$CRSid</font>/.amtpw'''
# Use '''ssh -K laira''' (or '''toton''') & press '''[Enter]''' to get the '''laira:~$''' prompt
# Use a Windows Remote Desktop Connection to a machine on the Computer Lab network such as the Terminal Server '''ts01.ad.cl.cam.ac.uk'''
# '''cd /home/<font color="red">$CRSid</font>/''' and '''[Enter]'''
# On the server, open a web-browser to the appropriate BMC interface URL i.e.:
# '''echo <font color="red">$CRSid</font> | (umask 377; sudo -u <font color="red">$CRSid</font> tee -a .amtuser)''' and '''[Enter]'''
#* For a '''workstation''' called '''<font color="red">$host</font>''' - [http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/IAMT IAMT] BMC:   '''http://<font color="red">$host</font>-bmc.cl.cam.ac.uk:16992
# '''sudo -u <font color="red">$CRSid</font> cp -p .amtuser .ipmi-user''' and '''[Enter]'''
#* For [http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/WPCM450 WPCM450] on a '''server''' called '''<font color="red">$host</font>''' - [http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/IPMI IPMI] BMC:   '''http://<font color="red">$host</font>.bmc.cl.cam.ac.uk'''
# Create an 8 character password using at least one of each of:
# '''[Login]''' as '''admin''' with the ''special admin password''
## lower case letter
## UPPER CASE LETTER
## digit
## special characters: !@#$%^&*()
# '''(umask 377; sudo -u <font color="red">$CRSid</font> vi .amtpw)''' and '''[Enter]'''<br />'''a''' for APPEND mode and type/paste in the ''password'' then '''[ESC]''' and  ''':wq''' and press '''[Enter]'''
# '''sudo -u <font color="red">$CRSid</font> cp -p .amtpw .ipmi-pw''' and '''[Enter]'''
# '''exit''' and '''[Enter]'''
# From a Computer Lab machine, open a web-browser to the BMC interface e.g.
#* for an [http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/IAMT IAMT] BMC (e.g. a workstation): '''http://<font color="red">$host</font>-bmc.cl.cam.ac.uk:16992
#* for an [http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/IPMI IPMI] BMC (e.g. an [http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/WPCM450 WPCM450] on a server): '''http://<font color="red">$host</font>.bmc.cl.cam.ac.uk'''
# '''[Login]''' as '''admin''' with the special password
# Go to '''User Accounts'''
# Go to '''User Accounts'''
# First select the previous CRSid and '''[Remove]''' & '''[Remove]'''   
# First select any previous assigned user's CRSid and then '''[Remove]''' & '''[Remove]'''   
# Click '''[New]''' and enter the '''User name''' then the password above ''twice'' and select '''Administrator: Grant access to all pages''' then '''[Submit]'''
# Click '''[New]''' and enter the '''User name:''' '''<font color="red">$CRSid</font>''' then the '''new random 8 character password''' created above in step 8 ''twice'' and select '''Administrator: Grant access to all pages''' then '''[Submit]''' ''Be careful when setting the P/W to ensure you do it for the USER and rather than Admin'' - if you do then try go back to laira and do:
# You can't logout so just close the web-browser
 
  '''sudo cat /home/'''<font color="red">$CRSid</font>'''/.amtpw
'''
to Display the 'random' password (take care not to include the password if reporting what happened using cut&paste)
 
# You can't logout of the BMC interface so just '''[X]''' close the web-browser and '''Log Off''' the server.
# On laira, for an iAMT BMC, test it was done correctly. The command below should not generate any 'failed' warnings:


AMTUSER=<font color="red">$CRSid</font> PAGE=index,acl /usr/groups/netmaint/iamt-web <font color="red">$host</font>


NOTE: We do not normally expect the user to have to explicitly use the credentials - they are normally used by commands such as:
NOTE: We do not normally expect the user to have to explicitly use the credentials - they are normally used by commands such as:
* '''cl-boot-mc''' - which boots a machine using the BMC
* '''ipmitool''' - which does raw commands to an IPMI BMC
* '''cl-amttool''' - which does raw commands to an IAMT BMC
* '''cl-amttool''' - which does raw commands to an IAMT BMC
* '''ipmitool''' - which does raw commands to an IPMI BMC
* '''cl-boot-mc''' - which boots a machine using the BMC
* '''amtterm''' - which connects to the serial console of an IAMT WS
* '''amtterm''' - which connects to the serial console of an IAMT WS
===4.9 WoL - at leisure===
* See http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/MachineSetup#head-9d3248bbd29dbd4b4a5fa47d29346972f76fff4f
===Local Disk Full===
First check if sufficient space can be obtained by clearing out old files
cl-admin linux-clean
if that does not give adequate space then
normally just Enlarge the FS using
cl-admin resize2fs / +2G
If there isn't enough free space to easily enlarge the partition (physical too small; RAID using all of the component partitions; VG full for LVM), the partition needs to be enlarged pass the ticket on to the Backoffice queue.
===Linux Operating System Upgrades===
* See [[Service_Desk_Knowledgebase:_Operator_Tasks#Linux_Operating_System_Upgrades | Linux Operating System Upgrades]]
===Requests to Install Linux Packages===
If an extra package is wanted on a Linux machine, the manager (i.e. "Assigned User") should be asked to add it. To determine the assigned user, see the '''User''' field of the machine in the [https://dbwebserver.ad.cl.cam.ac.uk/SCG/Equipment/Inventory.aspx Inventory] Database, or see who has write access to /etc/user-config/bundles e.g. using '''getfacl /etc/user-config/bundles''', the file which lists the packages which should automatically be added, which is copied over when a machine is reinstalled.  Normally it is one user and the sysadmin group, but ACLs may be used to add other users or groups (e.g. the '''srg-tsars''' unix group in the '''Group Server''' example below).
After checking with the appropriate person if permission is agreed then to grant a user the ability to add packages do
sudo setfacl -m u:''$user'':rw /etc/user-config/bundles
and then inform the user that they can now install packages by adding them to /etc/user-config/bundles and running
cl-asuser cl-add-rpms -a
or can manually check each package before adding it using
cl-asuser apt-get install <<package name>>
or tell them that the request has been denied if the authorised person says no.
====Classes of Machine Management====
1) '''User Workstations''':
* These are 'tower' systems;
* They are in a user's office;
* The "'''Assigned User'''" in the [https://dbwebserver.ad.cl.cam.ac.uk/SCG/Equipment/Inventory.aspx Inventory] is the owner, or SSH to the workstation and use '''getfacl /etc/user-config/bundles''' to reveal the owner's CRSid;
* There is normally only one user so refer the request to them.
* For example:
  xie:~$ getfacl /etc/user-config/bundles
  getfacl: Removing leading '/' from absolute path names
  # file: etc/user-config/bundles
  # owner: '''jz377''' 
  # group: sysadmin
  user::rw-
  group::rw-
  other::r--
2) '''Group Servers''':
* These are rack mount or virtual machines;
* They are in a machine room (GN09, SE18 & FN11);
* They are owned by a UTO;
* The "Assigned User" in the [https://dbwebserver.ad.cl.cam.ac.uk/SCG/Equipment/Inventory.aspx Inventory] is the owner, or SSH to the server and use '''getfacl /etc/user-config/bundles''' to reveal the owner's CRSid;
* The '''owner''' is normally the 'assigned manager' so that one person has an overview so refer the request to them.
* For example:
  nile:~: '''getfacl /etc/user-config/bundles'''
  getfacl: Removing leading '/' from absolute path names
  # file: etc/user-config/bundles
  # owner: '''awm22''' 
  # group: sysadmin
  user::rw-
  user:tm444:rw-
  group::rw-
  group:'''srg-tsars''':rw-
  mask::rw-
  other::r--
3) '''Departmental MPhil Pool''':
* These are 'tower' systems;
* They are in a teaching Lab (SW02 & SW11);
* They have systematic names, e.g. acs-34;
* They are owned by a Computer Lab CO (pb22, gt19, maj1, ckh11), "Lab" or some such; 
* The "Assigned User" in the [https://dbwebserver.ad.cl.cam.ac.uk/SCG/Equipment/Inventory.aspx Inventory] is the owner (workstations not accessible via ssh);
* Whilst individual users can load private copies of any 'special' things they need in their $HOME (and maybe setting some environment variables to find it); anything of use to the bulk of the people doing the course should be requested by the '''course giver'''.
* Escalate so the '''Computer Lab COs''' can decide if the case is strong enough to install it on all pool machines. 
 
4) '''Departmental Servers''':
* These are rack mount or Virtual machines;
* They are in a machine room (GN09, SE18 & FN11);
* They are owned by a Computer Lab CO (pb22, gt19, maj1, ckh11), "Lab" or some such; 
* The "Assigned User" in the [https://dbwebserver.ad.cl.cam.ac.uk/SCG/Equipment/Inventory.aspx Inventory] is the owner, or SSH to the server and use '''getfacl /etc/user-config/bundles''' to reveal the owner as 'localadmin', or some such;
* Escalate so the '''Computer Lab COs''' can decide if the case is strong enough and where to install it.
* For example:
  sandy:~$ '''getfacl /etc/user-config/bundles''' 
  getfacl: Removing leading '/' from absolute path names
  # file: etc/user-config/bundles
  # owner: '''localadmin''' 
  # group: sysadmin
  user::rw-
  group::rw-
  other::r--


== Contacts ==
== Contacts ==
Line 207: Line 325:


==Hints, Tips & Known Issues==
==Hints, Tips & Known Issues==
===The "inappropriate ioctl for device" error===
===If the SysAdmin Team Can't SSH into a Linux machine===
[http://www.lookup.cam.ac.uk/person/crsid/pb22 Piete Brooks] (03/06/15)
 
If you have problems logging in to Linux machine called <font color="red">$hostname</font> e.g.:
 
  laira:~$ '''ssh -K www-bluespec'''
  The authenticity of host 'www-bluespec (128.232.98.146)' can't be established.
  ECDSA key fingerprint is ab:0b:03:22:11:71:37:c3:30:00:b5:03:1c:0a:02:17.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'www-bluespec,128.232.98.146' (ECDSA) to the list of known hosts.
  Permission denied (publickey).
 
connect to an omnipotent machine (e.g. '''laira''') and use:<br />
'''sudo ssh <font color="red">$hostname</font>'''
 
This works for:
# stand-alone machines with very limited number of users
# machines on which Kerberos is failing, so can't auth the user
# machines on which LDAP is failing, so can't set groups etc
# machines on which DNS is failing, so can't check caller's DNS name etc.
 
===Finding out a machine's operating system===
[http://www.lookup.cam.ac.uk/person/crsid/gt19 Graham Titmus] (26/05/15)
 
Whilst it's only the best guess you can try logging into laira and running the command:<br />
'''cl-hosts -p <font color="red">MachineName</font>'''<br />
to find out what operating system a machine is believed to have.
 
===The "stty: standard input: inappropriate ioctl for device" error===


[http://www.lookup.cam.ac.uk/person/crsid/pb22 Piete Brooks] (20/03/15)
[http://www.lookup.cam.ac.uk/person/crsid/pb22 Piete Brooks] (20/03/15)


The "'''inappropriate ioctl for device'''" error is probably when the '''.profile''' uses the '''stty''' command to set your erase, kill, and interrupt characters e.g.
The "'''inappropriate ioctl for device'''" error is probably when the '''.profile''' uses the '''stty''' command to set your erase, kill, and interrupt characters e.g.:
 
   # The way certain characters are handled by the system are different between
   # The way certain characters are handled by the system are different between
   # Unixes.
   # Unixes.
Line 219: Line 366:
   *)      stty erase \^? kill \^x intr \^c echoe susp \^z ;;
   *)      stty erase \^? kill \^x intr \^c echoe susp \^z ;;
   esac
   esac
To fix the error you could comment out the use of '''stty''' in your '''.profile''' file or even chose to remove the '''.profile''' file.
'''To fix this error:''' you could comment out the use of '''stty''' in your '''.profile''' file using "'''#''' " at the start of each line above, or even chose to rename the '''.profile''' file as '''old_profile''' using the command: '''mv .profile old_profile'''


===Waking Up a Linux Box===
===Waking Up a Lab Computer which has BMC===
[http://www.lookup.cam.ac.uk/person/crsid/pb22 Piete Brooks] (20/03/15)
[http://www.lookup.cam.ac.uk/person/crsid/pb22 Piete Brooks] (20/03/15)


'''WoL''' is a highly unreliable protocol.  It sends a packet into the ether and hopes it arrives - there is no ACK.  In order to work, the client needs to have set everything up perfectly.  A much better method is to login to an '''slogin-serv''' machine and first run<br />'''ping <font color="red">MachineName-bmc</font>'''<br />and within (say) 10 seconds it should respond and then run<br />'''cl-boot-mc <font color="red">MachineName</font>'''<br />which will use a much more helpful mechanism. If the machine is already awake you will see something like:
First give a reboot a try with '''[http://www-dyns.cl.cam.ac.uk/cgi/raven/boot-mc.cgi Wake-on-Lan (WoL)]''' a try (and wait 3-4 minutes for it to appear online) but it's a highly unreliable protocol.  It sends a packet into the ether and hopes it arrives - there is no ACK.  In order to work, the client needs to have set everything up perfectly.  A much better method for the  ''''assigned user'''' is to login to an '''slogin-serv''' machine and first run:<br />'''ping <font color="red">MachineName</font>-bmc'''<br />and within (say) 10 seconds the BMC should be responsive. If that fails try running:<br />'''ping <font color="red">MachineName</font>.bmc'''<br />
''(Servers tend to have a dedicated connection for the BMC, which is on the 'BMC VLAN', which has its own subnet and domain '''.bmc''')''  Then run:<br />'''cl-boot-mc <font color="red">MachineName</font>'''<br />which will use a much more helpful mechanism. If the machine is already awake you will see something like:


   sandy:~: cl-boot-mc woc-base-00
   sandy:~: cl-boot-mc woc-base-00

Latest revision as of 07:08, 29 September 2016


This is the Linux content page of the CL Wiki Service Desk Knowledgebase. Its purpose is to provide information to the Service Desk team on how to handle problems and requests about this CL service. If you are involved with the provision of this CL service please feel free to add to the knowledge about that it.

If CL staff need to tell the Service Desk team about problems with this service please email
sys-admin-aside@cl.cam.ac.uk.

Return to the Service Desk Knowledgebase SERVICE PORTFOLIO

Key Service Description & URLs

CL Customer Documentation

Further CL Sys-Admin Resources

Underpinning Services

  • ??? - Any supporting or underpinning services

Customer-base for this Service

  • Linux boxes are available to all staff and post-graduates, and well as some on the Part III Under-graduates.

Costs

  • Hardware is charged for if you are a Research Assistant or a University Teaching Officer, but free to Post-graduates.
  • Support is free.

SLA

  • ??? - Timeframes or service level agreement for fulfilling the service

Service Desk Call Handling Procedure

  • RT tickets can be escalated to the unix-admin by changing the Queue to unix-admin with the Owner set to Nobody & Status set to new. Tell the requestor:
    I am passing this request over to our Unix Admin team who, I'm sure, will be in contact shortly.

"It gave an error" or "It failed to work"

Piete Brooks (20 Feb 2015)

If someone says something like "It gave an error" or "It failed to work" on a Linux system please email them the following:


Dear ???,
Could you please send us a copy & paste of the command that you ran and the output that it generated? Also, would you please run the commands:

groups
sudo -l

and copy & paste the output of those into the same reply to this email.

Many thanks,
???


Removing a broken install

Vince Woodley (17 Feb 2015)

ssh to the machine in question then...

1) Find the process responsible for the lock with sudo lsof /var/lib/dpkg/lock

2) Check for running dpkg processes with something like ps -ef | grep dpkg

3) Ask the requestor if they'd like you to kill it:
I suspect I could do that by killing the rogue process -- shall I have a go?

4) Kill any dpkg processes shown above with sudo kill 1234 etc...
If they refuse to die try sudo kill -HUG 1234

5) Check each is dead with the same command again sudo kill 1234 (hopefully there will be no such process)

6) Find the exact name of the dropbox package with dpkg -l \*dropb\* (or similar)

7) Remove it with cl-asuser apt-get remove nautilus-dropbox
(or similar) If that fails with:-
"dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct" then cd /var/lib/dpkg/updates and delete all the files there with rm *

8) Do an update of the system with cl-update-system

9) Check the output to make sure that everything's okay repeating cl-update-system if necessary

Clock slew problem

Graham Titmus (3 Feb 2015)

First check if this is a physical machine or a VM. If a physical machine login to it. If a Xen VM then login to it and check if it is tied to the dom0 clock.

cat /proc/sys/xen/independent_wallclock

If that returns an error then proceed as for a standalone machine. If it returns 0 then you need to find the dom0 which hosts the VM, to find that do

cl-onserver --xe cl-vm-status all hid | grep <<machine_name>>

When logged in to the appropriate machine (using ssh or in the case of Xen you could also connect via the Xen Centre guy) first check if it is working correctly

/usr/sbin/ntpdc -p

which should not look like this

 remote local st poll reach delay offset disp
 =======================================================================
 *LOCAL(0) 127.0.0.1 10 64 377 0.00000 0.000000 0.03046

but have multiple lines each to a remote ntp server.

If it does look like above then restart the ntpd service

 cl-asuser service ntpd restart

and check the output again which should now look like

remote local st poll reach delay offset disp
=======================================================================
=morgul.deadset. 128.232.26.100 16 64 0 0.00000 0.000000 4.00000
=time-b.as43289. 128.232.26.100 16 64 0 0.00000 0.000000 4.00000
=LOCAL(0) 127.0.0.1 10 64 1 0.00000 0.000000 2.81735
=ntp.katho.be 128.232.26.100 16 64 0 0.00000 0.000000 4.00000
=server.netkolik 128.232.26.100 16 64 0 0.00000 0.000000 4.00000
=ntp1d.cl.cam.ac 128.232.26.100 2 64 1 0.00070 0.002260 2.81735
=ntp1c.cl.cam.ac 128.232.26.100 2 64 1 0.00165 0.002201 2.81743
=ntp1b.cl.cam.ac 128.232.26.100 2 64 1 0.00121 0.003449 2.81743
=ntp1a.cl.cam.ac 128.232.26.100 2 64 1 0.00058 0.002520 2.81735

Linux user can't login using graphical interface

Graham Titmus (13 Jan 2015)

Symptoms: Linux user can't login using graphical interface, they enter username and password and get a blank screen then back to login

A common cause of this is a failure to access the home directory stored on the File Server (AKA Elmer or Filer), X (the window manager) needs to write a file there when it starts the user session. To diagnose if this is the problem do the following:-

  1. Remote login to the machine using ssh -K hostname@cl.cam.ac.uk from a CL machine - check if your home directory is present (ls -al ~). If it is look to see if the users home directory is present (ls -al ~crsid).
  2. If the home directory is missing then try to restart the auto mounter (cl-asuser service autofs restart).
  3. Look at the mounted filesystems (grep ldap /proc/mounts - will show which systems have been auto mounted using data form the LDAP).


An alternative is to ask the user to check if it is the machine failing to log them in or a problem with X by getting them to try on the text console (Select with Ctrl-Alt-F2). If they can login there but have no home then it is probably a problem with the filesystem. If they cannot login at all then it is an authentication problem. They should then try from another machine that is known to work to check their login works.

Adding privileged or 'assigned' users

Linux PCs Assigned to Users:
Machines are setup with a single 'assigned user' having both cl-asuser access (due to owning the file /etc/user-config/bundles) and sudo access (due to being in a suitable group which has sudo rights). If the assigned user has not been setup (because a machine has been moved to a new user or was not done when the machine was installed):

  1. Login to the user's machine from a Lab machine such as slogin-serv using ssh -K $MachineName
  2. First run cl-asuser cl-hostid-fix --user $CRSid which will show you what it thinks needs to be done and (if it looks okay)
  3. Then run cl-asuser cl-hostid-fix --user $CRSid -a to actually do it.
  4. The user should be told that if they are currently logged in they must logout & log back in again for the changes to take effect.
  5. Then [Edit] & [Update] the machine names' entry(s) in the inventory and set the User: $CRSid and a Comment like RT#12345 User=$CRSid.

NOTE: If the user's account and home directory have not already been created (as in Create home directories for new users) you will get errors like:

 chown jmt78 /etc/user-config/bundles
 chown: invalid user: ‘jmt78’
 chown jmt78 /etc/user-config/patches
 chown: invalid user: ‘jmt78’
 chown jmt78 /etc/user-config/hostid
 chown: invalid user: ‘jmt78’

and you will need to re-run the command after the account has been created.

Group Servers & PCs with multiple admins:
For group servers which may want multiple admins, they can use being in the sudo group to grant privileges to other users. Liaise with the machine owner to check what is wanted. To actually do it:

First ssh -K $hostname (if it's not turned on try cl-boot-mc $MachineName on any of the slogin machines, or Wake-on-Lan (WoL) - wait 3-4 minutes for it to appear online) and then...

cl-asuser access: (if ACLs are enabled) is setup using sudo setfacl -m u:$CRSid:rw /etc/user-config/bundles where $CRSid should be replaced by the CRSid of the person who is to be granted privilege. cl-asuser privileges should then be available immediately.

sudo access: is setup by using an editor to add them to the relevant group (e.g. sudo or root) in the file /etc/group. To do this ssh -K $hostname and then:

  1. sudo vi /etc/group
  2. [sudo] password for abc123: enter your CL password
  3. Add the user's CRSid to the line like sudo:x:27:localadmin,sg692 by scrolling down to it with the arrow-keys and using [Shift]+A to enter --- INSERT --- mode and typing in ,$CRSid
  4. [ESC] out of insert mode
  5. Write and quit with :wg and [Enter]

(Note that sudo privileges will only take effect in new sessions.)

If there are sudo problems use groups $CRSid to check which groups the user is in, and sudo -l -U $CRSid to check the status. Check /etc/sudoers using sudo view sudoers and /etc/sudoers.d/* to check which groups give ALL access.

Removing privileged or 'assigned' users when they leave

When the assigned user leaves the machine is usually removed from the network and into GC20. As it will be re-installed when re-allocated there is no need to un-assign the user. If it was needed to be done for some reason it could always be assigned to "localadmin"

If the person leaving had been granted "assigned user" type privileges on someone else's machine (which was remaining un-reclaimed on the network) then the leaving person could have their access undone with:

  1. delete user from /etc/groups
  2. remove from ACL with: sudo setfacl -x u:$CRSid /etc/user-config/bundles

(4.7) BMC ACL - when up if present

Based on http://www.wiki.cl.cam.ac.uk/clwiki/SysInfo/MachineSetup#bmcacl Piete Brooks (23 Feb 2015)

  1. Use an omnipotent machine (laira toton or radyr) e.g. ssh -K laira (or toton) & press [Enter] to get the laira:~$ prompt
  2. Replacing $CRSid with the assigned user's CRSid in the following instructions use cd /home/$CRSid/ and [Enter]
  3. Check if the files .amtpw, .amtuser, .ipmi-pw & .ipmi-user already exist by running the command to create them - if they already exist, it will list the files:
    /usr/groups/netmaint/setamt $CRSid
  4. Display the 'random' password for later use, e.g. for iAMT
    sudo cat /home/$CRSid/.amtpw
  5. Use a Windows Remote Desktop Connection to a machine on the Computer Lab network such as the Terminal Server ts01.ad.cl.cam.ac.uk
  6. On the server, open a web-browser to the appropriate BMC interface URL i.e.:
    • For a workstation called $host - IAMT BMC: http://$host-bmc.cl.cam.ac.uk:16992
    • For WPCM450 on a server called $host - IPMI BMC: http://$host.bmc.cl.cam.ac.uk
  7. [Login] as admin with the special admin password
  8. Go to User Accounts
  9. First select any previous assigned user's CRSid and then [Remove] & [Remove]
  10. Click [New] and enter the User name: $CRSid then the new random 8 character password created above in step 8 twice and select Administrator: Grant access to all pages then [Submit] Be careful when setting the P/W to ensure you do it for the USER and rather than Admin - if you do then try go back to laira and do:
 sudo cat /home/$CRSid/.amtpw 

to Display the 'random' password (take care not to include the password if reporting what happened using cut&paste)

  1. You can't logout of the BMC interface so just [X] close the web-browser and Log Off the server.
  2. On laira, for an iAMT BMC, test it was done correctly. The command below should not generate any 'failed' warnings:
AMTUSER=$CRSid PAGE=index,acl /usr/groups/netmaint/iamt-web $host

NOTE: We do not normally expect the user to have to explicitly use the credentials - they are normally used by commands such as:

  • cl-boot-mc - which boots a machine using the BMC
  • ipmitool - which does raw commands to an IPMI BMC
  • cl-amttool - which does raw commands to an IAMT BMC
  • amtterm - which connects to the serial console of an IAMT WS

4.9 WoL - at leisure

Local Disk Full

First check if sufficient space can be obtained by clearing out old files

cl-admin linux-clean

if that does not give adequate space then normally just Enlarge the FS using

cl-admin resize2fs / +2G

If there isn't enough free space to easily enlarge the partition (physical too small; RAID using all of the component partitions; VG full for LVM), the partition needs to be enlarged pass the ticket on to the Backoffice queue.


Linux Operating System Upgrades

Requests to Install Linux Packages

If an extra package is wanted on a Linux machine, the manager (i.e. "Assigned User") should be asked to add it. To determine the assigned user, see the User field of the machine in the Inventory Database, or see who has write access to /etc/user-config/bundles e.g. using getfacl /etc/user-config/bundles, the file which lists the packages which should automatically be added, which is copied over when a machine is reinstalled. Normally it is one user and the sysadmin group, but ACLs may be used to add other users or groups (e.g. the srg-tsars unix group in the Group Server example below).

After checking with the appropriate person if permission is agreed then to grant a user the ability to add packages do

sudo setfacl -m u:$user:rw /etc/user-config/bundles

and then inform the user that they can now install packages by adding them to /etc/user-config/bundles and running

cl-asuser cl-add-rpms -a

or can manually check each package before adding it using

cl-asuser apt-get install <<package name>>

or tell them that the request has been denied if the authorised person says no.


Classes of Machine Management

1) User Workstations:

  • These are 'tower' systems;
  • They are in a user's office;
  • The "Assigned User" in the Inventory is the owner, or SSH to the workstation and use getfacl /etc/user-config/bundles to reveal the owner's CRSid;
  • There is normally only one user so refer the request to them.
  • For example:
 xie:~$ getfacl /etc/user-config/bundles
 getfacl: Removing leading '/' from absolute path names
 # file: etc/user-config/bundles
 # owner: jz377  
 # group: sysadmin
 user::rw-
 group::rw-
 other::r--

2) Group Servers:

  • These are rack mount or virtual machines;
  • They are in a machine room (GN09, SE18 & FN11);
  • They are owned by a UTO;
  • The "Assigned User" in the Inventory is the owner, or SSH to the server and use getfacl /etc/user-config/bundles to reveal the owner's CRSid;
  • The owner is normally the 'assigned manager' so that one person has an overview so refer the request to them.
  • For example:
 nile:~: getfacl /etc/user-config/bundles
 getfacl: Removing leading '/' from absolute path names
 # file: etc/user-config/bundles
 # owner: awm22  
 # group: sysadmin
 user::rw-
 user:tm444:rw-
 group::rw-
 group:srg-tsars:rw-
 mask::rw-
 other::r--

3) Departmental MPhil Pool:

  • These are 'tower' systems;
  • They are in a teaching Lab (SW02 & SW11);
  • They have systematic names, e.g. acs-34;
  • They are owned by a Computer Lab CO (pb22, gt19, maj1, ckh11), "Lab" or some such;
  • The "Assigned User" in the Inventory is the owner (workstations not accessible via ssh);
  • Whilst individual users can load private copies of any 'special' things they need in their $HOME (and maybe setting some environment variables to find it); anything of use to the bulk of the people doing the course should be requested by the course giver.
  • Escalate so the Computer Lab COs can decide if the case is strong enough to install it on all pool machines.

4) Departmental Servers:

  • These are rack mount or Virtual machines;
  • They are in a machine room (GN09, SE18 & FN11);
  • They are owned by a Computer Lab CO (pb22, gt19, maj1, ckh11), "Lab" or some such;
  • The "Assigned User" in the Inventory is the owner, or SSH to the server and use getfacl /etc/user-config/bundles to reveal the owner as 'localadmin', or some such;
  • Escalate so the Computer Lab COs can decide if the case is strong enough and where to install it.
  • For example:
 sandy:~$ getfacl /etc/user-config/bundles  
 getfacl: Removing leading '/' from absolute path names
 # file: etc/user-config/bundles
 # owner: localadmin  
 # group: sysadmin
 user::rw-
 group::rw-
 other::r--

Contacts

Primary

  • unix-admin RT queue

Other

Availability

  • Monday: 09:00-17:00
  • Tuesday: 09:00-17:00
  • Wednesday: 09:00-17:00
  • Thursday: 09:00-17:00
  • Friday: 09:00-17:00
  • Saturday: Closed
  • Sunday: Closed

Hints, Tips & Known Issues

If the SysAdmin Team Can't SSH into a Linux machine

Piete Brooks (03/06/15)

If you have problems logging in to Linux machine called $hostname e.g.:

 laira:~$ ssh -K www-bluespec
 The authenticity of host 'www-bluespec (128.232.98.146)' can't be established.
 ECDSA key fingerprint is ab:0b:03:22:11:71:37:c3:30:00:b5:03:1c:0a:02:17.
 Are you sure you want to continue connecting (yes/no)? yes
 Warning: Permanently added 'www-bluespec,128.232.98.146' (ECDSA) to the list of known hosts.
 Permission denied (publickey).

connect to an omnipotent machine (e.g. laira) and use:
sudo ssh $hostname

This works for:

  1. stand-alone machines with very limited number of users
  2. machines on which Kerberos is failing, so can't auth the user
  3. machines on which LDAP is failing, so can't set groups etc
  4. machines on which DNS is failing, so can't check caller's DNS name etc.

Finding out a machine's operating system

Graham Titmus (26/05/15)

Whilst it's only the best guess you can try logging into laira and running the command:
cl-hosts -p MachineName
to find out what operating system a machine is believed to have.

The "stty: standard input: inappropriate ioctl for device" error

Piete Brooks (20/03/15)

The "inappropriate ioctl for device" error is probably when the .profile uses the stty command to set your erase, kill, and interrupt characters e.g.:

 # The way certain characters are handled by the system are different between
 # Unixes.
 
 case $ARCH in
 sun*)   stty crt erase \^? kill \^x intr \^c ;;
 *)      stty erase \^? kill \^x intr \^c echoe susp \^z ;;
 esac

To fix this error: you could comment out the use of stty in your .profile file using "# " at the start of each line above, or even chose to rename the .profile file as old_profile using the command: mv .profile old_profile

Waking Up a Lab Computer which has BMC

Piete Brooks (20/03/15)

First give a reboot a try with Wake-on-Lan (WoL) a try (and wait 3-4 minutes for it to appear online) but it's a highly unreliable protocol. It sends a packet into the ether and hopes it arrives - there is no ACK. In order to work, the client needs to have set everything up perfectly. A much better method for the 'assigned user' is to login to an slogin-serv machine and first run:
ping MachineName-bmc
and within (say) 10 seconds the BMC should be responsive. If that fails try running:
ping MachineName.bmc
(Servers tend to have a dedicated connection for the BMC, which is on the 'BMC VLAN', which has its own subnet and domain .bmc) Then run:
cl-boot-mc MachineName
which will use a much more helpful mechanism. If the machine is already awake you will see something like:

 sandy:~: cl-boot-mc woc-base-00
 # /usr/bin/cl-boot-mc: already running: Powerstate: S0 (running) for woc-base-00-bmc
 cl-boot-mc: woc-base-00 did not need booting using amt
 sandy:~:

(You may need to hit Ctrl+C to end the process if the machine is actually asleep.)

Categorising Keywords

  • Linux Ubuntu PC Person Computer