Filer review: Difference between revisions

From Computer Laboratory System Administration
Jump to navigationJump to search
No edit summary
No edit summary
Line 76: Line 76:
# The autofs configuration required for client-side mapping is disseminated via LDAP from an LDIF master file that is currently generated by a very complex set of historically grown scripts that are considered unmaintainable and not fully understood by any member of sys-admin. This system is overdue for reimplementation.
# The autofs configuration required for client-side mapping is disseminated via LDAP from an LDIF master file that is currently generated by a very complex set of historically grown scripts that are considered unmaintainable and not fully understood by any member of sys-admin. This system is overdue for reimplementation.
# The filer lacks any more efficient command than simple copying to move files from one q-tree into another. Therefore, reorganising the way in which home and research-group directories are distributed across q-trees would not only take several days to complete (which could be disruptive), but would also cause significant "churn" in the backup system, essentially doubling for a long time the amount of disc space required on the backup system.
# The filer lacks any more efficient command than simple copying to move files from one q-tree into another. Therefore, reorganising the way in which home and research-group directories are distributed across q-trees would not only take several days to complete (which could be disruptive), but would also cause significant "churn" in the backup system, essentially doubling for a long time the amount of disc space required on the backup system.
An additional problem is that the namespace under Linux and under Windows differs substantially, for example
  /homes/mgk25                        =  \\filer\userfiles\unix_home\mgk25
  /auto/userfiles/mgk25/windows_home  =  \\filer\mgk25
  /user/groups/ecad                  =  \\filer\ecad
  /anfs/www                          =  \\filer\www
 
This is a significant nuisance and source of confusion that hinders communication between Linux and Windows users. The Linux namespace was designed to be backwards compatible with existing and historically grown pre-NetApp practice. The Windows filespace was constrained by the fact that Windows "shares" allow only for a flat namespace. This has led to different solutions. Arguably, the Windows namespace, because it is flat, newer, and less influenced by historic practice is far more desirable. The main reason that the Windows namespace has not yet been implemented as well under Linux (/filer/...) is to keep the /filer prefix free for a comprehensive redesign of the Linux namespace that not only addresses the incompatibility with Windows share names, but also addresses many of the other issues raised above.
Outline of a possible solution:
* Since the constraints that led to the historic set of q-trees no longer apply, new users (and selected existing power users) should have their home directories assigned to either a single common q-tree, or to individual per-user q-trees (to be decided). A single common q-tree has the advantage of allowing easy collaboration in one's home directory and the disadvantage that it remains difficult to find where files are that count towards ones overfull quota. With a per-user q-tree, this advantage and disadvantage are swaped. (The current scheme has both aspects as a disadvantage!).
* For most existing users, the existing q-tree scheme should remain in place until they are a minority small enough to make the backup churn manageable that is caused by reassigning all of them as well.
* The way the client-side namespace mapping is configured needs to be rewritten from scratch and carefully documented, in a way that can easily applied to private Linux machines as well. This could be done by creating a new /filer namespace and running the old and the new system in parallel for a short time.
==Authentication / Kerberos==
For many years, NFS access to the departmental file servers was restricted to lab-managed machines, because of the inherently insecure AUTH_SYS authentication scheme used, whereby the file server simply trusts the numeric user ID communicated by the client as long as the NFS packet originated from a permitted IP address and from a source port address belowe 1024, which is reserved for the superuser.
This changed a few years ago with the implementation of "Kerberized NFS" (RPCSEC_GSS) on Linux. New lab-managed Linux clients now routinely authenticate their users to the NFSv3 server via Kerberos credentials, using the Lab's Active Directory server as the key distribution centre (KDC). As a result, it has become possible to give users of such clients root access without giving them the ability to impersonate others on the filer. The same Kerberos setup is also used by Windows/CIFS clients.
[... ease of setup ...]
[... host key question ...]
==LDAP==
In order to translate numeric user and group IDs into meaningful names, make the ~login shell expansion work, etc., the "getent passwd" user database of the local machine needs to be linked to the departmental LDAP server that has information about all users. This can be easily done (configure /etc/nsswitch.conf and /etc/ldap.conf).
However, there is still a minor problem/risk involved with using the departmental LDAP server (which affects not only home PCs but also lab-managed machines!):
The departmental LDAP server currently still serves a lot of numeric user IDs below 1000 and numeric group IDs below 500, which are actually reserved for use by the local operating system installation (and there are particularly many collisions with Ubuntu Linux system users and groups).
The only solution is to reassign the remaining user and group IDs to higher numbers, such that the departmental LDAP server no longer serves any user IDs below 1000 and group IDs below 500. (Setting these limits ~10 higher might be a good idea, because a home Linux PC might already have uids 1000, 1001, 1002 assigned to local users, such as family members.)
There is already an ongoing [http://www.cl.cam.ac.uk/news/2011/10/unix-group-cleanup/ campaign to reassign group identifiers], which is mostly slowed down by the need to chown a large number of files on the filer, along with the inadequate scripting interface of our user database (Microsoft's Active Directory).
== Summary: Why does Windows/CIFS access "just work" and Linux/NFS access not? ==
Users of home PCs running Windows have for a long time been able to access departmental filespace simply by activating a [http://www.cl.cam.ac.uk/local/sys/microsoft/vpn/ VPN connection to the Computer Laboratory's Cisco PPTP server], and typing "\\filer\..." into the "Run" or "Search" box of their desktop. There is no need to be integrated into the domain; Windows will simply show a pop-up box and ask for the departmental Kerberos password before the files become visible.
Why are things not equally simple under Linux? There are several reasons:
* There is no VPN service supported for Linux. The commonly-used alternative, OpenSSH, does not support the forwarding of the NFSv3 protocol, where the port number used by the mount protocol is dynamically negotiated via rpcbind.
** Solution 1: Enable NFSv4, which has no separate mount protocol and is easy to tunnel over ssh by forwarding tcp port 2049.
*** Prerequisite 1: AUTH_SYS must be disabled first for all generally accessible clients, because a NetApp-specific security vulnerability of AUTH_SYS is particularly trivial to exploit via NFSv4.
*** Prerequisite 2: All remaining AUTH_SYS clients (for cron, web server, etc.) must be on a dedicated server VLAN (already done?) and the filer be protected from source-IP spoofing of associated source IP addresses (already done?).
** Solution 2: Offer and document a VPN gateway that is easy to set up under Linux. (Replacing the VPN service may also benefit Windows users, as there are security and configuration concerns with the existing very old setup.)
* "Kerberized NFS" currently requires setting up a Kerberos host key for the machine, in addition to the Kerberos password required from the users.
** Solution: Investigate what the host key is actually required for, and whether we can eliminate that need. (Windows seems to require no equivalent.)
* Once NFS works, the user will be faced with only a half-mapped namespace very different from what is customary on lab-managed machines.
** Solution: design a simple and stable configuration for client-side mapping of the namespace. This may continue to involve an LDAP-disseminated autofs configuration, but might as well be just a list of additional entries to /etc/fstab. In the interest of efficiency, it may be desirable to have just one single NFS mount from the filer, along with local type=bind mounts to shape the desired namespace, in the absence of a comparable mechanism on the filer. Ideally, there should just be a single mount to /filer, with all remaining mapping being done on the filer.
* To convert numeric user/group IDs to user-friendly names, an LDAP connection is needed. A minor obstacle/risk is that existing historic LDAP entries still collide with numbers used by Linux distributions (e.g., Ubuntu).
**Solution: [http://www.cl.cam.ac.uk/news/2011/10/unix-group-cleanup/ Reassign the remaining LDAP user and group IDs to higher numbers.]

Revision as of 15:35, 15 February 2012

This is an evolving early-draft report by the ad-hoc Filer Working Group, who started in early 2012 to review the use and configuration of the departmental filer. For more information about the filer, there are also the departmental filespace user documentation, some notes on the NetApp file server by and for sys-admin, and its own man pages.

The Computer Laboratory has operated a centrally provided NFS file store for Unix/Linux systems continuously since the mid 1980s. This service hosts the commonly used home directories and working directories of most users, and is widely used by research groups to collaborate, via group directories for shared project files and software. At present, this service is provided by a NetApp FAS3140-R5 storage server "elmer" (SN: 210422922), running under "Data ONTAP Release 7.3.3". This server also provides access to the same filespace to other operating systems via the CIFS and WebDAV protocols. It also hosts disk images for virtual machines, which are accessed over block-level protocols such as iSCSI. An additional FAS2040-R5 server "echo" (SN: 200000186549) handles off-site backup using SnapVault.

Review

The Computer Laboratory's IT Advisory Panel initiated on 28 October 2011 an ad-hoc working group to review the provision of departmental file space, headed by Markus Kuhn. The initial focus of this review will be the configuration and use of the existing filer "elmer", with a particular view on identifying and eliminating obstacles that currently prevent remote NFS access by private Linux home computers (something that has long been available to Windows users). It is believed that this project also provides an opportunity to rid the configuration of the filer from historic baggage and to streamline and simplify its use by departmentally administered Linux machines. The working group may later extend its remit and welcomes suggestions.

The initial fact-finding phase of the review was conducted by Markus Kuhn and Martyn Johnson and focussed namespace management, authentication, and access from outside the department.

Namespace management

The NetApp operating system requires filer administrators to structure the storage space at several levels. Familiarity with these will help to understand some of the historic design decisions made.

  • An aggregate is a collection of physical discs, made up of one or more RAID-sets. It is the smallest unit that can be physically unplugged and moved intact to a different filer. Elmer has two aggregates because one cannot mix different major disk technologies (Fibre Channel vs. SATA) in an aggregate. The backup filer eldo has just one. Discs can be added to an aggregate on the fly, but never removed.
  • A volume is a major unit of space allocation within an aggregate. Typically, they have reserved space, though it is possible to over-commit if one really wants to. Many properties are bound to a volume, e.g. language(?). Significantly, a volume is the unit of snapshotting – each volume has its own snapshot schedule and retention policy.
  • A q-tree ("quota tree") is a magic directory within the root directory of a volume which has a quota attached to it and all its descendants. (This is merely for quota; there is no space reservation associated with a q-tree.)

When we first got the filer in 19??, the aggregate layer did not exist, and a volume was just a collection of discs. Therefore a single volume couldn't get too bit, and it was not feasible to put, for example, all user home directories into a single q-tree, as a q-tree couldn't span multiple volumes, and therefore no multiple sets of disks. This imposed an upper bound on the size of a q-tree. In addition, the backup system imposed constraints on the total number of q-trees. It was therefore also not possible to give every user their own q-tree. As a compromise, Martyn Johnson created eight q-trees called homes-1 to homes-8, which are now all located in volume 1, along with various q-trees for each research group with group filespace (and for various other functions):

 $ ls /a/elmer-vol1
 grp-cb1  grp-op1  grp-se1  grp-th1  homes-2  homes-5  homes-8  sys-rt
 grp-da1  grp-pr1  grp-sr1  grp-th2  homes-3  homes-6  sys-1
 grp-nl1  grp-rb1  grp-sr9  homes-1  homes-4  homes-7  sys-pk1
 $ ls /a/elmer-vol3
 grp-dt1  grp-nl4  grp-rb4   grp-sr3  grp-sr7  sys-lg1  sys-ww1
 grp-dt2  grp-nl9  grp-rb9   grp-sr4  grp-sr8  sys-li1
 grp-nl2  grp-rb2  grp-sr11  grp-sr5  grp-th9  sys-li9
 grp-nl3  grp-rb3  grp-sr2   grp-sr6  sys-acs  sys-pk2
 $ ls /a/elmer-vol4
 misc-clbib  misc-repl  sys-bmc  www-1  www-2
 $ ls /a/elmer-vol5
 WIN32Repository  grp-sr10  grp-te1     scr-1  scr-3  scr-5
 grp-rb5          grp-sr12  misc-arch1  scr-2  scr-4  www-3
 $ ls /a/elmer-vol6
 MSprovision  grp-nl8  grp-rb6  sys-ct  sys-rt2  www-4
 $ /a/elmer-vol8
 grp-ai1  grp-dt8  grp-dt9  grp-nl7
 $ /a/elmer-vol9
 ah433-nosnap  iscsi-nosnap1  misc-nosnap1

As a result of this compromise, the pathname of a (super)home directory on the filer, such as

 vol1/homes-1/maj1/
 vol1/homes-5/mgk25/

now includes a q-tree identifier (e.g., homes-1) that the user cannot infer from the user identifier, and which we therefore would ideally hide from users. Users should instead see simple pathnames such as /homes/maj1. Therefore, a two-stage mapping system between filer pathnames and user-visible pathnames was implemented for NFSv3:

  • Server-side mapping: Firstly, the filer's /etc/exports file (/a/elmer-vol0/etc/exports in lab-managed Linux machines) uses the -actual option as in "/vol/userfiles/mgk25 -actual=/vol/vol1/homes-5/mgk25" to export each superhome directory of a user under an alias pathname that lacks the q-tree identifier.
  • Client-side mapping: Secondly, autofs system is used to individually mount such user directories under a more customary location in the client-side namespace, using mount entires such as "elmer:/vol/userfiles/mgk25/unix_home on /auto/homes/mgk25" or "elmer:/vol/vol3/grp-rb2/ecad on /auto/groups/ecad". Finally, symbolic links such as "/homes -> /auto/homes", "/usr/groups -> /auto/groups", and "/anfs -> /auto/anfs" to give access via customary short pathnames.

This solution is historically grown and was motivated by two considerations:

  • When new users are created, their home directory should become instantly available on all client machines, which meant that the mapping needed to eliminate the q-tree identifier from the pathname had to be performed on the server, as there was no practical way to push out such changes in real-time to all client machines.
  • There was already an existing historic automount configuration infrastructure and customary namespace in place on lab-managed Unix clients when the filer was installed.

This arrangement is far from ideal, and causes a number of problems:

  • Quota restrictions (as reported on lab-managed Linux machines by "cl-rquota" are reported in terms of q-tree identifiers (e.g., homes-5, scr-1, www-1) that have hardly any useful link with the pathnames that the user is accustomed to, making it difficult to guess which subdirectory needs cleaning up if one runs out of quota.
  • Some research groups historically had to fragment their group space across multiple q-trees, which complicates understanding quotas and namespace.
  • Some users have quota in other's home directory (namely those sharing the same of the eight home-* q-trees), but others do not, which can lead to confusion when users try to collaborate using shared directories in someone's home directory.
  • A very elaborate system of symbolic links and autofs configuration is needed to create the customary client-side name-space, which requires substantial setup and tweaking on lab-managed machines that was never documented or supported for implementation on private machines.
  • the scheme relies on the -actual option in the filer's /etc/exports, which works for NFSv3, but unfortunately apparently not for NFSv4. As a result, we cannot switch to NFSv4 with the current setup, and we are loosing out on the substantial performance and ease-of-tunneling advantages that NFSv4 would have in particular for remote access (only one single TCP connection to filer needed, "delegation" to maintain consistency of a local file cache, etc.).

There are two reasons why fixing this is non-trivial and hasn't been done long ago:

  1. The autofs configuration required for client-side mapping is disseminated via LDAP from an LDIF master file that is currently generated by a very complex set of historically grown scripts that are considered unmaintainable and not fully understood by any member of sys-admin. This system is overdue for reimplementation.
  2. The filer lacks any more efficient command than simple copying to move files from one q-tree into another. Therefore, reorganising the way in which home and research-group directories are distributed across q-trees would not only take several days to complete (which could be disruptive), but would also cause significant "churn" in the backup system, essentially doubling for a long time the amount of disc space required on the backup system.

An additional problem is that the namespace under Linux and under Windows differs substantially, for example

 /homes/mgk25                        =  \\filer\userfiles\unix_home\mgk25
 /auto/userfiles/mgk25/windows_home  =  \\filer\mgk25
 /user/groups/ecad                   =  \\filer\ecad
 /anfs/www                           =  \\filer\www
 

This is a significant nuisance and source of confusion that hinders communication between Linux and Windows users. The Linux namespace was designed to be backwards compatible with existing and historically grown pre-NetApp practice. The Windows filespace was constrained by the fact that Windows "shares" allow only for a flat namespace. This has led to different solutions. Arguably, the Windows namespace, because it is flat, newer, and less influenced by historic practice is far more desirable. The main reason that the Windows namespace has not yet been implemented as well under Linux (/filer/...) is to keep the /filer prefix free for a comprehensive redesign of the Linux namespace that not only addresses the incompatibility with Windows share names, but also addresses many of the other issues raised above.

Outline of a possible solution:

  • Since the constraints that led to the historic set of q-trees no longer apply, new users (and selected existing power users) should have their home directories assigned to either a single common q-tree, or to individual per-user q-trees (to be decided). A single common q-tree has the advantage of allowing easy collaboration in one's home directory and the disadvantage that it remains difficult to find where files are that count towards ones overfull quota. With a per-user q-tree, this advantage and disadvantage are swaped. (The current scheme has both aspects as a disadvantage!).
  • For most existing users, the existing q-tree scheme should remain in place until they are a minority small enough to make the backup churn manageable that is caused by reassigning all of them as well.
  • The way the client-side namespace mapping is configured needs to be rewritten from scratch and carefully documented, in a way that can easily applied to private Linux machines as well. This could be done by creating a new /filer namespace and running the old and the new system in parallel for a short time.

Authentication / Kerberos

For many years, NFS access to the departmental file servers was restricted to lab-managed machines, because of the inherently insecure AUTH_SYS authentication scheme used, whereby the file server simply trusts the numeric user ID communicated by the client as long as the NFS packet originated from a permitted IP address and from a source port address belowe 1024, which is reserved for the superuser.

This changed a few years ago with the implementation of "Kerberized NFS" (RPCSEC_GSS) on Linux. New lab-managed Linux clients now routinely authenticate their users to the NFSv3 server via Kerberos credentials, using the Lab's Active Directory server as the key distribution centre (KDC). As a result, it has become possible to give users of such clients root access without giving them the ability to impersonate others on the filer. The same Kerberos setup is also used by Windows/CIFS clients.

[... ease of setup ...] [... host key question ...]

LDAP

In order to translate numeric user and group IDs into meaningful names, make the ~login shell expansion work, etc., the "getent passwd" user database of the local machine needs to be linked to the departmental LDAP server that has information about all users. This can be easily done (configure /etc/nsswitch.conf and /etc/ldap.conf).

However, there is still a minor problem/risk involved with using the departmental LDAP server (which affects not only home PCs but also lab-managed machines!):

The departmental LDAP server currently still serves a lot of numeric user IDs below 1000 and numeric group IDs below 500, which are actually reserved for use by the local operating system installation (and there are particularly many collisions with Ubuntu Linux system users and groups).

The only solution is to reassign the remaining user and group IDs to higher numbers, such that the departmental LDAP server no longer serves any user IDs below 1000 and group IDs below 500. (Setting these limits ~10 higher might be a good idea, because a home Linux PC might already have uids 1000, 1001, 1002 assigned to local users, such as family members.)

There is already an ongoing campaign to reassign group identifiers, which is mostly slowed down by the need to chown a large number of files on the filer, along with the inadequate scripting interface of our user database (Microsoft's Active Directory).

Summary: Why does Windows/CIFS access "just work" and Linux/NFS access not?

Users of home PCs running Windows have for a long time been able to access departmental filespace simply by activating a VPN connection to the Computer Laboratory's Cisco PPTP server, and typing "\\filer\..." into the "Run" or "Search" box of their desktop. There is no need to be integrated into the domain; Windows will simply show a pop-up box and ask for the departmental Kerberos password before the files become visible.

Why are things not equally simple under Linux? There are several reasons:

  • There is no VPN service supported for Linux. The commonly-used alternative, OpenSSH, does not support the forwarding of the NFSv3 protocol, where the port number used by the mount protocol is dynamically negotiated via rpcbind.
    • Solution 1: Enable NFSv4, which has no separate mount protocol and is easy to tunnel over ssh by forwarding tcp port 2049.
      • Prerequisite 1: AUTH_SYS must be disabled first for all generally accessible clients, because a NetApp-specific security vulnerability of AUTH_SYS is particularly trivial to exploit via NFSv4.
      • Prerequisite 2: All remaining AUTH_SYS clients (for cron, web server, etc.) must be on a dedicated server VLAN (already done?) and the filer be protected from source-IP spoofing of associated source IP addresses (already done?).
    • Solution 2: Offer and document a VPN gateway that is easy to set up under Linux. (Replacing the VPN service may also benefit Windows users, as there are security and configuration concerns with the existing very old setup.)
  • "Kerberized NFS" currently requires setting up a Kerberos host key for the machine, in addition to the Kerberos password required from the users.
    • Solution: Investigate what the host key is actually required for, and whether we can eliminate that need. (Windows seems to require no equivalent.)
  • Once NFS works, the user will be faced with only a half-mapped namespace very different from what is customary on lab-managed machines.
    • Solution: design a simple and stable configuration for client-side mapping of the namespace. This may continue to involve an LDAP-disseminated autofs configuration, but might as well be just a list of additional entries to /etc/fstab. In the interest of efficiency, it may be desirable to have just one single NFS mount from the filer, along with local type=bind mounts to shape the desired namespace, in the absence of a comparable mechanism on the filer. Ideally, there should just be a single mount to /filer, with all remaining mapping being done on the filer.
  • To convert numeric user/group IDs to user-friendly names, an LDAP connection is needed. A minor obstacle/risk is that existing historic LDAP entries still collide with numbers used by Linux distributions (e.g., Ubuntu).