Pros & Cons About Astronomy Switching to Glue
(this section should be reconciled with the original points made below)
- Cost is ~$50/year/machine, or ~$4k/year to support all our Linux boxes. That's a lot of system administration for very little cost.
- User requests are made by sending e-mail to email@example.com -- each request is given a case number to ensure follow-up.
(the following points are from the GLUE website)
- Consistency: As far as possible, every Glued machine looks like every other. You can walk across campus, log in to a Glued machine in another department, and it'll look just like the machine on your desk.
- Centralized Administration: The Glue environment makes it possible for a comparatively small staff in OIT to manage the operating system, install patches, maintain common applications, and so forth.
- Distributed Control: While Glue is centrally administered, individual system administrators (Lab Managers) still maintain control over their systems, including root access.
- Security: Glue makes heavy use of Kerberos and ssl for security, and its centralized administration means that patches can quickly be installed everywhere.
- We lose direct control over system administration, though we retain superuser access.
- Every Astro user will need a Glue account (these are free and generally provided in less than 24 hrs).
- Glue home space per user is limited to 100 MB.
- Presently Gnome is not available, but that can be changed.
Q&A About Glue and the Astronomy Department
Compiled in this section is a series of questions & answers between Astro people and Glue people, organized roughly by topic.
Q: Does the $50/yr/machine cover the initial Glue install as well?
The $50 per year covers redhat licensing only. There is (currently) no charge for Glue installations.
Q: What happens if access to the central system goes down?
A: It is true that Glue is network dependent.
1) Authentication - this is handled centrally, and you will not be able to log in if you can't contact an authentication server. However, our authentication servers are redundant, and in three different buildings and on three different networks on campus, so this is mitigated greatly. Your primary problem will be with local network outages. Once you're logged in, however, this isn't an issue.
2) Filesystems - /usr/local, /local and our globally installed software packages live in AFS (again, three redundant copies spread around campus). If your machine loses its local network, you'll lose access to these. If your home directory and mail are on a network filesystem, you'll lose these too. In general these problems occur very infrequently, and we haven't had any of our other departments express problems with this scenario.
Q: 100 MB per home directory does not seem to be enough space. What about using disk local to each desktop?
A: It is true that we discourage the storing of user data on individual workstations. In our experience, the hassle of trying to maintain this data across upgrades, keep the data backed up, and insure its availability (machine owners turn off their machine, etc.) is not worth the few dollars you might save by using the leftover disk space. Remember that a Glue home directory follows you no matter where you go, so if a machine owner has turned off their machine for the weekend, if they try to log in to glue.umd.edu remotely, they won't have a home directory.
We recommend instead having a departmental fileserver (AFS or NFS, your choice), where you place your home and mail directories. This storage should be on a RAID filesystem and can be backed up easily, either using your own backup hardware or by OIT via contract.
OIT does provide 100MB of disk space on our servers for your home and mail, your users may continue to use this, or you can choose to use your own servers.
However, if you still want home/mail space on each individual machine, this can be done relatively easily. We do have a few users that still do this.
[DCR's comment: it's possible that we are somewhat unique in desiring large amounts of local storage for processing telescope or simulation data -- we could always continue to run our own backups of these local disks, or require users to back up their own local data. BTW, you can buy a 1 TB firewire external disk from LaCie for ~$350!! Wow...]
Free space on random disks cannot be used by AFS. That is true. As I mentioned above, we recommend not using the leftover disk in a workstation for critical data. Disk is cheap. Our methodology is that the internal disk should be able to be thrown away and replaced and you have a functioning machine back in around 15-20 minutes. Workstations should not have to be backed up. If you feel like you're wasting the extra disk, mount it as /tmp, or other scratch space. Or, if you really want user data there, let the user have a central home directory, and give them a link to their local space - then they can have the best of both worlds, a roaming home directory, and lots of extra local space.
In addition to our other comments about running your own AFS server on our Wiki, we've recently made some changes to that policy. If you want your own AFS server, for security reasons, we will have to manage that server for you- it will be placed in our machine room, and you will NOT have login access on it. You WILL however be able to control quotas, and decide who gets placed on various disk partitions, etc. If you want to go this route we should have a more thorough discussion on this.
Q: Can we keep the astro domain?
A: Yes, you may. You may also choose to have us host the astro domain on our redundant nameservers, and you'll still maintain full control over your zone files.
Q: We're currently running Mandrake on our Linux boxes. If these were to be Glued, what would happen to the existing user partitions?
A: You'd need to make sure your existing data and any software that is not supplied by Glue (e.g., home-grown applications) are mounted on non-system partitions (e.g., /export/software). Data on /, /usr /usr/local, etc. will be overwritten.
Q: What version of RedHat Linux (Enterprise) are you currently supporting?
A: Both 3 and 4. The file /etc/redhat-release (even on mandrake) will tell you which version you have.
Q: Do you support OpenSolaris on Dell machines (i.e., not just Suns)?
A: We don't support OpenSolaris at all, and we don't currently support Solaris x86. Nearly all x86-based interest in our world is centered around Linux.
Q: How much heterogeneity in machine architecture (particularly devices) can Glue administrators tolerate? Do your staff hunt down and install the necessary device drivers, or is that the task of our local manager(s)?
A: If Red Hat doesn't support the device, we typically don't either.
Q: Would you be able to Glue dual-boot (RHL/Windows) machines?
We don't currently support this model because Glue machines get their updates at night, and if the machines routinely run Windows (or are turned off), they may quickly fall behind. We've talked about moving to an update-on-boot model, which would take some time to develop, but would significantly reduce if not eliminate the problems associated with machines missing their patch schedule. If we were to make this change, would you be interested? If so, and other Glue lab managers are as well, I might be able to get clearance to devote some resources to the project.
Q: Earlier you said that dual Windows/Linux machines aren't supported because availability under Linux for nightly rsyncing is not guaranteed. But suppose a user accesses the Windows side of their personal workstation only occasionally, and generally always returns to Linux? A number of us follow this model (myself included, though I confess I've not used the Windows side in over a year, since my Apple PowerBook running OS X provides all the functionality of Windows that I need). Have there been many requests for this from other users, or is this fairly unique to our dept? I could see a few users wanting to adopt this model (though probably not myself anymore).
A: We actually have a mechanism that has been little-used up to this point that will pull over updates when the machine is booted - it was originally designed for exactly this scenario, but since the demand has been next-to-nonexistent, we've never done any thorough usability testing of it. If this is something that you find that you need, we can certainly experiment with it.
Q: Do you provide 64bit support to the machines that can use it?
A: [DCR's interpretation: the impression I got is that 64 bit is the default (for example, they don't have the 32-bit version of the Intel compiler).]
Q: Would anyone with a Glue account be able to log into our machines, or would access be restricted to just astro personnel?
A: We (or you) can restrict access as you deem appropriate. The only non-Astro users with access would be OIT's Glue sysadmins.
Q: In a similar vein, is it easy to get guest Glue accounts for visitors to our department (who are not members of the university)?
A: Guest access to all OIT-provided services (including Glue) hinges on getting an affiliate account. Here's a link with additional info: [ http://www.oit.umd.edu/units/dataadmin/Affiliates/ ].
[DCR's NOTE: very short-term visitors are usually provided umd.edu wireless access by the department.]
Q: Is it possible to grant a user su access *only* to their own machine? (rather than, say, to all astro machines, which is a privacy risk).
A: Yup! You can grant access for your whole department, an arbitrary cluster of machines, or an individual machine.
Q: We currently have our own mail, web, and central astronomy software servers, with built-in redundancy in case a server fails. We also have our own backup system. Would Glue staff coordinate with our local manager(s) to seamlessly integrate the Glued machines into this structure, or would radical changes be needed?
A: The short answer is it depends. We will be available for help and advice during the initial migration, but it will be on the local manager to maintain these services over time.
You may also want to take a hard look at which services you want/need to run yourself and which you might want to turn over to central IT. As I'm sure you know, OIT provides all three of these services (email, webhosting, and backups).
OIT recommends that departments running their own mail server(s) look at the possibility of migrating to the University's Mirapoint system. I'm pretty sure this allows you to keep your department's name (astro.umd.edu), and, unless you have a particularly generous quota policy, the conversion process can be pretty straightforward. Let me know if you'd like more information, and I'll put you in touch with Mirapoint folks.
Ditto with our webhosting group. I don't know if you're dependent on any special-purpose applications, but it might be worth talking to our webhosting folks to see if migrating into their environment seems like a good fit. For more info, see [ http://www.webhosting.umd.edu/ ].
Nothing precludes you from continuing to do your own backups, but if your tape infrastructure is getting on in years and/or is time-consuming to manage, you may want to take a look at OIT's offering. See [ http://www.backups.umd.edu/ ] for additional info.
If you're able to transition any of these services to OIT, you'll not only likely save time and money, you'll also be able to redeploy IT personnel toward department-specific needs.
Q: Is it possible to have machine-specific customizations made by the lab manager that aren't overwritten by the rsync each night? For example, exporting local disks via NFS?
A: Yup, you can make any of your machines into an NFS server. Other local customizations are also possible, you can either check our docs, put in a request, or experiment and see what gets overwritten the next night :) And in many cases while directly editing a particular file on the machine may get overwritten, we probably have an alternate mechanism that gives you the same functionality.
Q: Our major astronomy application, IRAF, consists of iraf itself, plus numerous other smaller executables. If these were on AFS at OIT, would they obey the same caching rules as, say, the firefox application we tried? I.e. first access slowish (as it's transfered for the first time from OIT), subsequent access very fast (since the executable is on the local disk, if I understand correctly). How often is the cache refreshed? Only when the OIT version changes (or the local machine reboots), or more frequently?
A: Yes, everything in AFS observes this behavior. The cache is only flushed when it's determined that the version of the file on the server is newer than what's in cache.
Q: We currently run condor on our dept Linux boxes. This requires local su access (which we would have) and running the condor daemons. Disk sharing is handy in this mode too (or at least having one big central disk visible to all the nodes). Do you envision any problems with us implementing this?
A: We have condor installed in the Glue image; you can use our installation by setting a few config variables, or you can roll your own. (We can function as a collector or as a compute node.) Disk sharing via NFS as described above is no problem, if there's something else you have in mind, please elaborate.
Q: Several previous and current grad students have put a lot of time and effort into setting up a condor system from our astronomy desktops. Can you envision a way of preserving/adapting this system so that it's still available? The biggest issue I see is disk cross-mounting: data access is much easier when disks are cross-mounted across nodes (although I believe condor provides the necessary functionality to get around this is cross-mounting is not available). And of course it will be necessary to designate a master node.
A: As part of our install, we include the Condor client and server binaries and configurations. It's easy to make any or all of your machines clients, and to designate/create a new collector. It doesn't appear that we have any documentation on our wiki for this, I'll see if I can get some up there. Regarding cross mounting---you can do this if desired.
Q: Is sshfilt available?
We currently monitor access logs to all of our Glue hosts- if we see repeated failures in the logs, ssh access to all Glue hosts is automatically blocked. This feature will apply to astro hosts as well.
Q: How do we install local software?
See [ https://www.glue.umd.edu/wiki/bin/view/Glue/InstallingSW ] (I just created this document last week). [DCR note: you'll need a UMd directory account to view this page -- all teaching faculty have this.]
You can install your local software wherever you like, as long as it doesn't conflict with our installation. (/export/software is a best practice, but not a requirement).
Q: Is CVS available?
You can run your own CVS server if you like. I don't believe we have any currently running in Glue, but there's nothing stopping you from setting up your own. We do provide the CVS utilities in /usr/local/cvs.
Q: We use postfix for e-mail. What system does Glue use?
Sendmail is what we currently support. However, OIT is strongly recommending people to migrate to the Mirapoint service [DCR: this is UMD web mail], which provides strong spam controls, greylisting, etc. This would completely eliminate the need for your department to manage email. We also provide list management services which can handle your mailing lists.
If, however, you want to keep your email on a local server, we'll need to have a more detailed discussion into how mail is integrated into the Glue environment---something a bit out of scope for this reply.
- Sendmail can block access to various lists so that email can only come from on-campus senders.
- Glue provides spamassassin and anti-virus support. We do not currently provide greylisting, though we could potentially consider this.
- We have a huge access list that blocks all sorts of known bad guys.
- postfix vs. sendmail security - that's a religious war, and opinions vary based on who you talk to.
It may be possible that if you really want to run postfix that you can set up your own installation of it on your mail server and still have things function the way a Glue user would expect - this will require a more detailed investigation if you really want to go this way.
Q: Is procmail available?
Yes, you can run procmail and vacation. See: [ http://www.helpdesk.umd.edu/topics/email/utilities/procmail/1182/ ] and [ http://www.helpdesk.umd.edu/topics/email/utilities/vacation/3951/ ].
Obvious and not so obvious User changes
- your new home directory will be /homes/$USER, and is limited to 100MB and lives on backed up computers in OIT. Note that your old $USER and new GLUE $USER name don't need to be the same, but you will not be able to share data between the two. They are completely different users on a completely different computer set. This means all data disks will need to have their user permissions changed.
- your web space is not in ~/public_html but ~/../pub. The URL would be http://www.glue.umd.edu/~teuben .
- from your $HOME directory, there are 4 directories above: home, backup, mail and pub.
- ssh using kerberos: authentication is done via kerberos. This means no need to pass around your id_rsa.pub or id_dsa.pub from your ~/.ssh directory. Simply type kinit to set your LDAP password, and its good for 24 hours of authenticated ssh. Your client machine must have been setup for this. See the link.
- your mail is kept on glue.
Some system issues to be resolved
- there is no root access, each user that was given root permission can run "su"
- there is no rsync server running, which combined with the previous item, make our current RAID backup system a bit tedious to re-implement.
- where can things like /astromake go.
- cross-mounting all astro disks (the /n map, as well as the /backup map)
- mysql (for mediawiki and other things?) Note there are some version dependancies php/mysql/mediawiki
- convert astro $USER to glue $USER permission. Kevin has a perl script.
- users who want to keep a special local home directory on a given machine?
Things we can keep
- mail server -- But see below.
- web server (but you cannot use $HOME/public_html, on the webserver we'll need some $USER space)
System things to remember
- partition tables on the boot disk need to be edited, but any data disks on that boot disk will need to be preserved in whatever partition type (physical vs. logical) they were initially. GLUE will use hda1/sda1 to populate with all logical partitions. The 3 remaining physical partitions are free to be used.
Things we lose
- One feature installed recently is a program that inserts a firewall rule to block access to IP addresses that are trying to break into the system with ssh. Many hacked computers worldwide are running scripts that try to do this. A certain number are likely to eventually succeed. This script has reduced that problem (run "showblocked") to show the number of locations blocked within the last half a day. We would lose this with glue.
- sendmail is the only mail transfer program that is available with glue. We give up a *lot* by dropping the present postfix based mail system and going back to sendmail:
o Mail to certain user names (usually mailing lists and exploders but this can also be sensitive system names) can presently be confined to on-campus senders. There is no way to do this in sendmail. o There is no greylisting available in glue. o unknown users or aliases are not even allowed into the department mail network. We would lose that ability, increasing the amount of spam the department must handle.
postfix could be installed in our area. However many programs call /usr/sbin/sendmail to send mail. Postfix replaces this with its own binary. On glue this is impossible.