Topics covered include:
[Top of Page]
JB: Welcome to the CREN TechTalk series for spring of the new millennium, and to this session on "Building Directories -- The Fundamentals." You are here because it's time to discuss the core technologies for your future campus.
This is Judith Boettcher, your CREN host for today, and let me welcome technology anchor for TechTalk, Howard Strauss of Princeton. Howard is a well-know, very energetic and insightful Web information technology expert.
HS: Thank you, Judith. Your intro gets better every time! We're going to keep doing this.
I'm Howard Strauss, the technology anchor for the TechTalk series of CREN Webcasts. As technology anchor, I'll engage our guest experts in a lively technical dialogue that will answer the questions you'd like answered, and ask those very important follow-up questions. You can ask our guest experts, Ken Klingenstein and Keith Hazelton, your own questions by sending e-mail to firstname.lastname@example.org any time during this Webcast. If we don't get to your questions during the Webcast, we'll provide answers in the Webcast archive.
You may have noticed that as personal computers became ubiquitous, they stopped being personal. While they can still be used offline to turn out a document, spreadsheet or PowerPoint presentation, their real power today is an ability to be connected to a variety of networks, including, of course, the Internet. In fact, when we think about the most important PC applications, using e-mail and the Web are far more critical than anything we can do with a stand-alone, unconnected PC.
Using any network application requires that the network knows something about who we are. Admiral James Stockdale (I'm sure you all remember Ross Perot's 1992 running mate) asked, "Who am I?" to an electorate that had never heard of him before, and then he tried to explain just who he was. A network needs to ask us who we are, usually by requesting an ID and password.
But how does it know who we really are? Shakespeare asked, "What's in a name?" and decided, "That which we call a rose, by any other name would smell as sweet." But Shakespeare wasn't writing about networks! That which we'll call "Pogo Possum," for example, by any other name would smell very suspicious on a network. A network has to be very sure of who we are. Sticks and snails and puppy dogs' tails, we are told, define what little boys are. But our network identity is much more complex. It consists of much more than just our name, and even more than our name, ID and password.
All of us have addresses, phone numbers, departments, job functions and so forth, and all of these are important parts of our identity. Just what does a network need to know about us? As we'll see today, a network needs to know quite a bit about us, and it needs to store it in a safe place that can be accessed quickly by all the applications that need to know about us.
One such place is called a directory, and the information in a directory is accessed using a protocol such as LDAP, the Lightweight Directory Access Protocol. The directory or directories let every application know what they need to know about us, and so they can be used for authentication (which is determining who we are), and sometimes for authorization (which is determining what we can access).
While there are many complex issues involved in using directories effectively, much of what we need to know focuses on names and establishing the identity of people, systems and applications and the other entities that are stored in directories. It has been said that sticks and stones can break your bones, but names will never hurt you -- but names can hurt you and your systems quite a bit, if you don't understand them when you build your directory infrastructure. In just a few minutes, Ken and Keith will explain to us how to manage directories and tame those names on today's edition of TechTalk.
JB: Well, thank you, Howard, very much. And I've been really looking forward to today's session on directories to try and get at the basics, as we say, the fundamentals of directories. As we go around, people are still very confused with just what directories are, what they can do, and just how they fit into the whole picture. So I am hoping that we'll get some of the basics of that out today.
So let me introduce our guest experts today, Ken Klingenstein and Keith Hazelton. Ken is well-know here at TechTalks for some of his previous sessions on middleware and is also the Director of Information Technology Services at the University of Colorado at Boulder. He is also a member of the CREN Board of Trustees and the man behind the middleware effort for Internet2 and for all of us.
Keith Hazelton hails from the University of Wisconsin, Madison, where he's the Information Technology Architect. He also has roots, I understand, in Minnesota, which is good, and also was at Princeton University in the 1980s.
More about Ken and Keith is available at the event page and I must say that the other part about Keith that is very important is that he's the Coordinator for the EduPerson working group. And we'll hear more about what EduPerson and that working group is doing during the session today.
Welcome, Ken and Keith! So glad that you can be here.
KK: Glad to be here, Judith -- though "here" for all of us is different physical locations, I gather.
JB: I think we're across every time zone this time!
KH: That's great. Well, it's a nice time here to be with you, Judith and Howard and Ken, again. Looking forward to this!
JB: Okay, great!
HS: Why don't we start by just sort of figuring out what the landscape is around here and talking about what services a general purpose, enterprise-wide directory would provide?
KH: Okay, I'll launch in first here and expect Ken to do the same soon.
I think the easiest entry point into this from the fundamentals perspective is to say that directories are infrastructure services, ideally, that give applications and people the information they need. We'll be probably talking mostly about information about people today, but let's keep in the back of our minds, at least, that directories -- especially directories of the not-too-distant future -- will have information about everything from routers to services to Web servers as well as information about people. But I think, just to keep ourselves sane and keep the conversation 45 minutes, let's focus on that. So in general, directories either give you information that you need or point you to the information and help you get to it, about people.
And the "you" there can be people looking up information in the White Pages or applications that need to authenticate somebody. But when you dig down a little deeper, there's what I would consider kind of the primary responsibility, Job One for an enterprise wide directory -- the general purpose directory -- and that to me is identity reconciliation.
And let me take, if I can, a minute or two just to flesh that out. What we'd really like, I'm claiming, is one place, one logical place we could go to ask about all people or any person of interest within our university community. And we'd like that place that we go to to be able to answer questions about any of the roles that a given person might bear at that institution, or roles they might have with that as, say, a vendor or a contractor or a federal researcher visiting.
In addition to that kind of information, we'd like that one logical place to know about the identities of that person and the identifiers by which other information systems around the campus know them, so that, for example, if I wanted to get something that's buried in the depths of the Human Resources files, I could go to the directory and at the very least, if not get the answer directly, get how that Human Resources system knows the person that I'm looking for information.
HS: But before you go on, Keith, you seem to have said there should be one place for this. Does that mean there's one directory on a campus or on a bunch of campuses?
KH: One logical place, meaning from the point of view of the application or the person looking for information. They're going to one address or one service that they can get the information from. But no, the information in the back there might very well be federated across several directories and connected up invisibly to the person at the other end. But you don't want somebody or an application to have to know the URL's of 12 different directories to pull together all the pieces they need. So to be a really useful general service, it has to be one logical, unified view of the people of interest.
HS: Okay, and another thing you said. You talked about vendors and contractors and folks like that being in the directory. Those folks aren't really part of your organization. How do they appear in the directory?
KH: Well, you know, it's very possible that we'd want to -- and this isn't something we're doing on Day One, let's be clear about that -- but it sure would be nice to be able to accept digitally signed bids and/or have secure e-mail traffic, in the case of some federal researcher that's visiting and communicating with some project team. So in fact, those are people of interest to somebody on campus, so at least one person on campus cares about the identity and the contact information and authentication abilities for the contractor or the visitor. So it's legitimate game for the directory in our sense of it -- that is, this one logical place to look.
So again, it's easier to use the examples of faculty, staff and students, but I think, you know, it's useful to stretch our thinking about it just a little bit, to point out how generally useful it would be if we had all the people we cared about in one place.
JB: Keith, I know that this might be kind of a basic question, but before we go much further, I think there's a lot of folks out there right now that have "directories" about people on their campus in place already, and they call them by various names, i.e., directories and White Pages, etc. But now, the directories that we're talking about here and that you're talking about differ from those White Pages in a number of specific criteria, right?
KH: Yeah, and the big one is that, I think I have yet to hear of any institution of higher ed that has that one logical place. Those of us that have been building directories are getting closer -- directories in our sense -- so the second simple truth here, beyond the fact that we would like that, is that we don't have it yet, unless we've built one of these general purpose directories.
And the reason we don't have it, for many of the existing -- whether they're called White Pages or User Databases or Access Password Files and things like that -- is that they aren't integrated. We don't know from one to the next the identity of Susie Q is or is not the same as Susan Quinlan, one being a student and the other being an employee.
So the critical function of the directory that Ken and I are talking about -- and a lot of other folks in higher ed and, for that matter, in the rest of corporate America -- is a directory that does something some of us call identity reconciliation. It's pulling all these things together and knowing that Susie Q is or isn't Susan Quinlan, and knowing that in the student system, she's known by an ID, PeopleSoft-Student/Administration-ID-such-and-such, and in the HR she's know as Oracle-Financials-person-such-and-such. And so you can map across those systems of interest within the directory.
Now, the other term that this person registry that I refer to it as goes by is the metadirectory. A lot of vendors are going to want to sell you metadirectory functionality, whether it's in the box when you buy a directory server or not. But Stanford University and Bob Morgan in particular are responsible for this very useful notion of a person registry performing that identity reconciliation function and so we again try, at least in our conversations, to use that term. Metadirectory now, since the vendors have gotten hold of it, means as many things as there are vendors and more.
JB: Right. I think certainly that a term like metadirectory would clarify the function and purpose, Keith, so I think that's great.
KK: Let me dive in for a second. I'm three questions behind in a fast-moving crowd here, so back to Howard's first point about the uses of enterprise-wide directories. Back when we only had one application -- in 1904 -- it was okay to keep all relevant information in that one application. But as we get to literally scores and hundreds of applications on our campus, we don't want to re-enter our e-mail address and our preferences and our post office addresses and all kinds of other information time and time again into these applications and then worry about whether or not the information per application is fresh.
So we need to move a lot of this common information from the silos of applications into a general purpose broad infrastructure, and that's part of what directories do. And with that definition, then the scope increases beyond some of the basic information that you might find in the White Pages that's an electronic phone book.
Other things that you might find in directories is group information, something you would never find in a phone book. Also, printers, servers, all of these other entities in our computational world can have entries within the directory and carry their own preferences around, so those are some of the reasons why we need to do that.
Now, Howard's comment about the stand-alone machine raises this issue of mobility and the fact that a user -- in particular in higher education, a student -- may wind up working from 10 or 15 machines in the course of a week as they roam around campus. It's important for all that personal information and preferences to be available to them wherever they compute, and so we have to move it from the silo of your desktop into a central piece of infrastructure as well.
And the last comment, I think, will come up to speed with where everybody else was. Keith has led us into a very important discussion point, which is identifiers and reconciling identifiers. Identifiers used to be easy, again, in 1904 when we had one application. We had one identifier, typically a social security number. Today, if you do a count on most campuses, you find eight to ten identifiers out there that are at least enterprise-wide, and then numerous additional ones that might be assigned by departments. So the issue of, "I've got one identifier for somebody but it's not the right identifier, help me find some more information about this person" is very important.
And then to lead into the future, there is certainly a lot of buzz these days about PKI and X509 certificates -- very important infrastructures coming down the road.
HS: You can't say those, Ken, without saying what they are. You can't say PKI or --
KK: Thank you, Howard! Public Key Infrastructure and the certificates that are going to be used to manipulate this. This is the security infrastructure that we anticipate becoming a global fabric, much as the way IP connectivity became a global fabric for hosts. PKI may become a global fabric for how people connect to networks. In any case, inside a certificate, there is a field called SUBJECT. Whose certificate is that? We want to supply an identifier to go along with that subject, but which identifier we supply may depend upon the use of the certificate. Again, we need the capabilities to correlate among identifiers.
HS: Okay, before we talk more about identifiers, one of the uses of a directory is to do, as Keith pointed out, authentication. But another use that people talk about is authorization. Could you talk about the difference between those two, and are they both services that are done by directories?
KK: They can. Both of those functions can be done by directories or could be done by other mechanisms as well. Authentication establishes who you are. Authorization establishes, once we know who you are, what you're allowed to do. And those are related but distinct operations and, after you authenticate, we then look at the set of capabilities that you're allowed to do electronically.
HS: And would you put that information in a directory?
KK: You could!
HS: But would you?
KH: I think on that point --
HS: I guess we're getting to the point of how does one decide what should be in a directory.
KK: This is why people like Keith get great names like "architects"! Because an architect decides -- you know, architects figure out which tools and which styles to use for which kinds of construction. So it is, in fact, the case that we have lots of different options for building these middleware structures, and architects are the ones who decide which tools for which purposes.
KH: Ken, I have to jump in here. I know there are some folks, at least, from the University of Wisconsin tuned in here and if they were to leave this conversation believing that I decided, I wouldn't want to take much of a bet on my chances of leaving the facilities alive! So let me say that architects recommend.
And I will use a quote from a buddy -- a former buddy of ours. I don't know, since he's gone to the corporate side -- but Larry Gauthier used to be at the University of Michigan back when they were busy inventing LDAP, Lightweight Directory Access Protocol. He now works as an analyst for the Burton Group, but he has a quote, a sound bite where he talks about directory and then he says, "Security is directory's evil twin." So I like to think of that being true because directories -- you know, you ask it information, it gives it to you. Security is the bad cop that says, "No, you can't do that!" and stuff.
But there's another sense to it, which is I think we ought to -- my recommendation as architect around here is to think of security as another and distinct infrastructure level service. We ought to have a generic security, meaning in my terms here, access control thing where we authenticate people with a generic service. So any app that needed to do that would call on the generic infrastructure service to do it. And then in a separate step, the application will have to make an authorization decision, whether they want to let this authenticated entity actually do the thing they've asked to do -- either see a library resource or access a facility, get into the building, whatever.
And so I would say that, in that regard, the way it works for me to think about it is the directory is the place that has the information that the security service needs to do its work. The authentication process may be done by an authentication service -- a general purpose authentication service -- but it's going to probably look up the public key, if it's using Public Key Infrastructure process in the directory. And for authorization, this is a really interesting one because we like to make the decision about whether somebody can use this service or look at this resource on the service that's actually being requested.
In other words, I don't want somebody else to decide for me whether or not they can read my secret document. I want to make that decision. But the information that I use to make that call, whether or not authenticated person X can read it, is going to -- again, I think if we do directories right, the directories will play a role in carrying that information and making it available. So if I want to share this document with first-year medical students, then when somebody authenticates, I'm going to ask the directory, "Okay, so-and-so authenticated. Is she" -- well, I won't know if it's he or she.
HS: Maybe you will. Maybe that'll be in the directory.
KH: I doubt it! Just from registrars and their concerns about faux pas, this is what's called "sensitive data." It may be in there with appropriate access controls, if you have a reason to know. Then you can just get it, yes.
But at any rate, I would, in terms of letting that person see it, I'd go ask the directory, "Are they a first-year medical student?" And it may say, "I don't know, but the medical school records are over here and here's the identifier. If they'll let you in, you can go ask them."
HS: I'm just going to ask my question again, Keith, because I think it's a common thing. I talk to people who want to put everything under the sun in directories. Are there any guidelines or suggestions or anything that says, "Yep, you ought to put this in. You shouldn't put that in." How do you decide? I mean, why not put everything in the directory? There's obviously things you shouldn't put there.
KH: Boy! There's a philosophical debate about that, and I would say that the instincts of the people with the data, who created it and maintain it -- that is, what we call "data custodians" around here -- their instincts, from 1904 and on are to protect the data and let only people that they've concluded have a right to see it, see it.
So the notion of throwing it over the wall to a directory that -- omigod! -- is on the Internet! Excuse me! That takes a little careful, quiet talking to get that idea across.
HS: Is that because the information in the directory is not very secure?
KH: It can be very secure, but if you tell people, "We're going to put it in an LDAP directory, and oh by the way, you just point LDAP://". There's a lot of talking to do to convince them that the protections are (which, I think is true) better than we've got on most of the stuff that's in our system now. But it's not a given when you start those conversations with the source system people.
And then I think, I will say this is one of the perennial debates, and we just started this discussion in e-mail today with some of the work we're doing with Ken and the Internet2 middleware folks on directories. There's a thick vs. thin kind of approach to directories. "Thin" meaning there's very little in there except this identity reconciliation stuff and the identifiers. And "thick" is, yeah, the Prego model, as some people call it -- you know, "It's in there!" You know, you want to know whether they're a first-year medical student? Yeah, it's in there. And that debate, I think, will be carried on every campus and corporation that does a directory.
I think over time, like most people, it'll start thin and get thicker in the middle over time as more applications say, "Gee, wouldn't it be great if we could put it in the directory!" For example, from a source system point of view, if people are going to be hitting my operational Oracle database 50 times a minute to ask the same question over and over again, I'd rather have that in the directory with appropriate access controls and keep my operational system that's trying to do the transactions for getting people paid away from all that.
And so you can spell out some principles that if it's much more read than written and it's of interest to a broader list of people -- people outside your own unit or club -- then the directory's not a bad place to put it, once you're convinced that the access controls are pretty good. So yeah, I think more will end up being there over time. Not everything, because some of that stuff changes so fast that you want to go to the root source and check rather than replicate it into a directory.
KK: Keith makes a couple of important points in that. One is that performance issues determine whether something is in a directory, in a database, etc. Directories are highly optimized for reads over writes, so if you have a variable that may get rewritten frequently, it's not clear you want to put it in a directory. On the other hand, if you're just doing reads, directories will outperform most databases significantly. So that's a factor.
The point that Keith made about security is also important -- that a lot of institutional information is secure because it's either firewalled off the Internet directly or it's on the Internet but it's buried in some obscure system. And one thing we do with directories is, by putting the information in a central visible location, it becomes much more of a potential target. That said, a directory can be protected with very sophisticated controls.
Last comment, Keith, did you say that most people start thin and get thicker in the middle as they go on?
KH: Well, some of us more than others!
KK: I like that. I just wanted to note it.
HS: That's a different TechTalk.
JB: Well, nature follows nature, right?
Do we want to get back to talk about identifiers, and if we're going to be doing identity reconciliation in the directories, just how do I know who these people are or how does the network know who I am? I think the --
HS: I guess the question is, what's a good, unique, permanent identifier?
KK: Well, again, you're going to need more than one, and it's interesting that --
HS: There again, maybe you could give us a list of them, Ken.
KK: Right, and it's interesting that you said permanent, because some will, in fact, be permanent and some may be reassigned. And that permanence has two dimensions: one is an identifier once assigned to a person, never assigned to another person. And then the flip side of that is once a person receives a particular kind of identifier, would they ever be reassigned a different value for that identifier?
Clearly for social security numbers, they're both permanent in reassignment and permanent in that you don't get another one. They're also nothing that we can use viably anymore.
Identifiers that you might find on campuses include e-mail addresses, a UUID (as we tend to call it) -- some kind of electronic identifier. Your login account identifier, your net ID if you have one. You may have a LAN ID.
One of the things you have in higher education when we start to talk about academic freedom and library accesses is that you have the privilege of accessing library materials without revealing your identity. So you may need something that is an opaque identifier to present to database owners. Opaque in that the database provider will not be able to know exactly who it is who's looking at this material, but will know that you are authorized institutionally to do that.
HS: That'll be just protected because of access controls on the directory? Because, I mean, they can just look you up on the directory.
KK: Right, right. You certainly will put those kinds of access controls in place for the directory, so that's just some of the identifiers that are out there.
And what we need to understand -- what a campus needs to understand -- is which ones are permanent, which ones are reassigned, which identifiers beget other identifiers. Today, if you have an account on a machine, typically that allows you to get to library databases because they're accessed by IP address. But there's a lot of people who have accounts on university computers who are not directly associated with the university -- friends and small orphans, whoever may be out there can get an account on some machines. And what we're trying to do here is begin to unbundle these sets of permissions and say, "Yes, you have an account, but that doesn't authorize you to get to the library or that doesn't authorize you to have an e-mail address."
HS: We have a couple questions -- actually, we have enough questions to take us through the end of midnight here from Rich Larson at the University of Wisconsin in Madison. But let's take a couple of them because they have to do with UUIDs. And his first question is pretty basic. He says, "What is this UUID thing?" Could you say a few more things about UUIDs?
KH: Sure. UUID is right out of the book, meaning the book on distributed computing environments. That's the standard of an earlier age, if I can put it that way -- with apologies to a number of noble institutions that think it's still -- I mean, it is still great. It's great if you've got it. But at any rate, there's one part of that that is --
HS: Tell us what the acronym stands for, just so that people can --
KH: Universal Unique ID. And yeah, we've stolen a couple things from DCE. (We, the larger IT community.) One of them is the security approach, and Kerberos from MIT is in there now in the DCE stuff. But this UUID is something that directory types have picked up on as a wonderful tool for creating another permanent -- in terms of reassignment and in terms of once you get it, you don't get to change it -- identifier. So it's very handy to have something like that.
I mean, just think how handy it was to have SSNs when we did and could use them. The directory needs something like that. I mean, in the simplest sense, you could think of it as if it were a database table, it would be the primary key for that person's role in the directory of all people. So it's just a tag for the person.
Now, UUID in its definition, it's defined by an algorithm that's spelled out in documents, and I think, Howard, you had a URL --
HS: Yeah, I'm sure that's posted on the Website.
KH: That defines that in terms of bit-by-bit.
HS: Right. But we should point out how big it is. It's big!
KH: Yeah, when I say bit-by-bit, it's 128 bits altogether. There's a stringified representation of it that's 36 characters long and it looks like a big hex number with some hyphens thrown in.
But the point of it is that, when you generate these things according to the algorithm, if we all agree with the algorithm and generate them, we'll never generate the same one because the bottom piece of it is the six hexadecimal pairs off of your network interface card.
And the manufacturers got together and agreed on that. It's kind of a serial number for the network interface card. So that's stuck on the bottom of the 128 bits and then they put a big time stamp with the date and the year, all the way down to hundredths of nanoseconds in time, so even if you generate a bunch of them real fast, they'll all still be unique. And if you're generating them faster than 100 nanoseconds, it'll just count out by one for each one.
So you can't make it break algorithmically. And the beauty of it is, it's big and ugly, so only applications and databases are going to want to know it, remember it. People will not want to look at it or type it. No application developer in their right mind will ask anybody to ever use it, in words of typing it in or remembering it. And that's all goodness. It's back there in the back room, like the primary key on a database, but it does this very important function of giving the IT systems a unique, unchanging, permanent identifier for a person.
JB: Are you using that at the University of Wisconsin right now, Keith?
KH: Again, I am an architect. I am recommending its use. I'm hoping that it will be adopted. I'm fairly sure it will be adopted here, mainly because a number of us in higher ed are getting together and saying, "You know, it'd be great if we'd all do this." And I suspect it will get some momentum that way. My general rule is if three institutions do it, it's a trend. And I think, Ken, are we up to three yet?
KK: I think so, Keith.
HS: But when you're using this UUID, are you using this as the key in your directory? Is that how you use it, or was it just another identifier?
KH: Well, in our particular case, it will be "just another identifier." The key is an Oracle key for the tables, since we're doing our registry stuff in Oracle here. So no, it's not the key.
But it is an identifier and it has what Ken has pointed out are really critical attributes: it's permanent, in terms of it never gets reassigned, and it's permanent in that even if you don't like yours, we're probably not going to facilitate -- we're not going allow you to change it, even if you don't like it.
HS: But I could have lots of UUIDs, right? Because I mean, if I go to one computer and I get one and I go to another and I get one --
KH: No, no, you don't get them -- no. And these are the misunderstood points.
You don't get one every time you go to a computer. You get one once when you enter the person registry. That is, if we heard about you first from the HR folks -- you got hired -- when the HR systems told us, told the registry about you, Howard Strauss, and we didn't already know you from some other source, at that point in time, a UUID would be generated for you on the machine that holds the registry, probably. And stuck in that row, that's Howard Strauss's row from now and forever -- and even if you changed both your first and last names, your UUID will still be there and the system will essentially remember your identity. And so it's generated under the control of the registry.
HS: But if I went to another university -- which, of course, I never would do --
HS: But if I did, then I would get another UUID.
KH: Then you'd get another one, and it's --
HS: And they couldn't really tie me together because they'd really be different.
KH: Well, and then you're back to if those two universities wanted to know whether Howie Strauss was the same as Howard Strauss at Princeton. They'd have to do it the old-fashioned way, which is what else do we know about him and is it good enough for us to decide they're the same? We don't want to merge you with Howie Strauss in Akron, Ohio, if it's not really you.
HS: No, that's definitely not me. I know him!
KH: Yeah, okay! But there's no automated solution and no, that's not a problem, that you would have a different UUID. That's the UUID that you'd be known by in that registry, in Akron, Ohio. And you'd have a different one. And we don't need to share that across systems unless we wanted to.
Now, one place where it does make some sense is in, say, a state university system like Wisconsin's, where we have 26 different institutions, some two-year and a lot of four-year and a couple doctoral institutions. Each of the campuses could, in fact, set up their own registry. I don't think that's the way it'll play out, but let's say it did, and each could have their own UUID generator and create a campus level registry and directory. And if we wanted to create a single, unified but probably subsetted view of all the students, faculty, staff and others, UW system wide, we could in fact promote all those UUIDs up to the central directory and we wouldn't have to worry for a minute that any of them would clash.
JB: And that would all be accomplished by the registry, then?
HS: When you say "registry," you mean metadirectory? They're really the same word.
KH: Metadirectory is a vendor's term, but we're using registry because metadirectory --
HS: I didn't know which way we were going with this.
KH: Yeah, metadirectory is a rubber bag concept that, as I say, the vendors define it to suit their feature set.
But registry means that function, the identity reconciliation function, and you wouldn't have two reconciled UUIDs across the system because the ones from UW River Falls will have the address of the network interface card on the machine there that's running the registry and so it'll be different, even if they generated UUIDs at exactly the same instant that we did here. So it's a kind of free "uniquifier" in that sense.
And so yes, in a state system, it does make sense that we would bring them together. Now, if Howie Strauss at UW River Falls was really Howard Strauss -- and I know you would never change universities, Howard.
HS: I never would.
KH: Yeah. But --
HS: I actually know of somebody who did.
KH: Howie Strauss, Jr., say, came to UW River Falls, but he was also taking courses in Madison. There, we would kind of hope that when you registered at the second institution, it would go percolating up the registry and we'd notice that, hmm, you know, we think this is the same guy. We might provide an interface where it pops up a little window the next time you log in that says, "You know, we have reason to think that you may be the Howard Strauss that still owes $500 in library fines in Madison. If this is true, pick YES."
But you see what I mean -- that the registry function could, in fact, keep you from getting a second UUID within some unified directory space. So it gets us back to, I'm afraid, this horrible term called "name space issues."
JB: Okay, you know, we're starting to get some other questions in. Howard, you --
HS: Yeah, we have a question from Michael Geddes who used to be at Princeton. He's somebody who actually left! Hard to believe! But he's at Georgetown now, and Michael says, "As you dream up identifiers for use on the campus, how do you enforce the proper usage of these identifiers by applications and systems programmers?" This is something, I'm sure, of great interest to Michael.
KK: I think we typically think about a triage process here. That when somebody says they want information from the directory, you meet with them and ask, "Gee, how do you want to use the information?" etc., and then we understand which identifiers they may want to access from the directory and which are the pieces of information associated with those identities.
So there seems to be no substitute for that triage process, and I think Keith has actually instituted that at Wisconsin.
HS: Okay. We have another question from Lawrence Nevick at Wisconsin. All you folks in Wisconsin, you could really just go over and talk to Keith!
JB: Well, is he in Madison?
KH: But it's so much more fun on the Internet, though!
JB: I guess he is in Madison. Okay.
HS: Lawrence says, "Describe for us the interface between the Human Relations department with the technical maintainers of the directory." He'd like to know if there's a successful example where this kind of thing happened. Do you understand --
JB: Isn't that part of the triage?
KK: Well, since it's coming from Wisconsin, I'm real curious to hear your answer, Keith.
KH: I was hoping to have some time to think about it! No, I'll give you the quick answer on that.
HS: Or just give Lawrence your room number!
KH: I've got his phone number here, I believe.
No, we had serious conversations with our HR folks and they are hoping, I believe, to get their systems migrated to a more twentieth century -- if not twenty-first century -- infrastructure. But the facts of the matter are that right now some of it's -- all of it's on the mainframe called "Blue" here, and some of it's in relational database form in DB2. That's the appointment information. And some of it, I'm afraid, is still back in hierarchical data files, IMS style files, on the payroll side. So what we have in fact done there is kind of old-fashioned, but we worked on it with them and what we get is essentially a daily feed of the whole schmear of information about people that have appointments here.
And so it's a person, their contact information, their identifier in the Human Resource system, in the IAD system, all the appointments they have. Is there one or many active appointments? They might be in the medical physics and a doctor at the hospital as well. We get that information, we have agreed to the rules that the custodians of that data have put out on who gets access and who doesn't and to which pieces.
And what we do to kind of keep our directory from having to swallow the whole HR file every day is, in the background, we've got a little process that kind of compares the information from yesterday to the information from today and just feeds us the differences -- kind of a change log approach. And when they get to a relational database, we'll probably do it more with triggers.
Or if we even get really sophisticated, I know the Stanford folks are talking about -- and again, this is Buck Rogers stuff right now, but maybe now twenty-fourth century -- which would be to go to a message oriented middleware approach where you publish events like, "We hired Joe Blow and here's his address and stuff," and then the directory would subscribe to that channel, which is New Hires Channel, and then update the directory that way. That, to me, would be, I think -- I haven't imagined myself, at least, a better approach, but we've got some issues to sort out in terms of infrastructure and who's going to do that and all that.
But that to me is kind of the ideal because, in fact, you know, changes in the databases are not what you're interested in. You're interested in kind of events at a human level. This student became eligible to enroll. This person was terminated for cause, so take away their privileges. Things like that are probably best handled in a, "Here's an event, throw it over the wall, and anybody that cares about these subscribes to that channel for events."
KK: I think Lawrence's question shows another great value to directories -- one that I heard about at North Dakota State when I was there in the fall, which is that when they fire someone today at North Dakota State, it's not, seven days later maybe we've cleaned up all of their privileges. But you take them out of the directory immediately. And within five minutes, they are deactivated from their privileges at the institution. That's very helpful, vs. the current duct-taped mechanisms that we have today of, "Well, we fired somebody today."
HS: You mean somebody leaves. Nobody gets fired from a university, Ken.
KK: Somebody leaves, we need to disconnect a set of services associated with them. What is that set of services? And do we have to go to 40 to 50 different applications and remove this person?
With a directory, they're removed. Their status changes in the directory and all the applications that are directory enabled immediately are aware of that.
JB: You're bringing us back to some real practical applications of the value of this kind of a metadirectory, Ken. At this point, we do try and provide some very practical hints, guidance, thoughts perhaps, and conclude with that for the people who are listening. As we look at directories right now, are there any real practical directions or hints that you might suggest to folks on campus right now?
KK: As Keith said, we're trying to take some data points and make trends out of them, and one vehicle that we're working on to do that is the Internet2 Middleware Initiative.
And if you go to the Internet2 Website, you'll see a section on middleware. It includes a set of best practices that we've harvested on identifiers, authentication and directories. Those documents are up there today and they're living documents, ones that evolve over time.
I would also mention that we've created a list called email@example.com and we welcome people to subscribe to that list. And that list will be used to promulgate thoughts and discussions as we go forward in trying to develop rough consensus around middleware and directories.
HS: Could you also talk -- we mentioned earlier on something called EduPerson.
KH: Right. Yeah, I can say a little about that. (And also, I want to add that Internet2 part of that, Internet with the numeral two after it. So internet2.edu.)
JB: Actually, those links are all up on the Web page.
KH: Yes, they are. And in fact, we're talking about EduPerson, there's a link in my bio paragraph that says the Edu-Person
Object Class Working Group, and if you click that, it'll send a message to those lists and we'll take it from there.
But the notion on EduPerson is that, again, primarily for inter-institutional things -- whether it's within the UW system or between all the Big Ten institutions or nationwide with the research ones -- however, there's a lot of value if we made kind of the same decisions about names of attributes and the syntax of the values and the matching rules that we would apply when we try to decide.
So that we could do -- and I'm going to blame Michael Geddes for coming up with the idea of a directory of directories for higher education. He's got some experimental versions of that. That's part of this Internet2 effort. You know, for that to work very well, so that I could go in and ask for all the Ken Klingensteins in higher ed, Michael Geddes is going to have to count on the fact that we say its name or its common name and so does North Dakota and so does Stanford and so does maybe even Princeton.
And that's work that hasn't really been tackled, and there is this working group now. It's blessed by both EduCause and Internet2 and has participation from both of those sides of the fence. And our objective is in the next -- believe it or not -- six weeks, to come up with a 0.9 version of a list of attributes and syntaxes and semantics to just get this idea out there, by the way, help Michael Geddes produce a more useful directory of directories, and again bring it back home and as architects (or whatever we're called), recommend it to our folks that are setting up the object classes and the attributes for our own directories.
This is one of those cases where even if we can only agree on five or six things, that'll put us a lot father along in kind of out-of-the-box interoperability of querying the directory and finding what you were expecting to find if we agree on that. So again, there's not time to go into a lot of detail here, but this is an ongoing effort and at this point, it's more to kind of seed the idea and get people started on a Version 0.9. We're naming it that intentionally.
JB: And if we can come up with an EduPerson and at least a set of characteristics or variables associated with that EduPerson, then we'll all be able to interoperate across our campuses if we want to.
KH: Right. And Don Morgan out at University of Washington has reminded us that the trouble with efforts like this is that you need to spend all your time arguing about which attributes and which syntax and what we really should start from.
And we are going to try to follow his advice (and he'll remind us if we're not). We're going to start from what applications do we really think we want to do inter-institutionally, what attributes would be needed to support those, whether it's the access control decisions or whatever, and drive it from that practical side again, so that we say, "Now, here's what we would intend to use this particular attribute for, and if you want to do something like this, why don't you buy into this view of how it's done and what it means?"
So in fact, yeah, it's easy and kind of fun to argue about syntax and naming and all that, but the harder work and the more valuable work is in agreeing about some real things that we'd like to do and using that as our touchstone for making the hard decisions.
HS: We only have a couple minutes left here, but there was a project called Shibboleth. Do I have that right?
HS: That was, I think, another important area where some work was being done. Could you say a few words about it?
KK: Yes. Shibboleth is a joint effort by Internet2 and IBM. And our goal is to facilitate inter-institutional resource sharing.
And what we'd like to do is be able to control access to Web pages in inter-institutional frameworks, so that if you wanted to go to a restricted Web page that you were legitimately authorized to go to at another university, instead of getting a name and password from that university to use on that Web page, you could present your local credentials to that distant Web page, have the distant Web page go, "Aha! There's an @ sign in the middle of this name. That means I need to contact the originating institution and determine that the credentials are valid."
If we're successful in doing this, and the intent is to do the analysis and then hopefully produce an Apache module that would be widely available, then institutions that use directories and use these kinds of tools will be able to share resources much more freely and to reduce the clutter of names and passwords that we all have.
HS: That sounds really interesting.
JB: It sounds really great!
HS: Are there any other things that are going on out there that, Keith or Ken, you want to mention?
KH: Well, yeah, just very briefly.
There is, I think, a remarkable phenomenon going on where the people in the federal government, all the branches that have interaction with higher ed (and some that, hopefully, don't, like the Department of Defense), are talking about PKI. And again, through Net@Edu, one of the EduCause -- I don't know what we'd call them, subsidiaries or divisions -- is getting those folks together with the likes of Ken and other Internet2 and higher education folks to say, "What do we need to do to have really interoperable security coming out of, again, the Public Key Infrastructure stuff?"
And those are really interesting discussions. We have kind of agreed, I think -- Ken, has that letter been signed and sent yet? We're about to recommend from the higher ed side that the government talk to us about the higher ed notion of an X509 v 3.0 certificate, which is the certificate format that carries the Public Key information, for one thing. Asking the federal government folks to honor that and to work with us on defining the details and go forward from there.
So that's very exciting stuff, too, and again, that can be tracked through some of our links on the PKI thing. Or if you go to -- again, I think, in my bio, there's a reference to the federal PKI work and it'll take you to the Web pages for that federal PKI -- or Net@Edu PKI -- working group that, again, has this kind of cross membership from Ken's friends and mine and others.
KK: It was very nice to hear the federal government come to us and say, "We're from the federal government and we're here to listen."
HS: Rather than "We're here to arrest you!"
JB: Well, that gives us an opportunity. It sounds like we're not really ahead of a curve, but at least there's a lot of initiatives working to at least stay current with the curve.
KH: Yeah, and Ken's references to 1904, of course, it's Internet years, you know.
JB: Is that what it was?
KH: That's really 1980.
HS: Really two years ago, right?
JB: I thought it was -- somehow, I have always wondered why my computer, whenever it did mess up and forget the date, always chose 1904. I suspect there's a real reason for that somewhere.
Before we do our wrap-up, is there any final question or comment, Howard?
HS: Actually, we have about 30-some questions we never got to here --
JB: I know it!
HS: -- on the list, so I want to thank both Keith and Ken for leaving us enough questions so we can do two more TechTalks!
JB: Real soon, I think.
HS: But I think that anything we get into is going to drive us past 5:00 Eastern time here, so no, I think it's turned out to be even a richer area than anything we thought here, and I expect we will go back and visit this thing again.
JB: Well, one question I had hoped to think about getting to is the idea of, gee, it'd be nice to have kind of a step-by-step recipe for campuses for setting up directories, but that's obviously going to have to wait for another time.
KK: But that is in the works!
JB: That is in the works? Okay.
HS: Where are we going to find that when that's done, Ken?
KK: One of the processes that I2 is doing is something called Early Adopters, where 11 universities -- an interesting cross section of big and small -- are beginning to deploy this stuff on campuses, and we're documenting their processes and their internal paperwork, etc. So we hope to have 11 road maps out there over the next several months, and pick a university whose role and mission and environment is close to you and look at the road map, see if it works for you.
HS: And we'll find that on the Internet2 Website? Where will we find this kind of thing?
KK: Yeah, it'll be on the Internet2 Website and we're going to work with all of our partners in higher ed to promote this widely.
HS: Good, that sounds great.
JB: That sounds wonderful. It really will give a place for people to go. Well, given all that, Howard, is it time to wrap up?
HS: I think so.
JB: All right, well, let's do that. And many thanks to all of the institutions who help support our TechTalks, and we invite you and your institution to help support them if you're not already a CREN member.
And also, be sure and print out the calendar from the Web page for this spring and set aside time on your calendar in two weeks from today for a conversation with Chris Peoples from Indiana University on a very important topic of Analyzing and Costing of IT Services, just how much all this costs. And that's going to be on March third.
Thanks to everyone who made this event possible today: to our guest experts, Ken Klingenstein, Keith Hazelton; to our technology anchor, Howard Strauss, still at Princeton; to Terry Calhoun, Event page producer. To David Smith and Patty Gaul of CREN; to Julia O'Brien, Jason Russell, Carol Wadsworth and the whole support team at Merit; to Susie Berneis, audio file transcriber; to Laurel Erickson, transcript editor and indexer; and finally, a thanks to all of you for being here. You were here because it's time.
Bye, Ken, bye, Keith, bye Howard.
HS: Bye, Judith.
JB: See you all on the Web!