Minutes Globus integration task force 2011-05-18 15:35-16:15 Minutes belonging to Agenda under https://www.egi.eu/indico/conferenceDisplay.py?confId=485 Presence: Michaela Barth MB Gert Svensson GS Cristina del Cano Novales CN Helmut Heller (IGE, LRZ) HH John Gordon JG Torsten Antoni TA Ron Trompert RT Emir Imamagic EI Tiziana Ferrari TF John Robinson JR Additionally on site: David Wallom DW Steven Crouch SC Emmanouil Paisios (IGE, LRZ) EP MB: I hope you can hear me, I'll record the meeting just for personal use, hope nobody has something against it? MB: Welcome to the meeting! We aim for a short meeting today. How is the Community Forum going so far, how many participants do you have, are you happy? HH: Yes, I am actually very happy, although Alexander Papaspyrou fell ill this morning and couldn't come and give his talk. We have 40-45 participants, and the talks are running smooth. We are just in a coffeebreak right now, but nobody of us took any cake or coffee, so we are suffering. You see where our priorities are, they are with DMSU and Globus and not the coffee MB: You poor souls! We'll try to make this short then. I've seen you had the Grid-SAFE session just before, so I think we should be ideally positioned to continue with accounting. I tried to put together an agenda, if you want to chance it, it is just my suggestion. We start with a reference to the last meetings minutes: 0) Last meetings minutes: https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=469 1) Going through open Actionpoints related to accounting https://wiki.egi.eu/wiki/Globus_integration_task_force a) AP: EP to report on installation of Grid-SAFE in the LRZ testbed to our mailinglist when done. MB: the first point there would be an update on the testbed installation at LRZ by Emmanouil. Have you already installed it now or are we still in the planning phase? EP: Grid-SAFE is not yet installed on the testbed, I think. SC: I believe it is actually, John, would you like to make a comment? [3:06] JR: We are currently in the integration stage, Grid-SAFE is deployed and is able to accept URs, but what we can't do at the moment is to use the accounting portal to let you actually see the accounting data that we gathered. There are a few tweaks that need to be made with the help of the Grid-SAFE author. We are at his mercy slightly, but we are hoping it is taking days rather than weeks, but it is installed. [3:37] JG: So what are you using to collect/harvest the usage records? JR: That is another point. We are not at the stage where we've got a client which will actually harvest the URs: we need to implement something that goes through the Globus accounting logs in general. I realize there are tools that have been developed by the NGS (National Grid Service) to do that. And I am looking into those. [04:23] DW: So John, this is Dave Wallom, those tools for generating URs, were supposedly parts of the original spec with Grid-SAFE as well. We would be very happy to give you the RUS publisher client we developed, essentially the Globus log half, we still have got. [04:45] JR: Is that RUS Pub UR? Yes, I am looking at that in my desktop. DW: Right, we gave that to Steven Booth(?) quite some time so ago, so I am surprised that that functionality isn't within Grid-SAFE. [5:05] JR: I think, the state of the Grid-SAFE as a distribution has been slightly suboptimal, shall we say. It didn't quite survive its move from NeSCForge to SourceForge and ended up quite fragmented. So what we originally got was the accounting portal. We'd assumed that was everything, and then we discovered that in fact the RUPI Service, the SOAP service that takes the URs, was missing from that. So we got that separately. And then we are really at the stage were we are just addressing how we are generating the URs, so to patch the UR thing certainly wasn't part of what I had, and I'm trying now to glue it all back together. DW: Ok, well. If you need some assistances with that, we or I can certainly pass any queries on to the team within Manchester that originally worked with that, who passed that directly on to Steve Booth in the first place. [6:20] Ok, that's great. Without getting too technical, I suspect what has happened is: the tools that you've been using were not been sending their data to the RUPI service, they would have been sending it to the old non-RUS compliant servlet which just took records over a post request. What we're trying to do is to stick to OGF standards and use the RUPI service. So, I'm not sure if the tools that you have are in their present state able to submit to the RUPI service. I could obviously check with the guys in Manchester to see if that is possible. [7:09] DW: The standard version that was passed to EPCC for this, is as for the NGS, which is: you are a RUS compliant service, so it has actually been edited to not be RUS compliant? [7:29]JR: Right, ok. I had understood that the RUS standard was depricated and RUPI had replace that and that was Steve Croud who actually proceeded on that. [7:50]DW: That as I understand it certainly is the way you guys would want to go, but it should just be a case of changing some service definitions, whereas from the sounds of it the internals have been somehow messed around with in what was passed over. So it might actually be worthwhile going back to what originally was given from the NGS. JR: Yes definitely, I'm not looking to re-engineer anything, so if we could get some existing scripts so we could tweak them, that's my favorite approach, great. JG: Could you explain what RUPI is? JR: As I understand it (Steve would be the guy to tell you this), it is the resource publishing side of the RUS specification. As I understand it, RUS was split in half, and there is the resource publishing interface and a resource usage querying interface which is another set of a SOAP API, it is called RUQI. Now as far as I understand it, the query side of the standard has never been actually submitted to OGF, but I think with a bit of cajolery, Steven Booth would be prepared to to that. [09:00]JG: Certainly, the de-facto operation of any RUSes I've come across were that they allowed uploading and didn't really support the queries. Mainly because a lot of them were mapping things on to a relational database once they've received the records and trying to do expat queries over a relational database and that was impossible? JR: Right. JG: Ok, that was my hearing. I just hadn't heard the actual acronyms before. [09:34] SC: Yes, so the idea, I think, is in theory, even if you had a good site supporting the RUPI standard, if you had an RUS client that was able to upload, basic publish URs that should be able to connect to a Grid-SAFE deployment which supported RUPI, where RUPI is basically just the publishing half of RUS. [09:54] JG: Yeah, that's fine so, good. [09:58] SC: So one thing you do want to do is get hold of the NGS RUS client and then verify that you've got a good, safe installation. That enables then to determine to what extend the publishing aspect is actually okay with an RUS client. [10:30] MB: Now you already were going a little bit into the detail of everything, I must confess, I haven't managed to follow everything and will try to make it clear from the records to provide you with some nice minutes. [10:50] JG: I think we've answered actually already several of the following questions as well. b) AP: HH to pass on question to IGE about which transport mechanism is intended to be used in Grid-SAFE. [10:54] MB: Yes, but I will bring them up separately anyhow. So the next action-point actually was on Helmut to pass on the question about which transport mechanism is intended to be used within Grid-SAFE. I think you already have brought this up now, but what was the answer again if you could repeat, just for clarification. [11:19]: JR: So, It's SOAP, SOAP over http. MB: Ok. JR: https, of course. Sorry. [11:30] JG: You need client authentication? JR: Yeah. But, that's too much technical detail. * AP: MB to send link with suggested extensions to OGF-UR by APEL to HH. HH to check within IGE if those extensions are sufficient for use in Grid-SAFE. Update: MB, checked again with John Gordon about current valid APEL extensions: o a) extensions to the basic UR, eg it didn't include the sitename for example. Cristina del Cano Novales wrote a comparison of our UR with the standard as input to OGF, after the Catania meeting. It is probably somewhere in GridForge but a quick browse didn't find it. o b) a variation on the UR that represented total CPU time etc for a number of jobs. We made the mistake of calling this the aggregated accounting record. Some people had a semantic problem with this name and expected it to contain a concatenation of all the job records. We should have called it a summary accounting record. http://forge.gridforum.org/sf/docman/do/listDocuments/projects.ur-wg/docman.root.current_drafts.aggregate_ur_schema o c) Cristina has proposed an updated APEL UR schema with the fields aligned more closely to the standard. This is list of fieldnames used in a relational database, not an XML DTD. This is being discussed in a special group in EMI chaired by Cristina: https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting MB: Then the next actionpoint was about the UR extensions to be used. I thought I could just refer Helmut to the list of extensions APEL is using, but John Gordon corrected me that there are actually several extensions currently in use and that there is a new working-group within EMI chaired by Cristina who is trying to propose an updated APEL UR extension schema. I can post the link again: [15:47:19] Michaela Barth https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting MB: So the question is what more would be needed if you compare to these UR extensions and do you think we can come up with a common UR, what are your opinions there? I know this working-group just has started, but maybe Cristina would like to say a few words first about this new workinggroup? [13:54] CN: Well, just to say that what is going on in the ComputeAccounting working-group is basically a discussion, we've been comparing the different fields that UNICORE, APEL, DGAS and ARC use and comparing them to the OGF-UR and see what extra fields we want to include. So that is pretty much what has been going on. [14:24] SC: Ok, something we could be involved in there with Grid-SAFE is comparing the UR fields that we actually support as well, then we can get involved into that activity. Sounds like a good idea! CN: That would be good for us. SC: Okay. [14:40]So what would you recommend is the best mechanism for doing that? Should we get in contact with you direct or through a mailinglist? CN: Everything is going through the mailinglist. SC: Okay. JG: And that wiki page points to the mailinglist. [15:04] SC: So details are on the wiki page, you say. Did you post a link to the wiki page, oh you did, yeah, I've actually seen this page before. JG: So can I just say a little bit about the EGI use case beyond this: If EGI has got Globus sites then the EGI requirement is that they publish URs or summaries of them for international VOs, so the VOs are running across more then one country in EGI. Then we want the records held in a central place for those international organizations. So the thing then is to get Grid-SAFE to republish either jobrecords or summaries of jobrecords. So there is a draft OGF aggregated accounting record, intended to be called a summary accounting record in the form of a UR but it accounts for a number of jobs not just one and publishes them into a central repository, then the use of the VO on the Globus ones can be combined with the ones from gLite, ARC and UNICORE. [16:30] JG: So we believed in the UK, that GridSAFE had the availablity to do that, that it could republish records from the UK sites to the central UK RUS. JR: There is a note on that from David, but in addition to that yeah. So, Within Grid-SAFE there is a very flexible way of specifying the URs that you want to use. So if that partly answers the question, so what we can do is we can actually tailor away the URs deployed within Grid-SAFE so we can publish them in the format as desired. JG: But it has a standard RUS or RUPI publishing outlet, does it? You have a mechanism in there for publishing, it takes the RUPI publisher? [17:30]JR: Exactly, there is a RUPI client that we actually have within Grid-SAFE, it is not acquitting to the harvesting aspects from for example batch managers or Globus, but it does enable you to publish records, on a poor record basis at the moment I believe, through the RUPI service, yes. JG: Okay, that is good then. Our production APEL service isn't taking XML document usage records in yet, but we have a prototype at Manchester, that takes them in and then forwards them on onto the APEL servers, as a proof of concept. We are doing that with UK records. We could certainly do a test against RUS, but we maybe don't want to have it underlying before we have a mechanism. The other alternative is to take the publishing bit that ARC and UNICORE and people would use to publish straight to the APEL central repository. That may be a possibility as well. You just have to transform the records into the transform mechanism we've got and then run an ActiveMQ publisher. [18:49] DW: We also do have the long term aspiration, of course, that all of our different accounting systems will just use URs and RUS interfaces. We should really make sure this is our long-term envisaged plan. HH: IGE really wants to go through standards, so that is definitely aligned with us. JG: Yeah, no, I am not deviating from the standards vision, Dave, just if we needed some short term thing, so to actually make sure we are collecting records for a VO and they are sitting all over different repositories. DW: Yes, we could certainly do that as a short time solution. MB: Should we say to have this more ordered here on how we do this that John is maybe talking to Manchester to try to get such a short term solution to work? JG: We are doing that already, Michaela, we have that working for the UK records. The original model was to have Grid-SAFE on all UK sites, but we don't have that yet, what we have is NGS UR RUS clients all publishing to Manchester and then Manchester takes those NGS URs and runs an ActiveMQ Publisher to send them down to Rutherford. MB: So the current actionpoint we can put down is that IGE contacts Cristina over the ComputeAccounting mailinglist. JG: Agree with that. I think the next step is IGE to tell us that Grid-SAFE is working. [21:17] DW: Absolutely. SC: Yes, that would be a first step, and also it's being involved in the UR fields analysis as well across the other groups, so we can just kind of harmonize on the URs there, we eventually agree on using. MB: I think that we can start as you say, but additionally to formalize it more, this is what we surely need. SC: Could you rephrase that, please? JG: It is okay, you talk about the same thing, the action that Michaela suggested was that you guys join the URs group, and that was what you were suggesting as well, so no problem. SC: Anything else? [22:30] DW: We just went through the actions, it really is: you guys will make sure to get you guys the original code for the harvester. That would come actually with different answer modules for all of our supported systems, which is very broad. And then it is up to you guys to make Grid-SAFE work and integrate it. JG: Could I just ask: What is the positon: are you taking over support for Grid-SAFE? [23:22]SC: That is a very interesting question: Within WP5 it was part of the deployment task on our own test bed for Grid-SAFE, we are bringing together all the code anyway in order to do this. As a side-effect of that, all the code will effectively be located in a single location, that is where we eventually go. So you can get it from a single place on SourceForge. There is another option that we could use here, which is: since John and I are actually involved with the SSI project which has todo with software sustainability, one of the options we are pursuing with that is that we can take some effort from the SSI to do this and have this as an SSI task as well. But we are currently looking into that. JG: Okay, thanks. [24:24] MB: I think we are through with the action-points. 2) Accounting a) https://rt.egi.eu/rt/Ticket/Display.html?id=1607 more use cases would be nice. MB:Otherwise I had one more point in the agenda about an RT ticket which supports that we can have more than one middleware deployed at one site and the accounting would still work for all of them. If you have a use case for that, it would be nice if you could support the ticket. [15:59:51] Michaela Barth https://rt.egi.eu/rt/Ticket/Display.html?id=1607 b) Update on Grid-SAFE Transport mechanism, testbed installation?, intended time scale? c) Common usage record, compare https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting what more would be needed? MB: Then we already talked about Grid-SAFE and the ComputeAccounting group. d) RUS MB: Then RUS: I think we had the agreement that we would all aim for standarized RUS interfaces in the future, that was a common agreement, right. Do you have anything more to discuss right now? 3) If time: Authorization MB: If not we thought we could talk a little bit more about Authorization, if you have any update there on Argus? Otherwise I think I can send you to the rest of your coffee break for the last 15 minutes. JR: There was mention earlier around support for Argus in one of the presentations, what is the general scope of that. One of our requirements - and this is I think now going off-topic for this meeting - but is essentially for caveated authentication depending on servers within the Toolkit. That came through with our need to support charging and things like that. So, but I think we cancel the end of this meeting. MB: I agree. Thank you all for joining, enjoy your coffee break! == Updated Action Points after the meeting == * AP: EP to report on installation of Grid-SAFE in the LRZ testbed to our mailinglist when done. *: Update: JR and other people on the LRZ testbed to get the missing Globus log part of the RUS publisher client, the harvester, from the original NeSCForge (or Steven Booth) and adept it according to their needs. * AP: HH to pass on question to IGE about which transport mechanism is intended to be used in Grid-SAFE. *:Answer SOAP over https --> can be closed * AP: MB to send link with suggested extensions to OGF-UR by APEL to HH. HH to check within IGE if those extensions are sufficient for use in Grid-SAFE. Update: MB, checked again with John Gordon about current valid APEL extensions: o a) extensions to the basic UR, eg it didn't include the sitename for example. Cristina del Cano Novales wrote a comparison of our UR with the standard as input to OGF, after the Catania meeting. It is probably somewhere in GridForge but a quick browse didn't find it. o b) a variation on the UR that represented total CPU time etc for a number of jobs. We made the mistake of calling this the aggregated accounting record. Some people had a semantic problem with this name and expected it to contain a concatenation of all the job records. We should have called it a summary accounting record. http://forge.gridforum.org/sf/docman/do/listDocuments/projects.ur-wg/docman.root.current_drafts.aggregate_ur_schema o c) Cristina has proposed an updated APEL UR schema with the fields aligned more closely to the standard. This is list of fieldnames used in a relational database, not an XML DTD. This is being discussed in a special group in EMI chaired by Cristina: https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting Update: IGE (SC) contacts the EMI ComputeAccounting over their mailinglist and actively participates in the UR fields analysis.