Minutes UNICORE integration task force 2011-05-06 10:00-11:30 Minutes belonging to Agenda under https://www.egi.eu/indico/conferenceDisplay.py?confId=465 Presence: Michaela Barth MB Rebecca Breu RB Andrew Lukoshko AL David Meredith DM Gert Svensson GS Krzysztof Benedyczak KB Emir Imamagic EI Foued Jrad FR MB: Going through last meetings minutes: 1) Last meetings minutes https://www.egi.eu/indico/materialDisplay.py?materialId=minutes&confId=446 No remarks. 2) Going through last meetings action points https://wiki.egi.eu/wiki/UNICORE_integration_task_force MB: AP Use case for EMI registry: Nothing new, I'm currently reviewing related document EMI Execution Service Specification [10:07:55] Michaela Lechner https://twiki.cern.ch/twiki/pub/EMI/EmiExecutionService/EMI-ES-Specification_v1.0.odt AP on our EndPointServeURL requirement: During the EGI UserForum we had some personal discussions and John Casson who is working together with David made a proposal for a previously not discussed solution, which should be faster to implement, and from what I understood fulfilled all our requirements. David can you elaborate on this new solution? DM: For each service in GOCDB we had a primary key constraint which didn't allow us to repeat service endpoints. Now we can remove those constraints, so that's fine. The slight follow-up question is if multiple URLs for the same service are needed? Because then this solution would not be sufficient. KB: I don't think so. RB: Agree, I don't think so either. DM: That's great. MB: David, how complicated is it to implement this solution? Have you already checked if it really is possible to get rid of the primary key constraints? DM: We can definitely do it. But it's refactoring of code, it sounds trivial, but things have to be changed in several places. We are aiming to do this in the next month. MB: Thanks very much, that is indeed great news! Next AP: Keeping the probe integration ticket updated. KB: The ticket hasn't been updated. There were some updates in the probes, but just regular maintenance, nothing big. We'll update the ticket to report on the next big changes. MP: Next AP was to put in an official requirement towards APEL, we did that. So that specific action point can be closed. KB: Maybe we should have a new AP instead, the possibility to send accounting different from gLite to APEL is on the EMI roadmap now. We should follow the development of UNICORE and accounting in the EMI roadmap. MB: Maybe I can give an update on Cristina's efforts going on in accounting and unified URs here. After the Globus integration task force meeting, I asked John and Cristina directly about the current valid extensions for APEL to the OGF-URv.1 and I was pointed to the new working group Cristina is leading within EMI. [10:16:19] Krzysztof Benedyczak https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting MB: Exactly. KB: There are also SGAS extensions. Storage Accounting EMI has already produced some draft: [10:18:09] Krzysztof Benedyczak https://twiki.cern.ch/twiki/bin/view/EMI/StorageAccounting MB: Is there someone of us in this ComputeAccounting working group who can keep us updated? KB: I'm also in this group and am coordinating this from UNICORE side within EMI, as seen on list of people in the webpage. MB: So you can keep us updated in the future? KB: So yes as I am involved anyway in this effort. --> new AP: Keep an eye on accounting: See that EMI follows its roadmap and we get the possibility to send accounting different from gLite to APEL. Related to get updated on the work going on in the new EMI working group led by Cristina MB: AP on Best Practices: Update: I sent the links I got from Mathilde to the operational-documentation-best-practices@mailman.egi.eu mailinglist and got no reaction or feedback so far. Otherwise nobody had time with the current focus on the EMI release to produce any new best practices? MB: AP on this meeting: Can be closed and opened as a new one I suppose for the next meeting. MB: The AP on the list of the next service types to be integrated into GOCDB. I sent a list to our mailinglist we can use for discussion now: KB: I already provided such a list in a ticket somewhere... EI: Here? [10:27:41] Emir Imamagic https://rt.egi.eu/rt/Ticket/Display.html?id=306 DM: Maybe here? [10:29:23] david meredith https://rt.egi.eu/rt/Ticket/Display.html?id=944 KB: No, I thought I had posted in a ticket, I found it now in a mail and can share it on the mailinglist. MB: Can we then go through the list I proposed instead for the time being? Afterwards we can put it in Ticket 944. --> new AP on KB to post the list and after discussion to put it in RT 944 * Workflow Engine: Needed to add workflow functionality to UNICORE. It is not needed if only single jobs are submitted within a grid infrastructure. Normally there's one Workflow Engine per grid infrastructure. suggested name: unicore6.WorkflowEngine Probes: check_workflowservice check_workflow * Service Orchestrator: Handles dispatching of a workflow's subjobs, and brokering. Normally there's one Service Orchestrator per grid infrastructure. suggested name: unicore6.ServiceOrchestrator Probes: check_servorch * CIS: Common Information service. Standalone service which collects information from the UNICORE/X. One per grid infrastructure. suggested name: unicore6.CIS KB: There are two services related related to this. RB: CIP is needed as well, the idea was that all services should be in GOCDB. MB: * XUUDB: User database. Maps the X.509 certificates to a user's login accounts and roles. Services like the Workflow Engine and the UNICORE/X query the XUUDB for authorization. There is great flexibility in how many instances are needed per grid infrastructure; each site running their own XUUDB seems to be the most common deployment option. suggested name: unicore6.XUUDB MB: Is there a probe for this? KB: Maybe some internal probe with a passive test? It is an internal service. RB: But XUUDB can be shared! MB: * UVOS: UNICORE VO Service. Serves the same function as XUUDB but is much more advanced and flexible by supporting arbitrary attributes, groups, advanced authorization, and more. Usually one per grid infrastructure, but may be replicated. suggested name: unicore6.UVOS Probes: check_uvos * Target System Interface (TSI): The actual interface to the local batch system; submits jobs and goes with the UNICORE/X. suggested name: unicore6.TSI Probes: check_application (Is this the probe really connected to it?) KB: I disagree, for sure not, the TSI is always hidden and never exposed, it is translating UNICORE operations and is therfore comparable to blah in gLite. The actual test should be on UNICOREX. check_application is part of the probes for UNICOREX. RB: I agree with KB. MB: * SIMON: Standalone service which monitors UNICORE sites, mainly by periodically sending test jobs. suggested name: unicore6.SIMON KB: We don't need that one. RB: We don't really have an opinion on this. EI: Is it deployed on every UNICORE site? KB: It is a script, it is not deployed. MB: Ok, I guess the fazit here for SIMON is a no. * Storage Management Service, Is this really not already in the GOCDB, is it needed? suggested name: unicore6.SMS KB: Yes, it is UNICORE specific and it is definitely needed in GOCDB. Probes: check_freespace check_sms 10:37:11] Andrew Lukoshko you can see here how it looks in BY-Grid https://noc.grid.by/nagios/cgi-bin//status.cgi?host=all (need cert) Although currently we only check host by ping and simple MPI job [10:39:04] Emir Imamagic my cert doesn't work there [10:40:21] Andrew Lukoshko pity, I've made a screenshot http://dl.dropbox.com/u/2112100/nagios.png EI: Those are just the standard Nagios probes. AL: Yes you are right, we are interested to do right monitoring. MB: I found other probes I couldn't connect to any service: Probes: check_versions KB: This one is under development, it just checks versions of all components, you can't check each service. MB: So it is a more general probe valid for several service types, right. Fine. MB: What more services are missing? KB: There are some: [10:43:34] Krzysztof Benedyczak CISRegistryPortType CISInfoProvider TargetSystemFactory StorageFactory StorageManagement GridResourceInformationService ServiceOrchestrator GridBeanService BESFactoryPortType locationManager WorkflowFactory TracerService MB: Are these all really services?? KB: I'll make a list and check in the RT ticket. MB: 3) Discussion of Progress a) GOCDB Has now already been discussed. I can just emphasize once again how happy we are, David, that this has now highest priority in GOCDB development! b) Monitoring EI: Krzysztof sent me all the details I need for testing instances. I haven't progressed that much after I heard that GOCDB is preparing a non-interim solution within one month. My proposal is to wait for that in order to integrate it all in one go, otherwise that would just be waisted effort with fiddling around with XML. MB: So you have done everything that can be done without the XML? EI: We just need to add their definition, Krzysztof did all the work. At this point there isn't much to add. KB: There is possibly one thing you could do theoretically: When we were integrating our configuration with your Nagios, we experienced some problems with the different approaches of configuration in Nagios: e.g. some config file definitions were turned off. It would be good if you cold contact the guys from the gLite site in Poland who are maintaining our Nagios instances: Maciej Pawlik and Paweł Wolniewicz so maybe they can provide you some more detailed info about that. EI: I know them, this is basically the story of what you did. But you are right, now I can do it the other way around while we wait for the GOCDB. --> new AP? EI to contact Maciej Pawlik and Paweł Wolniewicz on sensible Nagios configuration default definitions. KB: We had one problem after this integration with the authorization. EGI Nagios is allowing people to submit tests manually, when the person is in a contact list. Our configurator is currently not able to create such contacts. EI: Once the site is in GOCDB and the site has all the appropriate DNs defined, this will be solved. Once we have integrated the probes from our site, the whole thing will work. FJ: One question: we will face the same problem with Globus when it comes to authorization; UNICORE is also using static mapping, this must be registered in .. EI: You have to play by the rules of the middleware. You will have to ask every site to add the DNs. KB: We have this centrally. EI: When setting up a new Nagios instance, we have to go and set up this in the middleware. KB: It has to have priority in the UNICORE system, otherwise it can wait too long. EI: As solved in gLite with the ops queue. KB: We map the monitoring user to the same user on all our sites: This user has a priority or reservation in the site configuration. MB: So something similiar will have to be made for every UNICORE NGI. EI: How is the whole discussion going in EMI concerning authorisation? Globus will have the same authorization framework as gLite. KB: It is very funny, rcmaps is not going to be deprecated, and Argus is going to be seen as common authorization structure for all middleware stacks. EI: Oscar (Editor's note: Oscar Koeroo, NIKHEF) said that IGE will take over the support of the Globus authorization plugin, it will be tied to Argus. Is this integration of UNICORE in EMI release 2? KB: The version from EMI 1 can talk to Argus: But you are not able to define the default authorization policy in Argus, some simplified policy language is used. The same story is valid for ARC, afaik. They also still have attributes which are unsupported by Argus. Argus is going to be updated by the end of this year to for EMI release 2 to implement and include all these attributes as it has been agreed between UNICORE, ARC and gLite. EI: Can you already have access to UNICORE sites with this simple policy? KB: Now, in UNICORE it is more complicated, there are owners of resources, and this feature is currently not supported. KB: So basically it is working but in an insecure way, so it is not useful. KB: The EMI 2 plan is to finish most critical updates of Argus by the end of this year, but not much earlier. FJ: What about the status of putting UNICORE client and gLite on same node? EI: Afaik that is still not completely ready, if you check the documents which Krzysztof sent. KB: I can confirm this is true: it is enough to install EMI 1 repository and it will work, of course this is true for the common UNICORE client. EI: Foued, I would advise against using EMI 1 in production, it has APEL dependencies. KB: For UNICORE it is totally safe, we do not depend on everything. But for something like the metapackage UI this is probably true. EI: It is not only that, and if you don't have protect flag and priorities, you might unintentionally update ALL UI packages, because they get all installed. KB: I suggest it would be the easiest way for Foued to install it from there and then delete/disable the EMI repository. Then it should work fine. MB: Cristina Aiftimiei reported in the last Grid Operations Meetings that the EMI 1 release is really being tested on system with different middlewares installed and with UNICORE being one of them. EI: This is not the problem of UNICROE; as Krzysztof said you shouldn't have the repo enabled in prodcution. Otherwise you have all four middleware clients available. KB: And in different locations then before! MB: It was already planned years ago to have the middleware installed in Linux default locations. Now that is finally true for all except yaim. EI: Yeah, that's the only one! MB: If we go again back to accounting. I'm glad to welcome Andrew Lukoshko here with us. I hope many of you have heard his interesting presentation of the Belarus solution to accounting with a nice user portal, unfortunately currently only in Russian. I was very impressed and asked around a little bit, and got the impression that the main reason why it wasn't considered previously within the UNICORE community as a global solution was that it wasn't open source. But Andrew mentioned, that maybe it could be made open source, or at least possible to give selected persons access to the source. Andrew, can you update us a little bit? AL: Currently it is not open source, but we are discussing this right now, so maybe we can share. I'm afraid it is not really that easy: There are some poeple we should ask first, because currently the code is not our property. We have some support for this to do and the people that give us the money don't want us to share right now. This is usual practice here. BTW, right now I am translating the system into English, maybe we can do some demonstration. That would be nice, so even if we can't share right now, maybe everybody can use the system right now and get acquainted with it. MB: Have you had a chance to look at our currently favoured Polish solution? AL: Maybe the system that currently was developed in Poland ist now better, following all the development guidelines and written in Java and as an operation system stand alone application. Our system is written in PHP, it is using Apache and we don't have had plans to integrate to UNICORE. Because we really do it our own way. The system has many features which maybe are not interesting for other NGIs. MB: We have many NGIs within EGI with different background and wishes. I'm sure some of the features can be interesting to some of them. So there it would really be nice to have open source and to compare the efforts. It would also be politically and prestige-wise be very good if you could provide something to the community that can be used by many. AL: So in general we will try to share source, maybe in a month. I was asked from Germany to give them the system to try and they then can compare the solutions from Poland and Belarus, and then they say if this is interesting for EGI. But this is an exception: it is not open source. KB: Anyway: it would be very nice to merge effort and we will be able to have the same server side with some extensions. I'm sceptical of the presentation layer, because it is very specific to Belarus, but I think we can certainly cooperate in the case of the server side, and the release I was talking about with Andrew will be published in one month. We are currently finishing documentation. I'll send the info as soon as a release with all major features is available. --> new AP: Start a more tight collaboration with Belarus. AL to keep us updated on efforts to go open source or other possibilities to collaborate, KB to send info on new release with all major features. MB: Anything more on accounting? And I think we covered the Argus authorization already in an excurs and the Best Practices were handled in the APs. 4) Next meeting MB: I'll create a new doodle poll. --> AP: create new doodle poll 5) AOB MB: Okay, then thank you all for this meeting. == Open Actionpoints after the meeting: == * AP: "EMI registry use case" Find out more about the EMI registry and the static info contained: How is the distinction between the static data GOCDB and EMI registry, how can we propagate downtime info for just one Service Endpoint (URLs are needed to distungish between different instances)? (contact Laurence Field) Progress: MB will report on her email discussion with Laurence Field and plans to push the GLUE 2.0 use case in propagating downtime information Update: EMI registry in very early planning stage, XML solution in the meantime. Second solution for GOCDB (the GLUE 2.0 based approach) again in discussion. MB to bring forward and discuss the usecase in the next OGF PGI wg meeting. Update: Too late to be part of V1 of http://www.ogf.org/documents/GFD.180.pdf waiting together with 2 other use cases for V2. Update: currently reviewing related document EMI Execution Service Specification https://twiki.cern.ch/twiki/pub/EMI/EmiExecutionService/EMI-ES-Specification_v1.0.odt * AP: "ServiceEndpointURLs in GOCDB" Reopen GOCDB requirement ticket for EndPointServiceURL https://rt.egi.eu/rt/Ticket/Display.html?id=975 and ask for second solution (Add URL field for ServiceEndpoint and repeat ServiceEndpoint to represent different URLs /Associate a ServiceEndpoint with a new „1-to-many‟ EndpointLocation entity), with a proposed timeline of 6 months. Progress: David provided us with some updated slides on the possible implementation with more detail: https://wiki.egi.eu/wiki/File:GocdbGlue2Unicore.pdf Update: Personal discussion at the EGI UF, Vilnius with John Casson: A new proposed solution with getting rid of the primary key constraint and a new timeline of one month has been discussed. It fulfills all our requirements. This requirement has now top-priority within GOCDB development. * AP: "Regular probe integration ticket updates" for everybody (especially EI, KB and FR): to keep https://rt.egi.eu/rt/Ticket/Display.html?id=306 updated. * AP: EI to contact Maciej Pawlik and Paweł Wolniewicz to discuss sensible Nagios configuration default definitions. * AP: "Keep an eye on accounting": See that EMI follows its roadmap and we get the possibility to send accounting different from gLite to APEL. Related to get updated on the work going on in the new EMI Workinggroup led by Cristina https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting * AP: Start a more tight collaboration with Belarus. AL to keep us updated on efforts to go open source or other possibilities to collaborate, KB to send info on new release with all major features. * AP: "Best practices" KB and FR to send documentation links to operational-documentation-best-practices@mailman.egi.eu as soon as considered sufficiently completed, suggested procedure which could be of interest in this context: how to install more than one UNICORE service on one host. Update: PL-Grid restructuring whole UNICORE documentation system otherwise concentration on EMI 1 release. MB will now start by pointing them to standard unicore.eu documents which conclude good documentation and installation guides and see if they would like to already include a link to them. What is missing: real best practices, the very solutions to different real integration problems. Some material exists in Polish language. Update from Mathilde: UNICORE installation documentation (not yet covering the EMI UNICORE rpms) can be found at http://www.unicore.eu/documentation/manuals/unicore6/files/manual_installation.pdf The basic scenario (section 2) should answer the question on how to install multiple services on one server. It is not perfect for EGI purposes but maybe it is of interest. Update: Michaela sent those links to the best practices mailinglist, but hasn't got any feedback yet. * AP: "Next Meeting": MB making again a doodlepoll for next meeting with Global Timezones and maybe function enabled. * AP: "List of service types to be integrated." MB compiling a list of the next service types which should be integrated into GOCDB and sending it to our mailinglist for discussion Update: Some discussion of this list during the meeting, AP on KB to send his list to the mailinglist as well and to continue discussing there and to put the output of this discussion in ticket https://rt.egi.eu/rt/Ticket/Display.html?id=944