iMARINE - EGI integraiton meeting

Europe/Amsterdam
ACTION (Enol) to circulate a pointer and arrange a wiki page to track integration activities of iMARINE, it must be connected to WP4

ACTION (Andrea/Pasquale) provide information about interfaces (IaaS) needed, and develop the desided use case for feasibility analysis ==> evolve the document starting from Andrea's

REQUIREMENTS (FEDCLOUD) Horizonatal scalability

MORE INFORMATION

ATTENDING: E. Fernandez, T. Ferrari, A. Manieri (part), P. Pagano, P. Solagna

D4SCIENCE: UoA and CNR are infra providers, plenty of services for FAO (part of collaboration)
Gentlement agreement for operating the infrastructure, ongoing discussion about legal entity
Private cloud at the moment, with new communities which can be taken of board (virtual organizations), geotermal communities, the fishery community is the largest.
D4SCIENCE delivers SaaS (not IaaS) and services including data management, data mining

D4SCIENCE also active in ENVRI+
iMARINE is an initiative, owning data in some cases, models and algorithms, all the services are outsourced to D4SCIENCE, different governance

Integration of commercial resources attempted at Engineering
300 VMs in total including support services

USERS
- 1,500 users registered (not all of them are active at the same time)

USE CASE

(1) end users will interact with gCUBE. It provides the abstraction to combine software and tools, generating as output a virtual appliance for storage in AppDB

(2) need for horizontal scalability. Note: in EGI there are tools for automatic horizontal scalability, but they are or Amazon specific, or OpenStack specific or support several infra, but not OCCI.
INDIGO has activities to support provide scalability + standards

PH1.
===
virtual appliance for iMARINE WN with gCUBE technology as mediator ==> scale on demand when needed. The Dispatcher is managed by D4SCIENCE, jobs in the queue, and then monitoring of jobs. Process can be composed of multiple jobs on data, and the result is then manipulated by other models. The job is a script with UNIQUE ID. The virtual appliance is a worker node. Each WN fetches the id of job to execute and the data (fetched), with data resolver, saves locally.

 A VRE is created on demand (data, models and request) => the VO manager approves/denies => if approved, a num of workers are assigned, the wn can be exclusively allocated or shared
No monitoring on queue, no additional assignment of workers, done at level of VRE approval

EGI-Engage: change this and manage quotas, 20 WNs and VREs can scale up to 60, and only if needed these will be deployed. This will require some changes in gCUBE.

a. VO managers connects to gCUBE
b. VO managers chooses to run WNs (see list of WNs and can assign WNs one by one to the VRE). The broker will dispatch VRE requires to the respective resources - on demand discovery ==> VO manager may evolve into just specify WNs, and gCUBE broker will automate this step
c. VREs are made available on some fed cloud providers

*** WN is a web service running in a web service container (tomcat) ***

Nagios and Ganglia for monitoring.

All data functionalities are web services. Need to operate on fed cloud? probably not.
Jobs process GBY of data, not big data problems. Data output? stored locally, then a data reduce phase is needed and then it remains in the storage (Mongo DB as storage backend)
Output is private (results are not published).
D4SCIENCE supports housing of data (relational DBs, geoservers for spacial data etc.)


There are minutes attached to this event. Show them.
The agenda of this meeting is empty