This 1,5-day-long event brings together the AI platform providers and the use cases from the iMagine project. They work together in the iMagine Competence Centre.
The aim of the event is boosting the adoption of the AI Platform and eliminating technical roadblocks that would hinder platform adoption and implementation progress with the use cases.
Particular attention will be given to transitioning the use cases from the development to the delivery/validation phase (together with the available labelled datasets), and opening these as services for external users.
The event is organised for technical staff working in iMagine WP3-WP4-WP5, as well as for the three externally onboarded new use cases. The main participants are the developers within the use cases, and the providers and user supporters from the iMagine AI platform.
Participation for the cloud providers from WP4 is optional.
The workshop will be facilitated by EGI staff.
iMagine 2nd Competence Centre Workshop - Meeting notes by Gergely:
DAY1:
Data publishing (Dick):
- A metadata structure was presented by Dick as a proposal to describe training datasets in Zenodo. These datasets must be also associated with the iMagine community in Zenodo
- Large, externally stored datasets should remain where they are, but a Zenodo entry would be created with a pointer to the external storage
- D3.3 (lead MARIS) is due at the end of April and should capture the move to data publishing
----------
USE CASE REPORTS:
UC8:
- Small subset of data was published on institutional repo, bigger expected in 2025 --> Please send pointer of this to Ilaria, Smitesh
EUX3: Age reading from fish otoliths
- Investigate annotation feature of iMagine Platform. Check out the other experiences page in Confluence
EUX1: Bathymetry
- Using sentinel images to understand sea depth. First results are promising, method works up to 15m depth
EUX2: Cold water coral reefs
- Delineate living and dead corals to be able to perform minitoring studies over time, to identify degradation/recoveries of habitats
- Tried 3 different models (Mask R-CNN, MaskFormer, YOLOv8)
UC1: Marine litter --> Mature
- Model is ready, OSCAR integration ongoing (?), Integration with GeoPortal is in early phase
- User videos tutorials are planned
- Outreach on DFKI website --> Replicate/strengthen on the iMagine website
UC2: Zooscan --> Mature
- 1.5h/day that could be saved for a single person. There are ~300 zooscans around the world - Huge time saving potential
- Mask2Former AI model
- 'Intersection Over Union' metrics is used for model quality evaluation. Impressive statistics on the precision (slide 19)
- Build for ZooScan, but with the ambition to be able to transport the whole technology to other instruments
- ~14,240 images in SeaNoe (CC-BY) --> To create a Zenodo entry linking to this
- They have 20+ years of samples, which is long enough to say sg about climate change
- They used local reosurces to train the models. (iMagine platform worked, but was slower than local system - because of a big memory local GPU + already pre-configured)
UC3-1: Azores
- Yolov8, MaskRCNN are models used in the pipeline. MaskRCNN is older, there were incomplatibility problems with latest Keras and TensorFlow
- There was a problem with NextCloud which drove them use local storage and they stayed local even after NextCloud was fixed
- Still in model training and testing with different parameters
- Possibly can already publish a dataset. --> Ilaria, Smitesh to follow up
UC3-2: Smartbay
- Two problems they solve: 1: Video quality degradation, 2: Species recognition
- Models: Yolov8 and DOVER. Yolo needs labelled images, which is lacking and also lack of manpower. Several video platforms use unsupervised models, that's how DOVER was found
- Data publishing expected in Aug 2024
- Service publishing timeline undefined
UC3-3: OBSEA
- S52 nice diagram on how the iMagine AI platorm valorise the camera images as an extension of the OBSEA infrastructure
- First training dataset was published in Zenodo --> Ilaria to ensure it's in the iMagine community
- Grafana based statistics generation framework is in place
- Target user: Their own site? Other EMSO site?
- EMSO has service group, data group where the EMSO UC results can/should be presented. --> Ilaria to follow up with Enoc
UC4: Oil spill
- Data will be available. But when?
- Want to setup a second instance on the iMagine platform besides their in-house deployment
UC5: Phytoplankton --> Mature
- US1: Model is in the MP, it's ready for usage. (User doc is to be written) --> EGI to engage to capture service description and offer
- Data published in Zenodo --> Need to update metadata and community association
UC6: Underwater noise
- Training dataset is available --> To publish on Zenodo
- Model: CNN
UC7: Beach monitoring in Mallorca
- Image dataset is already in Zenodo --> Improve metadata, Community association
- Yolov8 and UNet models are used
--------------
DAY2:
- Angel's slides are useful for the iMagine DTO presentation (if accepted). Articulate on the iMagine website/reports on how iMagine results can feed into DTO? --> Ilaria?
- iMagine platform:
* Provenance information available about the models in the MP. --> Make this visible about the services that go into production. (UC 5, 1, 2)
- General purpose modules:
* New models in the MP for beginner use cases: FasterRCNN, YOLOv8, YOLOv9 soon, Image Classification
- OSCAR:
* Engine/framework for model inference. GUI more for testing, REST API for external integration (from community portals)
* Predeployed instance already connected to the iMagine clouds
- iMagine AI platform feedback by MARIS
* EyeOnWater citizen science application - Classify images: Suitable/unsuitable for DB inclusion
* Presented main findings about the platform. A short report will be sent. Topics to further discuss in WP4:
** Resource requirement estimates for models
** Training job suspension/restart
Reporting back from day 1 break-out groups:
1. Data release in Zenodo (Dick)
- We should use an ISO or DublinCore standard to have fields that are more broadly interoperable
- We are exploring with Zenodo whether we could have a dedicated template for our data entries
- If both are yes then we'll prepare an instruction for the UCs
- Data that's already in open access repositories should stay there and only linked from Zenodo
- Immediate action is to have a Zenodo entry for every dataset available from the UCs, and associate those entries to the iMagine-community in Zenodo
2. Delivery of model applications for inference: AI platform vs External GUIs via OSCAR; Generic logins; Capacity allocation for use cases and for individual users. Remote service vs local setups (Smitesh)
- Sustainability of delivery will be explored with institutes in the project; with RIs that could use the solutions, with e-infras that could support delivery in the long term (e.g. French national clouds) --> Smitesh and the ASB to follow up
- We will check whether Check-in auth-authz solution could report user statistics and nationality of users --> Smitesh to follow up and report back to the ASB
- Model inference - UC5, UC1, UC2 to start first:
* UC5: 2 value propositions: (1) Embed model into pre-deployed OSCAR, users classify their own images with pre-trained model. (2) Retrain the model with the user's data and generate a new OSCAR deployment. (3) Deploy the setup internally by pullling from GitHub.
* UC1: (1) Would need predeployed OSCAR for the basic inference, invoked from an external GUI portal. (2) Retraining the model could be offered later.
* UC2: Use case is ~dozens of images at a time. (1 and 2) Same as UC1, but needs a GPU for execution (so Walton does not work, the other 3 clouds do). (3) Want to support this, there are ~100 institutes operate such system.
--> Setup a follow-up call, focussing on this topic. Bring the use cases, UPV, CSIC, KIT also into the meeting.
--> Structure D3.3 around these three delivery models (as a framework), position the use cases within this framework, define step-by-step action list for them under the delivery options relevant to them. Gergely to prepare the initial outline. (Lead by MARIS)
Alternatives to OSCAR - Which would result the most sustainable setup for the UCs? - Some alternatives: https://openwhisk.apache.org/ OR https://nifi.apache.org/ OR DockerCompose (if user number is low)
3. AI tooling experiences: Annotation/labelling; Certain models (Valentin)
- CVAT is a good annotation tool, integration into the iMagine AI platform is needed (and started?)
- Next round of best practices collection has recently started (Deliverable template is available for comments). Possibly can be a journal article afterwards
- Info was shared about upcoming generic models (YOLOv8, v9)
- Info was shared about new tool to generate synthetic data (for underrepresented classes)
---
SIDE DISCUSSION:
Updates to make to the website (Ilaria):
- Introduce new area: Image analysis services. Start building up the content from UC 5, 1, 2 based on a common service description and access template
- Introduce new area: Our data. Short page with pointer to the datasets that are stored in Zenodo (See several of these mentioned in UC reports above)
- Introduce new area: Our AI models. A table listing the models we use with references to the use cases that apply them (See most of them listed above). Also highlight those general purpose models that are avialable in the MP as starting point for new use cases (FasterRCNN, YOLOv8, Image Classification)