ENES looks for resources with considerable amount of storage (100TB to start) that will
Jupyter frontend, backend in cloud/HPC with the data science environment with the different technologies.
Pilot in HPC: focus on containerisation with udocker, open to compare with other technologies if available.
HPC deployment would it be possible to have a frontend in Jupyter? onedata? -> cloud part will be start faster, shouldn't be a problem to have public IP.
Look into cloud deployment first: 1 jupyterhub + few VMs pre-allocated to start the analysis to ensure some availability, with elasticity, onedata will come probably later.
EC3 can provide such deployment, that can be tuned to the use case, e.g. start several VMs at the same time instead of a single one.
sequential tools in the beginning, focus on making the user experience easy
Storage: need to be accessible by all the nodes. TUBITAK has NFS system that should work with SSD caches
NFS can be critical for parallel computations but TUBITAK has good experience on NFS to tune it for high I/O workloads. Need to test.
LUSTRE used in the HPC side of things, but considering NFS.
Need to understand how to map users to the storage.
Plan:
Next steps: