deprecated-oc-catalog/docs/components/components_specification.md
2024-03-18 11:25:59 +01:00

3.0 KiB

This documents aims to describe the role of each component in the catalog. It textually describes their attributes in order for anyone involved in the development to grasp their role and also identify some missing features/attributes.

This document should be accompanied of a diagram that summarizes it.

Components description

As a user of oc-catalog I want to be able to create a workflow, which represents the flow of data between different components : computing, datacenter, data and storage.

Each component has a name, a logo, a short and a long description.

Computing

A computing component is used to execute the docker image in it's command attribute. A computing component must be linked with a datacenter component, where it will be executed.

It has two required fields CPU and RAM which describe the minimum amount of calculating ressources needed to execute it.

Optionnaly, it can have a value in the GPU field.

For each instance of a computing component we can specify :

  • an other entrypoint to the image : this must be specified after the name of the image in command.
  • arguments, which will be passed to the entrypoint
  • Environment variables

The fields input and output list the different links coming in and out of the computing components.

[!] This is redundant with the Links object that we create when parsing the XML in oc-scheduler, might be better to remove them if proved redundant

Datacenter

A datacenter is identified by its DC acronym which is a very short form of its name.

Note : as of now, this dc cronym field is used a primary key in order to link other components to a datacenter. This might be a sign that using a NoSQL db in the future might not be the best option.

Each datacenter must declare :

  • its Memory, composed of two field : ecc (error-correcting code) and size (in MB)
  • its CPU which is composed of :
    • its cores number
    • a boolean to declare if the cores are shared or not
    • its architecture
    • its pltaform
    • the minimum memory needed

Finally, we can add GPUs to a datacenter, they are characterized by :

  • Their number of couda cores
  • number of tensor cores
  • their size (Mb)
  • their model

Data

This component represent a data source, we want to know what type of data they produce. They have a base64 encoded example of the final data structure.

The source URL must be specified, as well as the protocol.

! Hence, maybe these two field should merged, and only have an URL that indicates its protocol.

Storage

Storage components are linked to a datacenter, and used to store the result of a computing component.

Storage component are associated with a datacenter with its dc acronyme. They also have an URL to reach them. A storage component has a storage size and some optionnal field :

  • crypted storage
  • the type of redundancy
  • its throughput

Finally they have a price

Diagram