deprecated-oc-catalog/docs/components/components_specification.md

70 lines
3.0 KiB
Markdown
Raw Normal View History

This documents aims to describe the role of each component in the catalog. It textually describes their attributes in order for anyone involved in the development to grasp their role and also identify some missing features/attributes.
This document should be accompanied of a diagram that summarizes it.
# Components description
As a user of oc-catalog I want to be able to create a workflow, which represents the flow of data between different components : computing, datacenter, data and storage.
Each component has a name, a logo, a short and a long description.
## Computing
A computing component is used to execute the docker image in it's **command** attribute. A computing component **must** be linked with a datacenter component, where it will be executed.
It has two required fields **CPU** and **RAM** which describe the minimum amount of calculating ressources needed to execute it.
Optionnaly, it can have a value in the **GPU** field.
For each instance of a computing component we can specify :
- an other entrypoint to the image : this must be specified after the name of the image in **command**.
- **arguments**, which will be passed to the entrypoint
- **Environment variables**
2024-03-18 11:25:59 +01:00
The fields **input** and **output** list the different links coming in and out of the computing components.
> [!] This is redundant with the Links object that we create when parsing the XML in oc-scheduler, might be better to remove them if proved redundant
## Datacenter
A datacenter is identified by its **DC acronym** which is a very short form of its name.
**Note** : as of now, this dc cronym field is used a primary key in order to link other components to a datacenter. This might be a sign that using a NoSQL db in the future might not be the best option.
Each datacenter must declare :
- its **Memory**, composed of two field : **ecc** (error-correcting code) and **size** (in MB)
- its **CPU** which is composed of :
- its **cores** number
- a boolean to declare if the cores are **shared** or not
- its **architecture**
- its **pltaform**
- the **minimum memory** needed
Finally, we can add **GPU**s to a datacenter, they are characterized by :
- Their number of **couda cores**
- number of **tensor cores**
- their **size** (Mb)
- their **model**
## Data
This component represent a data source, we want to know what **type** of data they produce. They have a base64 encoded **example** of the final data structure.
The source **URL** must be specified, as well as the **protocol**.
> ! Hence, maybe these two field should merged, and only have an URL that indicates its protocol.
## Storage
Storage components are linked to a datacenter, and used to store the result of a computing component.
Storage component are associated with a datacenter with its **dc acronyme**. They also have an **URL** to reach them. A storage component has a storage **size** and some optionnal field :
- **crypted** storage
- the type of **redundancy**
- its **throughput**
Finally they have a **price**
# Diagram
![](models_oc-catalog.jpg)