Compare commits

...

5 Commits

16 changed files with 391 additions and 2 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

BIN
docs/S3/img/workflow.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

View File

@@ -0,0 +1,44 @@
# Allowing reparted Pods to use S3 storage
As a first way to transfer data from one processing node to another we have implemented the mechanics that allow a pod to access a bucket on a S3 compatible server which is not on the same kubernetes cluster.
For this we will use an example Workflow run with Argo and Admiralty on the node *Control*, with the **curl** and **mosquitto** processing executing on the control node and the other processing on the *Target01* node.
To transfer data we will use the **S3** and **output/input** annotations handled by Argo, using two *Minio* servers on Control and Target01.
![](./img/workflow.png)
When the user launches a booking on the UI a request is sent to **oc-scheduler**, which :
- Check if another booking is scheduled at the time requested
- Creates the booking and workflow executions in the DB
- Creates the namespace, service accounts and rights for argo to execute
![](./img/ns-creation-after-booking.gif)
We added another action to the existing calls that were made to **oc-datacenter**.
**oc-scheduler** retrieves all the storage ressources in the workflow and for each, retrieves the *computing* ressources that host a processing ressource using the storage ressource. Here we have :
- Minio Control :
- Control (via the first cURL)
- Target01 (via imagemagic)
- Minio Target01 :
- Control (via alpine)
- Target01 (via cURL, openalpr and mosquitto)
If the computing and storage ressources are on the same node, **oc-scheduler** uses an empty POST request to the route and **oc-datacenter** create the credentials on the S3 server and store them in a kubernetes secret in the execution's namespace.
If the two ressources are in different nodes **oc-scheduler** uses a POST request which states it needs to retrieve the credentials, reads the response and call the appopriate **oc-datacenter** to create a kubernetes secret. This means if we add three nodes
- A from which the workflow is scheduled
- B where the storage is
- C where the computing is
A can contact B to retrieve the credentials, post them to C for storage and then run an Argo Workflow, from which a pod will be deported to C and will be able to access the S3 server on B.
![](./img/secrets-created-in-s3.gif)
# Final
We can see that the different processing are able to access the required data on different storage ressources, and that our ALPR analysis is sent to the mosquitto server and to the HTTP endpoint we set in the last cURL.
![](./img/argo-watch-executing.gif)

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

View File

@@ -3,6 +3,85 @@
We have written two playbooks available on a private [GitHub repo](https://github.com/pi-B/ansible-oc/tree/384a5acc0713a0fa013a82f71fbe2338bf6c80c1/Admiralty) We have written two playbooks available on a private [GitHub repo](https://github.com/pi-B/ansible-oc/tree/384a5acc0713a0fa013a82f71fbe2338bf6c80c1/Admiralty)
- `deploy_admiralty.yml` installs Helm and necessary charts in order to run Admiralty on the cluster - `deploy_admiralty.yml` installs Helm and necessary charts in order to run Admiralty on the cluster
- `setup_admiralty_target.yml` create the environment necessary to use a cluster as a target in an Admiralty federation running Argo Workflows. Create the necessary serviceAccount, target ressource and token to authentify the source
- `add_admiralty_target.yml` creates the environment to use a cluster as a source, providing the data necessary to use a given cluster as a target.
# Ansible playbook
ansible-playbook deploy_admiralty.yml -i <REMOTE_HOST_IP>, --extra-vars "user_prompt=<YOUR_USER>" --ask-pass
```yaml
- name: Install Helm
hosts: all:!localhost
user: "{{ user_prompt }}"
become: true
# become_method: su
vars:
arch_mapping: # Map ansible architecture {{ ansible_architecture }} names to Docker's architecture names
x86_64: amd64
aarch64: arm64
tasks:
- name: Check if Helm does exist
ansible.builtin.command:
cmd: which helm
register: result_which
failed_when: result_which.rc not in [ 0, 1 ]
- name: Install helm
when: result_which.rc == 1
block:
- name: download helm from source
ansible.builtin.get_url:
url: https://get.helm.sh/helm-v3.15.0-linux-amd64.tar.gz
dest: ./
- name: unpack helm
ansible.builtin.unarchive:
remote_src: true
src: helm-v3.15.0-linux-amd64.tar.gz
dest: ./
- name: copy helm to path
ansible.builtin.command:
cmd: mv linux-amd64/helm /usr/local/bin/helm
- name: Install admiralty
hosts: all:!localhost
user: "{{ user_prompt }}"
tasks:
- name: Install required python libraries
become: true
# become_method: su
package:
name:
- python3
- python3-yaml
state: present
- name: Add jetstack repo
ansible.builtin.shell:
cmd: |
helm repo add jetstack https://charts.jetstack.io && \
helm repo update
- name: Install cert-manager
kubernetes.core.helm:
chart_ref: jetstack/cert-manager
release_name: cert-manager
context: default
namespace: cert-manager
create_namespace: true
wait: true
set_values:
- value: installCRDs=true
- name: Install admiralty
kubernetes.core.helm:
name: admiralty
chart_ref: oci://public.ecr.aws/admiralty/admiralty
namespace: admiralty
create_namespace: true
chart_version: 0.16.0
wait: true
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

View File

@@ -0,0 +1,151 @@
# Goals
This originated from a demand to know how much RAM is consummed by Open Cloud when running a large number of workflow at the same time on the same node.
We differentiated between differents components :
- The "oc-stack", which is the minimum set of services to be able to create and schedule a workflow execution : oc-auth, oc-datacenter, oc-scheduler, oc-front, oc-schedulerd, oc-workflow, oc-catalog, oc-peer, oc-workspace, loki, mongo, traefik and nats
- oc-monitord, which is the daemon instanciated by the scheduling daemon (oc-schedulerd) that created the YAML for argo and creates the necessary kubernetes ressources.
We monitor both parts to view how much RAM the oc-stack uses before / during / after the execution, the RAM consummed by the monitord containers and the total of the stack and monitors.
# Setup
In order to have optimal performance we used a Promox server with high ressources (>370 GiB RAM and 128 cores) to hosts two VMs composing our Kubernetes cluster, with one control plane node were the oc stack is running and a worker node with only k3s running.
## VMs
We instantiated a 2 node kubernetes (with k3s) cluster on the superg PVE (https://superg-pve.irtse-pf.ext:8006/)
### VM Control
This vm is running the oc stack and the monitord containers, it carries the biggest part of the load. It must have k3s and argo installed. We allocated **62 GiB of RAM** and **31 cores**.
### VM Worker
This VM is holding the workload for all the pods created, acting as a worker node for the k3s cluster. We deploy k3s as a nodes as explained in the K3S quick start guide :
`curl -sfL https://get.k3s.io | K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh -`
The value to use for K3S_TOKEN is stored at `/var/lib/rancher/k3s/server/node-token` on the server node.
Verify that the server has been added as a node to the cluster on the control plane with `kubectl get nodes` and look for the hostname of the worker VM on the list of nodes.
### Delegate pods to the worker node
In order for the pods to be executed on another node we need to modify how we construct he Argo YAML, to add a label in the metadata. We have added the needed attributes to the `Spec` struct in `oc-monitord` on the `test-ram` branch.
```go
type Spec struct {
ServiceAccountName string `yaml:"serviceAccountName"`
Entrypoint string `yaml:"entrypoint"`
Arguments []Parameter `yaml:"arguments,omitempty"`
Volumes []VolumeClaimTemplate `yaml:"volumeClaimTemplates,omitempty"`
Templates []Template `yaml:"templates"`
Timeout int `yaml:"activeDeadlineSeconds,omitempty"`
NodeSelector struct{
NodeRole string `yaml:"node-role"`
} `yaml:"nodeSelector"`
}
```
and added the tag in the `CreateDAG()` method :
```go
b.Workflow.Spec.NodeSelector.NodeRole = "worker"
```
## Container monitoring
Docker compose to instantiate the monitoring stack :
- Prometheus : storing data
- Cadvisor : monitoring of the containers
```yml
version: '3.2'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- 9090:9090
command:
- --config.file=/etc/prometheus/prometheus.yml
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
depends_on:
- cadvisor
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
ports:
- 9999:8080
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
```
Prometheus scrapping configuration :
```yml
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
static_configs:
- targets:
- cadvisor:8080
```
## Dashboards
In order to monitor the ressource consumption during our tests we need to create dashboard in Grafana.
We create 4 different queries using Prometheus as the data source. For each query we can use the `code` mode to create them from a PromQL query.
### OC stack consumption
```
sum(container_memory_usage_bytes{name=~"oc-auth|oc-datacenter|oc-scheduler|oc-front|oc-schedulerd|oc-workflow|oc-catalog|oc-peer|oc-workspace|loki|mongo|traefik|nats"})
```
### Monitord consumption
```
sum(container_memory_usage_bytes{image="oc-monitord"})
```
### Total RAM consumption
```
sum(
container_memory_usage_bytes{name=~"oc-auth|oc-datacenter|oc-scheduler|oc-front|oc-schedulerd|oc-workflow|oc-catalog|oc-peer|oc-workspace|loki|mongo|traefik|nats"}
or
container_memory_usage_bytes{image="oc-monitord"}
)
```
### Number of monitord containers
```
count(container_memory_usage_bytes{image="oc-monitord"} > 0)
```
# Launch executions
We will use a script to insert in the DB the executions that will create the monitord containers.
We need to retrieve two informations to execute the scripted insertion :
- The **workflow id** for the workflow we want to instantiate, this is can be located in the DB
- A **token** to authentify against the API, connect to oc-front and retrieve the token in your browser network analyzer tool.
Add these to the `insert_exex.sh` script.
The script takes two arguments :
- **$1** : the number of executions, which are created by chunks of 10 using a CRON expression to create 10 execution**S** for each execution/namespace
- **$2** : the number of minutes between now and the execution time for the executions.

View File

@@ -0,0 +1,72 @@
#!/bin/bash
TOKEN=""
WORFLOW=""
NB_EXEC=$1
TIME=$2
if [ -z "$NB_EXEC" ]; then
NB_EXEC=1
fi
# if (( NB_EXEC % 10 != 0 )); then
# echo "Met un chiffre rond stp"
# exit 0
# fi
if [ -z "$TIME" ]; then
TIME=1
fi
EXECS=$(((NB_EXEC+9) / 10))
echo EXECS=$EXECS
DAY=$(date +%d -u)
MONTH=$(date +%m -u)
HOUR=$(date +%H -u)
MINUTE=$(date -d "$TIME min" +"%M" -u)
SECOND=$(date +%s -u)
start_loop=$(date +%s)
for ((i = 1; i <= $EXECS; i++)); do
(
start_req=$(date +%s)
echo "Exec $i"
CRON="0-10 $MINUTE $HOUR $DAY $MONTH *"
echo "$CRON"
START="2025-$MONTH-$DAY"T"$HOUR:$MINUTE:00.012Z"
END_MONTH=$(printf "%02d" $((MONTH + 1)))
END="2025-$END_MONTH-$DAY"T"$HOUR:$MINUTE:00.012Z"
# PAYLOAD=$(printf '{"id":null,"name":null,"cron":"","mode":1,"start":"%s","end":"%s"}' "$START" "$END")
PAYLOAD=$(printf '{"id":null,"name":null,"cron":"%s","mode":1,"start":"%s","end":"%s"}' "$CRON" "$START" "$END")
# echo $PAYLOAD
curl -X 'POST' "http://localhost:8000/scheduler/$WORKFLOW" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d "$PAYLOAD" \
-H "Authorization: Bearer $TOKEN" -w '\n'
end=$(date +%s)
duration=$((end - start_req))
echo "Début $start_req"
echo "Fin $end"
echo "Durée d'exécution $i : $duration secondes"
)&
done
wait
end_loop=$(date +%s)
total_time=$((end_loop - start_loop))
echo "Durée d'exécution total : $total_time secondes"

View File

@@ -0,0 +1,43 @@
We used a very simple mono node workflow which execute a simple sleep command within an alpine container
![](wf_test_ram_1node.png)
# 10 monitors
![alt text](10_monitors.png)
# 100 monitors
![alt text](100_monitors.png)
# 150 monitors
![alt text](150_monitors.png)
# Observations
We see an increase in the memory usage by the OC stack which initially is around 600/700 MiB :
```
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
7ce889dd97cc oc-auth 0.00% 21.82MiB / 11.41GiB 0.19% 125MB / 61.9MB 23.3MB / 5.18MB 9
93be30148a12 oc-catalog 0.14% 17.52MiB / 11.41GiB 0.15% 300MB / 110MB 35.1MB / 242kB 9
611de96ee37e oc-datacenter 0.32% 21.85MiB / 11.41GiB 0.19% 38.7MB / 18.8MB 14.8MB / 0B 9
dafb3027cfc6 oc-front 0.00% 5.887MiB / 11.41GiB 0.05% 162kB / 3.48MB 1.65MB / 12.3kB 7
d7601fd64205 oc-peer 0.23% 16.46MiB / 11.41GiB 0.14% 201MB / 74.2MB 27.6MB / 606kB 9
a78eb053f0c8 oc-scheduler 0.00% 17.24MiB / 11.41GiB 0.15% 125MB / 61.1MB 17.3MB / 1.13MB 10
bfbc3c7c2c14 oc-schedulerd 0.07% 15.05MiB / 11.41GiB 0.13% 303MB / 293MB 7.58MB / 176kB 9
304bb6a65897 oc-workflow 0.44% 107.6MiB / 11.41GiB 0.92% 2.54GB / 2.65GB 50.9MB / 11.2MB 10
62e243c1c28f oc-workspace 0.13% 17.1MiB / 11.41GiB 0.15% 193MB / 95.6MB 34.4MB / 2.14MB 10
3c9311c8b963 loki 1.57% 147.4MiB / 11.41GiB 1.26% 37.4MB / 16.4MB 148MB / 459MB 13
01284abc3c8e mongo 1.48% 86.78MiB / 11.41GiB 0.74% 564MB / 1.48GB 35.6MB / 5.35GB 94
14fc9ac33688 traefik 2.61% 49.53MiB / 11.41GiB 0.42% 72.1MB / 72.1MB 127MB / 2.2MB 13
4f1b7890c622 nats 0.70% 78.14MiB / 11.41GiB 0.67% 2.64GB / 2.36GB 17.3MB / 2.2MB 14
Total 631.2 Mb
```
However over time with the repetition of a large number of scheduling that the stacks uses a larger amount of RAM.
Espacially it seems that **loki**, **nats**, **mongo**, **oc-datacenter** and **oc-workflow** grow overs 150 MiB. This can be explained by the cache growing in these containers, which seems to be reduced every time the containers are restarted.

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

BIN
performance_test Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB