added ansible playbook to the deployment file

added the explanation for how the multi node S3 setup is done with GIF illustrating
Finalised report of the performance test
2025-09-16 10:19:31 +02:00 · 2025-08-12 18:14:02 +02:00 · 2025-05-28 12:26:29 +02:00 · 2025-05-28 11:30:55 +02:00 · 2025-05-28 11:30:22 +02:00
16 changed files with 391 additions and 2 deletions
--- a/docs/S3/img/argo-watch-executing.gif
+++ b/docs/S3/img/argo-watch-executing.gif
--- a/docs/S3/img/ns-creation-after-booking.gif
+++ b/docs/S3/img/ns-creation-after-booking.gif
--- a/docs/S3/img/secrets-created-in-s3.gif
+++ b/docs/S3/img/secrets-created-in-s3.gif
--- a/docs/S3/img/workflow.png
+++ b/docs/S3/img/workflow.png
--- a/docs/S3/reparted-S3-readme.md
+++ b/docs/S3/reparted-S3-readme.md
@@ -0,0 +1,44 @@
+# Allowing reparted Pods to use S3 storage
+
+As a first way to transfer data from one processing node to another we have implemented the mechanics that allow a pod to access a bucket on a S3 compatible server which is not on the same kubernetes cluster.
+
+For this we will use an example Workflow run with Argo and Admiralty on the node *Control*, with the **curl** and **mosquitto** processing executing on the control node and the other processing on the *Target01* node.
+To transfer data we will use the **S3** and **output/input** annotations handled by Argo, using two *Minio* servers on Control and Target01.
+
+![](./img/workflow.png)
+
+
+When the user launches a booking on the UI a request is sent to **oc-scheduler**, which :
+- Check if another booking is scheduled at the time requested
+- Creates the booking and workflow executions in the DB
+- Creates the namespace, service accounts and rights for argo to execute
+
+![](./img/ns-creation-after-booking.gif)
+
+We added another action to the existing calls that were made to **oc-datacenter**. 
+
+**oc-scheduler** retrieves all the storage ressources in the workflow and for each, retrieves the *computing* ressources that host a processing ressource using the storage ressource. Here we have :
+- Minio Control :
+    - Control (via the first cURL)
+    - Target01 (via imagemagic)
+
+- Minio Target01 : 
+    - Control (via alpine)
+    - Target01 (via cURL, openalpr and mosquitto)
+
+If the computing and storage ressources are on the same node, **oc-scheduler** uses an empty POST request to the route and **oc-datacenter** create the credentials on the S3 server and store them in a kubernetes secret in the execution's namespace.
+
+If the two ressources are in different nodes **oc-scheduler** uses a POST request which states it needs to retrieve the credentials, reads the response and call the appopriate **oc-datacenter** to create a kubernetes secret. This means if we add three nodes 
+- A from which the workflow is scheduled
+- B where the storage is
+- C where the computing is
+
+A can contact B to retrieve the credentials, post them to C for storage and then run an Argo Workflow, from which a pod will be deported to C and will be able to access the S3 server on B.
+
+![](./img/secrets-created-in-s3.gif)
+
+# Final
+
+We can see that the different processing are able to access the required data on different storage ressources, and that our ALPR analysis is sent to the mosquitto server and to the HTTP endpoint we set in the last cURL.
+
+![](./img/argo-watch-executing.gif)
--- a/docs/admiralty/Capture
+++ b/docs/admiralty/Capture
--- a/docs/admiralty/Capture
+++ b/docs/admiralty/Capture
--- a/docs/admiralty/deployment.md
+++ b/docs/admiralty/deployment.md
@@ -3,6 +3,85 @@
 We have written two playbooks available on a private [GitHub repo](https://github.com/pi-B/ansible-oc/tree/384a5acc0713a0fa013a82f71fbe2338bf6c80c1/Admiralty)

 - `deploy_admiralty.yml` installs Helm and necessary charts in order to run Admiralty on the cluster
- `setup_admiralty_target.yml` create the environment necessary to use a cluster as a target in an Admiralty federation running Argo Workflows. Create the necessary serviceAccount, target ressource and token to authentify the source
- `add_admiralty_target.yml` creates the environment to use a cluster as a source, providing the data necessary to use a given cluster as a target.

+# Ansible playbook
+
+ansible-playbook deploy_admiralty.yml -i <REMOTE_HOST_IP>, --extra-vars "user_prompt=<YOUR_USER>" --ask-pass
+
+```yaml
+- name: Install Helm
+  hosts: all:!localhost
+  user: "{{ user_prompt }}"
+  become: true
+  # become_method: su
+  vars:
+    arch_mapping:  # Map ansible architecture {{ ansible_architecture }} names to Docker's architecture names
+      x86_64: amd64
+      aarch64: arm64
+
+
+  tasks:
+    - name: Check if Helm does exist
+      ansible.builtin.command: 
+        cmd: which helm
+      register: result_which
+      failed_when: result_which.rc not in [ 0, 1 ]
+
+    - name: Install helm
+      when: result_which.rc == 1
+      block:
+        - name: download helm from source
+          ansible.builtin.get_url:
+            url: https://get.helm.sh/helm-v3.15.0-linux-amd64.tar.gz
+            dest: ./
+
+        - name: unpack helm
+          ansible.builtin.unarchive:
+            remote_src: true
+            src: helm-v3.15.0-linux-amd64.tar.gz
+            dest: ./
+
+        - name: copy helm to path
+          ansible.builtin.command:
+            cmd: mv linux-amd64/helm /usr/local/bin/helm     
+
+- name: Install admiralty
+  hosts: all:!localhost
+  user: "{{ user_prompt }}"
+  
+  tasks:
+    - name: Install required python libraries   
+      become: true 
+      # become_method: su
+      package:
+        name: 
+          - python3
+          - python3-yaml
+        state: present    
+
+    - name: Add jetstack repo
+      ansible.builtin.shell: 
+        cmd: |
+          helm repo add jetstack https://charts.jetstack.io && \ 
+          helm repo update 
+
+    - name: Install cert-manager
+      kubernetes.core.helm:
+        chart_ref: jetstack/cert-manager
+        release_name: cert-manager
+        context: default
+        namespace: cert-manager
+        create_namespace: true
+        wait: true
+        set_values:
+          - value: installCRDs=true
+
+    - name: Install admiralty
+      kubernetes.core.helm:
+        name: admiralty
+        chart_ref: oci://public.ecr.aws/admiralty/admiralty
+        namespace: admiralty
+        create_namespace: true
+        chart_version: 0.16.0
+        wait: true
+```
--- a/docs/performance_test/100_monitors.png
+++ b/docs/performance_test/100_monitors.png
--- a/docs/performance_test/10_monitors.png
+++ b/docs/performance_test/10_monitors.png
--- a/docs/performance_test/150_monitors.png
+++ b/docs/performance_test/150_monitors.png
--- a/docs/performance_test/README.md
+++ b/docs/performance_test/README.md
@@ -0,0 +1,151 @@
+# Goals
+
+This originated from a demand to know how much RAM is consummed by Open Cloud when running a large number of workflow at the same time on the same node. 
+
+We differentiated between differents components : 
+
+- The "oc-stack", which is the minimum set of services to be able to create and schedule a workflow execution : oc-auth, oc-datacenter, oc-scheduler, oc-front, oc-schedulerd, oc-workflow, oc-catalog, oc-peer, oc-workspace, loki, mongo, traefik and nats
+
+- oc-monitord, which is the daemon instanciated by the scheduling daemon (oc-schedulerd) that created the YAML for argo and creates the necessary kubernetes ressources.
+
+We monitor both parts to view how much RAM the oc-stack uses before / during / after the execution, the RAM consummed by the monitord containers and the total of the stack and monitors.
+
+# Setup
+
+In order to have optimal performance we used a Promox server with high ressources (>370 GiB RAM and 128 cores) to hosts two VMs composing our Kubernetes cluster, with one control plane node were the oc stack is running and a worker node with only k3s running.
+
+## VMs
+
+We instantiated a 2 node kubernetes (with k3s) cluster on the superg PVE (https://superg-pve.irtse-pf.ext:8006/)
+
+### VM Control 
+
+This vm is running the oc stack and the monitord containers, it carries the biggest part of the load. It must have k3s and argo installed. We allocated **62 GiB of RAM** and **31 cores**.
+
+### VM Worker
+
+This VM is holding the workload for all the pods created, acting as a worker node for the k3s cluster. We deploy k3s as a nodes as explained in the K3S quick start guide :
+
+`curl -sfL https://get.k3s.io | K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh -`
+
+The value to use for K3S_TOKEN is stored at `/var/lib/rancher/k3s/server/node-token` on the server node.
+
+Verify that the server has been added as a node to the cluster on the control plane with `kubectl get nodes` and look for the hostname of the worker VM on the list of nodes.
+
+### Delegate pods to the worker node
+
+In order for the pods to be executed on another node we need to modify how we construct he Argo YAML, to add a label in the metadata. We have added the needed attributes to the `Spec` struct in `oc-monitord` on the `test-ram` branch. 
+
+```go
+type Spec struct {
+	ServiceAccountName	string					`yaml:"serviceAccountName"`
+	Entrypoint 			string                	`yaml:"entrypoint"`
+	Arguments  			[]Parameter           	`yaml:"arguments,omitempty"`
+	Volumes    			[]VolumeClaimTemplate 	`yaml:"volumeClaimTemplates,omitempty"`
+	Templates  			[]Template            	`yaml:"templates"`
+	Timeout    			int                   	`yaml:"activeDeadlineSeconds,omitempty"`
+	NodeSelector		struct{
+							NodeRole string `yaml:"node-role"`
+						} `yaml:"nodeSelector"`
+}
+```
+
+and added the tag in the `CreateDAG()` method :
+
+```go
+b.Workflow.Spec.NodeSelector.NodeRole = "worker"
+```
+
+## Container monitoring
+
+Docker compose to instantiate the monitoring stack : 
+- Prometheus : storing data  
+- Cadvisor : monitoring of the containers 
+
+```yml
+version: '3.2'
+services:
+  prometheus:
+    image: prom/prometheus:latest
+    container_name: prometheus
+    ports:
+    - 9090:9090
+    command:
+    - --config.file=/etc/prometheus/prometheus.yml
+    volumes:
+    - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
+    depends_on:
+    - cadvisor 
+  cadvisor:
+    image: gcr.io/cadvisor/cadvisor:latest
+    container_name: cadvisor
+    ports:
+    - 9999:8080
+    volumes:
+    - /:/rootfs:ro
+    - /var/run:/var/run:rw
+    - /sys:/sys:ro
+    - /var/lib/docker/:/var/lib/docker:ro
+  
+```
+
+Prometheus scrapping configuration :
+
+```yml
+scrape_configs:
+- job_name: cadvisor
+  scrape_interval: 5s
+  static_configs:
+  - targets:
+    - cadvisor:8080
+```
+
+## Dashboards
+
+In order to monitor the ressource consumption during our tests we need to create dashboard in Grafana. 
+
+We create 4 different queries using Prometheus as the data source. For each query we can use the `code` mode to create them from a PromQL query. 
+
+### OC stack consumption
+
+```
+sum(container_memory_usage_bytes{name=~"oc-auth|oc-datacenter|oc-scheduler|oc-front|oc-schedulerd|oc-workflow|oc-catalog|oc-peer|oc-workspace|loki|mongo|traefik|nats"})
+```
+
+### Monitord consumption
+
+```
+sum(container_memory_usage_bytes{image="oc-monitord"})
+```
+
+### Total RAM consumption
+
+```
+sum(
+  container_memory_usage_bytes{name=~"oc-auth|oc-datacenter|oc-scheduler|oc-front|oc-schedulerd|oc-workflow|oc-catalog|oc-peer|oc-workspace|loki|mongo|traefik|nats"}
+  or
+  container_memory_usage_bytes{image="oc-monitord"}
+)
+```
+
+### Number of monitord containers
+
+```
+count(container_memory_usage_bytes{image="oc-monitord"} > 0)
+```
+
+# Launch executions
+
+We will use a script to insert in the DB the executions that will create the monitord containers.
+
+We need to retrieve two informations to execute the scripted insertion : 
+
+- The **workflow id** for the workflow we want to instantiate, this is can be located in the DB
+- A **token** to authentify against the API, connect to oc-front and retrieve the token in your browser network analyzer tool.
+
+Add these to the `insert_exex.sh` script.
+
+The script takes two arguments :
+- **$1** : the number of executions, which are created by chunks of 10 using a CRON expression to create 10 execution**S** for each execution/namespace
+
+- **$2** : the number of minutes between now and the execution time for the executions.
--- a/docs/performance_test/insert_exec.sh
+++ b/docs/performance_test/insert_exec.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+
+TOKEN=""
+WORFLOW=""
+
+NB_EXEC=$1
+TIME=$2
+
+if [  -z "$NB_EXEC" ]; then
+	NB_EXEC=1
+fi
+
+# if (( NB_EXEC % 10 != 0 )); then
+#     echo "Met un chiffre rond stp"
+#     exit 0
+# fi
+
+if [  -z "$TIME" ]; then
+	TIME=1
+fi
+
+
+EXECS=$(((NB_EXEC+9) / 10))
+echo EXECS=$EXECS
+
+DAY=$(date +%d -u)
+MONTH=$(date +%m -u)
+HOUR=$(date +%H -u)
+MINUTE=$(date -d "$TIME min" +"%M" -u)
+SECOND=$(date +%s -u)
+
+start_loop=$(date +%s)
+
+for ((i = 1; i <= $EXECS; i++)); do
+	(	
+	start_req=$(date +%s)
+	
+	echo "Exec $i"
+	CRON="0-10 $MINUTE $HOUR $DAY $MONTH *"
+	echo "$CRON"
+
+	START="2025-$MONTH-$DAY"T"$HOUR:$MINUTE:00.012Z"
+
+	END_MONTH=$(printf "%02d" $((MONTH + 1)))
+	END="2025-$END_MONTH-$DAY"T"$HOUR:$MINUTE:00.012Z"
+
+	# PAYLOAD=$(printf '{"id":null,"name":null,"cron":"","mode":1,"start":"%s","end":"%s"}' "$START" "$END")
+	PAYLOAD=$(printf '{"id":null,"name":null,"cron":"%s","mode":1,"start":"%s","end":"%s"}' "$CRON" "$START" "$END")
+
+	# echo $PAYLOAD
+
+	curl -X 'POST'   "http://localhost:8000/scheduler/$WORKFLOW" \
+	  	-H 'accept: application/json' \
+	  	-H 'Content-Type: application/json' \
+		-d "$PAYLOAD" \
+	 	-H "Authorization: Bearer $TOKEN" -w '\n'
+
+	end=$(date +%s)
+	duration=$((end - start_req))
+
+	echo "Début $start_req"
+	echo "Fin $end"
+	echo "Durée d'exécution $i : $duration secondes"  
+	)&
+	
+done
+
+wait
+
+end_loop=$(date +%s)
+total_time=$((end_loop - start_loop))
+echo "Durée d'exécution total : $total_time secondes"
--- a/docs/performance_test/performance_report.md
+++ b/docs/performance_test/performance_report.md
@@ -0,0 +1,43 @@
+We used a very simple mono node workflow which execute a simple sleep command within an alpine container
+
+![](wf_test_ram_1node.png)
+
+# 10 monitors
+
+![alt text](10_monitors.png)
+
+# 100 monitors
+
+![alt text](100_monitors.png)
+
+# 150 monitors
+
+![alt text](150_monitors.png)
+
+# Observations
+
+We see an increase in the memory usage by the OC stack which initially is around 600/700 MiB :
+
+```
+CONTAINER ID   NAME            CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
+7ce889dd97cc   oc-auth         0.00%     21.82MiB / 11.41GiB   0.19%     125MB / 61.9MB    23.3MB / 5.18MB   9
+93be30148a12   oc-catalog      0.14%     17.52MiB / 11.41GiB   0.15%     300MB / 110MB     35.1MB / 242kB    9
+611de96ee37e   oc-datacenter   0.32%     21.85MiB / 11.41GiB   0.19%     38.7MB / 18.8MB   14.8MB / 0B       9
+dafb3027cfc6   oc-front        0.00%     5.887MiB / 11.41GiB   0.05%     162kB / 3.48MB    1.65MB / 12.3kB   7
+d7601fd64205   oc-peer         0.23%     16.46MiB / 11.41GiB   0.14%     201MB / 74.2MB    27.6MB / 606kB    9
+a78eb053f0c8   oc-scheduler    0.00%     17.24MiB / 11.41GiB   0.15%     125MB / 61.1MB    17.3MB / 1.13MB   10
+bfbc3c7c2c14   oc-schedulerd   0.07%     15.05MiB / 11.41GiB   0.13%     303MB / 293MB     7.58MB / 176kB    9
+304bb6a65897   oc-workflow     0.44%     107.6MiB / 11.41GiB   0.92%     2.54GB / 2.65GB   50.9MB / 11.2MB   10
+62e243c1c28f   oc-workspace    0.13%     17.1MiB / 11.41GiB    0.15%     193MB / 95.6MB    34.4MB / 2.14MB   10
+3c9311c8b963   loki            1.57%     147.4MiB / 11.41GiB   1.26%     37.4MB / 16.4MB   148MB / 459MB     13
+01284abc3c8e   mongo           1.48%     86.78MiB / 11.41GiB   0.74%     564MB / 1.48GB    35.6MB / 5.35GB   94
+14fc9ac33688   traefik         2.61%     49.53MiB / 11.41GiB   0.42%     72.1MB / 72.1MB   127MB / 2.2MB     13
+4f1b7890c622   nats            0.70%     78.14MiB / 11.41GiB   0.67%     2.64GB / 2.36GB   17.3MB / 2.2MB    14
+
+Total                                     631.2 Mb
+```
+
+However over time with the repetition of a large number of scheduling that the stacks uses a larger amount of RAM.
+
+Espacially it seems that **loki**, **nats**, **mongo**, **oc-datacenter** and **oc-workflow** grow overs 150 MiB. This can be explained by the cache growing in these containers, which seems to be reduced every time the containers are restarted.
+
--- a/docs/performance_test/wf_test_ram_1node.png
+++ b/docs/performance_test/wf_test_ram_1node.png
--- a/BIN
+++ b/BIN
Author	SHA1	Message	Date
pb	8e74e2b399	added ansible playbook to the deployment file	2025-09-16 10:19:31 +02:00
pb	6722c365fd	added the explanation for how the multi node S3 setup is done with GIF illustrating	2025-08-12 18:14:02 +02:00
pb	3da3ada710	Finalised report of the performance test	2025-05-28 12:26:29 +02:00
pb	a9b5f6dcad	Merge branch 'master' of https://cloud.o-forge.io/core/oc-doc	2025-05-28 11:30:55 +02:00
pb	a10021fb98	added documentation for the RAM consumption tests we jsut did	2025-05-28 11:30:22 +02:00