added ansible playbook to the deployment file

added the explanation for how the multi node S3 setup is done with GIF illustrating
Finalised report of the performance test
2025-09-16 10:19:31 +02:00 · 2025-08-12 18:14:02 +02:00 · 2025-05-28 12:26:29 +02:00 · 2025-05-28 11:30:55 +02:00 · 2025-05-28 11:30:22 +02:00
16 changed files with 391 additions and 2 deletions
--- a/docs/S3/img/argo-watch-executing.gif
+++ b/docs/S3/img/argo-watch-executing.gif
--- a/docs/S3/img/ns-creation-after-booking.gif
+++ b/docs/S3/img/ns-creation-after-booking.gif
--- a/docs/S3/img/secrets-created-in-s3.gif
+++ b/docs/S3/img/secrets-created-in-s3.gif
--- a/docs/S3/img/workflow.png
+++ b/docs/S3/img/workflow.png
--- a/docs/S3/reparted-S3-readme.md
+++ b/docs/S3/reparted-S3-readme.md
@@ -0,0 +1,44 @@
 # Allowing reparted Pods to use S3 storage
 As a first way to transfer data from one processing node to another we have implemented the mechanics that allow a pod to access a bucket on a S3 compatible server which is not on the same kubernetes cluster.
 For this we will use an example Workflow run with Argo and Admiralty on the node *Control*, with the **curl** and **mosquitto** processing executing on the control node and the other processing on the *Target01* node.
 To transfer data we will use the **S3** and **output/input** annotations handled by Argo, using two *Minio* servers on Control and Target01.
 ![](./img/workflow.png)
 When the user launches a booking on the UI a request is sent to **oc-scheduler**, which :
 - Check if another booking is scheduled at the time requested
 - Creates the booking and workflow executions in the DB
 - Creates the namespace, service accounts and rights for argo to execute
 ![](./img/ns-creation-after-booking.gif)
 We added another action to the existing calls that were made to **oc-datacenter**. 
 **oc-scheduler** retrieves all the storage ressources in the workflow and for each, retrieves the *computing* ressources that host a processing ressource using the storage ressource. Here we have :
 - Minio Control :
    - Control (via the first cURL)
    - Target01 (via imagemagic)
 - Minio Target01 : 
    - Control (via alpine)
    - Target01 (via cURL, openalpr and mosquitto)
 If the computing and storage ressources are on the same node, **oc-scheduler** uses an empty POST request to the route and **oc-datacenter** create the credentials on the S3 server and store them in a kubernetes secret in the execution's namespace.
 If the two ressources are in different nodes **oc-scheduler** uses a POST request which states it needs to retrieve the credentials, reads the response and call the appopriate **oc-datacenter** to create a kubernetes secret. This means if we add three nodes 
 - A from which the workflow is scheduled
 - B where the storage is
 - C where the computing is
 A can contact B to retrieve the credentials, post them to C for storage and then run an Argo Workflow, from which a pod will be deported to C and will be able to access the S3 server on B.
 ![](./img/secrets-created-in-s3.gif)
 # Final
 We can see that the different processing are able to access the required data on different storage ressources, and that our ALPR analysis is sent to the mosquitto server and to the HTTP endpoint we set in the last cURL.
 ![](./img/argo-watch-executing.gif)
--- a/docs/admiralty/Capture
+++ b/docs/admiralty/Capture
--- a/docs/admiralty/Capture
+++ b/docs/admiralty/Capture
--- a/docs/admiralty/deployment.md
+++ b/docs/admiralty/deployment.md
@@ -3,6 +3,85 @@
 We have written two playbooks available on a private [GitHub repo](https://github.com/pi-B/ansible-oc/tree/384a5acc0713a0fa013a82f71fbe2338bf6c80c1/Admiralty)
 - `deploy_admiralty.yml` installs Helm and necessary charts in order to run Admiralty on the cluster
 - `setup_admiralty_target.yml` create the environment necessary to use a cluster as a target in an Admiralty federation running Argo Workflows. Create the necessary serviceAccount, target ressource and token to authentify the source
 - `add_admiralty_target.yml` creates the environment to use a cluster as a source, providing the data necessary to use a given cluster as a target.
 # Ansible playbook
 ansible-playbook deploy_admiralty.yml -i <REMOTE_HOST_IP>, --extra-vars "user_prompt=<YOUR_USER>" --ask-pass
 ```yaml
 - name: Install Helm
  hosts: all:!localhost
  user: "{{ user_prompt }}"
  become: true
  # become_method: su
  vars:
    arch_mapping:  # Map ansible architecture {{ ansible_architecture }} names to Docker's architecture names
      x86_64: amd64
      aarch64: arm64
  tasks:
    - name: Check if Helm does exist
      ansible.builtin.command: 
        cmd: which helm
      register: result_which
      failed_when: result_which.rc not in [ 0, 1 ]
    - name: Install helm
      when: result_which.rc == 1
      block:
        - name: download helm from source
          ansible.builtin.get_url:
            url: https://get.helm.sh/helm-v3.15.0-linux-amd64.tar.gz
            dest: ./
        - name: unpack helm
          ansible.builtin.unarchive:
            remote_src: true
            src: helm-v3.15.0-linux-amd64.tar.gz
            dest: ./
        - name: copy helm to path
          ansible.builtin.command:
            cmd: mv linux-amd64/helm /usr/local/bin/helm     
 - name: Install admiralty
  hosts: all:!localhost
  user: "{{ user_prompt }}"
  tasks:
    - name: Install required python libraries   
      become: true 
      # become_method: su
      package:
        name: 
          - python3
          - python3-yaml
        state: present    
    - name: Add jetstack repo
      ansible.builtin.shell: 
        cmd: |
          helm repo add jetstack https://charts.jetstack.io && \ 
          helm repo update 
    - name: Install cert-manager
      kubernetes.core.helm:
        chart_ref: jetstack/cert-manager
        release_name: cert-manager
        context: default
        namespace: cert-manager
        create_namespace: true
        wait: true
        set_values:
          - value: installCRDs=true
    - name: Install admiralty
      kubernetes.core.helm:
        name: admiralty
        chart_ref: oci://public.ecr.aws/admiralty/admiralty
        namespace: admiralty
        create_namespace: true
        chart_version: 0.16.0
        wait: true
 ```
--- a/docs/performance_test/100_monitors.png
+++ b/docs/performance_test/100_monitors.png
--- a/docs/performance_test/10_monitors.png
+++ b/docs/performance_test/10_monitors.png
--- a/docs/performance_test/150_monitors.png
+++ b/docs/performance_test/150_monitors.png
--- a/docs/performance_test/README.md
+++ b/docs/performance_test/README.md
@@ -0,0 +1,151 @@
 # Goals
 This originated from a demand to know how much RAM is consummed by Open Cloud when running a large number of workflow at the same time on the same node. 
 We differentiated between differents components : 
 - The "oc-stack", which is the minimum set of services to be able to create and schedule a workflow execution : oc-auth, oc-datacenter, oc-scheduler, oc-front, oc-schedulerd, oc-workflow, oc-catalog, oc-peer, oc-workspace, loki, mongo, traefik and nats
 - oc-monitord, which is the daemon instanciated by the scheduling daemon (oc-schedulerd) that created the YAML for argo and creates the necessary kubernetes ressources.
 We monitor both parts to view how much RAM the oc-stack uses before / during / after the execution, the RAM consummed by the monitord containers and the total of the stack and monitors.
 # Setup
 In order to have optimal performance we used a Promox server with high ressources (>370 GiB RAM and 128 cores) to hosts two VMs composing our Kubernetes cluster, with one control plane node were the oc stack is running and a worker node with only k3s running.
 ## VMs
 We instantiated a 2 node kubernetes (with k3s) cluster on the superg PVE (https://superg-pve.irtse-pf.ext:8006/)
 ### VM Control 
 This vm is running the oc stack and the monitord containers, it carries the biggest part of the load. It must have k3s and argo installed. We allocated **62 GiB of RAM** and **31 cores**.
 ### VM Worker
 This VM is holding the workload for all the pods created, acting as a worker node for the k3s cluster. We deploy k3s as a nodes as explained in the K3S quick start guide :
 `curl -sfL https://get.k3s.io | K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh -`
 The value to use for K3S_TOKEN is stored at `/var/lib/rancher/k3s/server/node-token` on the server node.
 Verify that the server has been added as a node to the cluster on the control plane with `kubectl get nodes` and look for the hostname of the worker VM on the list of nodes.
 ### Delegate pods to the worker node
 In order for the pods to be executed on another node we need to modify how we construct he Argo YAML, to add a label in the metadata. We have added the needed attributes to the `Spec` struct in `oc-monitord` on the `test-ram` branch. 
 ```go
 type Spec struct {
 	ServiceAccountName	string					`yaml:"serviceAccountName"`
 	Entrypoint 			string                	`yaml:"entrypoint"`
 	Arguments  			[]Parameter           	`yaml:"arguments,omitempty"`
 	Volumes    			[]VolumeClaimTemplate 	`yaml:"volumeClaimTemplates,omitempty"`
 	Templates  			[]Template            	`yaml:"templates"`
 	Timeout    			int                   	`yaml:"activeDeadlineSeconds,omitempty"`
 	NodeSelector		struct{
 							NodeRole string `yaml:"node-role"`
 						} `yaml:"nodeSelector"`
 }
 ```
 and added the tag in the `CreateDAG()` method :
 ```go
 b.Workflow.Spec.NodeSelector.NodeRole = "worker"
 ```
 ## Container monitoring
 Docker compose to instantiate the monitoring stack : 
 - Prometheus : storing data  
 - Cadvisor : monitoring of the containers 
 ```yml
 version: '3.2'
 services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
    - 9090:9090
    command:
    - --config.file=/etc/prometheus/prometheus.yml
    volumes:
    - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
    depends_on:
    - cadvisor 
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    ports:
    - 9999:8080
    volumes:
    - /:/rootfs:ro
    - /var/run:/var/run:rw
    - /sys:/sys:ro
    - /var/lib/docker/:/var/lib/docker:ro
 ```
 Prometheus scrapping configuration :
 ```yml
 scrape_configs:
 - job_name: cadvisor
  scrape_interval: 5s
  static_configs:
  - targets:
    - cadvisor:8080
 ```
 ## Dashboards
 In order to monitor the ressource consumption during our tests we need to create dashboard in Grafana. 
 We create 4 different queries using Prometheus as the data source. For each query we can use the `code` mode to create them from a PromQL query. 
 ### OC stack consumption
 ```
 sum(container_memory_usage_bytes{name=~"oc-auth|oc-datacenter|oc-scheduler|oc-front|oc-schedulerd|oc-workflow|oc-catalog|oc-peer|oc-workspace|loki|mongo|traefik|nats"})
 ```
 ### Monitord consumption
 ```
 sum(container_memory_usage_bytes{image="oc-monitord"})
 ```
 ### Total RAM consumption
 ```
 sum(
  container_memory_usage_bytes{name=~"oc-auth|oc-datacenter|oc-scheduler|oc-front|oc-schedulerd|oc-workflow|oc-catalog|oc-peer|oc-workspace|loki|mongo|traefik|nats"}
  or
  container_memory_usage_bytes{image="oc-monitord"}
 )
 ```
 ### Number of monitord containers
 ```
 count(container_memory_usage_bytes{image="oc-monitord"} > 0)
 ```
 # Launch executions
 We will use a script to insert in the DB the executions that will create the monitord containers.
 We need to retrieve two informations to execute the scripted insertion : 
 - The **workflow id** for the workflow we want to instantiate, this is can be located in the DB
 - A **token** to authentify against the API, connect to oc-front and retrieve the token in your browser network analyzer tool.
 Add these to the `insert_exex.sh` script.
 The script takes two arguments :
 - **$1** : the number of executions, which are created by chunks of 10 using a CRON expression to create 10 execution**S** for each execution/namespace
 - **$2** : the number of minutes between now and the execution time for the executions.
--- a/docs/performance_test/insert_exec.sh
+++ b/docs/performance_test/insert_exec.sh
@@ -0,0 +1,72 @@
 #!/bin/bash
 TOKEN=""
 WORFLOW=""
 NB_EXEC=$1
 TIME=$2
 if [  -z "$NB_EXEC" ]; then
 	NB_EXEC=1
 fi
 # if (( NB_EXEC % 10 != 0 )); then
 #     echo "Met un chiffre rond stp"
 #     exit 0
 # fi
 if [  -z "$TIME" ]; then
 	TIME=1
 fi
 EXECS=$(((NB_EXEC+9) / 10))
 echo EXECS=$EXECS
 DAY=$(date +%d -u)
 MONTH=$(date +%m -u)
 HOUR=$(date +%H -u)
 MINUTE=$(date -d "$TIME min" +"%M" -u)
 SECOND=$(date +%s -u)
 start_loop=$(date +%s)
 for ((i = 1; i <= $EXECS; i++)); do
 	(	
 	start_req=$(date +%s)
 	echo "Exec $i"
 	CRON="0-10 $MINUTE $HOUR $DAY $MONTH *"
 	echo "$CRON"
 	START="2025-$MONTH-$DAY"T"$HOUR:$MINUTE:00.012Z"
 	END_MONTH=$(printf "%02d" $((MONTH + 1)))
 	END="2025-$END_MONTH-$DAY"T"$HOUR:$MINUTE:00.012Z"
 	# PAYLOAD=$(printf '{"id":null,"name":null,"cron":"","mode":1,"start":"%s","end":"%s"}' "$START" "$END")
 	PAYLOAD=$(printf '{"id":null,"name":null,"cron":"%s","mode":1,"start":"%s","end":"%s"}' "$CRON" "$START" "$END")
 	# echo $PAYLOAD
 	curl -X 'POST'   "http://localhost:8000/scheduler/$WORKFLOW" \
 	  	-H 'accept: application/json' \
 	  	-H 'Content-Type: application/json' \
 		-d "$PAYLOAD" \
 	 	-H "Authorization: Bearer $TOKEN" -w '\n'
 	end=$(date +%s)
 	duration=$((end - start_req))
 	echo "Début $start_req"
 	echo "Fin $end"
 	echo "Durée d'exécution $i : $duration secondes"  
 	)&
 done
 wait
 end_loop=$(date +%s)
 total_time=$((end_loop - start_loop))
 echo "Durée d'exécution total : $total_time secondes"
--- a/docs/performance_test/performance_report.md
+++ b/docs/performance_test/performance_report.md
@@ -0,0 +1,43 @@
 We used a very simple mono node workflow which execute a simple sleep command within an alpine container
 ![](wf_test_ram_1node.png)
 # 10 monitors
 ![alt text](10_monitors.png)
 # 100 monitors
 ![alt text](100_monitors.png)
 # 150 monitors
 ![alt text](150_monitors.png)
 # Observations
 We see an increase in the memory usage by the OC stack which initially is around 600/700 MiB :
 ```
 CONTAINER ID   NAME            CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
 7ce889dd97cc   oc-auth         0.00%     21.82MiB / 11.41GiB   0.19%     125MB / 61.9MB    23.3MB / 5.18MB   9
 93be30148a12   oc-catalog      0.14%     17.52MiB / 11.41GiB   0.15%     300MB / 110MB     35.1MB / 242kB    9
 611de96ee37e   oc-datacenter   0.32%     21.85MiB / 11.41GiB   0.19%     38.7MB / 18.8MB   14.8MB / 0B       9
 dafb3027cfc6   oc-front        0.00%     5.887MiB / 11.41GiB   0.05%     162kB / 3.48MB    1.65MB / 12.3kB   7
 d7601fd64205   oc-peer         0.23%     16.46MiB / 11.41GiB   0.14%     201MB / 74.2MB    27.6MB / 606kB    9
 a78eb053f0c8   oc-scheduler    0.00%     17.24MiB / 11.41GiB   0.15%     125MB / 61.1MB    17.3MB / 1.13MB   10
 bfbc3c7c2c14   oc-schedulerd   0.07%     15.05MiB / 11.41GiB   0.13%     303MB / 293MB     7.58MB / 176kB    9
 304bb6a65897   oc-workflow     0.44%     107.6MiB / 11.41GiB   0.92%     2.54GB / 2.65GB   50.9MB / 11.2MB   10
 62e243c1c28f   oc-workspace    0.13%     17.1MiB / 11.41GiB    0.15%     193MB / 95.6MB    34.4MB / 2.14MB   10
 3c9311c8b963   loki            1.57%     147.4MiB / 11.41GiB   1.26%     37.4MB / 16.4MB   148MB / 459MB     13
 01284abc3c8e   mongo           1.48%     86.78MiB / 11.41GiB   0.74%     564MB / 1.48GB    35.6MB / 5.35GB   94
 14fc9ac33688   traefik         2.61%     49.53MiB / 11.41GiB   0.42%     72.1MB / 72.1MB   127MB / 2.2MB     13
 4f1b7890c622   nats            0.70%     78.14MiB / 11.41GiB   0.67%     2.64GB / 2.36GB   17.3MB / 2.2MB    14
 Total                                     631.2 Mb
 ```
 However over time with the repetition of a large number of scheduling that the stacks uses a larger amount of RAM.
 Espacially it seems that **loki**, **nats**, **mongo**, **oc-datacenter** and **oc-workflow** grow overs 150 MiB. This can be explained by the cache growing in these containers, which seems to be reduced every time the containers are restarted.
--- a/docs/performance_test/wf_test_ram_1node.png
+++ b/docs/performance_test/wf_test_ram_1node.png
--- a/BIN
+++ b/BIN
Author	SHA1	Message	Date
pb	8e74e2b399	added ansible playbook to the deployment file	2025-09-16 10:19:31 +02:00
pb	6722c365fd	added the explanation for how the multi node S3 setup is done with GIF illustrating	2025-08-12 18:14:02 +02:00
pb	3da3ada710	Finalised report of the performance test	2025-05-28 12:26:29 +02:00
pb	a9b5f6dcad	Merge branch 'master' of https://cloud.o-forge.io/core/oc-doc	2025-05-28 11:30:55 +02:00
pb	a10021fb98	added documentation for the RAM consumption tests we jsut did	2025-05-28 11:30:22 +02:00