demo test + Peer

This commit is contained in:
mr
2026-03-03 16:38:24 +01:00
parent 79aa3cc2b3
commit ef3d998ead
41 changed files with 2360 additions and 578 deletions

495
ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,495 @@
# oc-discovery — Architecture et analyse technique
> **Convention de lecture**
> Les points marqués ✅ ont été corrigés dans le code. Les points marqués ⚠️ restent ouverts.
## Table des matières
1. [Vue d'ensemble](#1-vue-densemble)
2. [Hiérarchie des rôles](#2-hiérarchie-des-rôles)
3. [Mécanismes principaux](#3-mécanismes-principaux)
- 3.1 Heartbeat long-lived (node → indexer)
- 3.2 Scoring de confiance
- 3.3 Enregistrement auprès des natifs (indexer → native)
- 3.4 Pool d'indexeurs : fetch + consensus
- 3.5 Self-delegation et offload loop
- 3.6 Résilience du mesh natif
- 3.7 DHT partagée
- 3.8 PubSub gossip (indexer registry)
- 3.9 Streams applicatifs (node ↔ node)
4. [Tableau récapitulatif](#4-tableau-récapitulatif)
5. [Risques et limites globaux](#5-risques-et-limites-globaux)
6. [Pistes d'amélioration](#6-pistes-damélioration)
---
## 1. Vue d'ensemble
`oc-discovery` est un service de découverte P2P pour le réseau OpenCloud. Il repose sur
**libp2p** (transport TCP + PSK réseau privé) et une **DHT Kademlia** (préfixe `oc`)
pour indexer les pairs. L'architecture est intentionnellement hiérarchique : des _natifs_
stables servent de hubs autoritaires auxquels des _indexeurs_ s'enregistrent, et des _nœuds_
ordinaires découvrent des indexeurs via ces natifs.
```
┌──────────────┐ heartbeat ┌──────────────────┐
│ Node │ ───────────────────► │ Indexer │
│ (libp2p) │ ◄─────────────────── │ (DHT server) │
└──────────────┘ stream applicatif └────────┬─────────┘
│ subscribe / heartbeat
┌──────────────────┐
│ Native Indexer │◄──► autres natifs
│ (hub autoritaire│ (mesh)
└──────────────────┘
```
Tous les participants partagent une **clé pré-partagée (PSK)** qui isole le réseau
des connexions libp2p externes non autorisées.
---
## 2. Hiérarchie des rôles
| Rôle | Binaire | Responsabilité |
|---|---|---|
| **Node** | `node_mode=node` | Se fait indexer, publie/consulte des records DHT |
| **Indexer** | `node_mode=indexer` | Reçoit les heartbeats, écrit en DHT, s'enregistre auprès des natifs |
| **Native Indexer** | `node_mode=native` | Hub : tient le registre des indexeurs vivants, évalue le consensus, sert de fallback |
Un même processus peut cumuler les rôles node+indexer ou indexer+native.
---
## 3. Mécanismes principaux
### 3.1 Heartbeat long-lived (node → indexer)
**Fonctionnement**
Un stream libp2p **persistant** (`/opencloud/heartbeat/1.0`) est ouvert depuis le nœud
vers chaque indexeur de son pool (`StaticIndexers`). Toutes les 20 secondes, le nœud
envoie un `Heartbeat` JSON sur ce stream. L'indexeur répond en enregistrant le peer dans
`StreamRecords[ProtocolHeartbeat]` avec une expiry de 2 min.
Si `sendHeartbeat` échoue (stream reset, EOF, timeout), le peer est retiré de
`StaticIndexers` et `replenishIndexersFromNative` est déclenché.
**Avantages**
- Détection rapide de déconnexion (erreur sur le prochain encode).
- Un seul stream par pair réduit la pression sur les connexions TCP.
- Le channel de nudge (`indexerHeartbeatNudge`) permet un reconnect immédiat sans
attendre le ticker de 20 s.
**Limites / risques**
- ⚠️ Un seul stream persistant : si la couche TCP reste ouverte mais "gelée" (middlebox,
NAT silencieux), l'erreur peut ne pas remonter avant plusieurs minutes.
- ⚠️ `StaticIndexers` est une map partagée globale : si deux goroutines appellent
`replenishIndexersFromNative` simultanément (cas de perte multiple), on peut avoir
des écritures concurrentes non protégées hors des sections critiques.
---
### 3.2 Scoring de confiance
**Fonctionnement**
Avant d'enregistrer un heartbeat dans `StreamRecords`, l'indexeur vérifie un **score
minimum** calculé par `CheckHeartbeat` :
```
Score = (0.4 × uptime_ratio + 0.4 × bpms + 0.2 × diversity) × 100
```
- `uptime_ratio` : durée de présence du peer / durée depuis le démarrage de l'indexeur.
- `bpms` : débit mesuré via un stream dédié (`/opencloud/probe/1.0`) normalisé par 50 Mbps.
- `diversity` : ratio d'IP /24 distincts parmi les indexeurs que le peer déclare.
Deux seuils sont appliqués selon l'état du peer :
- **Premier heartbeat** (peer absent de `StreamRecords`, uptime = 0) : seuil à **40**.
- **Heartbeats suivants** (uptime accumulé) : seuil à **75**.
**Avantages**
- Décourage les peers éphémères ou lents d'encombrer le registre.
- La diversité réseau réduit le risque de concentration sur un seul sous-réseau.
- Le stream de probe dédié évite de polluer le stream JSON heartbeat avec des données binaires.
- Le double seuil permet aux nouveaux peers d'être admis dès leur première connexion.
**Limites / risques**
-**Deadlock logique de démarrage corrigé** : avec uptime = 0 le score maximal était 60,
en-dessous du seuil de 75. Les nouveaux peers étaient silencieusement rejetés à jamais.
→ Seuil abaissé à **40** pour le premier heartbeat (`isFirstHeartbeat`), 75 ensuite.
- ⚠️ Les seuils (40 / 75) restent câblés en dur, sans possibilité de configuration.
- ⚠️ La mesure de bande passante envoie entre 512 et 2048 octets par heartbeat : à 20 s
d'intervalle et 500 nœuds max, cela représente ~50 KB/s de trafic probe en continu.
- ⚠️ `diversity` est calculé sur les adresses que le nœud *déclare* avoir — ce champ est
auto-rapporté et non vérifié, facilement falsifiable.
---
### 3.3 Enregistrement auprès des natifs (indexer → native)
**Fonctionnement**
Chaque indexeur (non-natif) envoie périodiquement (toutes les 60 s) une
`IndexerRegistration` JSON sur un stream one-shot (`/opencloud/native/subscribe/1.0`)
vers chaque natif configuré. Le natif :
1. Stocke l'entrée en cache local avec un TTL de **90 s** (`IndexerTTL`).
2. Gossipe le `PeerID` sur le topic PubSub `oc-indexer-registry` aux autres natifs.
3. Persiste l'entrée en DHT de manière asynchrone (retry jusqu'à succès).
**Avantages**
- Stream jetable : pas de ressource longue durée côté natif pour les enregistrements.
- Le cache local est immédiatement disponible pour `handleNativeGetIndexers` sans
attendre la DHT.
- La dissémination PubSub permet à d'autres natifs de connaître l'indexeur sans
qu'il ait besoin de s'y enregistrer directement.
**Limites / risques**
-**TTL trop serré corrigé** : le TTL de 66 s n'était que 10 % au-dessus de l'intervalle
de 60 s — un léger retard réseau pouvait expirer un indexeur sain entre deux renewals.
`IndexerTTL` porté à **90 s** (+50 %).
- ⚠️ Si le `PutValue` DHT échoue définitivement (réseau partitionné), le natif possède
l'entrée mais les autres natifs qui n'ont pas reçu le message PubSub ne la connaissent
jamais — incohérence silencieuse.
- ⚠️ `RegisterWithNative` ignore les adresses en `127.0.0.1`, mais ne gère pas
les adresses privées (RFC1918) qui seraient non routables depuis d'autres hôtes.
---
### 3.4 Pool d'indexeurs : fetch + consensus
**Fonctionnement**
Lors de `ConnectToNatives` (démarrage ou replenish), le nœud/indexeur :
1. **Fetch** : envoie `GetIndexersRequest` au premier natif répondant
(`/opencloud/native/indexers/1.0`), reçoit une liste de candidats.
2. **Consensus (round 1)** : interroge **tous** les natifs configurés en parallèle
(`/opencloud/native/consensus/1.0`, timeout 3 s, collecte sur 4 s).
Un indexeur est confirmé si **strictement plus de 50 %** des natifs répondants
le considèrent vivant.
3. **Consensus (round 2)** : si le pool est insuffisant, les suggestions des natifs
(indexeurs qu'ils connaissent mais qui n'étaient pas dans les candidats initiaux)
sont soumises à un second round.
**Avantages**
- La règle de majorité absolue empêche un natif compromis ou désynchronisé d'injecter
des indexeurs fantômes.
- Le double round permet de compléter le pool avec des alternatives connues des natifs
sans sacrifier la vérification.
- Si le fetch retourne un **fallback** (natif comme indexeur), le consensus est skippé —
cohérent car il n'y a qu'une seule source.
**Limites / risques**
- ⚠️ Avec **un seul natif** configuré (très courant en dev/test), le consensus est trivial
(100 % d'un seul vote) — la règle de majorité ne protège rien dans ce cas.
- ⚠️ `fetchIndexersFromNative` s'arrête au **premier natif répondant** (séquentiellement) :
si ce natif a un cache périmé ou partiel, le nœud obtient un pool sous-optimal sans
consulter les autres.
- ⚠️ Le timeout de collecte global (4 s) est fixe : sur un réseau lent ou géographiquement
distribué, des natifs valides peuvent être éliminés faute de réponse à temps.
- ⚠️ `replaceStaticIndexers` **ajoute** sans jamais retirer d'anciens indexeurs expirés :
le pool peut accumuler des entrées mortes que seul le heartbeat purge ensuite.
---
### 3.5 Self-delegation et offload loop
**Fonctionnement**
Si un natif ne dispose d'aucun indexeur vivant lors d'un `handleNativeGetIndexers`,
il se désigne lui-même comme indexeur temporaire (`selfDelegate`) : il retourne sa propre
adresse multiaddr et ajoute le demandeur dans `responsiblePeers`, dans la limite de
`maxFallbackPeers` (50). Au-delà, la délégation est refusée et une réponse vide est
retournée pour que le nœud tente un autre natif.
Toutes les 30 s, `runOffloadLoop` vérifie si des indexeurs réels sont de nouveau
disponibles. Si oui, pour chaque peer responsable :
- **Stream présent** : `Reset()` du stream heartbeat — le peer reçoit une erreur,
déclenche `replenishIndexersFromNative` et migre vers de vrais indexeurs.
- **Stream absent** (peer jamais admis par le scoring) : `ClosePeer()` sur la connexion
réseau — le peer reconnecte et re-demande ses indexeurs au natif.
**Avantages**
- Continuité de service : un nœud n'est jamais bloqué en l'absence temporaire d'indexeurs.
- La migration est automatique et transparente pour le nœud.
- `Reset()` (vs `Close()`) interrompt les deux sens du stream, garantissant que le peer
reçoit bien une erreur.
- La limite de 50 empêche le natif de se retrouver surchargé lors de pénuries prolongées.
**Limites / risques**
-**Offload sans stream corrigé** : si le heartbeat n'avait jamais été enregistré dans
`StreamRecords` (score < seuil cas amplifié par le bug de scoring), l'offload
échouait silencieusement et le peer restait dans `responsiblePeers` indéfiniment.
Branche `else` : `ClosePeer()` + suppression de `responsiblePeers`.
- **`responsiblePeers` illimité corrigé** : le natif acceptait un nombre arbitraire
de peers en self-delegation, devenant lui-même un indexeur surchargé.
`selfDelegate` vérifie `len(responsiblePeers) >= maxFallbackPeers` et retourne
`false` si saturé.
- La délégation reste non coordonnée entre natifs : un natif surchargé refuse (retourne
vide) mais ne redirige pas explicitement vers un natif voisin qui aurait de la capacité.
---
### 3.6 Résilience du mesh natif
**Fonctionnement**
Quand le heartbeat vers un natif échoue, `replenishNativesFromPeers` tente de trouver
un remplaçant dans cet ordre :
1. `fetchNativeFromNatives` : demande à chaque natif vivant (`/opencloud/native/peers/1.0`)
une adresse de natif inconnue.
2. `fetchNativeFromIndexers` : demande à chaque indexeur connu
(`/opencloud/indexer/natives/1.0`) ses natifs configurés.
3. Si aucun remplaçant et `remaining ≤ 1` : `retryLostNative` relance un ticker de 30 s
qui retente la connexion directe au natif perdu.
`EnsureNativePeers` maintient des heartbeats de natif à natif via `ProtocolHeartbeat`,
avec une **unique goroutine** couvrant toute la map `StaticNatives`.
**Avantages**
- Le gossip multi-hop via indexeurs permet de retrouver un natif même si aucun pair
direct ne le connaît.
- `retryLostNative` gère le cas d'un seul natif (déploiement minimal).
- La reconnexion automatique (`retryLostNative`) déclenche `replenishIndexersIfNeeded`
pour restaurer aussi le pool d'indexeurs.
**Limites / risques**
- **Goroutines heartbeat multiples corrigé** : `EnsureNativePeers` démarrait une
goroutine `SendHeartbeat` par adresse native (N natifs N goroutines N² heartbeats
par tick). Utilisation de `nativeMeshHeartbeatOnce` : une seule goroutine itère sur
`StaticNatives`.
- `retryLostNative` tourne indéfiniment sans condition d'arrêt liée à la vie du processus
(pas de `context.Context`). Si le binaire est gracefully shutdown, cette goroutine
peut bloquer.
- La découverte transitoire (natif indexeur natif) est à sens unique : un indexeur
ne connaît que les natifs de sa propre config, pas les nouveaux natifs qui auraient
rejoint après son démarrage.
---
### 3.7 DHT partagée
**Fonctionnement**
Tous les indexeurs et natifs participent à une DHT Kademlia (préfixe `oc`, mode
`ModeServer`). Deux namespaces sont utilisés :
- `/node/<DID>` `PeerRecord` JSON signé (publié par les indexeurs sur heartbeat de nœud).
- `/indexer/<PeerID>` `liveIndexerEntry` JSON avec TTL (publié par les natifs).
Chaque natif lance `refreshIndexersFromDHT` (toutes les 30 s) qui ré-hydrate son cache
local depuis la DHT pour les PeerIDs connus (`knownPeerIDs`) dont l'entrée locale a expiré.
**Avantages**
- Persistance décentralisée : un record survit à la perte d'un seul natif ou indexeur.
- Validation des entrées : `PeerRecordValidator` et `IndexerRecordValidator` rejettent
les records malformés ou expirés au moment du `PutValue`.
- L'index secondaire `/name/<name>` permet la résolution par nom humain.
**Limites / risques**
- La DHT Kademlia en réseau privé (PSK) est fonctionnelle mais les nœuds bootstrap
ne sont pas configurés explicitement : la découverte dépend de connexions déjà établies,
ce qui peut ralentir la convergence au démarrage.
- `PutValue` est réessayé en boucle infinie si `"failed to find any peer in table"`
une panne de réseau prolongée génère des goroutines bloquées.
- Si la PSK est compromise, un attaquant peut écrire dans la DHT ; les `liveIndexerEntry`
d'indexeurs ne sont pas signées, contrairement aux `PeerRecord`.
- `refreshIndexersFromDHT` prune `knownPeerIDs` si la DHT n'a aucune entrée fraîche,
mais ne prune pas `liveIndexers` une entrée expirée reste en mémoire jusqu'au GC
ou au prochain refresh.
---
### 3.8 PubSub gossip (indexer registry)
**Fonctionnement**
Quand un indexeur s'enregistre auprès d'un natif, ce dernier publie l'adresse sur le
topic GossipSub `oc-indexer-registry`. Les autres natifs abonnés mettent à jour leur
`knownPeerIDs` sans attendre la DHT.
Le `TopicValidator` rejette tout message dont le contenu n'est pas un multiaddr
parseable valide avant qu'il n'atteigne la boucle de traitement.
**Avantages**
- Dissémination quasi-instantanée entre natifs connectés.
- Complément utile à la DHT pour les registrations récentes qui n'ont pas encore
été persistées.
- Le filtre syntaxique bloque les messages malformés avant propagation dans le mesh.
**Limites / risques**
- **`TopicValidator` sans validation corrigé** : le validateur acceptait systématiquement
tous les messages (`return true`), permettant à un natif compromis de gossiper
n'importe quelle donnée.
Le validateur vérifie désormais que le message est un multiaddr parseable
(`pp.AddrInfoFromString`).
- La validation reste syntaxique uniquement : l'origine du message (l'émetteur
est-il un natif légitime ?) n'est pas vérifiée.
- Si le natif redémarre, il perd son abonnement et manque les messages publiés
pendant son absence. La re-hydratation depuis la DHT compense, mais avec un délai
pouvant aller jusqu'à 30 s.
- Le gossip ne porte que le `Addr` de l'indexeur, pas sa TTL ni sa signature.
---
### 3.9 Streams applicatifs (node ↔ node)
**Fonctionnement**
`StreamService` gère les streams entre nœuds partenaires (relations `PARTNER` stockées
en base) via des protocols dédiés (`/opencloud/resource/*`). Un heartbeat partenaire
(`ProtocolHeartbeatPartner`) maintient les connexions actives. Les events sont routés
via `handleEvent` et le système NATS en parallèle.
**Avantages**
- TTL par protocol (`PersistantStream`, `WaitResponse`) adapte le comportement au
type d'échange (longue durée pour le planner, courte pour les CRUDs).
- La GC (`gc()` toutes les 8 s, démarrée une seule fois dans `InitStream`) libère
rapidement les streams expirés.
**Limites / risques**
- **Fuite de goroutines GC corrigée** : `HandlePartnerHeartbeat` appelait
`go s.StartGC(30s)` à chaque heartbeat reçu (~20 s), créant un nouveau ticker
goroutine infini à chaque appel.
Appel supprimé ; la GC lancée par `InitStream` est suffisante.
- **Boucle infinie sur EOF corrigée** : `readLoop` effectuait `s.Stream.Close();
continue` après une erreur de décodage, re-tentant indéfiniment de lire un stream
fermé.
→ Remplacé par `return` ; les defers (`Close`, `delete`) nettoient correctement.
- ⚠️ La récupération de partenaires depuis `conf.PeerIDS` est marquée `TO REMOVE` :
présence de code provisoire en production.
---
## 4. Tableau récapitulatif
| Mécanisme | Protocole | Avantage principal | État du risque |
|---|---|---|---|
| Heartbeat node→indexer | `/opencloud/heartbeat/1.0` | Détection rapide de perte | ⚠️ Stream TCP gelé non détecté |
| Scoring de confiance | (inline dans heartbeat) | Filtre les pairs instables | ✅ Deadlock corrigé (seuil 40/75) |
| Enregistrement natif | `/opencloud/native/subscribe/1.0` | TTL ample, cache immédiat | ✅ TTL porté à 90 s |
| Fetch pool d'indexeurs | `/opencloud/native/indexers/1.0` | Prend le 1er natif répondant | ⚠️ Natif au cache périmé possible |
| Consensus | `/opencloud/native/consensus/1.0` | Majorité absolue | ⚠️ Trivial avec 1 seul natif |
| Self-delegation + offload | (in-memory) | Disponibilité sans indexeur | ✅ Limite 50 peers + ClosePeer |
| Mesh natif | `/opencloud/native/peers/1.0` | Gossip multi-hop | ✅ Goroutines dédupliquées |
| DHT | `/oc/kad/1.0.0` | Persistance décentralisée | ⚠️ Retry infini, pas de bootstrap |
| PubSub registry | `oc-indexer-registry` | Dissémination rapide | ✅ Validation multiaddr |
| Streams applicatifs | `/opencloud/resource/*` | TTL par protocol | ✅ Fuite GC + EOF corrigés |
---
## 5. Risques et limites globaux
### Sécurité
- ⚠️ **Adresses auto-rapportées non vérifiées** : le champ `IndexersBinded` dans le heartbeat
est auto-déclaré par le nœud et sert à calculer la diversité. Un pair malveillant peut
gonfler son score en déclarant de fausses adresses.
- ⚠️ **PSK comme seule barrière d'entrée** : si la PSK est compromise (elle est statique et
fichier-based), tout l'isolement réseau saute. Il n'y a pas de rotation de clé ni
d'authentification supplémentaire par pair.
- ⚠️ **DHT sans ACL sur les entrées indexeur** : la signature des `PeerRecord` est vérifiée
à la lecture, mais les `liveIndexerEntry` ne sont pas signées. La validation PubSub
bloque les multiaddrs invalides mais pas les adresses d'indexeurs légitimes usurpées.
### Disponibilité
- ⚠️ **Single point of failure natif** : avec un seul natif, la perte de celui-ci stoppe
toute attribution d'indexeurs. `retryLostNative` pallie, mais sans indexeurs, les nœuds
ne peuvent pas publier.
- ⚠️ **Bootstrap DHT** : sans nœuds bootstrap explicites, la DHT met du temps à converger
si les connexions initiales sont peu nombreuses.
### Cohérence
- ⚠️ **`replaceStaticIndexers` n'efface jamais** : d'anciens indexeurs morts restent dans
`StaticIndexers` jusqu'à ce que le heartbeat échoue. Un nœud peut avoir un pool
surévalué contenant des entrées inatteignables.
- ⚠️ **`TimeWatcher` global** : défini une seule fois au démarrage de `ConnectToIndexers`.
Si l'indexeur tourne depuis longtemps, les nouveaux nœuds auront un `uptime_ratio`
durablement faible. Le seuil abaissé à 40 pour le premier heartbeat atténue l'impact
initial, mais les heartbeats suivants devront accumuler un uptime suffisant.
---
## 6. Pistes d'amélioration
Les pistes déjà implémentées sont marquées ✅. Les pistes ouvertes restent à traiter.
### ✅ Score : double seuil pour les nouveaux peers
~~Remplacer le seuil binaire~~ — **Implémenté** : seuil à 40 pour le premier heartbeat
(peer absent de `StreamRecords`), 75 pour les suivants. Un peer peut désormais être admis
dès sa première connexion sans bloquer sur l'uptime nul.
_Fichier : `common/common_stream.go`, `CheckHeartbeat`_
### ✅ TTL indexeur aligné avec l'intervalle de renouvellement
~~TTL de 66 s trop proche de 60 s~~ — **Implémenté** : `IndexerTTL` passé à **90 s**.
_Fichier : `indexer/native.go`_
### ✅ Limite de la self-delegation
~~`responsiblePeers` illimité~~ — **Implémenté** : `selfDelegate` retourne `false` quand
`len(responsiblePeers) >= maxFallbackPeers` (50). Le site d'appel retourne une réponse
vide et logue un warning.
_Fichier : `indexer/native.go`_
### ✅ Validation PubSub des adresses gossipées
~~`TopicValidator` accepte tout~~ — **Implémenté** : le validateur vérifie que le message
est un multiaddr parseable via `pp.AddrInfoFromString`.
_Fichier : `indexer/native.go`, `subscribeIndexerRegistry`_
### ✅ Goroutines heartbeat dédupliquées dans `EnsureNativePeers`
~~Une goroutine par adresse native~~ — **Implémenté** : `nativeMeshHeartbeatOnce`
garantit qu'une seule goroutine `SendHeartbeat` couvre toute la map `StaticNatives`.
_Fichier : `common/native_stream.go`_
### ✅ Fuite de goroutines GC dans `HandlePartnerHeartbeat`
~~`go s.StartGC(30s)` à chaque heartbeat~~ — **Implémenté** : appel supprimé ; la GC
de `InitStream` est suffisante.
_Fichier : `stream/service.go`_
### ✅ Boucle infinie sur EOF dans `readLoop`
~~`continue` après `Stream.Close()`~~ — **Implémenté** : remplacé par `return` pour
laisser les defers nettoyer proprement.
_Fichier : `stream/service.go`_
---
### ⚠️ Fetch pool : interroger tous les natifs en parallèle
`fetchIndexersFromNative` s'arrête au premier natif répondant. Interroger tous les natifs
en parallèle et fusionner les listes (similairement à `clientSideConsensus`) éviterait
qu'un natif au cache périmé fournisse un pool sous-optimal.
### ⚠️ Consensus avec quorum configurable
Le seuil de confirmation (`count*2 > total`) est câblé en dur. Le rendre configurable
(ex. `consensus_quorum: 0.67`) permettrait de durcir la règle sur des déploiements
à 3+ natifs sans modifier le code.
### ⚠️ Désenregistrement explicite
Ajouter un protocole `/opencloud/native/unsubscribe/1.0` : quand un indexeur s'arrête
proprement, il notifie les natifs pour invalider son TTL immédiatement plutôt qu'attendre
90 s.
### ⚠️ Bootstrap DHT explicite
Configurer les natifs comme nœuds bootstrap DHT via `dht.BootstrapPeers` pour accélérer
la convergence Kademlia au démarrage.
### ⚠️ Context propagé dans les goroutines longue durée
`retryLostNative`, `refreshIndexersFromDHT` et `runOffloadLoop` ne reçoivent aucun
`context.Context`. Les passer depuis `InitNative` permettrait un arrêt propre lors du
shutdown du processus.
### ⚠️ Redirection explicite lors du refus de self-delegation
Quand un natif refuse la self-delegation (pool saturé), retourner vide force le nœud à
réessayer sans lui indiquer vers qui se tourner. Une liste de natifs alternatifs dans la
réponse (`AlternativeNatives []string`) permettrait au nœud de trouver directement un
natif moins chargé.

View File

@@ -3,18 +3,21 @@ package conf
import "sync"
type Config struct {
Name string
Hostname string
PSKPath string
PublicKeyPath string
PrivateKeyPath string
NodeEndpointPort int64
Name string
Hostname string
PSKPath string
PublicKeyPath string
PrivateKeyPath string
NodeEndpointPort int64
IndexerAddresses string
NativeIndexerAddresses string // multiaddrs of native indexers, comma-separated; bypasses IndexerAddresses when set
PeerIDS string // TO REMOVE
NodeMode string
MinIndexer int
MaxIndexer int
}
var instance *Config

View File

@@ -1,7 +1,6 @@
package common
import (
"bytes"
"context"
cr "crypto/rand"
"encoding/json"
@@ -28,6 +27,12 @@ type LongLivedStreamRecordedService[T interface{}] struct {
StreamRecords map[protocol.ID]map[pp.ID]*StreamRecord[T]
StreamMU sync.RWMutex
maxNodesConn int
// AfterHeartbeat is an optional hook called after each successful heartbeat update.
// The indexer sets it to republish the embedded signed record to the DHT.
AfterHeartbeat func(pid pp.ID)
// AfterDelete is called after gc() evicts an expired peer, outside the lock.
// name and did may be empty if the HeartbeatStream had no metadata.
AfterDelete func(pid pp.ID, name string, did string)
}
func NewStreamRecordedService[T interface{}](h host.Host, maxNodesConn int) *LongLivedStreamRecordedService[T] {
@@ -54,16 +59,29 @@ func (ix *LongLivedStreamRecordedService[T]) StartGC(interval time.Duration) {
func (ix *LongLivedStreamRecordedService[T]) gc() {
ix.StreamMU.Lock()
defer ix.StreamMU.Unlock()
now := time.Now().UTC()
if ix.StreamRecords[ProtocolHeartbeat] == nil {
ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{}
ix.StreamMU.Unlock()
return
}
streams := ix.StreamRecords[ProtocolHeartbeat]
fmt.Println(StaticNatives, StaticIndexers, streams)
type gcEntry struct {
pid pp.ID
name string
did string
}
var evicted []gcEntry
for pid, rec := range streams {
if now.After(rec.HeartbeatStream.Expiry) || now.Sub(rec.HeartbeatStream.UptimeTracker.LastSeen) > 2*rec.HeartbeatStream.Expiry.Sub(now) {
name, did := "", ""
if rec.HeartbeatStream != nil {
name = rec.HeartbeatStream.Name
did = rec.HeartbeatStream.DID
}
evicted = append(evicted, gcEntry{pid, name, did})
for _, sstreams := range ix.StreamRecords {
if sstreams[pid] != nil {
delete(sstreams, pid)
@@ -71,6 +89,13 @@ func (ix *LongLivedStreamRecordedService[T]) gc() {
}
}
}
ix.StreamMU.Unlock()
if ix.AfterDelete != nil {
for _, e := range evicted {
ix.AfterDelete(e.pid, e.name, e.did)
}
}
}
func (ix *LongLivedStreamRecordedService[T]) Snapshot(interval time.Duration) {
@@ -101,8 +126,10 @@ func (ix *LongLivedStreamRecordedService[T]) snapshot() []*StreamRecord[T] {
return out
}
func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Stream) {
func (ix *LongLivedStreamRecordedService[T]) HandleHeartbeat(s network.Stream) {
logger := oclib.GetLogger()
defer s.Close()
dec := json.NewDecoder(s)
for {
ix.StreamMU.Lock()
if ix.StreamRecords[ProtocolHeartbeat] == nil {
@@ -114,17 +141,37 @@ func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Strea
streamsAnonym[k] = v
}
ix.StreamMU.Unlock()
pid, hb, err := CheckHeartbeat(ix.Host, s, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
pid, hb, err := CheckHeartbeat(ix.Host, s, dec, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
if err != nil {
// Stream-level errors (EOF, reset, closed) mean the connection is gone
// — exit so the goroutine doesn't spin forever on a dead stream.
// Metric/policy errors (score too low, too many connections) are transient
// — those are also stream-terminal since the stream carries one session.
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) ||
strings.Contains(err.Error(), "reset") ||
strings.Contains(err.Error(), "closed") ||
strings.Contains(err.Error(), "too many connections") {
logger.Info().Err(err).Msg("heartbeat stream terminated, closing handler")
return
}
logger.Warn().Err(err).Msg("heartbeat check failed, retrying on same stream")
continue
}
ix.StreamMU.Lock()
// if record already seen update last seen
if rec, ok := streams[*pid]; ok {
rec.DID = hb.DID
if rec.HeartbeatStream == nil {
rec.HeartbeatStream = hb.Stream
}
rec.HeartbeatStream = hb.Stream
rec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
if rec.HeartbeatStream.UptimeTracker == nil {
rec.HeartbeatStream.UptimeTracker = &UptimeTracker{
FirstSeen: time.Now().UTC(),
LastSeen: time.Now().UTC(),
}
}
logger.Info().Msg("A new node is updated : " + pid.String())
} else {
hb.Stream.UptimeTracker = &UptimeTracker{
FirstSeen: time.Now().UTC(),
@@ -134,37 +181,51 @@ func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Strea
DID: hb.DID,
HeartbeatStream: hb.Stream,
}
logger.Info().Msg("A new node is subscribed : " + pid.String())
}
ix.StreamMU.Unlock()
// Let the indexer republish the embedded signed record to the DHT.
if ix.AfterHeartbeat != nil {
ix.AfterHeartbeat(*pid)
}
}
}
func CheckHeartbeat(h host.Host, s network.Stream, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) {
func CheckHeartbeat(h host.Host, s network.Stream, dec *json.Decoder, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) {
if len(h.Network().Peers()) >= maxNodes {
return nil, nil, fmt.Errorf("too many connections, try another indexer")
}
var hb Heartbeat
if err := json.NewDecoder(s).Decode(&hb); err != nil {
if err := dec.Decode(&hb); err != nil {
return nil, nil, err
}
if ok, bpms, err := getBandwidthChallengeRate(MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)), s); err != nil {
return nil, nil, err
} else if !ok {
return nil, nil, fmt.Errorf("Not a proper peer")
} else {
_, bpms, _ := getBandwidthChallengeRate(h, s.Conn().RemotePeer(), MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)))
{
pid, err := pp.Decode(hb.PeerID)
if err != nil {
return nil, nil, err
}
upTime := float64(0)
isFirstHeartbeat := true
lock.Lock()
if rec, ok := streams[pid]; ok && rec.GetUptimeTracker() != nil {
upTime = rec.GetUptimeTracker().Uptime().Hours() / float64(time.Since(TimeWatcher).Hours())
isFirstHeartbeat = false
}
lock.Unlock()
diversity := getDiversityRate(h, hb.IndexersBinded)
fmt.Println(upTime, bpms, diversity)
hb.ComputeIndexerScore(upTime, bpms, diversity)
if hb.Score < 75 {
// First heartbeat: uptime is always 0 so the score ceiling is 60, below the
// steady-state threshold of 75. Use a lower admission threshold so new peers
// can enter and start accumulating uptime. Subsequent heartbeats must meet
// the full threshold once uptime is tracked.
minScore := float64(50)
if isFirstHeartbeat {
minScore = 40
}
fmt.Println(hb.Score, minScore)
if hb.Score < minScore {
return nil, nil, errors.New("not enough trusting value")
}
hb.Stream = &Stream{
@@ -178,11 +239,13 @@ func CheckHeartbeat(h host.Host, s network.Stream, streams map[pp.ID]HeartBeatSt
}
func getDiversityRate(h host.Host, peers []string) float64 {
peers, _ = checkPeers(h, peers)
diverse := []string{}
for _, p := range peers {
ip, err := ExtractIP(p)
if err != nil {
fmt.Println("NO IP", p, err)
continue
}
div := ip.Mask(net.CIDRMask(24, 32)).String()
@@ -190,6 +253,9 @@ func getDiversityRate(h host.Host, peers []string) float64 {
diverse = append(diverse, div)
}
}
if len(diverse) == 0 || len(peers) == 0 {
return 1
}
return float64(len(diverse) / len(peers))
}
@@ -211,35 +277,42 @@ func checkPeers(h host.Host, peers []string) ([]string, []string) {
return concretePeer, ips
}
const MaxExpectedMbps = 50.0
const MaxExpectedMbps = 100.0
const MinPayloadChallenge = 512
const MaxPayloadChallenge = 2048
const BaseRoundTrip = 400 * time.Millisecond
func getBandwidthChallengeRate(payloadSize int, s network.Stream) (bool, float64, error) {
// Génération payload aléatoire
// getBandwidthChallengeRate opens a dedicated ProtocolBandwidthProbe stream to
// remotePeer, sends a random payload, reads the echo, and computes throughput.
// Using a separate stream avoids mixing binary data on the JSON heartbeat stream
// and ensures the echo handler is actually running on the remote side.
func getBandwidthChallengeRate(h host.Host, remotePeer pp.ID, payloadSize int) (bool, float64, error) {
payload := make([]byte, payloadSize)
_, err := cr.Read(payload)
if _, err := cr.Read(payload); err != nil {
return false, 0, err
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
s, err := h.NewStream(ctx, remotePeer, ProtocolBandwidthProbe)
if err != nil {
return false, 0, err
}
defer s.Reset()
s.SetDeadline(time.Now().Add(10 * time.Second))
start := time.Now()
// send on heartbeat stream the challenge
if _, err = s.Write(payload); err != nil {
return false, 0, err
}
// read back
s.CloseWrite()
// Half-close the write side so the handler's io.Copy sees EOF and stops.
// Read the echo.
response := make([]byte, payloadSize)
_, err = io.ReadFull(s, response)
if err != nil {
if _, err = io.ReadFull(s, response); err != nil {
return false, 0, err
}
duration := time.Since(start)
// Verify content
if !bytes.Equal(payload, response) {
return false, 0, nil // pb or a sadge peer.
}
maxRoundTrip := BaseRoundTrip + (time.Duration(payloadSize) * (100 * time.Millisecond))
mbps := float64(payloadSize*8) / duration.Seconds() / 1e6
if duration > maxRoundTrip || mbps < 5.0 {
@@ -345,13 +418,36 @@ var StaticIndexers map[string]*pp.AddrInfo = map[string]*pp.AddrInfo{}
var StreamMuIndexes sync.RWMutex
var StreamIndexers ProtocolStream = ProtocolStream{}
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error {
// indexerHeartbeatNudge allows replenishIndexersFromNative to trigger an immediate
// heartbeat tick after adding new entries to StaticIndexers, without waiting up
// to 20s for the regular ticker. Buffered(1) so the sender never blocks.
var indexerHeartbeatNudge = make(chan struct{}, 1)
// NudgeIndexerHeartbeat signals the indexer heartbeat goroutine to fire immediately.
func NudgeIndexerHeartbeat() {
select {
case indexerHeartbeatNudge <- struct{}{}:
default: // nudge already pending, skip
}
}
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID, recordFn ...func() json.RawMessage) error {
TimeWatcher = time.Now().UTC()
logger := oclib.GetLogger()
// If native addresses are configured, bypass static indexer addresses
// If native addresses are configured, get the indexer pool from the native mesh,
// then start the long-lived heartbeat goroutine toward those indexers.
if conf.GetConfig().NativeIndexerAddresses != "" {
return ConnectToNatives(h, minIndexer, maxIndexer, myPID)
if err := ConnectToNatives(h, minIndexer, maxIndexer, myPID); err != nil {
return err
}
// Step 2: start the long-lived heartbeat goroutine toward the indexer pool.
// replaceStaticIndexers/replenishIndexersFromNative update the map in-place
// so this single goroutine follows all pool changes automatically.
logger.Info().Msg("[native] step 2 — starting long-lived heartbeat to indexer pool")
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name,
h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...)
return nil
}
addresses := strings.Split(conf.GetConfig().IndexerAddresses, ",")
@@ -360,8 +456,8 @@ func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
addresses = addresses[0:maxIndexer]
}
StreamMuIndexes.Lock()
for _, indexerAddr := range addresses {
fmt.Println("GENERATE ADDR", indexerAddr)
ad, err := pp.AddrInfoFromString(indexerAddr)
if err != nil {
logger.Err(err)
@@ -369,15 +465,18 @@ func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
}
StaticIndexers[indexerAddr] = ad
}
indexerCount := len(StaticIndexers)
StreamMuIndexes.Unlock()
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name, h, StreamIndexers, StaticIndexers, 20*time.Second) // your indexer is just like a node for the next indexer.
if len(StaticIndexers) < minIndexer {
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name, h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...) // your indexer is just like a node for the next indexer.
if indexerCount < minIndexer {
return errors.New("you run a node without indexers... your gonna be isolated.")
}
return nil
}
func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host, proto protocol.ID, id pp.ID, mypid pp.ID, force bool, onStreamCreated *func(network.Stream)) ProtocolStream {
logger := oclib.GetLogger()
if onStreamCreated == nil {
f := func(s network.Stream) {
protoS[proto][id] = &Stream{
@@ -400,7 +499,7 @@ func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host,
if protoS[proto][id] != nil {
protoS[proto][id].Expiry = time.Now().Add(2 * time.Minute)
} else {
fmt.Println("NEW STREAM", proto, id)
logger.Info().Msg("NEW STREAM Generated" + fmt.Sprintf("%v", proto) + " " + id.String())
s, err := h.NewStream(*ctx, id, proto)
if err != nil {
panic(err.Error())
@@ -419,12 +518,16 @@ type Heartbeat struct {
Timestamp int64 `json:"timestamp"`
IndexersBinded []string `json:"indexers_binded"`
Score float64
// Record carries a fresh signed PeerRecord (JSON) so the receiving indexer
// can republish it to the DHT without an extra round-trip.
// Only set by nodes (not indexers heartbeating other indexers).
Record json.RawMessage `json:"record,omitempty"`
}
func (hb *Heartbeat) ComputeIndexerScore(uptimeHours float64, bpms float64, diversity float64) {
hb.Score = (0.4 * uptimeHours) +
(0.4 * bpms) +
(0.2 * diversity)
hb.Score = ((0.3 * uptimeHours) +
(0.3 * bpms) +
(0.4 * diversity)) * 100
}
type HeartbeatInfo []struct {
@@ -433,34 +536,213 @@ type HeartbeatInfo []struct {
const ProtocolHeartbeat = "/opencloud/heartbeat/1.0"
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, ps ProtocolStream, peers map[string]*pp.AddrInfo, interval time.Duration) {
peerID, err := oclib.GenerateNodeID()
if err == nil {
panic("can't heartbeat daemon failed to start")
// ProtocolBandwidthProbe is a dedicated short-lived stream used exclusively
// for bandwidth/latency measurement. The handler echoes any bytes it receives.
// All nodes and indexers register this handler so peers can measure them.
const ProtocolBandwidthProbe = "/opencloud/probe/1.0"
// HandleBandwidthProbe echoes back everything written on the stream, then closes.
// It is registered by all participants so the measuring side (the heartbeat receiver)
// can open a dedicated probe stream and read the round-trip latency + throughput.
func HandleBandwidthProbe(s network.Stream) {
defer s.Close()
s.SetDeadline(time.Now().Add(10 * time.Second))
io.Copy(s, s) // echo every byte back to the sender
}
// SendHeartbeat starts a goroutine that sends periodic heartbeats to peers.
// recordFn, when provided, is called on each tick and its output is embedded in
// the heartbeat as a fresh signed PeerRecord so the receiving indexer can
// republish it to the DHT without an extra round-trip.
// Pass no recordFn (or nil) for indexer→indexer / native heartbeats.
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, ps ProtocolStream, peers map[string]*pp.AddrInfo, mu *sync.RWMutex, interval time.Duration, recordFn ...func() json.RawMessage) {
logger := oclib.GetLogger()
// isIndexerHB is true when this goroutine drives the indexer heartbeat.
// isNativeHB is true when it drives the native heartbeat.
isIndexerHB := mu == &StreamMuIndexes
isNativeHB := mu == &StreamNativeMu
var recFn func() json.RawMessage
if len(recordFn) > 0 {
recFn = recordFn[0]
}
go func() {
logger.Info().Str("proto", string(proto)).Int("peers", len(peers)).Msg("heartbeat started")
t := time.NewTicker(interval)
defer t.Stop()
// doTick sends one round of heartbeats to the current peer snapshot.
doTick := func() {
// Build the heartbeat payload — snapshot current indexer addresses.
StreamMuIndexes.RLock()
addrs := make([]string, 0, len(StaticIndexers))
for addr := range StaticIndexers {
addrs = append(addrs, addr)
}
StreamMuIndexes.RUnlock()
hb := Heartbeat{
Name: name,
PeerID: h.ID().String(),
Timestamp: time.Now().UTC().Unix(),
IndexersBinded: addrs,
}
if recFn != nil {
hb.Record = recFn()
}
// Snapshot the peer list under a read lock so we don't hold the
// write lock during network I/O.
if mu != nil {
mu.RLock()
}
snapshot := make([]*pp.AddrInfo, 0, len(peers))
for _, ix := range peers {
snapshot = append(snapshot, ix)
}
if mu != nil {
mu.RUnlock()
}
for _, ix := range snapshot {
wasConnected := h.Network().Connectedness(ix.ID) == network.Connected
if err := sendHeartbeat(ctx, h, proto, ix, hb, ps, interval*time.Second); err != nil {
// Step 3: heartbeat failed — remove from pool and trigger replenish.
logger.Info().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 3 — heartbeat failed, removing peer from pool")
// Remove the dead peer and clean up its stream.
// mu already covers ps when isIndexerHB (same mutex), so one
// lock acquisition is sufficient — no re-entrant double-lock.
if mu != nil {
mu.Lock()
}
if ps[proto] != nil {
if s, ok := ps[proto][ix.ID]; ok {
if s.Stream != nil {
s.Stream.Close()
}
delete(ps[proto], ix.ID)
}
}
lostAddr := ""
for addr, ad := range peers {
if ad.ID == ix.ID {
lostAddr = addr
delete(peers, addr)
break
}
}
need := conf.GetConfig().MinIndexer - len(peers)
remaining := len(peers)
if mu != nil {
mu.Unlock()
}
logger.Info().Int("remaining", remaining).Int("min", conf.GetConfig().MinIndexer).Int("need", need).Msg("[native] step 3 — pool state after removal")
// Step 4: ask the native for the missing indexer count.
if isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
if need < 1 {
need = 1
}
logger.Info().Int("need", need).Msg("[native] step 3→4 — triggering replenish")
go replenishIndexersFromNative(h, need)
}
// Native heartbeat failed — find a replacement native.
// Case 1: if the dead native was also serving as an indexer, evict it
// from StaticIndexers immediately without waiting for the indexer HB tick.
if isNativeHB {
logger.Info().Str("addr", lostAddr).Msg("[native] step 3 — native heartbeat failed, triggering native replenish")
if lostAddr != "" && conf.GetConfig().NativeIndexerAddresses != "" {
StreamMuIndexes.Lock()
if _, wasIndexer := StaticIndexers[lostAddr]; wasIndexer {
delete(StaticIndexers, lostAddr)
if s := StreamIndexers[ProtocolHeartbeat]; s != nil {
if stream, ok := s[ix.ID]; ok {
if stream.Stream != nil {
stream.Stream.Close()
}
delete(s, ix.ID)
}
}
idxNeed := conf.GetConfig().MinIndexer - len(StaticIndexers)
StreamMuIndexes.Unlock()
if idxNeed < 1 {
idxNeed = 1
}
logger.Info().Str("addr", lostAddr).Msg("[native] dead native evicted from indexer pool, triggering replenish")
go replenishIndexersFromNative(h, idxNeed)
} else {
StreamMuIndexes.Unlock()
}
}
go replenishNativesFromPeers(h, lostAddr, proto)
}
} else {
// Case 2: native-as-indexer reconnected after a restart.
// If the peer was disconnected before this tick and the heartbeat just
// succeeded (transparent reconnect), the native may have restarted with
// blank state (responsiblePeers empty). Evict it from StaticIndexers and
// re-request an assignment so the native re-tracks us properly and
// runOffloadLoop can eventually migrate us to real indexers.
if !wasConnected && isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
StreamNativeMu.RLock()
isNativeIndexer := false
for _, ad := range StaticNatives {
if ad.ID == ix.ID {
isNativeIndexer = true
break
}
}
StreamNativeMu.RUnlock()
if isNativeIndexer {
if mu != nil {
mu.Lock()
}
if ps[proto] != nil {
if s, ok := ps[proto][ix.ID]; ok {
if s.Stream != nil {
s.Stream.Close()
}
delete(ps[proto], ix.ID)
}
}
reconnectedAddr := ""
for addr, ad := range peers {
if ad.ID == ix.ID {
reconnectedAddr = addr
delete(peers, addr)
break
}
}
idxNeed := conf.GetConfig().MinIndexer - len(peers)
if mu != nil {
mu.Unlock()
}
if idxNeed < 1 {
idxNeed = 1
}
logger.Info().Str("addr", reconnectedAddr).Str("peer", ix.ID.String()).Msg(
"[native] native-as-indexer reconnected after restart — evicting and re-requesting assignment")
go replenishIndexersFromNative(h, idxNeed)
}
}
logger.Debug().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 2 — heartbeat sent ok")
}
}
}
for {
select {
case <-t.C:
addrs := []string{}
for addr := range StaticIndexers {
addrs = append(addrs, addr)
doTick()
case <-indexerHeartbeatNudge:
if isIndexerHB {
logger.Info().Msg("[native] step 2 — nudge received, heartbeating new indexers immediately")
doTick()
}
hb := Heartbeat{
Name: name,
DID: peerID,
PeerID: h.ID().String(),
Timestamp: time.Now().UTC().Unix(),
IndexersBinded: addrs,
}
for _, ix := range peers {
if err = sendHeartbeat(ctx, h, proto, ix, hb, ps, interval*time.Second); err != nil {
StreamMuIndexes.Lock()
delete(StreamIndexers[proto], ix.ID)
StreamMuIndexes.Unlock()
}
case <-nativeHeartbeatNudge:
if isNativeHB {
logger.Info().Msg("[native] native nudge received, heartbeating replacement native immediately")
doTick()
}
case <-ctx.Done():
return
@@ -480,58 +762,62 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
if pts[proto] != nil {
expiry = pts[proto].TTL
}
if ctxTTL, err := context.WithTimeout(context.Background(), expiry); err == nil {
if h.Network().Connectedness(ad.ID) != network.Connected {
if err := h.Connect(ctxTTL, ad); err != nil {
return streams, err
}
}
if streams[proto] != nil && streams[proto][ad.ID] != nil {
return streams, nil
} else if s, err := h.NewStream(ctxTTL, ad.ID, proto); err == nil {
mu.Lock()
if streams[proto] == nil {
streams[proto] = map[pp.ID]*Stream{}
}
mu.Unlock()
time.AfterFunc(expiry, func() {
mu.Lock()
defer mu.Unlock()
delete(streams[proto], ad.ID)
})
streams[ProtocolPublish][ad.ID] = &Stream{
DID: did,
Stream: s,
Expiry: time.Now().UTC().Add(expiry),
}
mu.Unlock()
return streams, nil
} else {
ctxTTL, _ := context.WithTimeout(context.Background(), expiry)
if h.Network().Connectedness(ad.ID) != network.Connected {
if err := h.Connect(ctxTTL, ad); err != nil {
return streams, err
}
}
return streams, errors.New("can't create a context")
if streams[proto] != nil && streams[proto][ad.ID] != nil {
return streams, nil
} else if s, err := h.NewStream(ctxTTL, ad.ID, proto); err == nil {
mu.Lock()
if streams[proto] == nil {
streams[proto] = map[pp.ID]*Stream{}
}
mu.Unlock()
time.AfterFunc(expiry, func() {
mu.Lock()
delete(streams[proto], ad.ID)
mu.Unlock()
})
mu.Lock()
streams[proto][ad.ID] = &Stream{
DID: did,
Stream: s,
Expiry: time.Now().UTC().Add(expiry),
}
mu.Unlock()
return streams, nil
} else {
return streams, err
}
}
func sendHeartbeat(ctx context.Context, h host.Host, proto protocol.ID, p *pp.AddrInfo,
hb Heartbeat, ps ProtocolStream, interval time.Duration) error {
streams := ps.Get(proto)
if len(streams) == 0 {
return errors.New("no stream for protocol heartbeat founded")
logger := oclib.GetLogger()
if ps[proto] == nil {
ps[proto] = map[pp.ID]*Stream{}
}
streams := ps[proto]
pss, exists := streams[p.ID]
ctxTTL, _ := context.WithTimeout(ctx, 3*interval)
ctxTTL, cancel := context.WithTimeout(ctx, 3*interval)
defer cancel()
// Connect si nécessaire
if h.Network().Connectedness(p.ID) != network.Connected {
if err := h.Connect(ctxTTL, *p); err != nil {
logger.Err(err)
return err
}
exists = false // on devra recréer le stream
}
// Crée le stream si inexistant ou fermé
if !exists || pss.Stream == nil {
logger.Info().Msg("New Stream engaged as Heartbeat " + fmt.Sprintf("%v", proto) + " " + p.ID.String())
s, err := h.NewStream(ctx, p.ID, proto)
if err != nil {
logger.Err(err)
return err
}
pss = &Stream{

View File

@@ -13,6 +13,7 @@ import (
oclib "cloud.o-forge.io/core/oc-lib"
"github.com/libp2p/go-libp2p/core/host"
pp "github.com/libp2p/go-libp2p/core/peer"
"github.com/libp2p/go-libp2p/core/protocol"
)
const (
@@ -56,7 +57,8 @@ type IndexerRegistration struct {
// GetIndexersRequest asks a native for a pool of live indexers.
type GetIndexersRequest struct {
Count int `json:"count"`
Count int `json:"count"`
From string `json:"from"`
}
// GetIndexersResponse is returned by the native with live indexer multiaddrs.
@@ -69,17 +71,26 @@ var StaticNatives = map[string]*pp.AddrInfo{}
var StreamNativeMu sync.RWMutex
var StreamNatives ProtocolStream = ProtocolStream{}
// ConnectToNatives is the client-side entry point for nodes/indexers that have
// NativeIndexerAddresses configured. It:
// 1. Connects (long-lived heartbeat) to all configured natives.
// 2. Fetches an initial indexer pool from the FIRST responsive native.
// 3. Challenges that pool to ALL natives (consensus round 1).
// 4. If the confirmed list is short, samples native suggestions and re-challenges (round 2).
// 5. Populates StaticIndexers with majority-confirmed indexers.
// nativeHeartbeatOnce ensures we start exactly one long-lived heartbeat goroutine
// toward the native mesh, even when ConnectToNatives is called from recovery paths.
var nativeHeartbeatOnce sync.Once
// nativeMeshHeartbeatOnce guards the native-to-native heartbeat goroutine started
// by EnsureNativePeers so only one goroutine covers the whole StaticNatives map.
var nativeMeshHeartbeatOnce sync.Once
// ConnectToNatives is the initial setup for nodes/indexers in native mode:
// 1. Parses native addresses → StaticNatives.
// 2. Starts a single long-lived heartbeat goroutine toward the native mesh.
// 3. Fetches an initial indexer pool from the first responsive native.
// 4. Runs consensus when real (non-fallback) indexers are returned.
// 5. Replaces StaticIndexers with the confirmed pool.
func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error {
logger := oclib.GetLogger()
logger.Info().Msg("[native] step 1 — parsing native addresses")
// Parse in config order: the first entry is the primary pool source.
// Parse native addresses — safe to call multiple times.
StreamNativeMu.Lock()
orderedAddrs := []string{}
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
addr = strings.TrimSpace(addr)
@@ -88,106 +99,208 @@ func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
}
ad, err := pp.AddrInfoFromString(addr)
if err != nil {
logger.Err(err).Msg("ConnectToNatives: invalid addr")
logger.Err(err).Msg("[native] step 1 — invalid native addr")
continue
}
StaticNatives[addr] = ad
orderedAddrs = append(orderedAddrs, addr)
logger.Info().Str("addr", addr).Msg("[native] step 1 — native registered")
}
if len(StaticNatives) == 0 {
StreamNativeMu.Unlock()
return errors.New("no valid native addresses configured")
}
StreamNativeMu.Unlock()
logger.Info().Int("count", len(orderedAddrs)).Msg("[native] step 1 — natives parsed")
// Long-lived heartbeat connections to keep the native mesh active.
SendHeartbeat(context.Background(), ProtocolHeartbeat,
conf.GetConfig().Name, h, StreamNatives, StaticNatives, 20*time.Second)
// Step 1: get an initial pool from the FIRST responsive native (in config order).
var candidates []string
var isFallback bool
for _, addr := range orderedAddrs {
ad := StaticNatives[addr]
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
if err := h.Connect(ctx, *ad); err != nil {
cancel()
continue
}
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetIndexers)
cancel()
if err != nil {
continue
}
req := GetIndexersRequest{Count: maxIndexer}
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
s.Close()
continue
}
var resp GetIndexersResponse
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
s.Close()
continue
}
s.Close()
candidates = resp.Indexers
isFallback = resp.IsSelfFallback
break // first responsive native only
}
// Step 1: one long-lived heartbeat to each native.
nativeHeartbeatOnce.Do(func() {
logger.Info().Msg("[native] step 1 — starting long-lived heartbeat to native mesh")
SendHeartbeat(context.Background(), ProtocolHeartbeat,
conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
})
// Fetch initial pool from the first responsive native.
logger.Info().Int("want", maxIndexer).Msg("[native] step 1 — fetching indexer pool from native")
candidates, isFallback := fetchIndexersFromNative(h, orderedAddrs, maxIndexer)
if len(candidates) == 0 {
logger.Warn().Msg("[native] step 1 — no candidates returned by any native")
if minIndexer > 0 {
return errors.New("ConnectToNatives: no indexers available from any native")
}
return nil
}
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 1 — pool received")
// If the native is already the fallback indexer, use it directly — no consensus needed.
// Step 2: populate StaticIndexers — consensus for real indexers, direct for fallback.
pool := resolvePool(h, candidates, isFallback, maxIndexer)
replaceStaticIndexers(pool)
StreamMuIndexes.RLock()
indexerCount := len(StaticIndexers)
StreamMuIndexes.RUnlock()
logger.Info().Int("pool_size", indexerCount).Msg("[native] step 2 — StaticIndexers replaced")
if minIndexer > 0 && indexerCount < minIndexer {
return errors.New("not enough majority-confirmed indexers available")
}
return nil
}
// replenishIndexersFromNative is called when an indexer heartbeat fails (step 3→4).
// It asks the native for exactly `need` replacement indexers, runs consensus when
// real indexers are returned, and adds the results to StaticIndexers without
// clearing the existing pool.
func replenishIndexersFromNative(h host.Host, need int) {
if need <= 0 {
return
}
logger := oclib.GetLogger()
logger.Info().Int("need", need).Msg("[native] step 4 — replenishing indexer pool from native")
StreamNativeMu.RLock()
addrs := make([]string, 0, len(StaticNatives))
for addr := range StaticNatives {
addrs = append(addrs, addr)
}
StreamNativeMu.RUnlock()
candidates, isFallback := fetchIndexersFromNative(h, addrs, need)
if len(candidates) == 0 {
logger.Warn().Msg("[native] step 4 — no candidates returned by any native")
return
}
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 4 — candidates received")
pool := resolvePool(h, candidates, isFallback, need)
if len(pool) == 0 {
logger.Warn().Msg("[native] step 4 — consensus yielded no confirmed indexers")
return
}
// Add new indexers to the pool — do NOT clear existing ones.
StreamMuIndexes.Lock()
for addr, ad := range pool {
StaticIndexers[addr] = ad
}
total := len(StaticIndexers)
StreamMuIndexes.Unlock()
logger.Info().Int("added", len(pool)).Int("total", total).Msg("[native] step 4 — pool replenished")
// Nudge the heartbeat goroutine to connect immediately instead of waiting
// for the next 20s tick.
NudgeIndexerHeartbeat()
logger.Info().Msg("[native] step 4 — heartbeat goroutine nudged")
}
// fetchIndexersFromNative opens a ProtocolNativeGetIndexers stream to the first
// responsive native and returns the candidate list and fallback flag.
func fetchIndexersFromNative(h host.Host, nativeAddrs []string, count int) (candidates []string, isFallback bool) {
logger := oclib.GetLogger()
for _, addr := range nativeAddrs {
ad, err := pp.AddrInfoFromString(addr)
if err != nil {
logger.Warn().Str("addr", addr).Msg("[native] fetch — skipping invalid addr")
continue
}
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
if err := h.Connect(ctx, *ad); err != nil {
cancel()
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — connect failed")
continue
}
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetIndexers)
cancel()
if err != nil {
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — stream open failed")
continue
}
req := GetIndexersRequest{Count: count, From: h.ID().String()}
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
s.Close()
logger.Warn().Str("addr", addr).Err(encErr).Msg("[native] fetch — encode request failed")
continue
}
var resp GetIndexersResponse
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
s.Close()
logger.Warn().Str("addr", addr).Err(decErr).Msg("[native] fetch — decode response failed")
continue
}
s.Close()
logger.Info().Str("native", addr).Int("indexers", len(resp.Indexers)).Bool("fallback", resp.IsSelfFallback).Msg("[native] fetch — response received")
return resp.Indexers, resp.IsSelfFallback
}
logger.Warn().Msg("[native] fetch — no native responded")
return nil, false
}
// resolvePool converts a candidate list to a validated addr→AddrInfo map.
// When isFallback is true the native itself is the indexer — no consensus needed.
// When isFallback is false, consensus is run before accepting the candidates.
func resolvePool(h host.Host, candidates []string, isFallback bool, maxIndexer int) map[string]*pp.AddrInfo {
logger := oclib.GetLogger()
if isFallback {
logger.Info().Strs("addrs", candidates).Msg("[native] resolve — fallback mode, skipping consensus")
pool := make(map[string]*pp.AddrInfo, len(candidates))
for _, addr := range candidates {
ad, err := pp.AddrInfoFromString(addr)
if err != nil {
continue
}
StaticIndexers[addr] = ad
pool[addr] = ad
}
return nil
return pool
}
// Step 2: challenge the pool to ALL configured natives and score by majority vote.
// Round 1.
logger.Info().Int("candidates", len(candidates)).Msg("[native] resolve — consensus round 1")
confirmed, suggestions := clientSideConsensus(h, candidates)
logger.Info().Int("confirmed", len(confirmed)).Int("suggestions", len(suggestions)).Msg("[native] resolve — consensus round 1 done")
// Step 3: if we still have gaps, sample from suggestions and re-challenge.
// Round 2: fill gaps from suggestions if below target.
if len(confirmed) < maxIndexer && len(suggestions) > 0 {
rand.Shuffle(len(suggestions), func(i, j int) { suggestions[i], suggestions[j] = suggestions[j], suggestions[i] })
gap := maxIndexer - len(confirmed)
if gap > len(suggestions) {
gap = len(suggestions)
}
logger.Info().Int("gap", gap).Msg("[native] resolve — consensus round 2 (filling gaps)")
confirmed2, _ := clientSideConsensus(h, append(confirmed, suggestions[:gap]...))
if len(confirmed2) > 0 {
confirmed = confirmed2
}
logger.Info().Int("confirmed", len(confirmed)).Msg("[native] resolve — consensus round 2 done")
}
// Step 4: populate StaticIndexers with confirmed addresses.
pool := make(map[string]*pp.AddrInfo, len(confirmed))
for _, addr := range confirmed {
ad, err := pp.AddrInfoFromString(addr)
if err != nil {
continue
}
pool[addr] = ad
}
logger.Info().Int("pool_size", len(pool)).Msg("[native] resolve — pool ready")
return pool
}
// replaceStaticIndexers atomically replaces the active indexer pool.
// Peers no longer in next have their heartbeat streams closed so the SendHeartbeat
// goroutine stops sending to them on the next tick.
func replaceStaticIndexers(next map[string]*pp.AddrInfo) {
StreamMuIndexes.Lock()
defer StreamMuIndexes.Unlock()
for addr, ad := range next {
StaticIndexers[addr] = ad
}
if minIndexer > 0 && len(StaticIndexers) < minIndexer {
return errors.New("not enough majority-confirmed indexers available")
}
return nil
}
// clientSideConsensus challenges a candidate list to ALL configured native peers
// in parallel. Each native replies with the candidates it trusts plus extras it
// recommends. An indexer is confirmed when strictly more than 50% of responding
// natives trust it. The remaining addresses from native suggestions are returned
// as suggestions for a possible second round.
// natives trust it.
func clientSideConsensus(h host.Host, candidates []string) (confirmed []string, suggestions []string) {
if len(candidates) == 0 {
return nil, nil
@@ -201,7 +314,6 @@ func clientSideConsensus(h host.Host, candidates []string) (confirmed []string,
StreamNativeMu.RUnlock()
if len(peers) == 0 {
// No natives to challenge: trust candidates as-is.
return candidates, nil
}
@@ -239,13 +351,12 @@ func clientSideConsensus(h host.Host, candidates []string) (confirmed []string,
}(ad)
}
// Collect responses up to consensusCollectTimeout.
timer := time.NewTimer(consensusCollectTimeout)
defer timer.Stop()
trustedCounts := map[string]int{}
suggestionPool := map[string]struct{}{}
total := 0 // counts only natives that actually responded
total := 0
collected := 0
collect:
@@ -254,7 +365,7 @@ collect:
case r := <-ch:
collected++
if !r.responded {
continue // timeout / error: skip, do not count as vote
continue
}
total++
seen := map[string]struct{}{}
@@ -273,13 +384,12 @@ collect:
}
if total == 0 {
// No native responded: fall back to trusting the candidates as-is.
return candidates, nil
}
confirmedSet := map[string]struct{}{}
for addr, count := range trustedCounts {
if count*2 > total { // strictly >50%
if count*2 > total {
confirmed = append(confirmed, addr)
confirmedSet[addr] = struct{}{}
}
@@ -292,15 +402,17 @@ collect:
return
}
const ProtocolIndexerHeartbeat = "/opencloud/heartbeat/indexer/1.0"
// RegisterWithNative sends a one-shot registration to each configured native indexer.
// Should be called periodically every RecommendedHeartbeatInterval.
func RegisterWithNative(h host.Host, nativeAddressesStr string) {
logger := oclib.GetLogger()
myAddr := ""
if len(h.Addrs()) > 0 {
myAddr = h.Addrs()[0].String() + "/p2p/" + h.ID().String()
if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
myAddr = h.Addrs()[len(h.Addrs())-1].String() + "/p2p/" + h.ID().String()
}
if myAddr == "" {
logger.Warn().Msg("RegisterWithNative: no routable address yet, skipping")
return
}
reg := IndexerRegistration{
PeerID: h.ID().String(),
@@ -334,16 +446,16 @@ func RegisterWithNative(h host.Host, nativeAddressesStr string) {
}
}
// EnsureNativePeers populates StaticNatives from config and starts heartbeat
// connections to other natives. Safe to call multiple times; heartbeat is only
// started once (when StaticNatives transitions from empty to non-empty).
// EnsureNativePeers populates StaticNatives from config and starts a single
// heartbeat goroutine toward the native mesh. Safe to call multiple times;
// the heartbeat goroutine is started at most once (nativeMeshHeartbeatOnce).
func EnsureNativePeers(h host.Host) {
logger := oclib.GetLogger()
nativeAddrs := conf.GetConfig().NativeIndexerAddresses
if nativeAddrs == "" {
return
}
StreamNativeMu.Lock()
wasEmpty := len(StaticNatives) == 0
for _, addr := range strings.Split(nativeAddrs, ",") {
addr = strings.TrimSpace(addr)
if addr == "" {
@@ -354,11 +466,312 @@ func EnsureNativePeers(h host.Host) {
continue
}
StaticNatives[addr] = ad
logger.Info().Str("addr", addr).Msg("native: registered peer in native mesh")
}
StreamNativeMu.Unlock()
// One heartbeat goroutine iterates over all of StaticNatives on each tick;
// starting one per address would multiply heartbeats by the native count.
nativeMeshHeartbeatOnce.Do(func() {
logger.Info().Msg("native: starting mesh heartbeat goroutine")
SendHeartbeat(context.Background(), ProtocolHeartbeat,
conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
})
}
if wasEmpty && len(StaticNatives) > 0 {
SendHeartbeat(context.Background(), ProtocolIndexerHeartbeat,
conf.GetConfig().Name, h, StreamNatives, StaticNatives, 20*time.Second)
func StartNativeRegistration(h host.Host, nativeAddressesStr string) {
go func() {
// Poll until a routable (non-loopback) address is available before the first
// registration attempt. libp2p may not have discovered external addresses yet
// at startup. Cap at 12 retries (~1 minute) so we don't spin indefinitely.
for i := 0; i < 12; i++ {
hasRoutable := false
if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
hasRoutable = true
break
}
if hasRoutable {
break
}
time.Sleep(5 * time.Second)
}
RegisterWithNative(h, nativeAddressesStr)
t := time.NewTicker(RecommendedHeartbeatInterval)
defer t.Stop()
for range t.C {
RegisterWithNative(h, nativeAddressesStr)
}
}()
}
// ── Lost-native replacement ───────────────────────────────────────────────────
const (
// ProtocolNativeGetPeers lets a node/indexer ask a native for a random
// selection of that native's own native contacts (to replace a dead native).
ProtocolNativeGetPeers = "/opencloud/native/peers/1.0"
// ProtocolIndexerGetNatives lets nodes/indexers ask a connected indexer for
// its configured native addresses (fallback when no alive native responds).
ProtocolIndexerGetNatives = "/opencloud/indexer/natives/1.0"
// retryNativeInterval is how often retryLostNative polls a dead native.
retryNativeInterval = 30 * time.Second
)
// GetNativePeersRequest is sent to a native to ask for its known native contacts.
type GetNativePeersRequest struct {
Exclude []string `json:"exclude"`
Count int `json:"count"`
}
// GetNativePeersResponse carries native addresses returned by a native's peer list.
type GetNativePeersResponse struct {
Peers []string `json:"peers"`
}
// GetIndexerNativesRequest is sent to an indexer to ask for its configured native addresses.
type GetIndexerNativesRequest struct {
Exclude []string `json:"exclude"`
}
// GetIndexerNativesResponse carries native addresses returned by an indexer.
type GetIndexerNativesResponse struct {
Natives []string `json:"natives"`
}
// nativeHeartbeatNudge allows replenishNativesFromPeers to trigger an immediate
// native heartbeat tick after adding a replacement native to the pool.
var nativeHeartbeatNudge = make(chan struct{}, 1)
// NudgeNativeHeartbeat signals the native heartbeat goroutine to fire immediately.
func NudgeNativeHeartbeat() {
select {
case nativeHeartbeatNudge <- struct{}{}:
default: // nudge already pending, skip
}
}
// replenishIndexersIfNeeded checks if the indexer pool is below the configured
// minimum (or empty) and, if so, asks the native mesh for replacements.
// Called whenever a native is recovered so the indexer pool is restored.
func replenishIndexersIfNeeded(h host.Host) {
logger := oclib.GetLogger()
minIdx := conf.GetConfig().MinIndexer
if minIdx < 1 {
minIdx = 1
}
StreamMuIndexes.RLock()
indexerCount := len(StaticIndexers)
StreamMuIndexes.RUnlock()
if indexerCount < minIdx {
need := minIdx - indexerCount
logger.Info().Int("need", need).Int("current", indexerCount).Msg("[native] native recovered — replenishing indexer pool")
go replenishIndexersFromNative(h, need)
}
}
// replenishNativesFromPeers is called when the heartbeat to a native fails.
// Flow:
// 1. Ask other alive natives for one of their native contacts (ProtocolNativeGetPeers).
// 2. If none respond or return a new address, ask connected indexers (ProtocolIndexerGetNatives).
// 3. If no replacement found:
// - remaining > 1 → ignore (enough natives remain).
// - remaining ≤ 1 → start periodic retry (retryLostNative).
func replenishNativesFromPeers(h host.Host, lostAddr string, proto protocol.ID) {
if lostAddr == "" {
return
}
logger := oclib.GetLogger()
logger.Info().Str("lost", lostAddr).Msg("[native] replenish natives — start")
// Build exclude list: the lost addr + all currently alive natives.
// lostAddr has already been removed from StaticNatives by doTick.
StreamNativeMu.RLock()
remaining := len(StaticNatives)
exclude := make([]string, 0, remaining+1)
exclude = append(exclude, lostAddr)
for addr := range StaticNatives {
exclude = append(exclude, addr)
}
StreamNativeMu.RUnlock()
logger.Info().Int("remaining", remaining).Msg("[native] replenish natives — step 1: ask alive natives for a peer")
// Step 1: ask other alive natives for a replacement.
newAddr := fetchNativeFromNatives(h, exclude)
// Step 2: fallback — ask connected indexers for their native addresses.
if newAddr == "" {
logger.Info().Msg("[native] replenish natives — step 2: ask indexers for their native addresses")
newAddr = fetchNativeFromIndexers(h, exclude)
}
if newAddr != "" {
ad, err := pp.AddrInfoFromString(newAddr)
if err == nil {
StreamNativeMu.Lock()
StaticNatives[newAddr] = ad
StreamNativeMu.Unlock()
logger.Info().Str("new", newAddr).Msg("[native] replenish natives — replacement added, nudging heartbeat")
NudgeNativeHeartbeat()
replenishIndexersIfNeeded(h)
return
}
}
// Step 3: no replacement found.
logger.Warn().Int("remaining", remaining).Msg("[native] replenish natives — no replacement found")
if remaining > 1 {
logger.Info().Msg("[native] replenish natives — enough natives remain, ignoring loss")
return
}
// Last (or only) native — retry periodically.
logger.Info().Str("addr", lostAddr).Msg("[native] replenish natives — last native lost, starting periodic retry")
go retryLostNative(h, lostAddr, proto)
}
// fetchNativeFromNatives asks each alive native for one of its own native contacts
// not in exclude. Returns the first new address found or "" if none.
func fetchNativeFromNatives(h host.Host, exclude []string) string {
logger := oclib.GetLogger()
excludeSet := make(map[string]struct{}, len(exclude))
for _, e := range exclude {
excludeSet[e] = struct{}{}
}
StreamNativeMu.RLock()
natives := make([]*pp.AddrInfo, 0, len(StaticNatives))
for _, ad := range StaticNatives {
natives = append(natives, ad)
}
StreamNativeMu.RUnlock()
rand.Shuffle(len(natives), func(i, j int) { natives[i], natives[j] = natives[j], natives[i] })
for _, ad := range natives {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
if err := h.Connect(ctx, *ad); err != nil {
cancel()
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — connect failed")
continue
}
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetPeers)
cancel()
if err != nil {
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — stream failed")
continue
}
req := GetNativePeersRequest{Exclude: exclude, Count: 1}
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
s.Close()
continue
}
var resp GetNativePeersResponse
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
s.Close()
continue
}
s.Close()
for _, peer := range resp.Peers {
if _, excluded := excludeSet[peer]; !excluded && peer != "" {
logger.Info().Str("from", ad.ID.String()).Str("new", peer).Msg("[native] fetch native peers — got replacement")
return peer
}
}
logger.Debug().Str("native", ad.ID.String()).Msg("[native] fetch native peers — no new native from this peer")
}
return ""
}
// fetchNativeFromIndexers asks connected indexers for their configured native addresses,
// returning the first one not in exclude.
func fetchNativeFromIndexers(h host.Host, exclude []string) string {
logger := oclib.GetLogger()
excludeSet := make(map[string]struct{}, len(exclude))
for _, e := range exclude {
excludeSet[e] = struct{}{}
}
StreamMuIndexes.RLock()
indexers := make([]*pp.AddrInfo, 0, len(StaticIndexers))
for _, ad := range StaticIndexers {
indexers = append(indexers, ad)
}
StreamMuIndexes.RUnlock()
rand.Shuffle(len(indexers), func(i, j int) { indexers[i], indexers[j] = indexers[j], indexers[i] })
for _, ad := range indexers {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
if err := h.Connect(ctx, *ad); err != nil {
cancel()
continue
}
s, err := h.NewStream(ctx, ad.ID, ProtocolIndexerGetNatives)
cancel()
if err != nil {
logger.Warn().Str("indexer", ad.ID.String()).Err(err).Msg("[native] fetch indexer natives — stream failed")
continue
}
req := GetIndexerNativesRequest{Exclude: exclude}
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
s.Close()
continue
}
var resp GetIndexerNativesResponse
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
s.Close()
continue
}
s.Close()
for _, nativeAddr := range resp.Natives {
if _, excluded := excludeSet[nativeAddr]; !excluded && nativeAddr != "" {
logger.Info().Str("indexer", ad.ID.String()).Str("native", nativeAddr).Msg("[native] fetch indexer natives — got native")
return nativeAddr
}
}
}
logger.Warn().Msg("[native] fetch indexer natives — no native found from indexers")
return ""
}
// retryLostNative periodically retries connecting to a lost native address until
// it becomes reachable again or was already restored by another path.
func retryLostNative(h host.Host, addr string, nativeProto protocol.ID) {
logger := oclib.GetLogger()
logger.Info().Str("addr", addr).Msg("[native] retry — periodic retry for lost native started")
t := time.NewTicker(retryNativeInterval)
defer t.Stop()
for range t.C {
StreamNativeMu.RLock()
_, alreadyRestored := StaticNatives[addr]
StreamNativeMu.RUnlock()
if alreadyRestored {
logger.Info().Str("addr", addr).Msg("[native] retry — native already restored, stopping retry")
return
}
ad, err := pp.AddrInfoFromString(addr)
if err != nil {
logger.Warn().Str("addr", addr).Msg("[native] retry — invalid addr, stopping retry")
return
}
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
err = h.Connect(ctx, *ad)
cancel()
if err != nil {
logger.Warn().Str("addr", addr).Msg("[native] retry — still unreachable")
continue
}
// Reachable again — add back to pool.
StreamNativeMu.Lock()
StaticNatives[addr] = ad
StreamNativeMu.Unlock()
logger.Info().Str("addr", addr).Msg("[native] retry — native reconnected and added back to pool")
NudgeNativeHeartbeat()
replenishIndexersIfNeeded(h)
if nativeProto == ProtocolNativeGetIndexers {
StartNativeRegistration(h, addr) // register back
}
return
}
}

View File

@@ -24,17 +24,16 @@ func ExtractIP(addr string) (net.IP, error) {
if err != nil {
return nil, err
}
ips, err := ma.ValueForProtocol(multiaddr.P_IP4) // or P_IP6
ipStr, err := ma.ValueForProtocol(multiaddr.P_IP4)
if err != nil {
return nil, err
ipStr, err = ma.ValueForProtocol(multiaddr.P_IP6)
if err != nil {
return nil, err
}
}
host, _, err := net.SplitHostPort(ips)
if err != nil {
return nil, err
}
ip := net.ParseIP(host)
ip := net.ParseIP(ipStr)
if ip == nil {
return nil, fmt.Errorf("invalid IP: %s", host)
return nil, fmt.Errorf("invalid IP: %s", ipStr)
}
return ip, nil
}

View File

@@ -5,8 +5,9 @@ import (
"encoding/base64"
"encoding/json"
"errors"
"fmt"
"oc-discovery/conf"
"oc-discovery/daemons/node/common"
"strings"
"time"
oclib "cloud.o-forge.io/core/oc-lib"
@@ -18,17 +19,21 @@ import (
"github.com/libp2p/go-libp2p/core/peer"
)
type PeerRecordPayload struct {
Name string `json:"name"`
DID string `json:"did"`
PubKey []byte `json:"pub_key"`
ExpiryDate time.Time `json:"expiry_date"`
}
type PeerRecord struct {
Name string `json:"name"`
DID string `json:"did"` // real PEER ID
PeerID string `json:"peer_id"`
PubKey []byte `json:"pub_key"`
APIUrl string `json:"api_url"`
StreamAddress string `json:"stream_address"`
NATSAddress string `json:"nats_address"`
WalletAddress string `json:"wallet_address"`
Signature []byte `json:"signature"`
ExpiryDate time.Time `json:"expiry_date"`
PeerRecordPayload
PeerID string `json:"peer_id"`
APIUrl string `json:"api_url"`
StreamAddress string `json:"stream_address"`
NATSAddress string `json:"nats_address"`
WalletAddress string `json:"wallet_address"`
Signature []byte `json:"signature"`
}
func (p *PeerRecord) Sign() error {
@@ -36,13 +41,7 @@ func (p *PeerRecord) Sign() error {
if err != nil {
return err
}
dht := PeerRecord{
Name: p.Name,
DID: p.DID,
PubKey: p.PubKey,
ExpiryDate: p.ExpiryDate,
}
payload, _ := json.Marshal(dht)
payload, _ := json.Marshal(p.PeerRecordPayload)
b, err := common.Sign(priv, payload)
p.Signature = b
return err
@@ -51,19 +50,11 @@ func (p *PeerRecord) Sign() error {
func (p *PeerRecord) Verify() (crypto.PubKey, error) {
pubKey, err := crypto.UnmarshalPublicKey(p.PubKey) // retrieve pub key in message
if err != nil {
fmt.Println("UnmarshalPublicKey")
return pubKey, err
}
dht := PeerRecord{
Name: p.Name,
DID: p.DID,
PubKey: p.PubKey,
ExpiryDate: p.ExpiryDate,
}
payload, _ := json.Marshal(dht)
payload, _ := json.Marshal(p.PeerRecordPayload)
if ok, _ := common.Verify(pubKey, payload, p.Signature); !ok { // verify minimal message was sign per pubKey
fmt.Println("Verify")
if ok, _ := pubKey.Verify(payload, p.Signature); !ok { // verify minimal message was sign per pubKey
return pubKey, errors.New("invalid signature")
}
return pubKey, nil
@@ -114,6 +105,8 @@ func (pr *PeerRecord) ExtractPeer(ourkey string, key string, pubKey crypto.PubKe
type GetValue struct {
Key string `json:"key"`
PeerID peer.ID `json:"peer_id"`
Name string `json:"name,omitempty"`
Search bool `json:"search,omitempty"`
}
type GetResponse struct {
@@ -125,122 +118,233 @@ func (ix *IndexerService) genKey(did string) string {
return "/node/" + did
}
func (ix *IndexerService) genNameKey(name string) string {
return "/name/" + name
}
func (ix *IndexerService) genPIDKey(peerID string) string {
return "/pid/" + peerID
}
func (ix *IndexerService) initNodeHandler() {
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleNodeHeartbeat)
logger := oclib.GetLogger()
logger.Info().Msg("Init Node Handler")
// Each heartbeat from a node carries a freshly signed PeerRecord.
// Republish it to the DHT so the record never expires as long as the node
// is alive — no separate publish stream needed from the node side.
ix.AfterHeartbeat = func(pid peer.ID) {
ctx1, cancel1 := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel1()
res, err := ix.DHT.GetValue(ctx1, ix.genPIDKey(pid.String()))
if err != nil {
logger.Warn().Err(err)
return
}
did := string(res)
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel2()
res, err = ix.DHT.GetValue(ctx2, ix.genKey(did))
if err != nil {
logger.Warn().Err(err)
return
}
var rec PeerRecord
if err := json.Unmarshal(res, &rec); err != nil {
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record unmarshal failed")
return
}
if _, err := rec.Verify(); err != nil {
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record signature invalid")
return
}
data, err := json.Marshal(rec)
if err != nil {
return
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
logger.Info().Msg("REFRESH PutValue " + ix.genKey(rec.DID))
if err := ix.DHT.PutValue(ctx, ix.genKey(rec.DID), data); err != nil {
logger.Warn().Err(err).Str("did", rec.DID).Msg("indexer: DHT refresh failed")
return
}
if rec.Name != "" {
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID))
cancel2()
}
if rec.PeerID != "" {
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID))
cancel3()
}
}
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat)
ix.Host.SetStreamHandler(common.ProtocolPublish, ix.handleNodePublish)
ix.Host.SetStreamHandler(common.ProtocolGet, ix.handleNodeGet)
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
}
func (ix *IndexerService) handleNodePublish(s network.Stream) {
defer s.Close()
logger := oclib.GetLogger()
for {
var rec PeerRecord
if err := json.NewDecoder(s).Decode(&rec); err != nil {
logger.Err(err)
continue
}
rec2 := PeerRecord{
Name: rec.Name,
DID: rec.DID, // REAL PEER ID
PubKey: rec.PubKey,
PeerID: rec.PeerID,
}
if _, err := rec2.Verify(); err != nil {
logger.Err(err)
continue
}
if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) { // already expired
logger.Err(errors.New(rec.PeerID + " is expired."))
continue
}
pid, err := peer.Decode(rec.PeerID)
if err != nil {
continue
}
ix.StreamMU.Lock()
var rec PeerRecord
if err := json.NewDecoder(s).Decode(&rec); err != nil {
logger.Err(err)
return
}
if _, err := rec.Verify(); err != nil {
logger.Err(err)
return
}
if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) {
logger.Err(errors.New(rec.PeerID + " is expired."))
return
}
pid, err := peer.Decode(rec.PeerID)
if err != nil {
return
}
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
}
streams := ix.StreamRecords[common.ProtocolHeartbeat]
ix.StreamMU.Lock()
defer ix.StreamMU.Unlock()
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
}
streams := ix.StreamRecords[common.ProtocolHeartbeat]
if srec, ok := streams[pid]; ok {
srec.DID = rec.DID
srec.Record = rec
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
}
if srec, ok := streams[pid]; ok {
srec.DID = rec.DID
srec.Record = rec
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
} else {
ix.StreamMU.Unlock()
logger.Err(errors.New("no heartbeat"))
continue
}
ix.StreamMU.Unlock()
key := ix.genKey(rec.DID)
data, err := json.Marshal(rec)
if err != nil {
logger.Err(err)
continue
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
logger.Err(err)
cancel()
continue
}
key := ix.genKey(rec.DID)
data, err := json.Marshal(rec)
if err != nil {
logger.Err(err)
return
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
logger.Err(err)
cancel()
break // response... so quit
return
}
cancel()
// Secondary index: /name/<name> → DID, so peers can resolve by human-readable name.
if rec.Name != "" {
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID)); err != nil {
logger.Err(err).Str("name", rec.Name).Msg("indexer: failed to write name index")
}
cancel2()
}
// Secondary index: /pid/<peerID> → DID, so peers can resolve by libp2p PeerID.
if rec.PeerID != "" {
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID)); err != nil {
logger.Err(err).Str("pid", rec.PeerID).Msg("indexer: failed to write pid index")
}
cancel3()
}
}
func (ix *IndexerService) handleNodeGet(s network.Stream) {
defer s.Close()
logger := oclib.GetLogger()
for {
var req GetValue
if err := json.NewDecoder(s).Decode(&req); err != nil {
logger.Err(err)
var req GetValue
if err := json.NewDecoder(s).Decode(&req); err != nil {
logger.Err(err)
return
}
resp := GetResponse{Found: false, Records: map[string]PeerRecord{}}
keys := []string{}
// Name substring search — scan in-memory connected nodes first, then DHT exact match.
if req.Name != "" {
if req.Search {
for _, did := range ix.LookupNameIndex(strings.ToLower(req.Name)) {
keys = append(keys, did)
}
} else {
// 2. DHT exact-name lookup: covers nodes that published but aren't currently connected.
nameCtx, nameCancel := context.WithTimeout(context.Background(), 5*time.Second)
if ch, err := ix.DHT.SearchValue(nameCtx, ix.genNameKey(req.Name)); err == nil {
for did := range ch {
keys = append(keys, string(did))
break
}
}
nameCancel()
}
} else if req.PeerID != "" {
pidCtx, pidCancel := context.WithTimeout(context.Background(), 5*time.Second)
if did, err := ix.DHT.GetValue(pidCtx, ix.genPIDKey(req.PeerID.String())); err == nil {
keys = append(keys, string(did))
}
pidCancel()
} else {
keys = append(keys, req.Key)
}
// DHT record fetch by DID key (covers exact-name and PeerID paths).
if len(keys) > 0 {
for _, k := range keys {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
c, err := ix.DHT.GetValue(ctx, ix.genKey(k))
cancel()
if err == nil {
var rec PeerRecord
if json.Unmarshal(c, &rec) == nil {
// Filter by PeerID only when one was explicitly specified.
if req.PeerID == "" || rec.PeerID == req.PeerID.String() {
resp.Records[rec.PeerID] = rec
}
}
} else if req.Name == "" && req.PeerID == "" {
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT " + req.Key)
}
}
}
resp.Found = len(resp.Records) > 0
_ = json.NewEncoder(s).Encode(resp)
}
// handleGetNatives returns this indexer's configured native addresses,
// excluding any in the request's Exclude list.
func (ix *IndexerService) handleGetNatives(s network.Stream) {
defer s.Close()
logger := oclib.GetLogger()
var req common.GetIndexerNativesRequest
if err := json.NewDecoder(s).Decode(&req); err != nil {
logger.Err(err).Msg("indexer get natives: decode")
return
}
excludeSet := make(map[string]struct{}, len(req.Exclude))
for _, e := range req.Exclude {
excludeSet[e] = struct{}{}
}
resp := common.GetIndexerNativesResponse{}
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
addr = strings.TrimSpace(addr)
if addr == "" {
continue
}
ix.StreamMU.Lock()
if _, excluded := excludeSet[addr]; !excluded {
resp.Natives = append(resp.Natives, addr)
}
}
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
}
resp := GetResponse{
Found: false,
Records: map[string]PeerRecord{},
}
streams := ix.StreamRecords[common.ProtocolHeartbeat]
key := ix.genKey(req.Key)
// simple lookup by PeerID (or DID)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
recBytes, err := ix.DHT.SearchValue(ctx, key)
if err != nil {
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT")
cancel()
}
cancel()
for c := range recBytes {
var rec PeerRecord
if err := json.Unmarshal(c, &rec); err != nil || rec.PeerID != req.PeerID.String() {
continue
}
resp.Found = true
resp.Records[rec.PeerID] = rec
if srec, ok := streams[req.PeerID]; ok {
srec.DID = rec.DID
srec.Record = rec
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
}
}
// Not found
_ = json.NewEncoder(s).Encode(resp)
ix.StreamMU.Unlock()
break // response... so quit
if err := json.NewEncoder(s).Encode(resp); err != nil {
logger.Err(err).Msg("indexer get natives: encode response")
}
}

View File

@@ -0,0 +1,168 @@
package indexer
import (
"context"
"encoding/json"
"strings"
"sync"
"time"
"oc-discovery/daemons/node/common"
oclib "cloud.o-forge.io/core/oc-lib"
pubsub "github.com/libp2p/go-libp2p-pubsub"
pp "github.com/libp2p/go-libp2p/core/peer"
)
// TopicNameIndex is the GossipSub topic shared by regular indexers to exchange
// add/delete events for the distributed name→peerID mapping.
const TopicNameIndex = "oc-name-index"
// nameIndexDedupWindow suppresses re-emission of the same (action, name, peerID)
// tuple within this window, reducing duplicate events when a node is registered
// with multiple indexers simultaneously.
const nameIndexDedupWindow = 30 * time.Second
// NameIndexAction indicates whether a name mapping is being added or removed.
type NameIndexAction string
const (
NameIndexAdd NameIndexAction = "add"
NameIndexDelete NameIndexAction = "delete"
)
// NameIndexEvent is published on TopicNameIndex by each indexer when a node
// registers (add) or is evicted by the GC (delete).
type NameIndexEvent struct {
Action NameIndexAction `json:"action"`
Name string `json:"name"`
PeerID string `json:"peer_id"`
DID string `json:"did"`
}
// nameIndexState holds the local in-memory name index and the sender-side
// deduplication tracker.
type nameIndexState struct {
// index: name → peerID → DID, built from events received from all indexers.
index map[string]map[string]string
indexMu sync.RWMutex
// emitted tracks the last emission time for each (action, name, peerID) key
// to suppress duplicates within nameIndexDedupWindow.
emitted map[string]time.Time
emittedMu sync.Mutex
}
// shouldEmit returns true if the (action, name, peerID) tuple has not been
// emitted within nameIndexDedupWindow, updating the tracker if so.
func (s *nameIndexState) shouldEmit(action NameIndexAction, name, peerID string) bool {
key := string(action) + ":" + name + ":" + peerID
s.emittedMu.Lock()
defer s.emittedMu.Unlock()
if t, ok := s.emitted[key]; ok && time.Since(t) < nameIndexDedupWindow {
return false
}
s.emitted[key] = time.Now()
return true
}
// onEvent applies a received NameIndexEvent to the local index.
// "add" inserts/updates the mapping; "delete" removes it.
// Operations are idempotent — duplicate events from multiple indexers are harmless.
func (s *nameIndexState) onEvent(evt NameIndexEvent) {
if evt.Name == "" || evt.PeerID == "" {
return
}
s.indexMu.Lock()
defer s.indexMu.Unlock()
switch evt.Action {
case NameIndexAdd:
if s.index[evt.Name] == nil {
s.index[evt.Name] = map[string]string{}
}
s.index[evt.Name][evt.PeerID] = evt.DID
case NameIndexDelete:
if s.index[evt.Name] != nil {
delete(s.index[evt.Name], evt.PeerID)
if len(s.index[evt.Name]) == 0 {
delete(s.index, evt.Name)
}
}
}
}
// initNameIndex joins TopicNameIndex and starts consuming events.
// Must be called after ix.PS is ready.
func (ix *IndexerService) initNameIndex(ps *pubsub.PubSub) {
logger := oclib.GetLogger()
ix.nameIndex = &nameIndexState{
index: map[string]map[string]string{},
emitted: map[string]time.Time{},
}
ps.RegisterTopicValidator(TopicNameIndex, func(_ context.Context, _ pp.ID, _ *pubsub.Message) bool {
return true
})
topic, err := ps.Join(TopicNameIndex)
if err != nil {
logger.Err(err).Msg("name index: failed to join topic")
return
}
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Lock()
ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex] = topic
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Unlock()
common.SubscribeEvents(
ix.LongLivedStreamRecordedService.LongLivedPubSubService,
context.Background(),
TopicNameIndex,
-1,
func(_ context.Context, evt NameIndexEvent, _ string) {
ix.nameIndex.onEvent(evt)
},
)
}
// publishNameEvent emits a NameIndexEvent on TopicNameIndex, subject to the
// sender-side deduplication window.
func (ix *IndexerService) publishNameEvent(action NameIndexAction, name, peerID, did string) {
if ix.nameIndex == nil || name == "" || peerID == "" {
return
}
if !ix.nameIndex.shouldEmit(action, name, peerID) {
return
}
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RLock()
topic := ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex]
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RUnlock()
if topic == nil {
return
}
evt := NameIndexEvent{Action: action, Name: name, PeerID: peerID, DID: did}
b, err := json.Marshal(evt)
if err != nil {
return
}
_ = topic.Publish(context.Background(), b)
}
// LookupNameIndex searches the distributed name index for peers whose name
// contains needle (case-insensitive). Returns peerID → DID for matched peers.
// Returns nil if the name index is not initialised (e.g. native indexers).
func (ix *IndexerService) LookupNameIndex(needle string) map[string]string {
if ix.nameIndex == nil {
return nil
}
result := map[string]string{}
needleLow := strings.ToLower(needle)
ix.nameIndex.indexMu.RLock()
defer ix.nameIndex.indexMu.RUnlock()
for name, peers := range ix.nameIndex.index {
if strings.Contains(strings.ToLower(name), needleLow) {
for peerID, did := range peers {
result[peerID] = did
}
}
}
return result
}

View File

@@ -4,7 +4,10 @@ import (
"context"
"encoding/json"
"errors"
"fmt"
"math/rand"
"slices"
"strings"
"sync"
"time"
@@ -12,19 +15,24 @@ import (
oclib "cloud.o-forge.io/core/oc-lib"
pubsub "github.com/libp2p/go-libp2p-pubsub"
"github.com/libp2p/go-libp2p/core/host"
"github.com/libp2p/go-libp2p/core/network"
pp "github.com/libp2p/go-libp2p/core/peer"
)
const (
// IndexerTTL is 10% above the recommended 60s heartbeat interval.
IndexerTTL = 66 * time.Second
// IndexerTTL is the lifetime of a live-indexer cache entry. Set to 50% above
// the recommended 60s heartbeat interval so a single delayed renewal does not
// evict a healthy indexer from the native's cache.
IndexerTTL = 90 * time.Second
// offloadInterval is how often the native checks if it can release responsible peers.
offloadInterval = 30 * time.Second
// dhtRefreshInterval is how often the background goroutine queries the DHT for
// known-but-expired indexer entries (written by neighbouring natives).
dhtRefreshInterval = 30 * time.Second
// maxFallbackPeers caps how many peers the native will accept in self-delegation
// mode. Beyond this limit the native refuses to act as a fallback indexer so it
// is not overwhelmed during prolonged indexer outages.
maxFallbackPeers = 50
)
// liveIndexerEntry tracks a registered indexer in the native's in-memory cache and DHT.
@@ -43,7 +51,7 @@ type NativeState struct {
// knownPeerIDs accumulates all indexer PeerIDs ever seen (local stream or gossip).
// Used by refreshIndexersFromDHT to re-hydrate expired entries from the shared DHT,
// including entries written by other natives.
knownPeerIDs map[string]struct{}
knownPeerIDs map[string]string
knownMu sync.RWMutex
}
@@ -51,7 +59,7 @@ func newNativeState() *NativeState {
return &NativeState{
liveIndexers: map[string]*liveIndexerEntry{},
responsiblePeers: map[pp.ID]struct{}{},
knownPeerIDs: map[string]struct{}{},
knownPeerIDs: map[string]string{},
}
}
@@ -92,10 +100,12 @@ func (v IndexerRecordValidator) Select(_ string, values [][]byte) (int, error) {
// Must be called after DHT is initialized.
func (ix *IndexerService) InitNative() {
ix.Native = newNativeState()
ix.Host.SetStreamHandler(common.ProtocolIndexerHeartbeat, ix.HandleNodeHeartbeat) // specific heartbeat for Indexer.
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat) // specific heartbeat for Indexer.
ix.Host.SetStreamHandler(common.ProtocolNativeSubscription, ix.handleNativeSubscription)
ix.Host.SetStreamHandler(common.ProtocolNativeGetIndexers, ix.handleNativeGetIndexers)
ix.Host.SetStreamHandler(common.ProtocolNativeConsensus, ix.handleNativeConsensus)
ix.Host.SetStreamHandler(common.ProtocolNativeGetPeers, ix.handleNativeGetPeers)
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
ix.subscribeIndexerRegistry()
// Ensure long connections to other configured natives (native-to-native mesh).
common.EnsureNativePeers(ix.Host)
@@ -107,8 +117,15 @@ func (ix *IndexerService) InitNative() {
// registered indexer PeerIDs to one another, enabling cross-native DHT discovery.
func (ix *IndexerService) subscribeIndexerRegistry() {
logger := oclib.GetLogger()
ix.PS.RegisterTopicValidator(common.TopicIndexerRegistry, func(_ context.Context, _ pp.ID, _ *pubsub.Message) bool {
return true
ix.PS.RegisterTopicValidator(common.TopicIndexerRegistry, func(_ context.Context, _ pp.ID, msg *pubsub.Message) bool {
// Reject empty or syntactically invalid multiaddrs before they reach the
// message loop. A compromised native could otherwise gossip arbitrary data.
addr := string(msg.Data)
if addr == "" {
return false
}
_, err := pp.AddrInfoFromString(addr)
return err == nil
})
topic, err := ix.PS.Join(common.TopicIndexerRegistry)
if err != nil {
@@ -130,29 +147,38 @@ func (ix *IndexerService) subscribeIndexerRegistry() {
if err != nil {
return
}
peerID := string(msg.Data)
if peerID == "" {
addr := string(msg.Data)
if addr == "" {
continue
}
if peer, err := pp.AddrInfoFromString(addr); err == nil {
ix.Native.knownMu.Lock()
ix.Native.knownPeerIDs[peer.ID.String()] = addr
ix.Native.knownMu.Unlock()
}
// A neighbouring native registered this PeerID; add to known set for DHT refresh.
ix.Native.knownMu.Lock()
ix.Native.knownPeerIDs[peerID] = struct{}{}
ix.Native.knownMu.Unlock()
}
}()
}
// handleNativeSubscription stores an indexer's alive registration in the DHT cache.
// handleNativeSubscription stores an indexer's alive registration in the local cache
// immediately, then persists it to the DHT asynchronously.
// The stream is temporary: indexer sends one IndexerRegistration and closes.
func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
defer s.Close()
logger := oclib.GetLogger()
logger.Info().Msg("Subscription")
var reg common.IndexerRegistration
if err := json.NewDecoder(s).Decode(&reg); err != nil {
logger.Err(err).Msg("native subscription: decode")
return
}
logger.Info().Msg("Subscription " + reg.Addr)
if reg.Addr == "" {
logger.Error().Msg("native subscription: missing addr")
return
@@ -166,30 +192,23 @@ func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
reg.PeerID = ad.ID.String()
}
expiry := time.Now().UTC().Add(IndexerTTL)
// Build entry with a fresh TTL — must happen before the cache write so the 66s
// window is not consumed by DHT retries.
entry := &liveIndexerEntry{
PeerID: reg.PeerID,
Addr: reg.Addr,
ExpiresAt: expiry,
ExpiresAt: time.Now().UTC().Add(IndexerTTL),
}
// Persist in DHT with 66s TTL.
key := ix.genIndexerKey(reg.PeerID)
if data, err := json.Marshal(entry); err == nil {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
logger.Err(err).Msg("native subscription: DHT put")
}
cancel()
}
// Update local cache and known set.
// Update local cache and known set immediately so concurrent GetIndexers calls
// can already see this indexer without waiting for the DHT write to complete.
ix.Native.liveIndexersMu.Lock()
_, isRenewal := ix.Native.liveIndexers[reg.PeerID]
ix.Native.liveIndexers[reg.PeerID] = entry
ix.Native.liveIndexersMu.Unlock()
ix.Native.knownMu.Lock()
ix.Native.knownPeerIDs[reg.PeerID] = struct{}{}
ix.Native.knownPeerIDs[reg.PeerID] = reg.Addr
ix.Native.knownMu.Unlock()
// Gossip PeerID to neighbouring natives so they discover it via DHT.
@@ -197,16 +216,46 @@ func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
topic := ix.LongLivedPubSubs[common.TopicIndexerRegistry]
ix.PubsubMu.RUnlock()
if topic != nil {
if err := topic.Publish(context.Background(), []byte(reg.PeerID)); err != nil {
if err := topic.Publish(context.Background(), []byte(reg.Addr)); err != nil {
logger.Err(err).Msg("native subscription: registry gossip publish")
}
}
logger.Info().Str("peer", reg.PeerID).Msg("native: indexer registered")
if isRenewal {
logger.Debug().Str("peer", reg.PeerID).Msg("native: indexer TTL renewed : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
} else {
logger.Info().Str("peer", reg.PeerID).Msg("native: indexer registered : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
}
// Persist in DHT asynchronously — retries must not block the handler or consume
// the local cache TTL.
key := ix.genIndexerKey(reg.PeerID)
data, err := json.Marshal(entry)
if err != nil {
logger.Err(err).Msg("native subscription: marshal entry")
return
}
go func() {
for {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
cancel()
logger.Err(err).Msg("native subscription: DHT put " + key)
if strings.Contains(err.Error(), "failed to find any peer in table") {
time.Sleep(10 * time.Second)
continue
}
return
}
cancel()
return
}
}()
}
// handleNativeGetIndexers returns this native's own list of reachable indexers.
// If none are available, it self-delegates (becomes the fallback indexer for the caller).
// Self-delegation (native acting as temporary fallback indexer) is only permitted
// for nodes — never for peers that are themselves registered indexers in knownPeerIDs.
// The consensus across natives is the responsibility of the requesting node/indexer.
func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
defer s.Close()
@@ -220,14 +269,20 @@ func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
if req.Count <= 0 {
req.Count = 3
}
reachable := ix.reachableLiveIndexers()
callerPeerID := s.Conn().RemotePeer().String()
reachable := ix.reachableLiveIndexers(req.Count, callerPeerID)
var resp common.GetIndexersResponse
if len(reachable) == 0 {
// No indexers known: become temporary fallback for this caller.
ix.selfDelegate(s.Conn().RemotePeer(), &resp)
logger.Info().Str("peer", s.Conn().RemotePeer().String()).Msg("native: no indexers, acting as fallback")
// No live indexers reachable — try to self-delegate.
if ix.selfDelegate(s.Conn().RemotePeer(), &resp) {
logger.Info().Str("peer", callerPeerID).Msg("native: no indexers, acting as fallback for node")
} else {
// Fallback pool saturated: return empty so the caller retries another
// native instead of piling more load onto this one.
logger.Warn().Str("peer", callerPeerID).Int("pool", maxFallbackPeers).Msg(
"native: fallback pool saturated, refusing self-delegation")
}
} else {
rand.Shuffle(len(reachable), func(i, j int) { reachable[i], reachable[j] = reachable[j], reachable[i] })
if req.Count > len(reachable) {
@@ -255,7 +310,7 @@ func (ix *IndexerService) handleNativeConsensus(s network.Stream) {
return
}
myList := ix.reachableLiveIndexers()
myList := ix.reachableLiveIndexers(-1, s.Conn().RemotePeer().String())
mySet := make(map[string]struct{}, len(myList))
for _, addr := range myList {
mySet[addr] = struct{}{}
@@ -285,31 +340,56 @@ func (ix *IndexerService) handleNativeConsensus(s network.Stream) {
}
// selfDelegate marks the caller as a responsible peer and exposes this native's own
// address as its temporary indexer.
func (ix *IndexerService) selfDelegate(remotePeer pp.ID, resp *common.GetIndexersResponse) {
// address as its temporary indexer. Returns false when the fallback pool is saturated
// (maxFallbackPeers reached) — the caller must return an empty response so the node
// retries later instead of pinning indefinitely to an overloaded native.
func (ix *IndexerService) selfDelegate(remotePeer pp.ID, resp *common.GetIndexersResponse) bool {
ix.Native.responsibleMu.Lock()
ix.Native.responsiblePeers[remotePeer] = struct{}{}
ix.Native.responsibleMu.Unlock()
resp.IsSelfFallback = true
for _, a := range ix.Host.Addrs() {
resp.Indexers = []string{a.String() + "/p2p/" + ix.Host.ID().String()}
break
defer ix.Native.responsibleMu.Unlock()
if len(ix.Native.responsiblePeers) >= maxFallbackPeers {
return false
}
ix.Native.responsiblePeers[remotePeer] = struct{}{}
resp.IsSelfFallback = true
resp.Indexers = []string{ix.Host.Addrs()[len(ix.Host.Addrs())-1].String() + "/p2p/" + ix.Host.ID().String()}
return true
}
// reachableLiveIndexers returns the multiaddrs of non-expired, pingable indexers
// from the local cache (kept fresh by refreshIndexersFromDHT in background).
func (ix *IndexerService) reachableLiveIndexers() []string {
func (ix *IndexerService) reachableLiveIndexers(count int, from ...string) []string {
ix.Native.liveIndexersMu.RLock()
now := time.Now().UTC()
candidates := []*liveIndexerEntry{}
for _, e := range ix.Native.liveIndexers {
if e.ExpiresAt.After(now) {
fmt.Println("liveIndexers", slices.Contains(from, e.PeerID), from, e.PeerID)
if e.ExpiresAt.After(now) && !slices.Contains(from, e.PeerID) {
candidates = append(candidates, e)
}
}
ix.Native.liveIndexersMu.RUnlock()
fmt.Println("midway...", candidates, from, ix.Native.knownPeerIDs)
if (count > 0 && len(candidates) < count) || count < 0 {
ix.Native.knownMu.RLock()
for k, v := range ix.Native.knownPeerIDs {
// Include peers whose liveIndexers entry is absent OR expired.
// A non-nil but expired entry means the peer was once known but
// has since timed out — PeerIsAlive below will decide if it's back.
fmt.Println("knownPeerIDs", slices.Contains(from, k), from, k)
if !slices.Contains(from, k) {
candidates = append(candidates, &liveIndexerEntry{
PeerID: k,
Addr: v,
})
}
}
ix.Native.knownMu.RUnlock()
}
fmt.Println("midway...1", candidates)
reachable := []string{}
for _, e := range candidates {
ad, err := pp.AddrInfoFromString(e.Addr)
@@ -371,6 +451,12 @@ func (ix *IndexerService) refreshIndexersFromDHT() {
ix.Native.liveIndexers[best.PeerID] = best
ix.Native.liveIndexersMu.Unlock()
logger.Info().Str("peer", best.PeerID).Msg("native: refreshed indexer from DHT")
} else {
// DHT has no fresh entry — peer is gone, prune from known set.
ix.Native.knownMu.Lock()
delete(ix.Native.knownPeerIDs, pid)
ix.Native.knownMu.Unlock()
logger.Info().Str("peer", pid).Msg("native: pruned stale peer from knownPeerIDs")
}
}
}
@@ -387,30 +473,107 @@ func (ix *IndexerService) runOffloadLoop() {
defer t.Stop()
logger := oclib.GetLogger()
for range t.C {
fmt.Println("runOffloadLoop", ix.Native.responsiblePeers)
ix.Native.responsibleMu.RLock()
count := len(ix.Native.responsiblePeers)
ix.Native.responsibleMu.RUnlock()
if count == 0 {
continue
}
if len(ix.reachableLiveIndexers()) > 0 {
ix.Native.responsibleMu.Lock()
ix.Native.responsiblePeers = map[pp.ID]struct{}{}
ix.Native.responsibleMu.Unlock()
ix.Native.responsibleMu.RLock()
peerIDS := []string{}
for p := range ix.Native.responsiblePeers {
peerIDS = append(peerIDS, p.String())
}
fmt.Println("COUNT --> ", count, len(ix.reachableLiveIndexers(-1, peerIDS...)))
ix.Native.responsibleMu.RUnlock()
if len(ix.reachableLiveIndexers(-1, peerIDS...)) > 0 {
ix.Native.responsibleMu.RLock()
released := ix.Native.responsiblePeers
ix.Native.responsibleMu.RUnlock()
// Reset (not Close) heartbeat streams of released peers.
// Close() only half-closes the native's write direction — the peer's write
// direction stays open and sendHeartbeat never sees an error.
// Reset() abruptly terminates both directions, making the peer's next
// json.Encode return an error which triggers replenishIndexersFromNative.
ix.StreamMU.Lock()
if streams := ix.StreamRecords[common.ProtocolHeartbeat]; streams != nil {
for pid := range released {
if rec, ok := streams[pid]; ok {
if rec.HeartbeatStream != nil && rec.HeartbeatStream.Stream != nil {
rec.HeartbeatStream.Stream.Reset()
}
ix.Native.responsibleMu.Lock()
delete(ix.Native.responsiblePeers, pid)
ix.Native.responsibleMu.Unlock()
delete(streams, pid)
logger.Info().Str("peer", pid.String()).Str("proto", string(common.ProtocolHeartbeat)).Msg(
"native: offload — stream reset, peer will reconnect to real indexer")
} else {
// No recorded heartbeat stream for this peer: either it never
// passed the score check (new peer, uptime=0 → score<75) or the
// stream was GC'd. We cannot send a Reset signal, so close the
// whole connection instead — this makes the peer's sendHeartbeat
// return an error, which triggers replenishIndexersFromNative and
// migrates it to a real indexer.
ix.Native.responsibleMu.Lock()
delete(ix.Native.responsiblePeers, pid)
ix.Native.responsibleMu.Unlock()
go ix.Host.Network().ClosePeer(pid)
logger.Info().Str("peer", pid.String()).Msg(
"native: offload — no heartbeat stream, closing connection so peer re-requests real indexers")
}
}
}
ix.StreamMU.Unlock()
logger.Info().Int("released", count).Msg("native: offloaded responsible peers to real indexers")
}
}
}
// handleNativeGetPeers returns a random selection of this native's known native
// contacts, excluding any in the request's Exclude list.
func (ix *IndexerService) handleNativeGetPeers(s network.Stream) {
defer s.Close()
logger := oclib.GetLogger()
var req common.GetNativePeersRequest
if err := json.NewDecoder(s).Decode(&req); err != nil {
logger.Err(err).Msg("native get peers: decode")
return
}
if req.Count <= 0 {
req.Count = 1
}
excludeSet := make(map[string]struct{}, len(req.Exclude))
for _, e := range req.Exclude {
excludeSet[e] = struct{}{}
}
common.StreamNativeMu.RLock()
candidates := make([]string, 0, len(common.StaticNatives))
for addr := range common.StaticNatives {
if _, excluded := excludeSet[addr]; !excluded {
candidates = append(candidates, addr)
}
}
common.StreamNativeMu.RUnlock()
rand.Shuffle(len(candidates), func(i, j int) { candidates[i], candidates[j] = candidates[j], candidates[i] })
if req.Count > len(candidates) {
req.Count = len(candidates)
}
resp := common.GetNativePeersResponse{Peers: candidates[:req.Count]}
if err := json.NewEncoder(s).Encode(resp); err != nil {
logger.Err(err).Msg("native get peers: encode response")
}
}
// StartNativeRegistration starts a goroutine that periodically registers this
// indexer with all configured native indexers (every RecommendedHeartbeatInterval).
func StartNativeRegistration(h host.Host, nativeAddressesStr string) {
go func() {
common.RegisterWithNative(h, nativeAddressesStr)
t := time.NewTicker(common.RecommendedHeartbeatInterval)
defer t.Stop()
for range t.C {
common.RegisterWithNative(h, nativeAddressesStr)
}
}()
}

View File

@@ -11,6 +11,7 @@ import (
pubsub "github.com/libp2p/go-libp2p-pubsub"
record "github.com/libp2p/go-libp2p-record"
"github.com/libp2p/go-libp2p/core/host"
pp "github.com/libp2p/go-libp2p/core/peer"
)
// IndexerService manages the indexer node's state: stream records, DHT, pubsub.
@@ -22,6 +23,7 @@ type IndexerService struct {
mu sync.RWMutex
IsNative bool
Native *NativeState // non-nil when IsNative == true
nameIndex *nameIndexState
}
// NewIndexerService creates an IndexerService.
@@ -43,22 +45,34 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
}
ix.PS = ps
if ix.isStrictIndexer {
if ix.isStrictIndexer && !isNative {
logger.Info().Msg("connect to indexers as strict indexer...")
common.ConnectToIndexers(h, 0, 5, ix.Host.ID())
common.ConnectToIndexers(h, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, ix.Host.ID())
logger.Info().Msg("subscribe to decentralized search flow as strict indexer...")
ix.SubscribeToSearch(ix.PS, nil)
go ix.SubscribeToSearch(ix.PS, nil)
}
if !isNative {
logger.Info().Msg("init distributed name index...")
ix.initNameIndex(ps)
ix.LongLivedStreamRecordedService.AfterDelete = func(pid pp.ID, name, did string) {
ix.publishNameEvent(NameIndexDelete, name, pid.String(), did)
}
}
if ix.DHT, err = dht.New(
context.Background(),
ix.Host,
dht.Mode(dht.ModeServer),
dht.ProtocolPrefix("oc"), // 🔥 réseau privé
dht.Validator(record.NamespacedValidator{
"node": PeerRecordValidator{},
"indexer": IndexerRecordValidator{}, // for native indexer registry
"name": DefaultValidator{},
"pid": DefaultValidator{},
}),
); err != nil {
logger.Info().Msg(err.Error())
return nil
}
@@ -67,11 +81,10 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
ix.InitNative()
} else {
ix.initNodeHandler()
}
// Register with configured natives so this indexer appears in their cache
if nativeAddrs := conf.GetConfig().NativeIndexerAddresses; nativeAddrs != "" {
StartNativeRegistration(ix.Host, nativeAddrs)
// Register with configured natives so this indexer appears in their cache
if nativeAddrs := conf.GetConfig().NativeIndexerAddresses; nativeAddrs != "" {
common.StartNativeRegistration(ix.Host, nativeAddrs)
}
}
return ix
}
@@ -79,6 +92,9 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
func (ix *IndexerService) Close() {
ix.DHT.Close()
ix.PS.UnregisterTopicValidator(common.TopicPubSubSearch)
if ix.nameIndex != nil {
ix.PS.UnregisterTopicValidator(TopicNameIndex)
}
for _, s := range ix.StreamRecords {
for _, ss := range s {
ss.HeartbeatStream.Stream.Close()

View File

@@ -6,6 +6,16 @@ import (
"time"
)
type DefaultValidator struct{}
func (v DefaultValidator) Validate(key string, value []byte) error {
return nil
}
func (v DefaultValidator) Select(key string, values [][]byte) (int, error) {
return 0, nil
}
type PeerRecordValidator struct{}
func (v PeerRecordValidator) Validate(key string, value []byte) error {
@@ -26,14 +36,7 @@ func (v PeerRecordValidator) Validate(key string, value []byte) error {
}
// Signature verification
rec2 := PeerRecord{
Name: rec.Name,
DID: rec.DID,
PubKey: rec.PubKey,
PeerID: rec.PeerID,
}
if _, err := rec2.Verify(); err != nil {
if _, err := rec.Verify(); err != nil {
return errors.New("invalid signature")
}

View File

@@ -96,6 +96,7 @@ func ListenNATS(n *Node) {
},
tools.PROPALGATION_EVENT: func(resp tools.NATSResponse) {
fmt.Println("PROPALGATION")
if resp.FromApp == config.GetAppName() {
return
}
@@ -106,10 +107,10 @@ func ListenNATS(n *Node) {
dtt := tools.DataType(propalgation.DataType)
dt = &dtt
}
fmt.Println("PROPALGATION ACT", propalgation.Action, propalgation.Action == tools.PB_CREATE, err)
if err == nil {
switch propalgation.Action {
case tools.PB_ADMIRALTY_CONFIG:
case tools.PB_MINIO_CONFIG:
case tools.PB_ADMIRALTY_CONFIG, tools.PB_MINIO_CONFIG:
var m configPayload
var proto protocol.ID = stream.ProtocolAdmiraltyConfigResource
if propalgation.Action == tools.PB_MINIO_CONFIG {
@@ -122,20 +123,17 @@ func ListenNATS(n *Node) {
p.PeerID, proto, resp.Payload)
}
}
case tools.PB_CREATE:
case tools.PB_UPDATE:
case tools.PB_DELETE:
n.StreamService.ToPartnerPublishEvent(
case tools.PB_CREATE, tools.PB_UPDATE, tools.PB_DELETE:
fmt.Println(propalgation.Action, dt, resp.User, propalgation.Payload)
fmt.Println(n.StreamService.ToPartnerPublishEvent(
context.Background(),
propalgation.Action,
dt, resp.User,
propalgation.Payload,
)
))
case tools.PB_CONSIDERS:
switch resp.Datatype {
case tools.BOOKING:
case tools.PURCHASE_RESOURCE:
case tools.WORKFLOW_EXECUTION:
case tools.BOOKING, tools.PURCHASE_RESOURCE, tools.WORKFLOW_EXECUTION:
var m executionConsidersPayload
if err := json.Unmarshal(resp.Payload, &m); err == nil {
for _, p := range m.PeerIDs {

View File

@@ -2,10 +2,10 @@ package node
import (
"context"
"crypto/sha256"
"encoding/json"
"errors"
"fmt"
"maps"
"oc-discovery/conf"
"oc-discovery/daemons/node/common"
"oc-discovery/daemons/node/indexer"
@@ -15,6 +15,7 @@ import (
"time"
oclib "cloud.o-forge.io/core/oc-lib"
"cloud.o-forge.io/core/oc-lib/dbs"
"cloud.o-forge.io/core/oc-lib/models/peer"
"cloud.o-forge.io/core/oc-lib/tools"
"github.com/google/uuid"
@@ -33,6 +34,7 @@ type Node struct {
StreamService *stream.StreamService
PeerID pp.ID
isIndexer bool
peerRecord *indexer.PeerRecord
Mu sync.RWMutex
}
@@ -69,6 +71,9 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
isIndexer: isIndexer,
LongLivedStreamRecordedService: common.NewStreamRecordedService[interface{}](h, 1000),
}
// Register the bandwidth probe handler so any peer measuring this node's
// throughput can open a dedicated probe stream and read the echo.
h.SetStreamHandler(common.ProtocolBandwidthProbe, common.HandleBandwidthProbe)
var ps *pubsubs.PubSub
if isNode {
logger.Info().Msg("generate opencloud node...")
@@ -77,8 +82,30 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
panic(err) // can't run your node without a propalgation pubsub, of state of node.
}
node.PS = ps
// buildRecord returns a fresh signed PeerRecord as JSON, embedded in each
// heartbeat so the receiving indexer can republish it to the DHT directly.
// peerRecord is nil until claimInfo runs, so the first ~20s heartbeats carry
// no record — that's fine, claimInfo publishes once synchronously at startup.
buildRecord := func() json.RawMessage {
if node.peerRecord == nil {
return nil
}
priv, err := tools.LoadKeyFromFilePrivate()
if err != nil {
return nil
}
fresh := *node.peerRecord
fresh.PeerRecordPayload.ExpiryDate = time.Now().UTC().Add(2 * time.Minute)
payload, _ := json.Marshal(fresh.PeerRecordPayload)
fresh.Signature, err = priv.Sign(payload)
if err != nil {
return nil
}
b, _ := json.Marshal(fresh)
return json.RawMessage(b)
}
logger.Info().Msg("connect to indexers...")
common.ConnectToIndexers(node.Host, 0, 5, node.PeerID) // TODO : make var to change how many indexers are allowed.
common.ConnectToIndexers(node.Host, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, node.PeerID, buildRecord)
logger.Info().Msg("claims my node...")
if _, err := node.claimInfo(conf.GetConfig().Name, conf.GetConfig().Hostname); err != nil {
panic(err)
@@ -100,14 +127,14 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
}
}
node.SubscribeToSearch(node.PS, &f)
logger.Info().Msg("connect to NATS")
go ListenNATS(node)
logger.Info().Msg("Node is actually running.")
}
if isIndexer {
logger.Info().Msg("generate opencloud indexer...")
node.IndexerService = indexer.NewIndexerService(node.Host, ps, 5, isNativeIndexer)
node.IndexerService = indexer.NewIndexerService(node.Host, ps, 500, isNativeIndexer)
}
logger.Info().Msg("connect to NATS")
ListenNATS(node)
logger.Info().Msg("Node is actually running.")
return node, nil
}
@@ -127,24 +154,29 @@ func (d *Node) publishPeerRecord(
if err != nil {
return err
}
common.StreamMuIndexes.RLock()
indexerSnapshot := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
for _, ad := range common.StaticIndexers {
indexerSnapshot = append(indexerSnapshot, ad)
}
common.StreamMuIndexes.RUnlock()
for _, ad := range indexerSnapshot {
var err error
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolPublish, "", common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{},
&common.StreamMuIndexes); err != nil {
continue
}
stream := common.StreamIndexers[common.ProtocolPublish][ad.ID]
base := indexer.PeerRecord{
base := indexer.PeerRecordPayload{
Name: rec.Name,
DID: rec.DID,
PubKey: rec.PubKey,
ExpiryDate: time.Now().UTC().Add(2 * time.Minute),
}
payload, _ := json.Marshal(base)
hash := sha256.Sum256(payload)
rec.ExpiryDate = base.ExpiryDate
rec.Signature, err = priv.Sign(hash[:])
rec.PeerRecordPayload = base
rec.Signature, err = priv.Sign(payload)
if err := json.NewEncoder(stream.Stream).Encode(&rec); err != nil { // then publish on stream
return err
}
@@ -156,38 +188,50 @@ func (d *Node) GetPeerRecord(
ctx context.Context,
pidOrdid string,
) ([]*peer.Peer, error) {
did := pidOrdid // if known pidOrdid is did
pid := pidOrdid // if not known pidOrdid is pid
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
if data := access.Search(nil, did, true); len(data.Data) > 0 {
did = data.Data[0].GetID()
pid = data.Data[0].(*peer.Peer).PeerID
}
var err error
var info map[string]indexer.PeerRecord
common.StreamMuIndexes.RLock()
indexerSnapshot2 := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
for _, ad := range common.StaticIndexers {
indexerSnapshot2 = append(indexerSnapshot2, ad)
}
common.StreamMuIndexes.RUnlock()
// Build the GetValue request: if pidOrdid is neither a UUID DID nor a libp2p
// PeerID, treat it as a human-readable name and let the indexer resolve it.
getReq := indexer.GetValue{Key: pidOrdid}
isNameSearch := false
if pidR, pidErr := pp.Decode(pidOrdid); pidErr == nil {
getReq.PeerID = pidR
} else if _, uuidErr := uuid.Parse(pidOrdid); uuidErr != nil {
// Not a UUID DID → treat pidOrdid as a name substring search.
getReq.Name = pidOrdid
getReq.Key = ""
isNameSearch = true
}
for _, ad := range indexerSnapshot2 {
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolGet, "",
common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{}, &common.StreamMuIndexes); err != nil {
continue
}
pidR, err := pp.Decode(pid)
if err != nil {
stream := common.StreamIndexers[common.ProtocolGet][ad.ID]
if err := json.NewEncoder(stream.Stream).Encode(getReq); err != nil {
continue
}
stream := common.StreamIndexers[common.ProtocolGet][ad.ID]
if err := json.NewEncoder(stream.Stream).Encode(indexer.GetValue{
Key: did,
PeerID: pidR,
}); err != nil {
return nil, err
var resp indexer.GetResponse
if err := json.NewDecoder(stream.Stream).Decode(&resp); err != nil {
continue
}
for {
var resp indexer.GetResponse
if err := json.NewDecoder(stream.Stream).Decode(&resp); err != nil {
return nil, err
}
if resp.Found {
if resp.Found {
if info == nil {
info = resp.Records
} else {
// Aggregate results from all indexers for name searches.
maps.Copy(info, resp.Records)
}
// For exact lookups (PeerID / DID) stop at the first hit.
if !isNameSearch {
break
}
}
@@ -196,7 +240,7 @@ func (d *Node) GetPeerRecord(
for _, pr := range info {
if pk, err := pr.Verify(); err != nil {
return nil, err
} else if ok, p, err := pr.ExtractPeer(d.PeerID.String(), did, pk); err != nil {
} else if ok, p, err := pr.ExtractPeer(d.PeerID.String(), pr.PeerID, pk); err != nil {
return nil, err
} else {
if ok {
@@ -218,7 +262,11 @@ func (d *Node) claimInfo(
}
did := uuid.New().String()
peers := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil).Search(nil, fmt.Sprintf("%v", peer.SELF), false)
peers := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil).Search(&dbs.Filters{
And: map[string][]dbs.Filter{ // search by name if no filters are provided
"peer_id": {{Operator: dbs.EQUAL.String(), Value: d.Host.ID().String()}},
},
}, "", false)
if len(peers.Data) > 0 {
did = peers.Data[0].GetID() // if already existing set up did as made
}
@@ -238,39 +286,38 @@ func (d *Node) claimInfo(
now := time.Now().UTC()
expiry := now.Add(150 * time.Second)
rec := &indexer.PeerRecord{
Name: name,
DID: did, // REAL PEER ID
PubKey: pubBytes,
pRec := indexer.PeerRecordPayload{
Name: name,
DID: did, // REAL PEER ID
PubKey: pubBytes,
ExpiryDate: expiry,
}
rec.PeerID = d.Host.ID().String()
d.PeerID = d.Host.ID()
payload, _ := json.Marshal(pRec)
payload, _ := json.Marshal(rec)
hash := sha256.Sum256(payload)
rec.Signature, err = priv.Sign(hash[:])
rec := &indexer.PeerRecord{
PeerRecordPayload: pRec,
}
rec.Signature, err = priv.Sign(payload)
if err != nil {
return nil, err
}
rec.PeerID = d.Host.ID().String()
rec.APIUrl = endPoint
rec.StreamAddress = "/ip4/" + conf.GetConfig().Hostname + "/tcp/" + fmt.Sprintf("%v", conf.GetConfig().NodeEndpointPort) + "/p2p/" + rec.PeerID
rec.NATSAddress = oclib.GetConfig().NATSUrl
rec.WalletAddress = "my-wallet"
rec.ExpiryDate = expiry
if err := d.publishPeerRecord(rec); err != nil {
return nil, err
}
/*if pk, err := rec.Verify(); err != nil {
fmt.Println("Verify")
d.peerRecord = rec
if _, err := rec.Verify(); err != nil {
return nil, err
} else {*/
_, p, err := rec.ExtractPeer(did, did, pub)
return p, err
//}
} else {
_, p, err := rec.ExtractPeer(did, did, pub)
return p, err
}
}
/*

View File

@@ -4,47 +4,56 @@ import (
"context"
"encoding/json"
"errors"
"oc-discovery/daemons/node/stream"
"oc-discovery/models"
oclib "cloud.o-forge.io/core/oc-lib"
"cloud.o-forge.io/core/oc-lib/dbs"
"cloud.o-forge.io/core/oc-lib/models/peer"
"cloud.o-forge.io/core/oc-lib/tools"
)
func (ps *PubSubService) SearchPublishEvent(
ctx context.Context, dt *tools.DataType, typ string, user string, search string) error {
b, err := json.Marshal(map[string]string{"search": search})
if err != nil {
return err
}
switch typ {
case "known": // define Search Strategy
return ps.StreamService.SearchKnownPublishEvent(dt, user, search) //if partners focus only them*/
return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"": {{Operator: dbs.NOT.String(), Value: dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.BLACKLIST}},
},
}}},
},
}, b, stream.ProtocolSearchResource) //if partners focus only them*/
case "partner": // define Search Strategy
return ps.StreamService.SearchPartnersPublishEvent(dt, user, search) //if partners focus only them*/
return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
},
}, b, stream.ProtocolSearchResource)
case "all": // Gossip PubSub
b, err := json.Marshal(map[string]string{"search": search})
if err != nil {
return err
}
return ps.searchPublishEvent(ctx, dt, user, b)
return ps.publishEvent(ctx, dt, tools.PB_SEARCH, user, b)
default:
return errors.New("no type of research found")
}
}
func (ps *PubSubService) searchPublishEvent(
ctx context.Context, dt *tools.DataType, user string, payload []byte) error {
return ps.publishEvent(ctx, dt, tools.PB_SEARCH, user, payload)
}
func (ps *PubSubService) publishEvent(
ctx context.Context, dt *tools.DataType, action tools.PubSubAction, user string, payload []byte,
) error {
from, err := oclib.GenerateNodeID()
if err != nil {
return err
}
priv, err := tools.LoadKeyFromFilePrivate()
if err != nil {
return err
}
msg, _ := json.Marshal(models.NewEvent(action.String(), from, dt, user, payload, priv))
msg, _ := json.Marshal(models.NewEvent(action.String(), ps.Host.ID().String(), dt, user, payload, priv))
topic, err := ps.PS.Join(action.String())
if err != nil {
return err

View File

@@ -5,6 +5,7 @@ import (
"crypto/subtle"
"encoding/json"
"errors"
"fmt"
"oc-discovery/daemons/node/common"
oclib "cloud.o-forge.io/core/oc-lib"
@@ -19,6 +20,7 @@ type Verify struct {
}
func (ps *StreamService) handleEvent(protocol string, evt *common.Event) error {
fmt.Println("handleEvent")
ps.handleEventFromPartner(evt, protocol)
/*if protocol == ProtocolVerifyResource {
if evt.DataType == -1 {
@@ -148,14 +150,6 @@ func (abs *StreamService) pass(event *common.Event, action tools.PubSubAction) e
}
func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol string) error {
resource, err := resources.ToResource(int(evt.DataType), evt.Payload)
if err != nil {
return err
}
b, err := json.Marshal(resource)
if err != nil {
return err
}
switch protocol {
case ProtocolSearchResource:
if evt.DataType < 0 {
@@ -169,20 +163,20 @@ func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol stri
ps.SendResponse(p[0], evt)
}
}
case ProtocolCreateResource:
case ProtocolUpdateResource:
case ProtocolCreateResource, ProtocolUpdateResource:
fmt.Println("RECEIVED Protocol.Update")
go tools.NewNATSCaller().SetNATSPub(tools.CREATE_RESOURCE, tools.NATSResponse{
FromApp: "oc-discovery",
Datatype: tools.DataType(evt.DataType),
Method: int(tools.CREATE_RESOURCE),
Payload: b,
Payload: evt.Payload,
})
case ProtocolDeleteResource:
go tools.NewNATSCaller().SetNATSPub(tools.REMOVE_RESOURCE, tools.NATSResponse{
FromApp: "oc-discovery",
Datatype: tools.DataType(evt.DataType),
Method: int(tools.REMOVE_RESOURCE),
Payload: b,
Payload: evt.Payload,
})
default:
return errors.New("no action authorized available : " + protocol)
@@ -213,9 +207,9 @@ func (abs *StreamService) SendResponse(p *peer.Peer, event *common.Event) error
if j, err := json.Marshal(ss); err == nil {
if event.DataType != -1 {
ndt := tools.DataType(dt.EnumIndex())
abs.PublishResources(&ndt, event.User, peerID, j)
abs.PublishCommon(&ndt, event.User, peerID, ProtocolSearchResource, j)
} else {
abs.PublishResources(nil, event.User, peerID, j)
abs.PublishCommon(nil, event.User, peerID, ProtocolSearchResource, j)
}
}
}

View File

@@ -15,81 +15,45 @@ import (
"github.com/libp2p/go-libp2p/core/protocol"
)
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) {
func (ps *StreamService) PublishesCommon(dt *tools.DataType, user string, filter *dbs.Filters, resource []byte, protos ...protocol.ID) error {
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
p := access.LoadOne(toPeerID)
if p.Err != "" {
return nil, errors.New(p.Err)
} else {
ad, err := pp.AddrInfoFromString(p.Data.(*peer.Peer).StreamAddress)
p := access.Search(filter, "", false)
for _, pes := range p.Data {
for _, proto := range protos {
if _, err := ps.PublishCommon(dt, user, pes.(*peer.Peer).PeerID, proto, resource); err != nil {
return err
}
}
}
return nil
}
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) {
fmt.Println("PublishCommon")
if toPeerID == ps.Key.String() {
return nil, errors.New("Can't send to ourself !")
}
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
p := access.Search(&dbs.Filters{
And: map[string][]dbs.Filter{ // search by name if no filters are provided
"peer_id": {{Operator: dbs.EQUAL.String(), Value: toPeerID}},
},
}, toPeerID, false)
var pe *peer.Peer
if len(p.Data) > 0 && p.Data[0].(*peer.Peer).Relation != peer.BLACKLIST {
pe = p.Data[0].(*peer.Peer)
} else if pps, err := ps.Node.GetPeerRecord(context.Background(), toPeerID); err == nil && len(pps) > 0 {
pe = pps[0]
}
if pe != nil {
ad, err := pp.AddrInfoFromString(p.Data[0].(*peer.Peer).StreamAddress)
if err != nil {
return nil, err
}
return ps.write(toPeerID, ad, dt, user, resource, proto)
}
}
func (ps *StreamService) PublishResources(dt *tools.DataType, user string, toPeerID string, resource []byte) error {
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
p := access.LoadOne(toPeerID)
if p.Err != "" {
return errors.New(p.Err)
} else {
ad, err := pp.AddrInfoFromString(p.Data.(*peer.Peer).StreamAddress)
if err != nil {
return err
}
ps.write(toPeerID, ad, dt, user, resource, ProtocolSearchResource)
}
return nil
}
func (ps *StreamService) SearchKnownPublishEvent(dt *tools.DataType, user string, search string) error {
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
peers := access.Search(&dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"": {{Operator: dbs.NOT.String(), Value: dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.BLACKLIST}},
},
}}},
},
}, search, false)
if peers.Err != "" {
return errors.New(peers.Err)
} else {
b, err := json.Marshal(map[string]string{"search": search})
if err != nil {
return err
}
for _, p := range peers.Data {
ad, err := pp.AddrInfoFromString(p.(*peer.Peer).StreamAddress)
if err != nil {
continue
}
ps.write(p.GetID(), ad, dt, user, b, ProtocolSearchResource)
}
}
return nil
}
func (ps *StreamService) SearchPartnersPublishEvent(dt *tools.DataType, user string, search string) error {
if peers, err := ps.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex())); err != nil {
return err
} else {
b, err := json.Marshal(map[string]string{"search": search})
if err != nil {
return err
}
for _, p := range peers {
ad, err := pp.AddrInfoFromString(p.StreamAddress)
if err != nil {
continue
}
ps.write(p.GetID(), ad, dt, user, b, ProtocolSearchResource)
}
}
return nil
return nil, errors.New("peer unvalid " + toPeerID)
}
func (ps *StreamService) ToPartnerPublishEvent(
@@ -103,35 +67,44 @@ func (ps *StreamService) ToPartnerPublishEvent(
if err != nil {
return err
}
ps.Mu.Lock()
defer ps.Mu.Unlock()
if p.Relation == peer.PARTNER {
if ps.Streams[ProtocolHeartbeatPartner] == nil {
ps.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{}
}
ps.ConnectToPartner(p.StreamAddress)
} else if ps.Streams[ProtocolHeartbeatPartner] != nil && ps.Streams[ProtocolHeartbeatPartner][pid] != nil {
for _, pids := range ps.Streams {
if pids[pid] != nil {
delete(pids, pid)
if pe, err := oclib.GetMySelf(); err != nil {
return err
} else if pe.GetID() == p.GetID() {
return fmt.Errorf("can't send to ourself")
} else {
pe.Relation = p.Relation
pe.Verify = false
if b2, err := json.Marshal(pe); err == nil {
if _, err := ps.PublishCommon(dt, user, p.PeerID, ProtocolUpdateResource, b2); err != nil {
return err
}
if p.Relation == peer.PARTNER {
if ps.Streams[ProtocolHeartbeatPartner] == nil {
ps.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{}
}
fmt.Println("SHOULD CONNECT")
ps.ConnectToPartner(p.StreamAddress)
} else if ps.Streams[ProtocolHeartbeatPartner] != nil && ps.Streams[ProtocolHeartbeatPartner][pid] != nil {
for _, pids := range ps.Streams {
if pids[pid] != nil {
delete(pids, pid)
}
}
}
}
}
return nil
}
if peers, err := ps.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex())); err != nil {
return err
} else {
for _, p := range peers {
for protocol := range protocolsPartners {
ad, err := pp.AddrInfoFromString(p.StreamAddress)
if err != nil {
continue
}
ps.write(p.GetID(), ad, dt, user, payload, protocol)
}
}
ks := []protocol.ID{}
for k := range protocolsPartners {
ks = append(ks, k)
}
ps.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
},
}, payload, ks...)
return nil
}
@@ -158,6 +131,7 @@ func (s *StreamService) write(
}
stream := s.Streams[proto][peerID.ID]
evt := common.NewEvent(string(proto), peerID.ID.String(), dt, user, payload)
fmt.Println("SEND EVENT ", evt.From, evt.DataType, evt.Timestamp)
if err := json.NewEncoder(stream.Stream).Encode(evt); err != nil {
stream.Stream.Close()
logger.Err(err)

View File

@@ -116,7 +116,7 @@ func (s *StreamService) HandlePartnerHeartbeat(stream network.Stream) {
streamsAnonym[k] = v
}
s.Mu.Unlock()
pid, hb, err := common.CheckHeartbeat(s.Host, stream, streamsAnonym, &s.Mu, s.maxNodesConn)
pid, hb, err := common.CheckHeartbeat(s.Host, stream, json.NewDecoder(stream), streamsAnonym, &s.Mu, s.maxNodesConn)
if err != nil {
return
}
@@ -132,10 +132,12 @@ func (s *StreamService) HandlePartnerHeartbeat(stream network.Stream) {
s.ConnectToPartner(val)
}
}
go s.StartGC(30 * time.Second)
// GC is already running via InitStream — starting a new ticker goroutine on
// every heartbeat would leak an unbounded number of goroutines.
}
func (s *StreamService) connectToPartners() error {
logger := oclib.GetLogger()
for proto, info := range protocolsPartners {
f := func(ss network.Stream) {
if s.Streams[proto] == nil {
@@ -147,11 +149,12 @@ func (s *StreamService) connectToPartners() error {
}
go s.readLoop(s.Streams[proto][ss.Conn().RemotePeer()], ss.Conn().RemotePeer(), proto, info)
}
fmt.Println("SetStreamHandler", proto)
logger.Info().Msg("SetStreamHandler " + string(proto))
s.Host.SetStreamHandler(proto, f)
}
peers, err := s.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex()))
if err != nil {
logger.Err(err)
return err
}
for _, p := range peers {
@@ -161,19 +164,19 @@ func (s *StreamService) connectToPartners() error {
}
func (s *StreamService) ConnectToPartner(address string) {
logger := oclib.GetLogger()
if ad, err := pp.AddrInfoFromString(address); err == nil {
logger.Info().Msg("Connect to Partner " + ProtocolHeartbeatPartner + " " + address)
common.SendHeartbeat(context.Background(), ProtocolHeartbeatPartner, conf.GetConfig().Name,
s.Host, s.Streams, map[string]*pp.AddrInfo{address: ad}, 20*time.Second)
s.Host, s.Streams, map[string]*pp.AddrInfo{address: ad}, nil, 20*time.Second)
}
}
func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
/* TODO FOR TEST ONLY A VARS THAT DEFINE ADDRESS... deserialize */
ps := []*peer.Peer{}
if conf.GetConfig().PeerIDS != "" {
for _, peerID := range strings.Split(conf.GetConfig().PeerIDS, ",") {
ppID := strings.Split(peerID, "/")
fmt.Println(ppID, peerID)
ps = append(ps, &peer.Peer{
AbstractObject: utils.AbstractObject{
UUID: uuid.New().String(),
@@ -185,7 +188,6 @@ func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
})
}
}
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
peers := access.Search(nil, search, false)
for _, p := range peers.Data {
@@ -252,8 +254,9 @@ func (ps *StreamService) readLoop(s *common.Stream, id pp.ID, proto protocol.ID,
}
var evt common.Event
if err := json.NewDecoder(s.Stream).Decode(&evt); err != nil {
s.Stream.Close()
continue
// Any decode error (EOF, reset, malformed JSON) terminates the loop;
// continuing on a dead/closed stream creates an infinite spin.
return
}
ps.handleEvent(evt.Type, &evt)
if protocolInfo.WaitResponse && !protocolInfo.PersistantStream {

View File

@@ -1,23 +1,33 @@
#!/bin/bash
IMAGE_BASE_NAME="oc-discovery"
DOCKERFILE_PATH="."
for i in {0..3}; do
docker network create \
--subnet=172.40.0.0/24 \
discovery
for i in $(seq ${1:-0} ${2:-3}); do
NUM=$((i + 1))
PORT=$((4000 + $NUM))
IMAGE_NAME="${IMAGE_BASE_NAME}:${NUM}"
echo "▶ Building image ${IMAGE_NAME} with CONF_NUM=${NUM}"
docker build \
--build-arg CONF_NUM=${NUM} \
-t ${IMAGE_NAME} \
-t "${IMAGE_BASE_NAME}_${NUM}" \
${DOCKERFILE_PATH}
docker kill "${IMAGE_BASE_NAME}_${NUM}" | true
docker rm "${IMAGE_BASE_NAME}_${NUM}" | true
echo "▶ Running container ${IMAGE_NAME} on port ${PORT}:${PORT}"
docker run -d \
--network="${3:-oc}" \
-p ${PORT}:${PORT} \
--name "${IMAGE_BASE_NAME}_${NUM}" \
${IMAGE_NAME}
"${IMAGE_BASE_NAME}_${NUM}"
docker network connect --ip "172.40.0.${NUM}" discovery "${IMAGE_BASE_NAME}_${NUM}"
done

10
docker_discovery10.json Normal file
View File

@@ -0,0 +1,10 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "node",
"NODE_ENDPOINT_PORT": 4010,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
"MIN_INDEXER": 2,
"PEER_IDS": "/ip4/172.40.0.9/tcp/4009/p2p/12D3KooWGnQfKwX9E4umCPE8dUKZuig4vw5BndDowRLEbGmcZyta"
}

View File

@@ -4,5 +4,5 @@
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "indexer",
"NODE_ENDPOINT_PORT": 4002,
"INDEXER_ADDRESSES": "/ip4/172.19.0.2/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
"INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
}

View File

@@ -4,5 +4,5 @@
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "node",
"NODE_ENDPOINT_PORT": 4003,
"INDEXER_ADDRESSES": "/ip4/172.19.0.3/tcp/4002/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
"INDEXER_ADDRESSES": "/ip4/172.40.0.2/tcp/4002/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
}

View File

@@ -4,6 +4,6 @@
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "node",
"NODE_ENDPOINT_PORT": 4004,
"INDEXER_ADDRESSES": "/ip4/172.19.0.2/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
"PEER_IDS": "/ip4/172.19.0.4/tcp/4003/p2p/12D3KooWBh9kZrekBAE5G33q4jCLNRAzygem3gP1mMdK8mhoCTaw"
"INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
"PEER_IDS": "/ip4/172.40.0.3/tcp/4003/p2p/12D3KooWBh9kZrekBAE5G33q4jCLNRAzygem3gP1mMdK8mhoCTaw"
}

7
docker_discovery5.json Normal file
View File

@@ -0,0 +1,7 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "native-indexer",
"NODE_ENDPOINT_PORT": 4005
}

8
docker_discovery6.json Normal file
View File

@@ -0,0 +1,8 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "native-indexer",
"NODE_ENDPOINT_PORT": 4006,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
}

8
docker_discovery7.json Normal file
View File

@@ -0,0 +1,8 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "indexer",
"NODE_ENDPOINT_PORT": 4007,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
}

8
docker_discovery8.json Normal file
View File

@@ -0,0 +1,8 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "indexer",
"NODE_ENDPOINT_PORT": 4008,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
}

8
docker_discovery9.json Normal file
View File

@@ -0,0 +1,8 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "node",
"NODE_ENDPOINT_PORT": 4009,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u,/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
}

2
go.mod
View File

@@ -3,7 +3,7 @@ module oc-discovery
go 1.25.0
require (
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5
github.com/libp2p/go-libp2p v0.47.0
github.com/libp2p/go-libp2p-record v0.3.1
github.com/multiformats/go-multiaddr v0.16.1

8
go.sum
View File

@@ -1,5 +1,13 @@
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7 h1:p9uJjMY+QkE4neA+xRmIRtAm9us94EKZqgajDdLOd0Y=
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c h1:FTUu9tdEfib6J+fuc7e5wYTe++EIlB70bVNpOeFjnyU=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0 h1:lvrRF4ToIMl/5k1q4AiPEy6ycjwRtOaDhWnQ/LrW1ZA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31 h1:hvkvJibS9NmImw73j79Ov5VpIYs4WbP4SYGlK/XO82Q=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5 h1:h+Fkyj6cfwAirc0QGCBEkZSSrgcyThXswg7ytOLm948=
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/Masterminds/semver/v3 v3.4.0 h1:Zog+i5UMtVoCU8oKka5P7i9q9HgrJeGzI9SA1Xbatp0=
github.com/Masterminds/semver/v3 v3.4.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM=

View File

@@ -28,11 +28,15 @@ func main() {
conf.GetConfig().PSKPath = o.GetStringDefault("PSK_PATH", "./psk/psk.key")
conf.GetConfig().NodeEndpointPort = o.GetInt64Default("NODE_ENDPOINT_PORT", 4001)
conf.GetConfig().IndexerAddresses = o.GetStringDefault("INDEXER_ADDRESSES", "")
conf.GetConfig().NativeIndexerAddresses = o.GetStringDefault("NATIVE_INDEXER_ADDRESSES", "")
conf.GetConfig().PeerIDS = o.GetStringDefault("PEER_IDS", "")
conf.GetConfig().NodeMode = o.GetStringDefault("NODE_MODE", "node")
conf.GetConfig().MinIndexer = o.GetIntDefault("MIN_INDEXER", 1)
conf.GetConfig().MaxIndexer = o.GetIntDefault("MAX_INDEXER", 5)
ctx, stop := signal.NotifyContext(
context.Background(),
os.Interrupt,
@@ -47,7 +51,7 @@ func main() {
if n, err := node.InitNode(isNode, isIndexer, isNativeIndexer); err != nil {
panic(err)
} else {
<-ctx.Done() // 👈 the only blocking point
<-ctx.Done() // the only blocking point
log.Println("shutting down")
n.Close()
}

3
pem/private10.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIPc7D3Mgb1U2Ipyb/85hA4Ew7dC8zHDEuQYSjqzzRgLK
-----END PRIVATE KEY-----

3
pem/private5.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIK2oBaOtGNchE09MBRtPd5oEOUcVUQG2ndym5wKExj7R
-----END PRIVATE KEY-----

3
pem/private6.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIE58GDazCyF1jp796ivSmHiCepbkC8TpzliIaQ7eGEpu
-----END PRIVATE KEY-----

3
pem/private7.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIAeX4O7ldwehRSnPkbzuE6csyo63vjvqAcNNujENOKUC
-----END PRIVATE KEY-----

3
pem/private8.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIEkgqINXDLnxIJZs2LEK9O4vdsqk43dwbULGUE25AWuR
-----END PRIVATE KEY-----

3
pem/private9.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIBcflxGlZYyUVJoExC94rHZbIyKMwZ+Oh7EDkb0qUlxd
-----END PRIVATE KEY-----

3
pem/public10.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAEomuEQGmGsYVw35C6DB5tfY8LI8jm359ceAxRX8eQ0o=
-----END PUBLIC KEY-----

3
pem/public5.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAZ2nLJBL8a5opfa8nFeVj0SZToW8pl4+zgcSUkeZFRO4=
-----END PUBLIC KEY-----

3
pem/public6.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAIQVeSGwsjPjyepPTnzzYqVxIxviSEjZXU7C7zuNTui4=
-----END PUBLIC KEY-----

3
pem/public7.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAG95Ettl3jTi41HM8le1A9WDmOEq0ANEqpLF7zTZrfXA=
-----END PUBLIC KEY-----

3
pem/public8.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEA/ymOIb0sJ0qCWrf3mKz7ACCvsMXLog/EK533JfNXZTM=
-----END PUBLIC KEY-----

3
pem/public9.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAZ4F3KqOp/5QrPdZGqqX6PYYEGd2snX4Q3AUt9XAG3v8=
-----END PUBLIC KEY-----