demo test + Peer

This commit is contained in:
mr
2026-03-03 16:38:24 +01:00
parent 79aa3cc2b3
commit ef3d998ead
41 changed files with 2360 additions and 578 deletions

495
ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,495 @@
# oc-discovery — Architecture et analyse technique
> **Convention de lecture**
> Les points marqués ✅ ont été corrigés dans le code. Les points marqués ⚠️ restent ouverts.
## Table des matières
1. [Vue d'ensemble](#1-vue-densemble)
2. [Hiérarchie des rôles](#2-hiérarchie-des-rôles)
3. [Mécanismes principaux](#3-mécanismes-principaux)
- 3.1 Heartbeat long-lived (node → indexer)
- 3.2 Scoring de confiance
- 3.3 Enregistrement auprès des natifs (indexer → native)
- 3.4 Pool d'indexeurs : fetch + consensus
- 3.5 Self-delegation et offload loop
- 3.6 Résilience du mesh natif
- 3.7 DHT partagée
- 3.8 PubSub gossip (indexer registry)
- 3.9 Streams applicatifs (node ↔ node)
4. [Tableau récapitulatif](#4-tableau-récapitulatif)
5. [Risques et limites globaux](#5-risques-et-limites-globaux)
6. [Pistes d'amélioration](#6-pistes-damélioration)
---
## 1. Vue d'ensemble
`oc-discovery` est un service de découverte P2P pour le réseau OpenCloud. Il repose sur
**libp2p** (transport TCP + PSK réseau privé) et une **DHT Kademlia** (préfixe `oc`)
pour indexer les pairs. L'architecture est intentionnellement hiérarchique : des _natifs_
stables servent de hubs autoritaires auxquels des _indexeurs_ s'enregistrent, et des _nœuds_
ordinaires découvrent des indexeurs via ces natifs.
```
┌──────────────┐ heartbeat ┌──────────────────┐
│ Node │ ───────────────────► │ Indexer │
│ (libp2p) │ ◄─────────────────── │ (DHT server) │
└──────────────┘ stream applicatif └────────┬─────────┘
│ subscribe / heartbeat
┌──────────────────┐
│ Native Indexer │◄──► autres natifs
│ (hub autoritaire│ (mesh)
└──────────────────┘
```
Tous les participants partagent une **clé pré-partagée (PSK)** qui isole le réseau
des connexions libp2p externes non autorisées.
---
## 2. Hiérarchie des rôles
| Rôle | Binaire | Responsabilité |
|---|---|---|
| **Node** | `node_mode=node` | Se fait indexer, publie/consulte des records DHT |
| **Indexer** | `node_mode=indexer` | Reçoit les heartbeats, écrit en DHT, s'enregistre auprès des natifs |
| **Native Indexer** | `node_mode=native` | Hub : tient le registre des indexeurs vivants, évalue le consensus, sert de fallback |
Un même processus peut cumuler les rôles node+indexer ou indexer+native.
---
## 3. Mécanismes principaux
### 3.1 Heartbeat long-lived (node → indexer)
**Fonctionnement**
Un stream libp2p **persistant** (`/opencloud/heartbeat/1.0`) est ouvert depuis le nœud
vers chaque indexeur de son pool (`StaticIndexers`). Toutes les 20 secondes, le nœud
envoie un `Heartbeat` JSON sur ce stream. L'indexeur répond en enregistrant le peer dans
`StreamRecords[ProtocolHeartbeat]` avec une expiry de 2 min.
Si `sendHeartbeat` échoue (stream reset, EOF, timeout), le peer est retiré de
`StaticIndexers` et `replenishIndexersFromNative` est déclenché.
**Avantages**
- Détection rapide de déconnexion (erreur sur le prochain encode).
- Un seul stream par pair réduit la pression sur les connexions TCP.
- Le channel de nudge (`indexerHeartbeatNudge`) permet un reconnect immédiat sans
attendre le ticker de 20 s.
**Limites / risques**
- ⚠️ Un seul stream persistant : si la couche TCP reste ouverte mais "gelée" (middlebox,
NAT silencieux), l'erreur peut ne pas remonter avant plusieurs minutes.
- ⚠️ `StaticIndexers` est une map partagée globale : si deux goroutines appellent
`replenishIndexersFromNative` simultanément (cas de perte multiple), on peut avoir
des écritures concurrentes non protégées hors des sections critiques.
---
### 3.2 Scoring de confiance
**Fonctionnement**
Avant d'enregistrer un heartbeat dans `StreamRecords`, l'indexeur vérifie un **score
minimum** calculé par `CheckHeartbeat` :
```
Score = (0.4 × uptime_ratio + 0.4 × bpms + 0.2 × diversity) × 100
```
- `uptime_ratio` : durée de présence du peer / durée depuis le démarrage de l'indexeur.
- `bpms` : débit mesuré via un stream dédié (`/opencloud/probe/1.0`) normalisé par 50 Mbps.
- `diversity` : ratio d'IP /24 distincts parmi les indexeurs que le peer déclare.
Deux seuils sont appliqués selon l'état du peer :
- **Premier heartbeat** (peer absent de `StreamRecords`, uptime = 0) : seuil à **40**.
- **Heartbeats suivants** (uptime accumulé) : seuil à **75**.
**Avantages**
- Décourage les peers éphémères ou lents d'encombrer le registre.
- La diversité réseau réduit le risque de concentration sur un seul sous-réseau.
- Le stream de probe dédié évite de polluer le stream JSON heartbeat avec des données binaires.
- Le double seuil permet aux nouveaux peers d'être admis dès leur première connexion.
**Limites / risques**
-**Deadlock logique de démarrage corrigé** : avec uptime = 0 le score maximal était 60,
en-dessous du seuil de 75. Les nouveaux peers étaient silencieusement rejetés à jamais.
→ Seuil abaissé à **40** pour le premier heartbeat (`isFirstHeartbeat`), 75 ensuite.
- ⚠️ Les seuils (40 / 75) restent câblés en dur, sans possibilité de configuration.
- ⚠️ La mesure de bande passante envoie entre 512 et 2048 octets par heartbeat : à 20 s
d'intervalle et 500 nœuds max, cela représente ~50 KB/s de trafic probe en continu.
- ⚠️ `diversity` est calculé sur les adresses que le nœud *déclare* avoir — ce champ est
auto-rapporté et non vérifié, facilement falsifiable.
---
### 3.3 Enregistrement auprès des natifs (indexer → native)
**Fonctionnement**
Chaque indexeur (non-natif) envoie périodiquement (toutes les 60 s) une
`IndexerRegistration` JSON sur un stream one-shot (`/opencloud/native/subscribe/1.0`)
vers chaque natif configuré. Le natif :
1. Stocke l'entrée en cache local avec un TTL de **90 s** (`IndexerTTL`).
2. Gossipe le `PeerID` sur le topic PubSub `oc-indexer-registry` aux autres natifs.
3. Persiste l'entrée en DHT de manière asynchrone (retry jusqu'à succès).
**Avantages**
- Stream jetable : pas de ressource longue durée côté natif pour les enregistrements.
- Le cache local est immédiatement disponible pour `handleNativeGetIndexers` sans
attendre la DHT.
- La dissémination PubSub permet à d'autres natifs de connaître l'indexeur sans
qu'il ait besoin de s'y enregistrer directement.
**Limites / risques**
-**TTL trop serré corrigé** : le TTL de 66 s n'était que 10 % au-dessus de l'intervalle
de 60 s — un léger retard réseau pouvait expirer un indexeur sain entre deux renewals.
`IndexerTTL` porté à **90 s** (+50 %).
- ⚠️ Si le `PutValue` DHT échoue définitivement (réseau partitionné), le natif possède
l'entrée mais les autres natifs qui n'ont pas reçu le message PubSub ne la connaissent
jamais — incohérence silencieuse.
- ⚠️ `RegisterWithNative` ignore les adresses en `127.0.0.1`, mais ne gère pas
les adresses privées (RFC1918) qui seraient non routables depuis d'autres hôtes.
---
### 3.4 Pool d'indexeurs : fetch + consensus
**Fonctionnement**
Lors de `ConnectToNatives` (démarrage ou replenish), le nœud/indexeur :
1. **Fetch** : envoie `GetIndexersRequest` au premier natif répondant
(`/opencloud/native/indexers/1.0`), reçoit une liste de candidats.
2. **Consensus (round 1)** : interroge **tous** les natifs configurés en parallèle
(`/opencloud/native/consensus/1.0`, timeout 3 s, collecte sur 4 s).
Un indexeur est confirmé si **strictement plus de 50 %** des natifs répondants
le considèrent vivant.
3. **Consensus (round 2)** : si le pool est insuffisant, les suggestions des natifs
(indexeurs qu'ils connaissent mais qui n'étaient pas dans les candidats initiaux)
sont soumises à un second round.
**Avantages**
- La règle de majorité absolue empêche un natif compromis ou désynchronisé d'injecter
des indexeurs fantômes.
- Le double round permet de compléter le pool avec des alternatives connues des natifs
sans sacrifier la vérification.
- Si le fetch retourne un **fallback** (natif comme indexeur), le consensus est skippé —
cohérent car il n'y a qu'une seule source.
**Limites / risques**
- ⚠️ Avec **un seul natif** configuré (très courant en dev/test), le consensus est trivial
(100 % d'un seul vote) — la règle de majorité ne protège rien dans ce cas.
- ⚠️ `fetchIndexersFromNative` s'arrête au **premier natif répondant** (séquentiellement) :
si ce natif a un cache périmé ou partiel, le nœud obtient un pool sous-optimal sans
consulter les autres.
- ⚠️ Le timeout de collecte global (4 s) est fixe : sur un réseau lent ou géographiquement
distribué, des natifs valides peuvent être éliminés faute de réponse à temps.
- ⚠️ `replaceStaticIndexers` **ajoute** sans jamais retirer d'anciens indexeurs expirés :
le pool peut accumuler des entrées mortes que seul le heartbeat purge ensuite.
---
### 3.5 Self-delegation et offload loop
**Fonctionnement**
Si un natif ne dispose d'aucun indexeur vivant lors d'un `handleNativeGetIndexers`,
il se désigne lui-même comme indexeur temporaire (`selfDelegate`) : il retourne sa propre
adresse multiaddr et ajoute le demandeur dans `responsiblePeers`, dans la limite de
`maxFallbackPeers` (50). Au-delà, la délégation est refusée et une réponse vide est
retournée pour que le nœud tente un autre natif.
Toutes les 30 s, `runOffloadLoop` vérifie si des indexeurs réels sont de nouveau
disponibles. Si oui, pour chaque peer responsable :
- **Stream présent** : `Reset()` du stream heartbeat — le peer reçoit une erreur,
déclenche `replenishIndexersFromNative` et migre vers de vrais indexeurs.
- **Stream absent** (peer jamais admis par le scoring) : `ClosePeer()` sur la connexion
réseau — le peer reconnecte et re-demande ses indexeurs au natif.
**Avantages**
- Continuité de service : un nœud n'est jamais bloqué en l'absence temporaire d'indexeurs.
- La migration est automatique et transparente pour le nœud.
- `Reset()` (vs `Close()`) interrompt les deux sens du stream, garantissant que le peer
reçoit bien une erreur.
- La limite de 50 empêche le natif de se retrouver surchargé lors de pénuries prolongées.
**Limites / risques**
-**Offload sans stream corrigé** : si le heartbeat n'avait jamais été enregistré dans
`StreamRecords` (score < seuil cas amplifié par le bug de scoring), l'offload
échouait silencieusement et le peer restait dans `responsiblePeers` indéfiniment.
Branche `else` : `ClosePeer()` + suppression de `responsiblePeers`.
- **`responsiblePeers` illimité corrigé** : le natif acceptait un nombre arbitraire
de peers en self-delegation, devenant lui-même un indexeur surchargé.
`selfDelegate` vérifie `len(responsiblePeers) >= maxFallbackPeers` et retourne
`false` si saturé.
- La délégation reste non coordonnée entre natifs : un natif surchargé refuse (retourne
vide) mais ne redirige pas explicitement vers un natif voisin qui aurait de la capacité.
---
### 3.6 Résilience du mesh natif
**Fonctionnement**
Quand le heartbeat vers un natif échoue, `replenishNativesFromPeers` tente de trouver
un remplaçant dans cet ordre :
1. `fetchNativeFromNatives` : demande à chaque natif vivant (`/opencloud/native/peers/1.0`)
une adresse de natif inconnue.
2. `fetchNativeFromIndexers` : demande à chaque indexeur connu
(`/opencloud/indexer/natives/1.0`) ses natifs configurés.
3. Si aucun remplaçant et `remaining ≤ 1` : `retryLostNative` relance un ticker de 30 s
qui retente la connexion directe au natif perdu.
`EnsureNativePeers` maintient des heartbeats de natif à natif via `ProtocolHeartbeat`,
avec une **unique goroutine** couvrant toute la map `StaticNatives`.
**Avantages**
- Le gossip multi-hop via indexeurs permet de retrouver un natif même si aucun pair
direct ne le connaît.
- `retryLostNative` gère le cas d'un seul natif (déploiement minimal).
- La reconnexion automatique (`retryLostNative`) déclenche `replenishIndexersIfNeeded`
pour restaurer aussi le pool d'indexeurs.
**Limites / risques**
- **Goroutines heartbeat multiples corrigé** : `EnsureNativePeers` démarrait une
goroutine `SendHeartbeat` par adresse native (N natifs N goroutines N² heartbeats
par tick). Utilisation de `nativeMeshHeartbeatOnce` : une seule goroutine itère sur
`StaticNatives`.
- `retryLostNative` tourne indéfiniment sans condition d'arrêt liée à la vie du processus
(pas de `context.Context`). Si le binaire est gracefully shutdown, cette goroutine
peut bloquer.
- La découverte transitoire (natif indexeur natif) est à sens unique : un indexeur
ne connaît que les natifs de sa propre config, pas les nouveaux natifs qui auraient
rejoint après son démarrage.
---
### 3.7 DHT partagée
**Fonctionnement**
Tous les indexeurs et natifs participent à une DHT Kademlia (préfixe `oc`, mode
`ModeServer`). Deux namespaces sont utilisés :
- `/node/<DID>` `PeerRecord` JSON signé (publié par les indexeurs sur heartbeat de nœud).
- `/indexer/<PeerID>` `liveIndexerEntry` JSON avec TTL (publié par les natifs).
Chaque natif lance `refreshIndexersFromDHT` (toutes les 30 s) qui ré-hydrate son cache
local depuis la DHT pour les PeerIDs connus (`knownPeerIDs`) dont l'entrée locale a expiré.
**Avantages**
- Persistance décentralisée : un record survit à la perte d'un seul natif ou indexeur.
- Validation des entrées : `PeerRecordValidator` et `IndexerRecordValidator` rejettent
les records malformés ou expirés au moment du `PutValue`.
- L'index secondaire `/name/<name>` permet la résolution par nom humain.
**Limites / risques**
- La DHT Kademlia en réseau privé (PSK) est fonctionnelle mais les nœuds bootstrap
ne sont pas configurés explicitement : la découverte dépend de connexions déjà établies,
ce qui peut ralentir la convergence au démarrage.
- `PutValue` est réessayé en boucle infinie si `"failed to find any peer in table"`
une panne de réseau prolongée génère des goroutines bloquées.
- Si la PSK est compromise, un attaquant peut écrire dans la DHT ; les `liveIndexerEntry`
d'indexeurs ne sont pas signées, contrairement aux `PeerRecord`.
- `refreshIndexersFromDHT` prune `knownPeerIDs` si la DHT n'a aucune entrée fraîche,
mais ne prune pas `liveIndexers` une entrée expirée reste en mémoire jusqu'au GC
ou au prochain refresh.
---
### 3.8 PubSub gossip (indexer registry)
**Fonctionnement**
Quand un indexeur s'enregistre auprès d'un natif, ce dernier publie l'adresse sur le
topic GossipSub `oc-indexer-registry`. Les autres natifs abonnés mettent à jour leur
`knownPeerIDs` sans attendre la DHT.
Le `TopicValidator` rejette tout message dont le contenu n'est pas un multiaddr
parseable valide avant qu'il n'atteigne la boucle de traitement.
**Avantages**
- Dissémination quasi-instantanée entre natifs connectés.
- Complément utile à la DHT pour les registrations récentes qui n'ont pas encore
été persistées.
- Le filtre syntaxique bloque les messages malformés avant propagation dans le mesh.
**Limites / risques**
- **`TopicValidator` sans validation corrigé** : le validateur acceptait systématiquement
tous les messages (`return true`), permettant à un natif compromis de gossiper
n'importe quelle donnée.
Le validateur vérifie désormais que le message est un multiaddr parseable
(`pp.AddrInfoFromString`).
- La validation reste syntaxique uniquement : l'origine du message (l'émetteur
est-il un natif légitime ?) n'est pas vérifiée.
- Si le natif redémarre, il perd son abonnement et manque les messages publiés
pendant son absence. La re-hydratation depuis la DHT compense, mais avec un délai
pouvant aller jusqu'à 30 s.
- Le gossip ne porte que le `Addr` de l'indexeur, pas sa TTL ni sa signature.
---
### 3.9 Streams applicatifs (node ↔ node)
**Fonctionnement**
`StreamService` gère les streams entre nœuds partenaires (relations `PARTNER` stockées
en base) via des protocols dédiés (`/opencloud/resource/*`). Un heartbeat partenaire
(`ProtocolHeartbeatPartner`) maintient les connexions actives. Les events sont routés
via `handleEvent` et le système NATS en parallèle.
**Avantages**
- TTL par protocol (`PersistantStream`, `WaitResponse`) adapte le comportement au
type d'échange (longue durée pour le planner, courte pour les CRUDs).
- La GC (`gc()` toutes les 8 s, démarrée une seule fois dans `InitStream`) libère
rapidement les streams expirés.
**Limites / risques**
- **Fuite de goroutines GC corrigée** : `HandlePartnerHeartbeat` appelait
`go s.StartGC(30s)` à chaque heartbeat reçu (~20 s), créant un nouveau ticker
goroutine infini à chaque appel.
Appel supprimé ; la GC lancée par `InitStream` est suffisante.
- **Boucle infinie sur EOF corrigée** : `readLoop` effectuait `s.Stream.Close();
continue` après une erreur de décodage, re-tentant indéfiniment de lire un stream
fermé.
→ Remplacé par `return` ; les defers (`Close`, `delete`) nettoient correctement.
- ⚠️ La récupération de partenaires depuis `conf.PeerIDS` est marquée `TO REMOVE` :
présence de code provisoire en production.
---
## 4. Tableau récapitulatif
| Mécanisme | Protocole | Avantage principal | État du risque |
|---|---|---|---|
| Heartbeat node→indexer | `/opencloud/heartbeat/1.0` | Détection rapide de perte | ⚠️ Stream TCP gelé non détecté |
| Scoring de confiance | (inline dans heartbeat) | Filtre les pairs instables | ✅ Deadlock corrigé (seuil 40/75) |
| Enregistrement natif | `/opencloud/native/subscribe/1.0` | TTL ample, cache immédiat | ✅ TTL porté à 90 s |
| Fetch pool d'indexeurs | `/opencloud/native/indexers/1.0` | Prend le 1er natif répondant | ⚠️ Natif au cache périmé possible |
| Consensus | `/opencloud/native/consensus/1.0` | Majorité absolue | ⚠️ Trivial avec 1 seul natif |
| Self-delegation + offload | (in-memory) | Disponibilité sans indexeur | ✅ Limite 50 peers + ClosePeer |
| Mesh natif | `/opencloud/native/peers/1.0` | Gossip multi-hop | ✅ Goroutines dédupliquées |
| DHT | `/oc/kad/1.0.0` | Persistance décentralisée | ⚠️ Retry infini, pas de bootstrap |
| PubSub registry | `oc-indexer-registry` | Dissémination rapide | ✅ Validation multiaddr |
| Streams applicatifs | `/opencloud/resource/*` | TTL par protocol | ✅ Fuite GC + EOF corrigés |
---
## 5. Risques et limites globaux
### Sécurité
- ⚠️ **Adresses auto-rapportées non vérifiées** : le champ `IndexersBinded` dans le heartbeat
est auto-déclaré par le nœud et sert à calculer la diversité. Un pair malveillant peut
gonfler son score en déclarant de fausses adresses.
- ⚠️ **PSK comme seule barrière d'entrée** : si la PSK est compromise (elle est statique et
fichier-based), tout l'isolement réseau saute. Il n'y a pas de rotation de clé ni
d'authentification supplémentaire par pair.
- ⚠️ **DHT sans ACL sur les entrées indexeur** : la signature des `PeerRecord` est vérifiée
à la lecture, mais les `liveIndexerEntry` ne sont pas signées. La validation PubSub
bloque les multiaddrs invalides mais pas les adresses d'indexeurs légitimes usurpées.
### Disponibilité
- ⚠️ **Single point of failure natif** : avec un seul natif, la perte de celui-ci stoppe
toute attribution d'indexeurs. `retryLostNative` pallie, mais sans indexeurs, les nœuds
ne peuvent pas publier.
- ⚠️ **Bootstrap DHT** : sans nœuds bootstrap explicites, la DHT met du temps à converger
si les connexions initiales sont peu nombreuses.
### Cohérence
- ⚠️ **`replaceStaticIndexers` n'efface jamais** : d'anciens indexeurs morts restent dans
`StaticIndexers` jusqu'à ce que le heartbeat échoue. Un nœud peut avoir un pool
surévalué contenant des entrées inatteignables.
- ⚠️ **`TimeWatcher` global** : défini une seule fois au démarrage de `ConnectToIndexers`.
Si l'indexeur tourne depuis longtemps, les nouveaux nœuds auront un `uptime_ratio`
durablement faible. Le seuil abaissé à 40 pour le premier heartbeat atténue l'impact
initial, mais les heartbeats suivants devront accumuler un uptime suffisant.
---
## 6. Pistes d'amélioration
Les pistes déjà implémentées sont marquées ✅. Les pistes ouvertes restent à traiter.
### ✅ Score : double seuil pour les nouveaux peers
~~Remplacer le seuil binaire~~ — **Implémenté** : seuil à 40 pour le premier heartbeat
(peer absent de `StreamRecords`), 75 pour les suivants. Un peer peut désormais être admis
dès sa première connexion sans bloquer sur l'uptime nul.
_Fichier : `common/common_stream.go`, `CheckHeartbeat`_
### ✅ TTL indexeur aligné avec l'intervalle de renouvellement
~~TTL de 66 s trop proche de 60 s~~ — **Implémenté** : `IndexerTTL` passé à **90 s**.
_Fichier : `indexer/native.go`_
### ✅ Limite de la self-delegation
~~`responsiblePeers` illimité~~ — **Implémenté** : `selfDelegate` retourne `false` quand
`len(responsiblePeers) >= maxFallbackPeers` (50). Le site d'appel retourne une réponse
vide et logue un warning.
_Fichier : `indexer/native.go`_
### ✅ Validation PubSub des adresses gossipées
~~`TopicValidator` accepte tout~~ — **Implémenté** : le validateur vérifie que le message
est un multiaddr parseable via `pp.AddrInfoFromString`.
_Fichier : `indexer/native.go`, `subscribeIndexerRegistry`_
### ✅ Goroutines heartbeat dédupliquées dans `EnsureNativePeers`
~~Une goroutine par adresse native~~ — **Implémenté** : `nativeMeshHeartbeatOnce`
garantit qu'une seule goroutine `SendHeartbeat` couvre toute la map `StaticNatives`.
_Fichier : `common/native_stream.go`_
### ✅ Fuite de goroutines GC dans `HandlePartnerHeartbeat`
~~`go s.StartGC(30s)` à chaque heartbeat~~ — **Implémenté** : appel supprimé ; la GC
de `InitStream` est suffisante.
_Fichier : `stream/service.go`_
### ✅ Boucle infinie sur EOF dans `readLoop`
~~`continue` après `Stream.Close()`~~ — **Implémenté** : remplacé par `return` pour
laisser les defers nettoyer proprement.
_Fichier : `stream/service.go`_
---
### ⚠️ Fetch pool : interroger tous les natifs en parallèle
`fetchIndexersFromNative` s'arrête au premier natif répondant. Interroger tous les natifs
en parallèle et fusionner les listes (similairement à `clientSideConsensus`) éviterait
qu'un natif au cache périmé fournisse un pool sous-optimal.
### ⚠️ Consensus avec quorum configurable
Le seuil de confirmation (`count*2 > total`) est câblé en dur. Le rendre configurable
(ex. `consensus_quorum: 0.67`) permettrait de durcir la règle sur des déploiements
à 3+ natifs sans modifier le code.
### ⚠️ Désenregistrement explicite
Ajouter un protocole `/opencloud/native/unsubscribe/1.0` : quand un indexeur s'arrête
proprement, il notifie les natifs pour invalider son TTL immédiatement plutôt qu'attendre
90 s.
### ⚠️ Bootstrap DHT explicite
Configurer les natifs comme nœuds bootstrap DHT via `dht.BootstrapPeers` pour accélérer
la convergence Kademlia au démarrage.
### ⚠️ Context propagé dans les goroutines longue durée
`retryLostNative`, `refreshIndexersFromDHT` et `runOffloadLoop` ne reçoivent aucun
`context.Context`. Les passer depuis `InitNative` permettrait un arrêt propre lors du
shutdown du processus.
### ⚠️ Redirection explicite lors du refus de self-delegation
Quand un natif refuse la self-delegation (pool saturé), retourner vide force le nœud à
réessayer sans lui indiquer vers qui se tourner. Une liste de natifs alternatifs dans la
réponse (`AlternativeNatives []string`) permettrait au nœud de trouver directement un
natif moins chargé.

View File

@@ -15,6 +15,9 @@ type Config struct {
PeerIDS string // TO REMOVE PeerIDS string // TO REMOVE
NodeMode string NodeMode string
MinIndexer int
MaxIndexer int
} }
var instance *Config var instance *Config

View File

@@ -1,7 +1,6 @@
package common package common
import ( import (
"bytes"
"context" "context"
cr "crypto/rand" cr "crypto/rand"
"encoding/json" "encoding/json"
@@ -28,6 +27,12 @@ type LongLivedStreamRecordedService[T interface{}] struct {
StreamRecords map[protocol.ID]map[pp.ID]*StreamRecord[T] StreamRecords map[protocol.ID]map[pp.ID]*StreamRecord[T]
StreamMU sync.RWMutex StreamMU sync.RWMutex
maxNodesConn int maxNodesConn int
// AfterHeartbeat is an optional hook called after each successful heartbeat update.
// The indexer sets it to republish the embedded signed record to the DHT.
AfterHeartbeat func(pid pp.ID)
// AfterDelete is called after gc() evicts an expired peer, outside the lock.
// name and did may be empty if the HeartbeatStream had no metadata.
AfterDelete func(pid pp.ID, name string, did string)
} }
func NewStreamRecordedService[T interface{}](h host.Host, maxNodesConn int) *LongLivedStreamRecordedService[T] { func NewStreamRecordedService[T interface{}](h host.Host, maxNodesConn int) *LongLivedStreamRecordedService[T] {
@@ -54,16 +59,29 @@ func (ix *LongLivedStreamRecordedService[T]) StartGC(interval time.Duration) {
func (ix *LongLivedStreamRecordedService[T]) gc() { func (ix *LongLivedStreamRecordedService[T]) gc() {
ix.StreamMU.Lock() ix.StreamMU.Lock()
defer ix.StreamMU.Unlock()
now := time.Now().UTC() now := time.Now().UTC()
if ix.StreamRecords[ProtocolHeartbeat] == nil { if ix.StreamRecords[ProtocolHeartbeat] == nil {
ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{} ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{}
ix.StreamMU.Unlock()
return return
} }
streams := ix.StreamRecords[ProtocolHeartbeat] streams := ix.StreamRecords[ProtocolHeartbeat]
fmt.Println(StaticNatives, StaticIndexers, streams)
type gcEntry struct {
pid pp.ID
name string
did string
}
var evicted []gcEntry
for pid, rec := range streams { for pid, rec := range streams {
if now.After(rec.HeartbeatStream.Expiry) || now.Sub(rec.HeartbeatStream.UptimeTracker.LastSeen) > 2*rec.HeartbeatStream.Expiry.Sub(now) { if now.After(rec.HeartbeatStream.Expiry) || now.Sub(rec.HeartbeatStream.UptimeTracker.LastSeen) > 2*rec.HeartbeatStream.Expiry.Sub(now) {
name, did := "", ""
if rec.HeartbeatStream != nil {
name = rec.HeartbeatStream.Name
did = rec.HeartbeatStream.DID
}
evicted = append(evicted, gcEntry{pid, name, did})
for _, sstreams := range ix.StreamRecords { for _, sstreams := range ix.StreamRecords {
if sstreams[pid] != nil { if sstreams[pid] != nil {
delete(sstreams, pid) delete(sstreams, pid)
@@ -71,6 +89,13 @@ func (ix *LongLivedStreamRecordedService[T]) gc() {
} }
} }
} }
ix.StreamMU.Unlock()
if ix.AfterDelete != nil {
for _, e := range evicted {
ix.AfterDelete(e.pid, e.name, e.did)
}
}
} }
func (ix *LongLivedStreamRecordedService[T]) Snapshot(interval time.Duration) { func (ix *LongLivedStreamRecordedService[T]) Snapshot(interval time.Duration) {
@@ -101,8 +126,10 @@ func (ix *LongLivedStreamRecordedService[T]) snapshot() []*StreamRecord[T] {
return out return out
} }
func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Stream) { func (ix *LongLivedStreamRecordedService[T]) HandleHeartbeat(s network.Stream) {
logger := oclib.GetLogger()
defer s.Close() defer s.Close()
dec := json.NewDecoder(s)
for { for {
ix.StreamMU.Lock() ix.StreamMU.Lock()
if ix.StreamRecords[ProtocolHeartbeat] == nil { if ix.StreamRecords[ProtocolHeartbeat] == nil {
@@ -114,17 +141,37 @@ func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Strea
streamsAnonym[k] = v streamsAnonym[k] = v
} }
ix.StreamMU.Unlock() ix.StreamMU.Unlock()
pid, hb, err := CheckHeartbeat(ix.Host, s, dec, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
pid, hb, err := CheckHeartbeat(ix.Host, s, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
if err != nil { if err != nil {
// Stream-level errors (EOF, reset, closed) mean the connection is gone
// — exit so the goroutine doesn't spin forever on a dead stream.
// Metric/policy errors (score too low, too many connections) are transient
// — those are also stream-terminal since the stream carries one session.
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) ||
strings.Contains(err.Error(), "reset") ||
strings.Contains(err.Error(), "closed") ||
strings.Contains(err.Error(), "too many connections") {
logger.Info().Err(err).Msg("heartbeat stream terminated, closing handler")
return
}
logger.Warn().Err(err).Msg("heartbeat check failed, retrying on same stream")
continue continue
} }
ix.StreamMU.Lock() ix.StreamMU.Lock()
// if record already seen update last seen // if record already seen update last seen
if rec, ok := streams[*pid]; ok { if rec, ok := streams[*pid]; ok {
rec.DID = hb.DID rec.DID = hb.DID
if rec.HeartbeatStream == nil {
rec.HeartbeatStream = hb.Stream rec.HeartbeatStream = hb.Stream
rec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC() }
rec.HeartbeatStream = hb.Stream
if rec.HeartbeatStream.UptimeTracker == nil {
rec.HeartbeatStream.UptimeTracker = &UptimeTracker{
FirstSeen: time.Now().UTC(),
LastSeen: time.Now().UTC(),
}
}
logger.Info().Msg("A new node is updated : " + pid.String())
} else { } else {
hb.Stream.UptimeTracker = &UptimeTracker{ hb.Stream.UptimeTracker = &UptimeTracker{
FirstSeen: time.Now().UTC(), FirstSeen: time.Now().UTC(),
@@ -134,37 +181,51 @@ func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Strea
DID: hb.DID, DID: hb.DID,
HeartbeatStream: hb.Stream, HeartbeatStream: hb.Stream,
} }
logger.Info().Msg("A new node is subscribed : " + pid.String())
} }
ix.StreamMU.Unlock() ix.StreamMU.Unlock()
// Let the indexer republish the embedded signed record to the DHT.
if ix.AfterHeartbeat != nil {
ix.AfterHeartbeat(*pid)
}
} }
} }
func CheckHeartbeat(h host.Host, s network.Stream, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) { func CheckHeartbeat(h host.Host, s network.Stream, dec *json.Decoder, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) {
if len(h.Network().Peers()) >= maxNodes { if len(h.Network().Peers()) >= maxNodes {
return nil, nil, fmt.Errorf("too many connections, try another indexer") return nil, nil, fmt.Errorf("too many connections, try another indexer")
} }
var hb Heartbeat var hb Heartbeat
if err := json.NewDecoder(s).Decode(&hb); err != nil { if err := dec.Decode(&hb); err != nil {
return nil, nil, err return nil, nil, err
} }
if ok, bpms, err := getBandwidthChallengeRate(MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)), s); err != nil { _, bpms, _ := getBandwidthChallengeRate(h, s.Conn().RemotePeer(), MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)))
return nil, nil, err {
} else if !ok {
return nil, nil, fmt.Errorf("Not a proper peer")
} else {
pid, err := pp.Decode(hb.PeerID) pid, err := pp.Decode(hb.PeerID)
if err != nil { if err != nil {
return nil, nil, err return nil, nil, err
} }
upTime := float64(0) upTime := float64(0)
isFirstHeartbeat := true
lock.Lock() lock.Lock()
if rec, ok := streams[pid]; ok && rec.GetUptimeTracker() != nil { if rec, ok := streams[pid]; ok && rec.GetUptimeTracker() != nil {
upTime = rec.GetUptimeTracker().Uptime().Hours() / float64(time.Since(TimeWatcher).Hours()) upTime = rec.GetUptimeTracker().Uptime().Hours() / float64(time.Since(TimeWatcher).Hours())
isFirstHeartbeat = false
} }
lock.Unlock() lock.Unlock()
diversity := getDiversityRate(h, hb.IndexersBinded) diversity := getDiversityRate(h, hb.IndexersBinded)
fmt.Println(upTime, bpms, diversity)
hb.ComputeIndexerScore(upTime, bpms, diversity) hb.ComputeIndexerScore(upTime, bpms, diversity)
if hb.Score < 75 { // First heartbeat: uptime is always 0 so the score ceiling is 60, below the
// steady-state threshold of 75. Use a lower admission threshold so new peers
// can enter and start accumulating uptime. Subsequent heartbeats must meet
// the full threshold once uptime is tracked.
minScore := float64(50)
if isFirstHeartbeat {
minScore = 40
}
fmt.Println(hb.Score, minScore)
if hb.Score < minScore {
return nil, nil, errors.New("not enough trusting value") return nil, nil, errors.New("not enough trusting value")
} }
hb.Stream = &Stream{ hb.Stream = &Stream{
@@ -178,11 +239,13 @@ func CheckHeartbeat(h host.Host, s network.Stream, streams map[pp.ID]HeartBeatSt
} }
func getDiversityRate(h host.Host, peers []string) float64 { func getDiversityRate(h host.Host, peers []string) float64 {
peers, _ = checkPeers(h, peers) peers, _ = checkPeers(h, peers)
diverse := []string{} diverse := []string{}
for _, p := range peers { for _, p := range peers {
ip, err := ExtractIP(p) ip, err := ExtractIP(p)
if err != nil { if err != nil {
fmt.Println("NO IP", p, err)
continue continue
} }
div := ip.Mask(net.CIDRMask(24, 32)).String() div := ip.Mask(net.CIDRMask(24, 32)).String()
@@ -190,6 +253,9 @@ func getDiversityRate(h host.Host, peers []string) float64 {
diverse = append(diverse, div) diverse = append(diverse, div)
} }
} }
if len(diverse) == 0 || len(peers) == 0 {
return 1
}
return float64(len(diverse) / len(peers)) return float64(len(diverse) / len(peers))
} }
@@ -211,35 +277,42 @@ func checkPeers(h host.Host, peers []string) ([]string, []string) {
return concretePeer, ips return concretePeer, ips
} }
const MaxExpectedMbps = 50.0 const MaxExpectedMbps = 100.0
const MinPayloadChallenge = 512 const MinPayloadChallenge = 512
const MaxPayloadChallenge = 2048 const MaxPayloadChallenge = 2048
const BaseRoundTrip = 400 * time.Millisecond const BaseRoundTrip = 400 * time.Millisecond
func getBandwidthChallengeRate(payloadSize int, s network.Stream) (bool, float64, error) { // getBandwidthChallengeRate opens a dedicated ProtocolBandwidthProbe stream to
// Génération payload aléatoire // remotePeer, sends a random payload, reads the echo, and computes throughput.
// Using a separate stream avoids mixing binary data on the JSON heartbeat stream
// and ensures the echo handler is actually running on the remote side.
func getBandwidthChallengeRate(h host.Host, remotePeer pp.ID, payloadSize int) (bool, float64, error) {
payload := make([]byte, payloadSize) payload := make([]byte, payloadSize)
_, err := cr.Read(payload) if _, err := cr.Read(payload); err != nil {
return false, 0, err
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
s, err := h.NewStream(ctx, remotePeer, ProtocolBandwidthProbe)
if err != nil { if err != nil {
return false, 0, err return false, 0, err
} }
defer s.Reset()
s.SetDeadline(time.Now().Add(10 * time.Second))
start := time.Now() start := time.Now()
// send on heartbeat stream the challenge
if _, err = s.Write(payload); err != nil { if _, err = s.Write(payload); err != nil {
return false, 0, err return false, 0, err
} }
// read back s.CloseWrite()
// Half-close the write side so the handler's io.Copy sees EOF and stops.
// Read the echo.
response := make([]byte, payloadSize) response := make([]byte, payloadSize)
_, err = io.ReadFull(s, response) if _, err = io.ReadFull(s, response); err != nil {
if err != nil {
return false, 0, err return false, 0, err
} }
duration := time.Since(start) duration := time.Since(start)
// Verify content
if !bytes.Equal(payload, response) {
return false, 0, nil // pb or a sadge peer.
}
maxRoundTrip := BaseRoundTrip + (time.Duration(payloadSize) * (100 * time.Millisecond)) maxRoundTrip := BaseRoundTrip + (time.Duration(payloadSize) * (100 * time.Millisecond))
mbps := float64(payloadSize*8) / duration.Seconds() / 1e6 mbps := float64(payloadSize*8) / duration.Seconds() / 1e6
if duration > maxRoundTrip || mbps < 5.0 { if duration > maxRoundTrip || mbps < 5.0 {
@@ -345,13 +418,36 @@ var StaticIndexers map[string]*pp.AddrInfo = map[string]*pp.AddrInfo{}
var StreamMuIndexes sync.RWMutex var StreamMuIndexes sync.RWMutex
var StreamIndexers ProtocolStream = ProtocolStream{} var StreamIndexers ProtocolStream = ProtocolStream{}
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error { // indexerHeartbeatNudge allows replenishIndexersFromNative to trigger an immediate
// heartbeat tick after adding new entries to StaticIndexers, without waiting up
// to 20s for the regular ticker. Buffered(1) so the sender never blocks.
var indexerHeartbeatNudge = make(chan struct{}, 1)
// NudgeIndexerHeartbeat signals the indexer heartbeat goroutine to fire immediately.
func NudgeIndexerHeartbeat() {
select {
case indexerHeartbeatNudge <- struct{}{}:
default: // nudge already pending, skip
}
}
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID, recordFn ...func() json.RawMessage) error {
TimeWatcher = time.Now().UTC() TimeWatcher = time.Now().UTC()
logger := oclib.GetLogger() logger := oclib.GetLogger()
// If native addresses are configured, bypass static indexer addresses // If native addresses are configured, get the indexer pool from the native mesh,
// then start the long-lived heartbeat goroutine toward those indexers.
if conf.GetConfig().NativeIndexerAddresses != "" { if conf.GetConfig().NativeIndexerAddresses != "" {
return ConnectToNatives(h, minIndexer, maxIndexer, myPID) if err := ConnectToNatives(h, minIndexer, maxIndexer, myPID); err != nil {
return err
}
// Step 2: start the long-lived heartbeat goroutine toward the indexer pool.
// replaceStaticIndexers/replenishIndexersFromNative update the map in-place
// so this single goroutine follows all pool changes automatically.
logger.Info().Msg("[native] step 2 — starting long-lived heartbeat to indexer pool")
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name,
h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...)
return nil
} }
addresses := strings.Split(conf.GetConfig().IndexerAddresses, ",") addresses := strings.Split(conf.GetConfig().IndexerAddresses, ",")
@@ -360,8 +456,8 @@ func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
addresses = addresses[0:maxIndexer] addresses = addresses[0:maxIndexer]
} }
StreamMuIndexes.Lock()
for _, indexerAddr := range addresses { for _, indexerAddr := range addresses {
fmt.Println("GENERATE ADDR", indexerAddr)
ad, err := pp.AddrInfoFromString(indexerAddr) ad, err := pp.AddrInfoFromString(indexerAddr)
if err != nil { if err != nil {
logger.Err(err) logger.Err(err)
@@ -369,15 +465,18 @@ func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
} }
StaticIndexers[indexerAddr] = ad StaticIndexers[indexerAddr] = ad
} }
indexerCount := len(StaticIndexers)
StreamMuIndexes.Unlock()
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name, h, StreamIndexers, StaticIndexers, 20*time.Second) // your indexer is just like a node for the next indexer. SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name, h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...) // your indexer is just like a node for the next indexer.
if len(StaticIndexers) < minIndexer { if indexerCount < minIndexer {
return errors.New("you run a node without indexers... your gonna be isolated.") return errors.New("you run a node without indexers... your gonna be isolated.")
} }
return nil return nil
} }
func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host, proto protocol.ID, id pp.ID, mypid pp.ID, force bool, onStreamCreated *func(network.Stream)) ProtocolStream { func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host, proto protocol.ID, id pp.ID, mypid pp.ID, force bool, onStreamCreated *func(network.Stream)) ProtocolStream {
logger := oclib.GetLogger()
if onStreamCreated == nil { if onStreamCreated == nil {
f := func(s network.Stream) { f := func(s network.Stream) {
protoS[proto][id] = &Stream{ protoS[proto][id] = &Stream{
@@ -400,7 +499,7 @@ func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host,
if protoS[proto][id] != nil { if protoS[proto][id] != nil {
protoS[proto][id].Expiry = time.Now().Add(2 * time.Minute) protoS[proto][id].Expiry = time.Now().Add(2 * time.Minute)
} else { } else {
fmt.Println("NEW STREAM", proto, id) logger.Info().Msg("NEW STREAM Generated" + fmt.Sprintf("%v", proto) + " " + id.String())
s, err := h.NewStream(*ctx, id, proto) s, err := h.NewStream(*ctx, id, proto)
if err != nil { if err != nil {
panic(err.Error()) panic(err.Error())
@@ -419,12 +518,16 @@ type Heartbeat struct {
Timestamp int64 `json:"timestamp"` Timestamp int64 `json:"timestamp"`
IndexersBinded []string `json:"indexers_binded"` IndexersBinded []string `json:"indexers_binded"`
Score float64 Score float64
// Record carries a fresh signed PeerRecord (JSON) so the receiving indexer
// can republish it to the DHT without an extra round-trip.
// Only set by nodes (not indexers heartbeating other indexers).
Record json.RawMessage `json:"record,omitempty"`
} }
func (hb *Heartbeat) ComputeIndexerScore(uptimeHours float64, bpms float64, diversity float64) { func (hb *Heartbeat) ComputeIndexerScore(uptimeHours float64, bpms float64, diversity float64) {
hb.Score = (0.4 * uptimeHours) + hb.Score = ((0.3 * uptimeHours) +
(0.4 * bpms) + (0.3 * bpms) +
(0.2 * diversity) (0.4 * diversity)) * 100
} }
type HeartbeatInfo []struct { type HeartbeatInfo []struct {
@@ -433,35 +536,214 @@ type HeartbeatInfo []struct {
const ProtocolHeartbeat = "/opencloud/heartbeat/1.0" const ProtocolHeartbeat = "/opencloud/heartbeat/1.0"
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, ps ProtocolStream, peers map[string]*pp.AddrInfo, interval time.Duration) { // ProtocolBandwidthProbe is a dedicated short-lived stream used exclusively
peerID, err := oclib.GenerateNodeID() // for bandwidth/latency measurement. The handler echoes any bytes it receives.
if err == nil { // All nodes and indexers register this handler so peers can measure them.
panic("can't heartbeat daemon failed to start") const ProtocolBandwidthProbe = "/opencloud/probe/1.0"
// HandleBandwidthProbe echoes back everything written on the stream, then closes.
// It is registered by all participants so the measuring side (the heartbeat receiver)
// can open a dedicated probe stream and read the round-trip latency + throughput.
func HandleBandwidthProbe(s network.Stream) {
defer s.Close()
s.SetDeadline(time.Now().Add(10 * time.Second))
io.Copy(s, s) // echo every byte back to the sender
}
// SendHeartbeat starts a goroutine that sends periodic heartbeats to peers.
// recordFn, when provided, is called on each tick and its output is embedded in
// the heartbeat as a fresh signed PeerRecord so the receiving indexer can
// republish it to the DHT without an extra round-trip.
// Pass no recordFn (or nil) for indexer→indexer / native heartbeats.
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, ps ProtocolStream, peers map[string]*pp.AddrInfo, mu *sync.RWMutex, interval time.Duration, recordFn ...func() json.RawMessage) {
logger := oclib.GetLogger()
// isIndexerHB is true when this goroutine drives the indexer heartbeat.
// isNativeHB is true when it drives the native heartbeat.
isIndexerHB := mu == &StreamMuIndexes
isNativeHB := mu == &StreamNativeMu
var recFn func() json.RawMessage
if len(recordFn) > 0 {
recFn = recordFn[0]
} }
go func() { go func() {
logger.Info().Str("proto", string(proto)).Int("peers", len(peers)).Msg("heartbeat started")
t := time.NewTicker(interval) t := time.NewTicker(interval)
defer t.Stop() defer t.Stop()
for {
select { // doTick sends one round of heartbeats to the current peer snapshot.
case <-t.C: doTick := func() {
addrs := []string{} // Build the heartbeat payload — snapshot current indexer addresses.
StreamMuIndexes.RLock()
addrs := make([]string, 0, len(StaticIndexers))
for addr := range StaticIndexers { for addr := range StaticIndexers {
addrs = append(addrs, addr) addrs = append(addrs, addr)
} }
StreamMuIndexes.RUnlock()
hb := Heartbeat{ hb := Heartbeat{
Name: name, Name: name,
DID: peerID,
PeerID: h.ID().String(), PeerID: h.ID().String(),
Timestamp: time.Now().UTC().Unix(), Timestamp: time.Now().UTC().Unix(),
IndexersBinded: addrs, IndexersBinded: addrs,
} }
if recFn != nil {
hb.Record = recFn()
}
// Snapshot the peer list under a read lock so we don't hold the
// write lock during network I/O.
if mu != nil {
mu.RLock()
}
snapshot := make([]*pp.AddrInfo, 0, len(peers))
for _, ix := range peers { for _, ix := range peers {
if err = sendHeartbeat(ctx, h, proto, ix, hb, ps, interval*time.Second); err != nil { snapshot = append(snapshot, ix)
}
if mu != nil {
mu.RUnlock()
}
for _, ix := range snapshot {
wasConnected := h.Network().Connectedness(ix.ID) == network.Connected
if err := sendHeartbeat(ctx, h, proto, ix, hb, ps, interval*time.Second); err != nil {
// Step 3: heartbeat failed — remove from pool and trigger replenish.
logger.Info().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 3 — heartbeat failed, removing peer from pool")
// Remove the dead peer and clean up its stream.
// mu already covers ps when isIndexerHB (same mutex), so one
// lock acquisition is sufficient — no re-entrant double-lock.
if mu != nil {
mu.Lock()
}
if ps[proto] != nil {
if s, ok := ps[proto][ix.ID]; ok {
if s.Stream != nil {
s.Stream.Close()
}
delete(ps[proto], ix.ID)
}
}
lostAddr := ""
for addr, ad := range peers {
if ad.ID == ix.ID {
lostAddr = addr
delete(peers, addr)
break
}
}
need := conf.GetConfig().MinIndexer - len(peers)
remaining := len(peers)
if mu != nil {
mu.Unlock()
}
logger.Info().Int("remaining", remaining).Int("min", conf.GetConfig().MinIndexer).Int("need", need).Msg("[native] step 3 — pool state after removal")
// Step 4: ask the native for the missing indexer count.
if isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
if need < 1 {
need = 1
}
logger.Info().Int("need", need).Msg("[native] step 3→4 — triggering replenish")
go replenishIndexersFromNative(h, need)
}
// Native heartbeat failed — find a replacement native.
// Case 1: if the dead native was also serving as an indexer, evict it
// from StaticIndexers immediately without waiting for the indexer HB tick.
if isNativeHB {
logger.Info().Str("addr", lostAddr).Msg("[native] step 3 — native heartbeat failed, triggering native replenish")
if lostAddr != "" && conf.GetConfig().NativeIndexerAddresses != "" {
StreamMuIndexes.Lock() StreamMuIndexes.Lock()
delete(StreamIndexers[proto], ix.ID) if _, wasIndexer := StaticIndexers[lostAddr]; wasIndexer {
delete(StaticIndexers, lostAddr)
if s := StreamIndexers[ProtocolHeartbeat]; s != nil {
if stream, ok := s[ix.ID]; ok {
if stream.Stream != nil {
stream.Stream.Close()
}
delete(s, ix.ID)
}
}
idxNeed := conf.GetConfig().MinIndexer - len(StaticIndexers)
StreamMuIndexes.Unlock()
if idxNeed < 1 {
idxNeed = 1
}
logger.Info().Str("addr", lostAddr).Msg("[native] dead native evicted from indexer pool, triggering replenish")
go replenishIndexersFromNative(h, idxNeed)
} else {
StreamMuIndexes.Unlock() StreamMuIndexes.Unlock()
} }
} }
go replenishNativesFromPeers(h, lostAddr, proto)
}
} else {
// Case 2: native-as-indexer reconnected after a restart.
// If the peer was disconnected before this tick and the heartbeat just
// succeeded (transparent reconnect), the native may have restarted with
// blank state (responsiblePeers empty). Evict it from StaticIndexers and
// re-request an assignment so the native re-tracks us properly and
// runOffloadLoop can eventually migrate us to real indexers.
if !wasConnected && isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
StreamNativeMu.RLock()
isNativeIndexer := false
for _, ad := range StaticNatives {
if ad.ID == ix.ID {
isNativeIndexer = true
break
}
}
StreamNativeMu.RUnlock()
if isNativeIndexer {
if mu != nil {
mu.Lock()
}
if ps[proto] != nil {
if s, ok := ps[proto][ix.ID]; ok {
if s.Stream != nil {
s.Stream.Close()
}
delete(ps[proto], ix.ID)
}
}
reconnectedAddr := ""
for addr, ad := range peers {
if ad.ID == ix.ID {
reconnectedAddr = addr
delete(peers, addr)
break
}
}
idxNeed := conf.GetConfig().MinIndexer - len(peers)
if mu != nil {
mu.Unlock()
}
if idxNeed < 1 {
idxNeed = 1
}
logger.Info().Str("addr", reconnectedAddr).Str("peer", ix.ID.String()).Msg(
"[native] native-as-indexer reconnected after restart — evicting and re-requesting assignment")
go replenishIndexersFromNative(h, idxNeed)
}
}
logger.Debug().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 2 — heartbeat sent ok")
}
}
}
for {
select {
case <-t.C:
doTick()
case <-indexerHeartbeatNudge:
if isIndexerHB {
logger.Info().Msg("[native] step 2 — nudge received, heartbeating new indexers immediately")
doTick()
}
case <-nativeHeartbeatNudge:
if isNativeHB {
logger.Info().Msg("[native] native nudge received, heartbeating replacement native immediately")
doTick()
}
case <-ctx.Done(): case <-ctx.Done():
return return
} }
@@ -480,7 +762,7 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
if pts[proto] != nil { if pts[proto] != nil {
expiry = pts[proto].TTL expiry = pts[proto].TTL
} }
if ctxTTL, err := context.WithTimeout(context.Background(), expiry); err == nil { ctxTTL, _ := context.WithTimeout(context.Background(), expiry)
if h.Network().Connectedness(ad.ID) != network.Connected { if h.Network().Connectedness(ad.ID) != network.Connected {
if err := h.Connect(ctxTTL, ad); err != nil { if err := h.Connect(ctxTTL, ad); err != nil {
return streams, err return streams, err
@@ -496,10 +778,11 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
mu.Unlock() mu.Unlock()
time.AfterFunc(expiry, func() { time.AfterFunc(expiry, func() {
mu.Lock() mu.Lock()
defer mu.Unlock()
delete(streams[proto], ad.ID) delete(streams[proto], ad.ID)
mu.Unlock()
}) })
streams[ProtocolPublish][ad.ID] = &Stream{ mu.Lock()
streams[proto][ad.ID] = &Stream{
DID: did, DID: did,
Stream: s, Stream: s,
Expiry: time.Now().UTC().Add(expiry), Expiry: time.Now().UTC().Add(expiry),
@@ -509,29 +792,32 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
} else { } else {
return streams, err return streams, err
} }
}
return streams, errors.New("can't create a context")
} }
func sendHeartbeat(ctx context.Context, h host.Host, proto protocol.ID, p *pp.AddrInfo, func sendHeartbeat(ctx context.Context, h host.Host, proto protocol.ID, p *pp.AddrInfo,
hb Heartbeat, ps ProtocolStream, interval time.Duration) error { hb Heartbeat, ps ProtocolStream, interval time.Duration) error {
streams := ps.Get(proto) logger := oclib.GetLogger()
if len(streams) == 0 { if ps[proto] == nil {
return errors.New("no stream for protocol heartbeat founded") ps[proto] = map[pp.ID]*Stream{}
} }
streams := ps[proto]
pss, exists := streams[p.ID] pss, exists := streams[p.ID]
ctxTTL, _ := context.WithTimeout(ctx, 3*interval) ctxTTL, cancel := context.WithTimeout(ctx, 3*interval)
defer cancel()
// Connect si nécessaire // Connect si nécessaire
if h.Network().Connectedness(p.ID) != network.Connected { if h.Network().Connectedness(p.ID) != network.Connected {
if err := h.Connect(ctxTTL, *p); err != nil { if err := h.Connect(ctxTTL, *p); err != nil {
logger.Err(err)
return err return err
} }
exists = false // on devra recréer le stream exists = false // on devra recréer le stream
} }
// Crée le stream si inexistant ou fermé // Crée le stream si inexistant ou fermé
if !exists || pss.Stream == nil { if !exists || pss.Stream == nil {
logger.Info().Msg("New Stream engaged as Heartbeat " + fmt.Sprintf("%v", proto) + " " + p.ID.String())
s, err := h.NewStream(ctx, p.ID, proto) s, err := h.NewStream(ctx, p.ID, proto)
if err != nil { if err != nil {
logger.Err(err)
return err return err
} }
pss = &Stream{ pss = &Stream{

View File

@@ -13,6 +13,7 @@ import (
oclib "cloud.o-forge.io/core/oc-lib" oclib "cloud.o-forge.io/core/oc-lib"
"github.com/libp2p/go-libp2p/core/host" "github.com/libp2p/go-libp2p/core/host"
pp "github.com/libp2p/go-libp2p/core/peer" pp "github.com/libp2p/go-libp2p/core/peer"
"github.com/libp2p/go-libp2p/core/protocol"
) )
const ( const (
@@ -57,6 +58,7 @@ type IndexerRegistration struct {
// GetIndexersRequest asks a native for a pool of live indexers. // GetIndexersRequest asks a native for a pool of live indexers.
type GetIndexersRequest struct { type GetIndexersRequest struct {
Count int `json:"count"` Count int `json:"count"`
From string `json:"from"`
} }
// GetIndexersResponse is returned by the native with live indexer multiaddrs. // GetIndexersResponse is returned by the native with live indexer multiaddrs.
@@ -69,17 +71,26 @@ var StaticNatives = map[string]*pp.AddrInfo{}
var StreamNativeMu sync.RWMutex var StreamNativeMu sync.RWMutex
var StreamNatives ProtocolStream = ProtocolStream{} var StreamNatives ProtocolStream = ProtocolStream{}
// ConnectToNatives is the client-side entry point for nodes/indexers that have // nativeHeartbeatOnce ensures we start exactly one long-lived heartbeat goroutine
// NativeIndexerAddresses configured. It: // toward the native mesh, even when ConnectToNatives is called from recovery paths.
// 1. Connects (long-lived heartbeat) to all configured natives. var nativeHeartbeatOnce sync.Once
// 2. Fetches an initial indexer pool from the FIRST responsive native.
// 3. Challenges that pool to ALL natives (consensus round 1). // nativeMeshHeartbeatOnce guards the native-to-native heartbeat goroutine started
// 4. If the confirmed list is short, samples native suggestions and re-challenges (round 2). // by EnsureNativePeers so only one goroutine covers the whole StaticNatives map.
// 5. Populates StaticIndexers with majority-confirmed indexers. var nativeMeshHeartbeatOnce sync.Once
// ConnectToNatives is the initial setup for nodes/indexers in native mode:
// 1. Parses native addresses → StaticNatives.
// 2. Starts a single long-lived heartbeat goroutine toward the native mesh.
// 3. Fetches an initial indexer pool from the first responsive native.
// 4. Runs consensus when real (non-fallback) indexers are returned.
// 5. Replaces StaticIndexers with the confirmed pool.
func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error { func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error {
logger := oclib.GetLogger() logger := oclib.GetLogger()
logger.Info().Msg("[native] step 1 — parsing native addresses")
// Parse in config order: the first entry is the primary pool source. // Parse native addresses — safe to call multiple times.
StreamNativeMu.Lock()
orderedAddrs := []string{} orderedAddrs := []string{}
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") { for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
addr = strings.TrimSpace(addr) addr = strings.TrimSpace(addr)
@@ -88,106 +99,208 @@ func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
} }
ad, err := pp.AddrInfoFromString(addr) ad, err := pp.AddrInfoFromString(addr)
if err != nil { if err != nil {
logger.Err(err).Msg("ConnectToNatives: invalid addr") logger.Err(err).Msg("[native] step 1 — invalid native addr")
continue continue
} }
StaticNatives[addr] = ad StaticNatives[addr] = ad
orderedAddrs = append(orderedAddrs, addr) orderedAddrs = append(orderedAddrs, addr)
logger.Info().Str("addr", addr).Msg("[native] step 1 — native registered")
} }
if len(StaticNatives) == 0 { if len(StaticNatives) == 0 {
StreamNativeMu.Unlock()
return errors.New("no valid native addresses configured") return errors.New("no valid native addresses configured")
} }
StreamNativeMu.Unlock()
logger.Info().Int("count", len(orderedAddrs)).Msg("[native] step 1 — natives parsed")
// Long-lived heartbeat connections to keep the native mesh active. // Step 1: one long-lived heartbeat to each native.
nativeHeartbeatOnce.Do(func() {
logger.Info().Msg("[native] step 1 — starting long-lived heartbeat to native mesh")
SendHeartbeat(context.Background(), ProtocolHeartbeat, SendHeartbeat(context.Background(), ProtocolHeartbeat,
conf.GetConfig().Name, h, StreamNatives, StaticNatives, 20*time.Second) conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
})
// Step 1: get an initial pool from the FIRST responsive native (in config order).
var candidates []string
var isFallback bool
for _, addr := range orderedAddrs {
ad := StaticNatives[addr]
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
if err := h.Connect(ctx, *ad); err != nil {
cancel()
continue
}
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetIndexers)
cancel()
if err != nil {
continue
}
req := GetIndexersRequest{Count: maxIndexer}
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
s.Close()
continue
}
var resp GetIndexersResponse
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
s.Close()
continue
}
s.Close()
candidates = resp.Indexers
isFallback = resp.IsSelfFallback
break // first responsive native only
}
// Fetch initial pool from the first responsive native.
logger.Info().Int("want", maxIndexer).Msg("[native] step 1 — fetching indexer pool from native")
candidates, isFallback := fetchIndexersFromNative(h, orderedAddrs, maxIndexer)
if len(candidates) == 0 { if len(candidates) == 0 {
logger.Warn().Msg("[native] step 1 — no candidates returned by any native")
if minIndexer > 0 { if minIndexer > 0 {
return errors.New("ConnectToNatives: no indexers available from any native") return errors.New("ConnectToNatives: no indexers available from any native")
} }
return nil return nil
} }
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 1 — pool received")
// If the native is already the fallback indexer, use it directly — no consensus needed. // Step 2: populate StaticIndexers — consensus for real indexers, direct for fallback.
pool := resolvePool(h, candidates, isFallback, maxIndexer)
replaceStaticIndexers(pool)
StreamMuIndexes.RLock()
indexerCount := len(StaticIndexers)
StreamMuIndexes.RUnlock()
logger.Info().Int("pool_size", indexerCount).Msg("[native] step 2 — StaticIndexers replaced")
if minIndexer > 0 && indexerCount < minIndexer {
return errors.New("not enough majority-confirmed indexers available")
}
return nil
}
// replenishIndexersFromNative is called when an indexer heartbeat fails (step 3→4).
// It asks the native for exactly `need` replacement indexers, runs consensus when
// real indexers are returned, and adds the results to StaticIndexers without
// clearing the existing pool.
func replenishIndexersFromNative(h host.Host, need int) {
if need <= 0 {
return
}
logger := oclib.GetLogger()
logger.Info().Int("need", need).Msg("[native] step 4 — replenishing indexer pool from native")
StreamNativeMu.RLock()
addrs := make([]string, 0, len(StaticNatives))
for addr := range StaticNatives {
addrs = append(addrs, addr)
}
StreamNativeMu.RUnlock()
candidates, isFallback := fetchIndexersFromNative(h, addrs, need)
if len(candidates) == 0 {
logger.Warn().Msg("[native] step 4 — no candidates returned by any native")
return
}
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 4 — candidates received")
pool := resolvePool(h, candidates, isFallback, need)
if len(pool) == 0 {
logger.Warn().Msg("[native] step 4 — consensus yielded no confirmed indexers")
return
}
// Add new indexers to the pool — do NOT clear existing ones.
StreamMuIndexes.Lock()
for addr, ad := range pool {
StaticIndexers[addr] = ad
}
total := len(StaticIndexers)
StreamMuIndexes.Unlock()
logger.Info().Int("added", len(pool)).Int("total", total).Msg("[native] step 4 — pool replenished")
// Nudge the heartbeat goroutine to connect immediately instead of waiting
// for the next 20s tick.
NudgeIndexerHeartbeat()
logger.Info().Msg("[native] step 4 — heartbeat goroutine nudged")
}
// fetchIndexersFromNative opens a ProtocolNativeGetIndexers stream to the first
// responsive native and returns the candidate list and fallback flag.
func fetchIndexersFromNative(h host.Host, nativeAddrs []string, count int) (candidates []string, isFallback bool) {
logger := oclib.GetLogger()
for _, addr := range nativeAddrs {
ad, err := pp.AddrInfoFromString(addr)
if err != nil {
logger.Warn().Str("addr", addr).Msg("[native] fetch — skipping invalid addr")
continue
}
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
if err := h.Connect(ctx, *ad); err != nil {
cancel()
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — connect failed")
continue
}
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetIndexers)
cancel()
if err != nil {
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — stream open failed")
continue
}
req := GetIndexersRequest{Count: count, From: h.ID().String()}
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
s.Close()
logger.Warn().Str("addr", addr).Err(encErr).Msg("[native] fetch — encode request failed")
continue
}
var resp GetIndexersResponse
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
s.Close()
logger.Warn().Str("addr", addr).Err(decErr).Msg("[native] fetch — decode response failed")
continue
}
s.Close()
logger.Info().Str("native", addr).Int("indexers", len(resp.Indexers)).Bool("fallback", resp.IsSelfFallback).Msg("[native] fetch — response received")
return resp.Indexers, resp.IsSelfFallback
}
logger.Warn().Msg("[native] fetch — no native responded")
return nil, false
}
// resolvePool converts a candidate list to a validated addr→AddrInfo map.
// When isFallback is true the native itself is the indexer — no consensus needed.
// When isFallback is false, consensus is run before accepting the candidates.
func resolvePool(h host.Host, candidates []string, isFallback bool, maxIndexer int) map[string]*pp.AddrInfo {
logger := oclib.GetLogger()
if isFallback { if isFallback {
logger.Info().Strs("addrs", candidates).Msg("[native] resolve — fallback mode, skipping consensus")
pool := make(map[string]*pp.AddrInfo, len(candidates))
for _, addr := range candidates { for _, addr := range candidates {
ad, err := pp.AddrInfoFromString(addr) ad, err := pp.AddrInfoFromString(addr)
if err != nil { if err != nil {
continue continue
} }
StaticIndexers[addr] = ad pool[addr] = ad
} }
return nil return pool
} }
// Step 2: challenge the pool to ALL configured natives and score by majority vote. // Round 1.
logger.Info().Int("candidates", len(candidates)).Msg("[native] resolve — consensus round 1")
confirmed, suggestions := clientSideConsensus(h, candidates) confirmed, suggestions := clientSideConsensus(h, candidates)
logger.Info().Int("confirmed", len(confirmed)).Int("suggestions", len(suggestions)).Msg("[native] resolve — consensus round 1 done")
// Step 3: if we still have gaps, sample from suggestions and re-challenge. // Round 2: fill gaps from suggestions if below target.
if len(confirmed) < maxIndexer && len(suggestions) > 0 { if len(confirmed) < maxIndexer && len(suggestions) > 0 {
rand.Shuffle(len(suggestions), func(i, j int) { suggestions[i], suggestions[j] = suggestions[j], suggestions[i] }) rand.Shuffle(len(suggestions), func(i, j int) { suggestions[i], suggestions[j] = suggestions[j], suggestions[i] })
gap := maxIndexer - len(confirmed) gap := maxIndexer - len(confirmed)
if gap > len(suggestions) { if gap > len(suggestions) {
gap = len(suggestions) gap = len(suggestions)
} }
logger.Info().Int("gap", gap).Msg("[native] resolve — consensus round 2 (filling gaps)")
confirmed2, _ := clientSideConsensus(h, append(confirmed, suggestions[:gap]...)) confirmed2, _ := clientSideConsensus(h, append(confirmed, suggestions[:gap]...))
if len(confirmed2) > 0 { if len(confirmed2) > 0 {
confirmed = confirmed2 confirmed = confirmed2
} }
logger.Info().Int("confirmed", len(confirmed)).Msg("[native] resolve — consensus round 2 done")
} }
// Step 4: populate StaticIndexers with confirmed addresses. pool := make(map[string]*pp.AddrInfo, len(confirmed))
for _, addr := range confirmed { for _, addr := range confirmed {
ad, err := pp.AddrInfoFromString(addr) ad, err := pp.AddrInfoFromString(addr)
if err != nil { if err != nil {
continue continue
} }
pool[addr] = ad
}
logger.Info().Int("pool_size", len(pool)).Msg("[native] resolve — pool ready")
return pool
}
// replaceStaticIndexers atomically replaces the active indexer pool.
// Peers no longer in next have their heartbeat streams closed so the SendHeartbeat
// goroutine stops sending to them on the next tick.
func replaceStaticIndexers(next map[string]*pp.AddrInfo) {
StreamMuIndexes.Lock()
defer StreamMuIndexes.Unlock()
for addr, ad := range next {
StaticIndexers[addr] = ad StaticIndexers[addr] = ad
} }
if minIndexer > 0 && len(StaticIndexers) < minIndexer {
return errors.New("not enough majority-confirmed indexers available")
}
return nil
} }
// clientSideConsensus challenges a candidate list to ALL configured native peers // clientSideConsensus challenges a candidate list to ALL configured native peers
// in parallel. Each native replies with the candidates it trusts plus extras it // in parallel. Each native replies with the candidates it trusts plus extras it
// recommends. An indexer is confirmed when strictly more than 50% of responding // recommends. An indexer is confirmed when strictly more than 50% of responding
// natives trust it. The remaining addresses from native suggestions are returned // natives trust it.
// as suggestions for a possible second round.
func clientSideConsensus(h host.Host, candidates []string) (confirmed []string, suggestions []string) { func clientSideConsensus(h host.Host, candidates []string) (confirmed []string, suggestions []string) {
if len(candidates) == 0 { if len(candidates) == 0 {
return nil, nil return nil, nil
@@ -201,7 +314,6 @@ func clientSideConsensus(h host.Host, candidates []string) (confirmed []string,
StreamNativeMu.RUnlock() StreamNativeMu.RUnlock()
if len(peers) == 0 { if len(peers) == 0 {
// No natives to challenge: trust candidates as-is.
return candidates, nil return candidates, nil
} }
@@ -239,13 +351,12 @@ func clientSideConsensus(h host.Host, candidates []string) (confirmed []string,
}(ad) }(ad)
} }
// Collect responses up to consensusCollectTimeout.
timer := time.NewTimer(consensusCollectTimeout) timer := time.NewTimer(consensusCollectTimeout)
defer timer.Stop() defer timer.Stop()
trustedCounts := map[string]int{} trustedCounts := map[string]int{}
suggestionPool := map[string]struct{}{} suggestionPool := map[string]struct{}{}
total := 0 // counts only natives that actually responded total := 0
collected := 0 collected := 0
collect: collect:
@@ -254,7 +365,7 @@ collect:
case r := <-ch: case r := <-ch:
collected++ collected++
if !r.responded { if !r.responded {
continue // timeout / error: skip, do not count as vote continue
} }
total++ total++
seen := map[string]struct{}{} seen := map[string]struct{}{}
@@ -273,13 +384,12 @@ collect:
} }
if total == 0 { if total == 0 {
// No native responded: fall back to trusting the candidates as-is.
return candidates, nil return candidates, nil
} }
confirmedSet := map[string]struct{}{} confirmedSet := map[string]struct{}{}
for addr, count := range trustedCounts { for addr, count := range trustedCounts {
if count*2 > total { // strictly >50% if count*2 > total {
confirmed = append(confirmed, addr) confirmed = append(confirmed, addr)
confirmedSet[addr] = struct{}{} confirmedSet[addr] = struct{}{}
} }
@@ -292,15 +402,17 @@ collect:
return return
} }
const ProtocolIndexerHeartbeat = "/opencloud/heartbeat/indexer/1.0"
// RegisterWithNative sends a one-shot registration to each configured native indexer. // RegisterWithNative sends a one-shot registration to each configured native indexer.
// Should be called periodically every RecommendedHeartbeatInterval. // Should be called periodically every RecommendedHeartbeatInterval.
func RegisterWithNative(h host.Host, nativeAddressesStr string) { func RegisterWithNative(h host.Host, nativeAddressesStr string) {
logger := oclib.GetLogger() logger := oclib.GetLogger()
myAddr := "" myAddr := ""
if len(h.Addrs()) > 0 { if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
myAddr = h.Addrs()[0].String() + "/p2p/" + h.ID().String() myAddr = h.Addrs()[len(h.Addrs())-1].String() + "/p2p/" + h.ID().String()
}
if myAddr == "" {
logger.Warn().Msg("RegisterWithNative: no routable address yet, skipping")
return
} }
reg := IndexerRegistration{ reg := IndexerRegistration{
PeerID: h.ID().String(), PeerID: h.ID().String(),
@@ -334,16 +446,16 @@ func RegisterWithNative(h host.Host, nativeAddressesStr string) {
} }
} }
// EnsureNativePeers populates StaticNatives from config and starts heartbeat // EnsureNativePeers populates StaticNatives from config and starts a single
// connections to other natives. Safe to call multiple times; heartbeat is only // heartbeat goroutine toward the native mesh. Safe to call multiple times;
// started once (when StaticNatives transitions from empty to non-empty). // the heartbeat goroutine is started at most once (nativeMeshHeartbeatOnce).
func EnsureNativePeers(h host.Host) { func EnsureNativePeers(h host.Host) {
logger := oclib.GetLogger()
nativeAddrs := conf.GetConfig().NativeIndexerAddresses nativeAddrs := conf.GetConfig().NativeIndexerAddresses
if nativeAddrs == "" { if nativeAddrs == "" {
return return
} }
StreamNativeMu.Lock() StreamNativeMu.Lock()
wasEmpty := len(StaticNatives) == 0
for _, addr := range strings.Split(nativeAddrs, ",") { for _, addr := range strings.Split(nativeAddrs, ",") {
addr = strings.TrimSpace(addr) addr = strings.TrimSpace(addr)
if addr == "" { if addr == "" {
@@ -354,11 +466,312 @@ func EnsureNativePeers(h host.Host) {
continue continue
} }
StaticNatives[addr] = ad StaticNatives[addr] = ad
logger.Info().Str("addr", addr).Msg("native: registered peer in native mesh")
} }
StreamNativeMu.Unlock() StreamNativeMu.Unlock()
// One heartbeat goroutine iterates over all of StaticNatives on each tick;
// starting one per address would multiply heartbeats by the native count.
nativeMeshHeartbeatOnce.Do(func() {
logger.Info().Msg("native: starting mesh heartbeat goroutine")
SendHeartbeat(context.Background(), ProtocolHeartbeat,
conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
})
}
if wasEmpty && len(StaticNatives) > 0 { func StartNativeRegistration(h host.Host, nativeAddressesStr string) {
SendHeartbeat(context.Background(), ProtocolIndexerHeartbeat, go func() {
conf.GetConfig().Name, h, StreamNatives, StaticNatives, 20*time.Second) // Poll until a routable (non-loopback) address is available before the first
// registration attempt. libp2p may not have discovered external addresses yet
// at startup. Cap at 12 retries (~1 minute) so we don't spin indefinitely.
for i := 0; i < 12; i++ {
hasRoutable := false
if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
hasRoutable = true
break
}
if hasRoutable {
break
}
time.Sleep(5 * time.Second)
}
RegisterWithNative(h, nativeAddressesStr)
t := time.NewTicker(RecommendedHeartbeatInterval)
defer t.Stop()
for range t.C {
RegisterWithNative(h, nativeAddressesStr)
}
}()
}
// ── Lost-native replacement ───────────────────────────────────────────────────
const (
// ProtocolNativeGetPeers lets a node/indexer ask a native for a random
// selection of that native's own native contacts (to replace a dead native).
ProtocolNativeGetPeers = "/opencloud/native/peers/1.0"
// ProtocolIndexerGetNatives lets nodes/indexers ask a connected indexer for
// its configured native addresses (fallback when no alive native responds).
ProtocolIndexerGetNatives = "/opencloud/indexer/natives/1.0"
// retryNativeInterval is how often retryLostNative polls a dead native.
retryNativeInterval = 30 * time.Second
)
// GetNativePeersRequest is sent to a native to ask for its known native contacts.
type GetNativePeersRequest struct {
Exclude []string `json:"exclude"`
Count int `json:"count"`
}
// GetNativePeersResponse carries native addresses returned by a native's peer list.
type GetNativePeersResponse struct {
Peers []string `json:"peers"`
}
// GetIndexerNativesRequest is sent to an indexer to ask for its configured native addresses.
type GetIndexerNativesRequest struct {
Exclude []string `json:"exclude"`
}
// GetIndexerNativesResponse carries native addresses returned by an indexer.
type GetIndexerNativesResponse struct {
Natives []string `json:"natives"`
}
// nativeHeartbeatNudge allows replenishNativesFromPeers to trigger an immediate
// native heartbeat tick after adding a replacement native to the pool.
var nativeHeartbeatNudge = make(chan struct{}, 1)
// NudgeNativeHeartbeat signals the native heartbeat goroutine to fire immediately.
func NudgeNativeHeartbeat() {
select {
case nativeHeartbeatNudge <- struct{}{}:
default: // nudge already pending, skip
}
}
// replenishIndexersIfNeeded checks if the indexer pool is below the configured
// minimum (or empty) and, if so, asks the native mesh for replacements.
// Called whenever a native is recovered so the indexer pool is restored.
func replenishIndexersIfNeeded(h host.Host) {
logger := oclib.GetLogger()
minIdx := conf.GetConfig().MinIndexer
if minIdx < 1 {
minIdx = 1
}
StreamMuIndexes.RLock()
indexerCount := len(StaticIndexers)
StreamMuIndexes.RUnlock()
if indexerCount < minIdx {
need := minIdx - indexerCount
logger.Info().Int("need", need).Int("current", indexerCount).Msg("[native] native recovered — replenishing indexer pool")
go replenishIndexersFromNative(h, need)
}
}
// replenishNativesFromPeers is called when the heartbeat to a native fails.
// Flow:
// 1. Ask other alive natives for one of their native contacts (ProtocolNativeGetPeers).
// 2. If none respond or return a new address, ask connected indexers (ProtocolIndexerGetNatives).
// 3. If no replacement found:
// - remaining > 1 → ignore (enough natives remain).
// - remaining ≤ 1 → start periodic retry (retryLostNative).
func replenishNativesFromPeers(h host.Host, lostAddr string, proto protocol.ID) {
if lostAddr == "" {
return
}
logger := oclib.GetLogger()
logger.Info().Str("lost", lostAddr).Msg("[native] replenish natives — start")
// Build exclude list: the lost addr + all currently alive natives.
// lostAddr has already been removed from StaticNatives by doTick.
StreamNativeMu.RLock()
remaining := len(StaticNatives)
exclude := make([]string, 0, remaining+1)
exclude = append(exclude, lostAddr)
for addr := range StaticNatives {
exclude = append(exclude, addr)
}
StreamNativeMu.RUnlock()
logger.Info().Int("remaining", remaining).Msg("[native] replenish natives — step 1: ask alive natives for a peer")
// Step 1: ask other alive natives for a replacement.
newAddr := fetchNativeFromNatives(h, exclude)
// Step 2: fallback — ask connected indexers for their native addresses.
if newAddr == "" {
logger.Info().Msg("[native] replenish natives — step 2: ask indexers for their native addresses")
newAddr = fetchNativeFromIndexers(h, exclude)
}
if newAddr != "" {
ad, err := pp.AddrInfoFromString(newAddr)
if err == nil {
StreamNativeMu.Lock()
StaticNatives[newAddr] = ad
StreamNativeMu.Unlock()
logger.Info().Str("new", newAddr).Msg("[native] replenish natives — replacement added, nudging heartbeat")
NudgeNativeHeartbeat()
replenishIndexersIfNeeded(h)
return
}
}
// Step 3: no replacement found.
logger.Warn().Int("remaining", remaining).Msg("[native] replenish natives — no replacement found")
if remaining > 1 {
logger.Info().Msg("[native] replenish natives — enough natives remain, ignoring loss")
return
}
// Last (or only) native — retry periodically.
logger.Info().Str("addr", lostAddr).Msg("[native] replenish natives — last native lost, starting periodic retry")
go retryLostNative(h, lostAddr, proto)
}
// fetchNativeFromNatives asks each alive native for one of its own native contacts
// not in exclude. Returns the first new address found or "" if none.
func fetchNativeFromNatives(h host.Host, exclude []string) string {
logger := oclib.GetLogger()
excludeSet := make(map[string]struct{}, len(exclude))
for _, e := range exclude {
excludeSet[e] = struct{}{}
}
StreamNativeMu.RLock()
natives := make([]*pp.AddrInfo, 0, len(StaticNatives))
for _, ad := range StaticNatives {
natives = append(natives, ad)
}
StreamNativeMu.RUnlock()
rand.Shuffle(len(natives), func(i, j int) { natives[i], natives[j] = natives[j], natives[i] })
for _, ad := range natives {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
if err := h.Connect(ctx, *ad); err != nil {
cancel()
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — connect failed")
continue
}
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetPeers)
cancel()
if err != nil {
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — stream failed")
continue
}
req := GetNativePeersRequest{Exclude: exclude, Count: 1}
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
s.Close()
continue
}
var resp GetNativePeersResponse
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
s.Close()
continue
}
s.Close()
for _, peer := range resp.Peers {
if _, excluded := excludeSet[peer]; !excluded && peer != "" {
logger.Info().Str("from", ad.ID.String()).Str("new", peer).Msg("[native] fetch native peers — got replacement")
return peer
}
}
logger.Debug().Str("native", ad.ID.String()).Msg("[native] fetch native peers — no new native from this peer")
}
return ""
}
// fetchNativeFromIndexers asks connected indexers for their configured native addresses,
// returning the first one not in exclude.
func fetchNativeFromIndexers(h host.Host, exclude []string) string {
logger := oclib.GetLogger()
excludeSet := make(map[string]struct{}, len(exclude))
for _, e := range exclude {
excludeSet[e] = struct{}{}
}
StreamMuIndexes.RLock()
indexers := make([]*pp.AddrInfo, 0, len(StaticIndexers))
for _, ad := range StaticIndexers {
indexers = append(indexers, ad)
}
StreamMuIndexes.RUnlock()
rand.Shuffle(len(indexers), func(i, j int) { indexers[i], indexers[j] = indexers[j], indexers[i] })
for _, ad := range indexers {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
if err := h.Connect(ctx, *ad); err != nil {
cancel()
continue
}
s, err := h.NewStream(ctx, ad.ID, ProtocolIndexerGetNatives)
cancel()
if err != nil {
logger.Warn().Str("indexer", ad.ID.String()).Err(err).Msg("[native] fetch indexer natives — stream failed")
continue
}
req := GetIndexerNativesRequest{Exclude: exclude}
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
s.Close()
continue
}
var resp GetIndexerNativesResponse
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
s.Close()
continue
}
s.Close()
for _, nativeAddr := range resp.Natives {
if _, excluded := excludeSet[nativeAddr]; !excluded && nativeAddr != "" {
logger.Info().Str("indexer", ad.ID.String()).Str("native", nativeAddr).Msg("[native] fetch indexer natives — got native")
return nativeAddr
}
}
}
logger.Warn().Msg("[native] fetch indexer natives — no native found from indexers")
return ""
}
// retryLostNative periodically retries connecting to a lost native address until
// it becomes reachable again or was already restored by another path.
func retryLostNative(h host.Host, addr string, nativeProto protocol.ID) {
logger := oclib.GetLogger()
logger.Info().Str("addr", addr).Msg("[native] retry — periodic retry for lost native started")
t := time.NewTicker(retryNativeInterval)
defer t.Stop()
for range t.C {
StreamNativeMu.RLock()
_, alreadyRestored := StaticNatives[addr]
StreamNativeMu.RUnlock()
if alreadyRestored {
logger.Info().Str("addr", addr).Msg("[native] retry — native already restored, stopping retry")
return
}
ad, err := pp.AddrInfoFromString(addr)
if err != nil {
logger.Warn().Str("addr", addr).Msg("[native] retry — invalid addr, stopping retry")
return
}
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
err = h.Connect(ctx, *ad)
cancel()
if err != nil {
logger.Warn().Str("addr", addr).Msg("[native] retry — still unreachable")
continue
}
// Reachable again — add back to pool.
StreamNativeMu.Lock()
StaticNatives[addr] = ad
StreamNativeMu.Unlock()
logger.Info().Str("addr", addr).Msg("[native] retry — native reconnected and added back to pool")
NudgeNativeHeartbeat()
replenishIndexersIfNeeded(h)
if nativeProto == ProtocolNativeGetIndexers {
StartNativeRegistration(h, addr) // register back
}
return
} }
} }

View File

@@ -24,17 +24,16 @@ func ExtractIP(addr string) (net.IP, error) {
if err != nil { if err != nil {
return nil, err return nil, err
} }
ips, err := ma.ValueForProtocol(multiaddr.P_IP4) // or P_IP6 ipStr, err := ma.ValueForProtocol(multiaddr.P_IP4)
if err != nil {
ipStr, err = ma.ValueForProtocol(multiaddr.P_IP6)
if err != nil { if err != nil {
return nil, err return nil, err
} }
host, _, err := net.SplitHostPort(ips)
if err != nil {
return nil, err
} }
ip := net.ParseIP(host) ip := net.ParseIP(ipStr)
if ip == nil { if ip == nil {
return nil, fmt.Errorf("invalid IP: %s", host) return nil, fmt.Errorf("invalid IP: %s", ipStr)
} }
return ip, nil return ip, nil
} }

View File

@@ -5,8 +5,9 @@ import (
"encoding/base64" "encoding/base64"
"encoding/json" "encoding/json"
"errors" "errors"
"fmt" "oc-discovery/conf"
"oc-discovery/daemons/node/common" "oc-discovery/daemons/node/common"
"strings"
"time" "time"
oclib "cloud.o-forge.io/core/oc-lib" oclib "cloud.o-forge.io/core/oc-lib"
@@ -18,17 +19,21 @@ import (
"github.com/libp2p/go-libp2p/core/peer" "github.com/libp2p/go-libp2p/core/peer"
) )
type PeerRecord struct { type PeerRecordPayload struct {
Name string `json:"name"` Name string `json:"name"`
DID string `json:"did"` // real PEER ID DID string `json:"did"`
PeerID string `json:"peer_id"`
PubKey []byte `json:"pub_key"` PubKey []byte `json:"pub_key"`
ExpiryDate time.Time `json:"expiry_date"`
}
type PeerRecord struct {
PeerRecordPayload
PeerID string `json:"peer_id"`
APIUrl string `json:"api_url"` APIUrl string `json:"api_url"`
StreamAddress string `json:"stream_address"` StreamAddress string `json:"stream_address"`
NATSAddress string `json:"nats_address"` NATSAddress string `json:"nats_address"`
WalletAddress string `json:"wallet_address"` WalletAddress string `json:"wallet_address"`
Signature []byte `json:"signature"` Signature []byte `json:"signature"`
ExpiryDate time.Time `json:"expiry_date"`
} }
func (p *PeerRecord) Sign() error { func (p *PeerRecord) Sign() error {
@@ -36,13 +41,7 @@ func (p *PeerRecord) Sign() error {
if err != nil { if err != nil {
return err return err
} }
dht := PeerRecord{ payload, _ := json.Marshal(p.PeerRecordPayload)
Name: p.Name,
DID: p.DID,
PubKey: p.PubKey,
ExpiryDate: p.ExpiryDate,
}
payload, _ := json.Marshal(dht)
b, err := common.Sign(priv, payload) b, err := common.Sign(priv, payload)
p.Signature = b p.Signature = b
return err return err
@@ -51,19 +50,11 @@ func (p *PeerRecord) Sign() error {
func (p *PeerRecord) Verify() (crypto.PubKey, error) { func (p *PeerRecord) Verify() (crypto.PubKey, error) {
pubKey, err := crypto.UnmarshalPublicKey(p.PubKey) // retrieve pub key in message pubKey, err := crypto.UnmarshalPublicKey(p.PubKey) // retrieve pub key in message
if err != nil { if err != nil {
fmt.Println("UnmarshalPublicKey")
return pubKey, err return pubKey, err
} }
dht := PeerRecord{ payload, _ := json.Marshal(p.PeerRecordPayload)
Name: p.Name,
DID: p.DID,
PubKey: p.PubKey,
ExpiryDate: p.ExpiryDate,
}
payload, _ := json.Marshal(dht)
if ok, _ := common.Verify(pubKey, payload, p.Signature); !ok { // verify minimal message was sign per pubKey if ok, _ := pubKey.Verify(payload, p.Signature); !ok { // verify minimal message was sign per pubKey
fmt.Println("Verify")
return pubKey, errors.New("invalid signature") return pubKey, errors.New("invalid signature")
} }
return pubKey, nil return pubKey, nil
@@ -114,6 +105,8 @@ func (pr *PeerRecord) ExtractPeer(ourkey string, key string, pubKey crypto.PubKe
type GetValue struct { type GetValue struct {
Key string `json:"key"` Key string `json:"key"`
PeerID peer.ID `json:"peer_id"` PeerID peer.ID `json:"peer_id"`
Name string `json:"name,omitempty"`
Search bool `json:"search,omitempty"`
} }
type GetResponse struct { type GetResponse struct {
@@ -125,122 +118,233 @@ func (ix *IndexerService) genKey(did string) string {
return "/node/" + did return "/node/" + did
} }
func (ix *IndexerService) genNameKey(name string) string {
return "/name/" + name
}
func (ix *IndexerService) genPIDKey(peerID string) string {
return "/pid/" + peerID
}
func (ix *IndexerService) initNodeHandler() { func (ix *IndexerService) initNodeHandler() {
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleNodeHeartbeat) logger := oclib.GetLogger()
logger.Info().Msg("Init Node Handler")
// Each heartbeat from a node carries a freshly signed PeerRecord.
// Republish it to the DHT so the record never expires as long as the node
// is alive — no separate publish stream needed from the node side.
ix.AfterHeartbeat = func(pid peer.ID) {
ctx1, cancel1 := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel1()
res, err := ix.DHT.GetValue(ctx1, ix.genPIDKey(pid.String()))
if err != nil {
logger.Warn().Err(err)
return
}
did := string(res)
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel2()
res, err = ix.DHT.GetValue(ctx2, ix.genKey(did))
if err != nil {
logger.Warn().Err(err)
return
}
var rec PeerRecord
if err := json.Unmarshal(res, &rec); err != nil {
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record unmarshal failed")
return
}
if _, err := rec.Verify(); err != nil {
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record signature invalid")
return
}
data, err := json.Marshal(rec)
if err != nil {
return
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
logger.Info().Msg("REFRESH PutValue " + ix.genKey(rec.DID))
if err := ix.DHT.PutValue(ctx, ix.genKey(rec.DID), data); err != nil {
logger.Warn().Err(err).Str("did", rec.DID).Msg("indexer: DHT refresh failed")
return
}
if rec.Name != "" {
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID))
cancel2()
}
if rec.PeerID != "" {
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID))
cancel3()
}
}
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat)
ix.Host.SetStreamHandler(common.ProtocolPublish, ix.handleNodePublish) ix.Host.SetStreamHandler(common.ProtocolPublish, ix.handleNodePublish)
ix.Host.SetStreamHandler(common.ProtocolGet, ix.handleNodeGet) ix.Host.SetStreamHandler(common.ProtocolGet, ix.handleNodeGet)
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
} }
func (ix *IndexerService) handleNodePublish(s network.Stream) { func (ix *IndexerService) handleNodePublish(s network.Stream) {
defer s.Close() defer s.Close()
logger := oclib.GetLogger() logger := oclib.GetLogger()
for {
var rec PeerRecord var rec PeerRecord
if err := json.NewDecoder(s).Decode(&rec); err != nil { if err := json.NewDecoder(s).Decode(&rec); err != nil {
logger.Err(err) logger.Err(err)
continue return
} }
rec2 := PeerRecord{ if _, err := rec.Verify(); err != nil {
Name: rec.Name,
DID: rec.DID, // REAL PEER ID
PubKey: rec.PubKey,
PeerID: rec.PeerID,
}
if _, err := rec2.Verify(); err != nil {
logger.Err(err) logger.Err(err)
continue return
} }
if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) { // already expired if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) {
logger.Err(errors.New(rec.PeerID + " is expired.")) logger.Err(errors.New(rec.PeerID + " is expired."))
continue return
} }
pid, err := peer.Decode(rec.PeerID) pid, err := peer.Decode(rec.PeerID)
if err != nil { if err != nil {
continue return
} }
ix.StreamMU.Lock() ix.StreamMU.Lock()
defer ix.StreamMU.Unlock()
if ix.StreamRecords[common.ProtocolHeartbeat] == nil { if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{} ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
} }
streams := ix.StreamRecords[common.ProtocolHeartbeat] streams := ix.StreamRecords[common.ProtocolHeartbeat]
if srec, ok := streams[pid]; ok { if srec, ok := streams[pid]; ok {
srec.DID = rec.DID srec.DID = rec.DID
srec.Record = rec srec.Record = rec
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC() srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
} else {
ix.StreamMU.Unlock()
logger.Err(errors.New("no heartbeat"))
continue
} }
ix.StreamMU.Unlock()
key := ix.genKey(rec.DID) key := ix.genKey(rec.DID)
data, err := json.Marshal(rec) data, err := json.Marshal(rec)
if err != nil { if err != nil {
logger.Err(err) logger.Err(err)
continue return
} }
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx, key, data); err != nil { if err := ix.DHT.PutValue(ctx, key, data); err != nil {
logger.Err(err) logger.Err(err)
cancel() cancel()
continue return
} }
cancel() cancel()
break // response... so quit
// Secondary index: /name/<name> → DID, so peers can resolve by human-readable name.
if rec.Name != "" {
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID)); err != nil {
logger.Err(err).Str("name", rec.Name).Msg("indexer: failed to write name index")
}
cancel2()
}
// Secondary index: /pid/<peerID> → DID, so peers can resolve by libp2p PeerID.
if rec.PeerID != "" {
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID)); err != nil {
logger.Err(err).Str("pid", rec.PeerID).Msg("indexer: failed to write pid index")
}
cancel3()
} }
} }
func (ix *IndexerService) handleNodeGet(s network.Stream) { func (ix *IndexerService) handleNodeGet(s network.Stream) {
defer s.Close() defer s.Close()
logger := oclib.GetLogger() logger := oclib.GetLogger()
for {
var req GetValue var req GetValue
if err := json.NewDecoder(s).Decode(&req); err != nil { if err := json.NewDecoder(s).Decode(&req); err != nil {
logger.Err(err) logger.Err(err)
continue return
} }
ix.StreamMU.Lock()
if ix.StreamRecords[common.ProtocolHeartbeat] == nil { resp := GetResponse{Found: false, Records: map[string]PeerRecord{}}
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
}
resp := GetResponse{
Found: false,
Records: map[string]PeerRecord{},
}
streams := ix.StreamRecords[common.ProtocolHeartbeat]
key := ix.genKey(req.Key) keys := []string{}
// simple lookup by PeerID (or DID) // Name substring search — scan in-memory connected nodes first, then DHT exact match.
if req.Name != "" {
if req.Search {
for _, did := range ix.LookupNameIndex(strings.ToLower(req.Name)) {
keys = append(keys, did)
}
} else {
// 2. DHT exact-name lookup: covers nodes that published but aren't currently connected.
nameCtx, nameCancel := context.WithTimeout(context.Background(), 5*time.Second)
if ch, err := ix.DHT.SearchValue(nameCtx, ix.genNameKey(req.Name)); err == nil {
for did := range ch {
keys = append(keys, string(did))
break
}
}
nameCancel()
}
} else if req.PeerID != "" {
pidCtx, pidCancel := context.WithTimeout(context.Background(), 5*time.Second)
if did, err := ix.DHT.GetValue(pidCtx, ix.genPIDKey(req.PeerID.String())); err == nil {
keys = append(keys, string(did))
}
pidCancel()
} else {
keys = append(keys, req.Key)
}
// DHT record fetch by DID key (covers exact-name and PeerID paths).
if len(keys) > 0 {
for _, k := range keys {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
recBytes, err := ix.DHT.SearchValue(ctx, key) c, err := ix.DHT.GetValue(ctx, ix.genKey(k))
if err != nil {
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT")
cancel() cancel()
} if err == nil {
cancel()
for c := range recBytes {
var rec PeerRecord var rec PeerRecord
if err := json.Unmarshal(c, &rec); err != nil || rec.PeerID != req.PeerID.String() { if json.Unmarshal(c, &rec) == nil {
// Filter by PeerID only when one was explicitly specified.
if req.PeerID == "" || rec.PeerID == req.PeerID.String() {
resp.Records[rec.PeerID] = rec
}
}
} else if req.Name == "" && req.PeerID == "" {
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT " + req.Key)
}
}
}
resp.Found = len(resp.Records) > 0
_ = json.NewEncoder(s).Encode(resp)
}
// handleGetNatives returns this indexer's configured native addresses,
// excluding any in the request's Exclude list.
func (ix *IndexerService) handleGetNatives(s network.Stream) {
defer s.Close()
logger := oclib.GetLogger()
var req common.GetIndexerNativesRequest
if err := json.NewDecoder(s).Decode(&req); err != nil {
logger.Err(err).Msg("indexer get natives: decode")
return
}
excludeSet := make(map[string]struct{}, len(req.Exclude))
for _, e := range req.Exclude {
excludeSet[e] = struct{}{}
}
resp := common.GetIndexerNativesResponse{}
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
addr = strings.TrimSpace(addr)
if addr == "" {
continue continue
} }
resp.Found = true if _, excluded := excludeSet[addr]; !excluded {
resp.Records[rec.PeerID] = rec resp.Natives = append(resp.Natives, addr)
if srec, ok := streams[req.PeerID]; ok {
srec.DID = rec.DID
srec.Record = rec
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
} }
} }
// Not found
_ = json.NewEncoder(s).Encode(resp) if err := json.NewEncoder(s).Encode(resp); err != nil {
ix.StreamMU.Unlock() logger.Err(err).Msg("indexer get natives: encode response")
break // response... so quit
} }
} }

View File

@@ -0,0 +1,168 @@
package indexer
import (
"context"
"encoding/json"
"strings"
"sync"
"time"
"oc-discovery/daemons/node/common"
oclib "cloud.o-forge.io/core/oc-lib"
pubsub "github.com/libp2p/go-libp2p-pubsub"
pp "github.com/libp2p/go-libp2p/core/peer"
)
// TopicNameIndex is the GossipSub topic shared by regular indexers to exchange
// add/delete events for the distributed name→peerID mapping.
const TopicNameIndex = "oc-name-index"
// nameIndexDedupWindow suppresses re-emission of the same (action, name, peerID)
// tuple within this window, reducing duplicate events when a node is registered
// with multiple indexers simultaneously.
const nameIndexDedupWindow = 30 * time.Second
// NameIndexAction indicates whether a name mapping is being added or removed.
type NameIndexAction string
const (
NameIndexAdd NameIndexAction = "add"
NameIndexDelete NameIndexAction = "delete"
)
// NameIndexEvent is published on TopicNameIndex by each indexer when a node
// registers (add) or is evicted by the GC (delete).
type NameIndexEvent struct {
Action NameIndexAction `json:"action"`
Name string `json:"name"`
PeerID string `json:"peer_id"`
DID string `json:"did"`
}
// nameIndexState holds the local in-memory name index and the sender-side
// deduplication tracker.
type nameIndexState struct {
// index: name → peerID → DID, built from events received from all indexers.
index map[string]map[string]string
indexMu sync.RWMutex
// emitted tracks the last emission time for each (action, name, peerID) key
// to suppress duplicates within nameIndexDedupWindow.
emitted map[string]time.Time
emittedMu sync.Mutex
}
// shouldEmit returns true if the (action, name, peerID) tuple has not been
// emitted within nameIndexDedupWindow, updating the tracker if so.
func (s *nameIndexState) shouldEmit(action NameIndexAction, name, peerID string) bool {
key := string(action) + ":" + name + ":" + peerID
s.emittedMu.Lock()
defer s.emittedMu.Unlock()
if t, ok := s.emitted[key]; ok && time.Since(t) < nameIndexDedupWindow {
return false
}
s.emitted[key] = time.Now()
return true
}
// onEvent applies a received NameIndexEvent to the local index.
// "add" inserts/updates the mapping; "delete" removes it.
// Operations are idempotent — duplicate events from multiple indexers are harmless.
func (s *nameIndexState) onEvent(evt NameIndexEvent) {
if evt.Name == "" || evt.PeerID == "" {
return
}
s.indexMu.Lock()
defer s.indexMu.Unlock()
switch evt.Action {
case NameIndexAdd:
if s.index[evt.Name] == nil {
s.index[evt.Name] = map[string]string{}
}
s.index[evt.Name][evt.PeerID] = evt.DID
case NameIndexDelete:
if s.index[evt.Name] != nil {
delete(s.index[evt.Name], evt.PeerID)
if len(s.index[evt.Name]) == 0 {
delete(s.index, evt.Name)
}
}
}
}
// initNameIndex joins TopicNameIndex and starts consuming events.
// Must be called after ix.PS is ready.
func (ix *IndexerService) initNameIndex(ps *pubsub.PubSub) {
logger := oclib.GetLogger()
ix.nameIndex = &nameIndexState{
index: map[string]map[string]string{},
emitted: map[string]time.Time{},
}
ps.RegisterTopicValidator(TopicNameIndex, func(_ context.Context, _ pp.ID, _ *pubsub.Message) bool {
return true
})
topic, err := ps.Join(TopicNameIndex)
if err != nil {
logger.Err(err).Msg("name index: failed to join topic")
return
}
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Lock()
ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex] = topic
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Unlock()
common.SubscribeEvents(
ix.LongLivedStreamRecordedService.LongLivedPubSubService,
context.Background(),
TopicNameIndex,
-1,
func(_ context.Context, evt NameIndexEvent, _ string) {
ix.nameIndex.onEvent(evt)
},
)
}
// publishNameEvent emits a NameIndexEvent on TopicNameIndex, subject to the
// sender-side deduplication window.
func (ix *IndexerService) publishNameEvent(action NameIndexAction, name, peerID, did string) {
if ix.nameIndex == nil || name == "" || peerID == "" {
return
}
if !ix.nameIndex.shouldEmit(action, name, peerID) {
return
}
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RLock()
topic := ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex]
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RUnlock()
if topic == nil {
return
}
evt := NameIndexEvent{Action: action, Name: name, PeerID: peerID, DID: did}
b, err := json.Marshal(evt)
if err != nil {
return
}
_ = topic.Publish(context.Background(), b)
}
// LookupNameIndex searches the distributed name index for peers whose name
// contains needle (case-insensitive). Returns peerID → DID for matched peers.
// Returns nil if the name index is not initialised (e.g. native indexers).
func (ix *IndexerService) LookupNameIndex(needle string) map[string]string {
if ix.nameIndex == nil {
return nil
}
result := map[string]string{}
needleLow := strings.ToLower(needle)
ix.nameIndex.indexMu.RLock()
defer ix.nameIndex.indexMu.RUnlock()
for name, peers := range ix.nameIndex.index {
if strings.Contains(strings.ToLower(name), needleLow) {
for peerID, did := range peers {
result[peerID] = did
}
}
}
return result
}

View File

@@ -4,7 +4,10 @@ import (
"context" "context"
"encoding/json" "encoding/json"
"errors" "errors"
"fmt"
"math/rand" "math/rand"
"slices"
"strings"
"sync" "sync"
"time" "time"
@@ -12,19 +15,24 @@ import (
oclib "cloud.o-forge.io/core/oc-lib" oclib "cloud.o-forge.io/core/oc-lib"
pubsub "github.com/libp2p/go-libp2p-pubsub" pubsub "github.com/libp2p/go-libp2p-pubsub"
"github.com/libp2p/go-libp2p/core/host"
"github.com/libp2p/go-libp2p/core/network" "github.com/libp2p/go-libp2p/core/network"
pp "github.com/libp2p/go-libp2p/core/peer" pp "github.com/libp2p/go-libp2p/core/peer"
) )
const ( const (
// IndexerTTL is 10% above the recommended 60s heartbeat interval. // IndexerTTL is the lifetime of a live-indexer cache entry. Set to 50% above
IndexerTTL = 66 * time.Second // the recommended 60s heartbeat interval so a single delayed renewal does not
// evict a healthy indexer from the native's cache.
IndexerTTL = 90 * time.Second
// offloadInterval is how often the native checks if it can release responsible peers. // offloadInterval is how often the native checks if it can release responsible peers.
offloadInterval = 30 * time.Second offloadInterval = 30 * time.Second
// dhtRefreshInterval is how often the background goroutine queries the DHT for // dhtRefreshInterval is how often the background goroutine queries the DHT for
// known-but-expired indexer entries (written by neighbouring natives). // known-but-expired indexer entries (written by neighbouring natives).
dhtRefreshInterval = 30 * time.Second dhtRefreshInterval = 30 * time.Second
// maxFallbackPeers caps how many peers the native will accept in self-delegation
// mode. Beyond this limit the native refuses to act as a fallback indexer so it
// is not overwhelmed during prolonged indexer outages.
maxFallbackPeers = 50
) )
// liveIndexerEntry tracks a registered indexer in the native's in-memory cache and DHT. // liveIndexerEntry tracks a registered indexer in the native's in-memory cache and DHT.
@@ -43,7 +51,7 @@ type NativeState struct {
// knownPeerIDs accumulates all indexer PeerIDs ever seen (local stream or gossip). // knownPeerIDs accumulates all indexer PeerIDs ever seen (local stream or gossip).
// Used by refreshIndexersFromDHT to re-hydrate expired entries from the shared DHT, // Used by refreshIndexersFromDHT to re-hydrate expired entries from the shared DHT,
// including entries written by other natives. // including entries written by other natives.
knownPeerIDs map[string]struct{} knownPeerIDs map[string]string
knownMu sync.RWMutex knownMu sync.RWMutex
} }
@@ -51,7 +59,7 @@ func newNativeState() *NativeState {
return &NativeState{ return &NativeState{
liveIndexers: map[string]*liveIndexerEntry{}, liveIndexers: map[string]*liveIndexerEntry{},
responsiblePeers: map[pp.ID]struct{}{}, responsiblePeers: map[pp.ID]struct{}{},
knownPeerIDs: map[string]struct{}{}, knownPeerIDs: map[string]string{},
} }
} }
@@ -92,10 +100,12 @@ func (v IndexerRecordValidator) Select(_ string, values [][]byte) (int, error) {
// Must be called after DHT is initialized. // Must be called after DHT is initialized.
func (ix *IndexerService) InitNative() { func (ix *IndexerService) InitNative() {
ix.Native = newNativeState() ix.Native = newNativeState()
ix.Host.SetStreamHandler(common.ProtocolIndexerHeartbeat, ix.HandleNodeHeartbeat) // specific heartbeat for Indexer. ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat) // specific heartbeat for Indexer.
ix.Host.SetStreamHandler(common.ProtocolNativeSubscription, ix.handleNativeSubscription) ix.Host.SetStreamHandler(common.ProtocolNativeSubscription, ix.handleNativeSubscription)
ix.Host.SetStreamHandler(common.ProtocolNativeGetIndexers, ix.handleNativeGetIndexers) ix.Host.SetStreamHandler(common.ProtocolNativeGetIndexers, ix.handleNativeGetIndexers)
ix.Host.SetStreamHandler(common.ProtocolNativeConsensus, ix.handleNativeConsensus) ix.Host.SetStreamHandler(common.ProtocolNativeConsensus, ix.handleNativeConsensus)
ix.Host.SetStreamHandler(common.ProtocolNativeGetPeers, ix.handleNativeGetPeers)
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
ix.subscribeIndexerRegistry() ix.subscribeIndexerRegistry()
// Ensure long connections to other configured natives (native-to-native mesh). // Ensure long connections to other configured natives (native-to-native mesh).
common.EnsureNativePeers(ix.Host) common.EnsureNativePeers(ix.Host)
@@ -107,8 +117,15 @@ func (ix *IndexerService) InitNative() {
// registered indexer PeerIDs to one another, enabling cross-native DHT discovery. // registered indexer PeerIDs to one another, enabling cross-native DHT discovery.
func (ix *IndexerService) subscribeIndexerRegistry() { func (ix *IndexerService) subscribeIndexerRegistry() {
logger := oclib.GetLogger() logger := oclib.GetLogger()
ix.PS.RegisterTopicValidator(common.TopicIndexerRegistry, func(_ context.Context, _ pp.ID, _ *pubsub.Message) bool { ix.PS.RegisterTopicValidator(common.TopicIndexerRegistry, func(_ context.Context, _ pp.ID, msg *pubsub.Message) bool {
return true // Reject empty or syntactically invalid multiaddrs before they reach the
// message loop. A compromised native could otherwise gossip arbitrary data.
addr := string(msg.Data)
if addr == "" {
return false
}
_, err := pp.AddrInfoFromString(addr)
return err == nil
}) })
topic, err := ix.PS.Join(common.TopicIndexerRegistry) topic, err := ix.PS.Join(common.TopicIndexerRegistry)
if err != nil { if err != nil {
@@ -130,29 +147,38 @@ func (ix *IndexerService) subscribeIndexerRegistry() {
if err != nil { if err != nil {
return return
} }
peerID := string(msg.Data) addr := string(msg.Data)
if peerID == "" { if addr == "" {
continue continue
} }
// A neighbouring native registered this PeerID; add to known set for DHT refresh. if peer, err := pp.AddrInfoFromString(addr); err == nil {
ix.Native.knownMu.Lock() ix.Native.knownMu.Lock()
ix.Native.knownPeerIDs[peerID] = struct{}{} ix.Native.knownPeerIDs[peer.ID.String()] = addr
ix.Native.knownMu.Unlock() ix.Native.knownMu.Unlock()
}
// A neighbouring native registered this PeerID; add to known set for DHT refresh.
} }
}() }()
} }
// handleNativeSubscription stores an indexer's alive registration in the DHT cache. // handleNativeSubscription stores an indexer's alive registration in the local cache
// immediately, then persists it to the DHT asynchronously.
// The stream is temporary: indexer sends one IndexerRegistration and closes. // The stream is temporary: indexer sends one IndexerRegistration and closes.
func (ix *IndexerService) handleNativeSubscription(s network.Stream) { func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
defer s.Close() defer s.Close()
logger := oclib.GetLogger() logger := oclib.GetLogger()
logger.Info().Msg("Subscription")
var reg common.IndexerRegistration var reg common.IndexerRegistration
if err := json.NewDecoder(s).Decode(&reg); err != nil { if err := json.NewDecoder(s).Decode(&reg); err != nil {
logger.Err(err).Msg("native subscription: decode") logger.Err(err).Msg("native subscription: decode")
return return
} }
logger.Info().Msg("Subscription " + reg.Addr)
if reg.Addr == "" { if reg.Addr == "" {
logger.Error().Msg("native subscription: missing addr") logger.Error().Msg("native subscription: missing addr")
return return
@@ -166,30 +192,23 @@ func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
reg.PeerID = ad.ID.String() reg.PeerID = ad.ID.String()
} }
expiry := time.Now().UTC().Add(IndexerTTL) // Build entry with a fresh TTL — must happen before the cache write so the 66s
// window is not consumed by DHT retries.
entry := &liveIndexerEntry{ entry := &liveIndexerEntry{
PeerID: reg.PeerID, PeerID: reg.PeerID,
Addr: reg.Addr, Addr: reg.Addr,
ExpiresAt: expiry, ExpiresAt: time.Now().UTC().Add(IndexerTTL),
} }
// Persist in DHT with 66s TTL. // Update local cache and known set immediately so concurrent GetIndexers calls
key := ix.genIndexerKey(reg.PeerID) // can already see this indexer without waiting for the DHT write to complete.
if data, err := json.Marshal(entry); err == nil {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
logger.Err(err).Msg("native subscription: DHT put")
}
cancel()
}
// Update local cache and known set.
ix.Native.liveIndexersMu.Lock() ix.Native.liveIndexersMu.Lock()
_, isRenewal := ix.Native.liveIndexers[reg.PeerID]
ix.Native.liveIndexers[reg.PeerID] = entry ix.Native.liveIndexers[reg.PeerID] = entry
ix.Native.liveIndexersMu.Unlock() ix.Native.liveIndexersMu.Unlock()
ix.Native.knownMu.Lock() ix.Native.knownMu.Lock()
ix.Native.knownPeerIDs[reg.PeerID] = struct{}{} ix.Native.knownPeerIDs[reg.PeerID] = reg.Addr
ix.Native.knownMu.Unlock() ix.Native.knownMu.Unlock()
// Gossip PeerID to neighbouring natives so they discover it via DHT. // Gossip PeerID to neighbouring natives so they discover it via DHT.
@@ -197,16 +216,46 @@ func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
topic := ix.LongLivedPubSubs[common.TopicIndexerRegistry] topic := ix.LongLivedPubSubs[common.TopicIndexerRegistry]
ix.PubsubMu.RUnlock() ix.PubsubMu.RUnlock()
if topic != nil { if topic != nil {
if err := topic.Publish(context.Background(), []byte(reg.PeerID)); err != nil { if err := topic.Publish(context.Background(), []byte(reg.Addr)); err != nil {
logger.Err(err).Msg("native subscription: registry gossip publish") logger.Err(err).Msg("native subscription: registry gossip publish")
} }
} }
logger.Info().Str("peer", reg.PeerID).Msg("native: indexer registered") if isRenewal {
logger.Debug().Str("peer", reg.PeerID).Msg("native: indexer TTL renewed : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
} else {
logger.Info().Str("peer", reg.PeerID).Msg("native: indexer registered : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
}
// Persist in DHT asynchronously — retries must not block the handler or consume
// the local cache TTL.
key := ix.genIndexerKey(reg.PeerID)
data, err := json.Marshal(entry)
if err != nil {
logger.Err(err).Msg("native subscription: marshal entry")
return
}
go func() {
for {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
cancel()
logger.Err(err).Msg("native subscription: DHT put " + key)
if strings.Contains(err.Error(), "failed to find any peer in table") {
time.Sleep(10 * time.Second)
continue
}
return
}
cancel()
return
}
}()
} }
// handleNativeGetIndexers returns this native's own list of reachable indexers. // handleNativeGetIndexers returns this native's own list of reachable indexers.
// If none are available, it self-delegates (becomes the fallback indexer for the caller). // Self-delegation (native acting as temporary fallback indexer) is only permitted
// for nodes — never for peers that are themselves registered indexers in knownPeerIDs.
// The consensus across natives is the responsibility of the requesting node/indexer. // The consensus across natives is the responsibility of the requesting node/indexer.
func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) { func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
defer s.Close() defer s.Close()
@@ -220,14 +269,20 @@ func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
if req.Count <= 0 { if req.Count <= 0 {
req.Count = 3 req.Count = 3
} }
callerPeerID := s.Conn().RemotePeer().String()
reachable := ix.reachableLiveIndexers() reachable := ix.reachableLiveIndexers(req.Count, callerPeerID)
var resp common.GetIndexersResponse var resp common.GetIndexersResponse
if len(reachable) == 0 { if len(reachable) == 0 {
// No indexers known: become temporary fallback for this caller. // No live indexers reachable — try to self-delegate.
ix.selfDelegate(s.Conn().RemotePeer(), &resp) if ix.selfDelegate(s.Conn().RemotePeer(), &resp) {
logger.Info().Str("peer", s.Conn().RemotePeer().String()).Msg("native: no indexers, acting as fallback") logger.Info().Str("peer", callerPeerID).Msg("native: no indexers, acting as fallback for node")
} else {
// Fallback pool saturated: return empty so the caller retries another
// native instead of piling more load onto this one.
logger.Warn().Str("peer", callerPeerID).Int("pool", maxFallbackPeers).Msg(
"native: fallback pool saturated, refusing self-delegation")
}
} else { } else {
rand.Shuffle(len(reachable), func(i, j int) { reachable[i], reachable[j] = reachable[j], reachable[i] }) rand.Shuffle(len(reachable), func(i, j int) { reachable[i], reachable[j] = reachable[j], reachable[i] })
if req.Count > len(reachable) { if req.Count > len(reachable) {
@@ -255,7 +310,7 @@ func (ix *IndexerService) handleNativeConsensus(s network.Stream) {
return return
} }
myList := ix.reachableLiveIndexers() myList := ix.reachableLiveIndexers(-1, s.Conn().RemotePeer().String())
mySet := make(map[string]struct{}, len(myList)) mySet := make(map[string]struct{}, len(myList))
for _, addr := range myList { for _, addr := range myList {
mySet[addr] = struct{}{} mySet[addr] = struct{}{}
@@ -285,31 +340,56 @@ func (ix *IndexerService) handleNativeConsensus(s network.Stream) {
} }
// selfDelegate marks the caller as a responsible peer and exposes this native's own // selfDelegate marks the caller as a responsible peer and exposes this native's own
// address as its temporary indexer. // address as its temporary indexer. Returns false when the fallback pool is saturated
func (ix *IndexerService) selfDelegate(remotePeer pp.ID, resp *common.GetIndexersResponse) { // (maxFallbackPeers reached) — the caller must return an empty response so the node
// retries later instead of pinning indefinitely to an overloaded native.
func (ix *IndexerService) selfDelegate(remotePeer pp.ID, resp *common.GetIndexersResponse) bool {
ix.Native.responsibleMu.Lock() ix.Native.responsibleMu.Lock()
ix.Native.responsiblePeers[remotePeer] = struct{}{} defer ix.Native.responsibleMu.Unlock()
ix.Native.responsibleMu.Unlock() if len(ix.Native.responsiblePeers) >= maxFallbackPeers {
resp.IsSelfFallback = true return false
for _, a := range ix.Host.Addrs() {
resp.Indexers = []string{a.String() + "/p2p/" + ix.Host.ID().String()}
break
} }
ix.Native.responsiblePeers[remotePeer] = struct{}{}
resp.IsSelfFallback = true
resp.Indexers = []string{ix.Host.Addrs()[len(ix.Host.Addrs())-1].String() + "/p2p/" + ix.Host.ID().String()}
return true
} }
// reachableLiveIndexers returns the multiaddrs of non-expired, pingable indexers // reachableLiveIndexers returns the multiaddrs of non-expired, pingable indexers
// from the local cache (kept fresh by refreshIndexersFromDHT in background). // from the local cache (kept fresh by refreshIndexersFromDHT in background).
func (ix *IndexerService) reachableLiveIndexers() []string { func (ix *IndexerService) reachableLiveIndexers(count int, from ...string) []string {
ix.Native.liveIndexersMu.RLock() ix.Native.liveIndexersMu.RLock()
now := time.Now().UTC() now := time.Now().UTC()
candidates := []*liveIndexerEntry{} candidates := []*liveIndexerEntry{}
for _, e := range ix.Native.liveIndexers { for _, e := range ix.Native.liveIndexers {
if e.ExpiresAt.After(now) { fmt.Println("liveIndexers", slices.Contains(from, e.PeerID), from, e.PeerID)
if e.ExpiresAt.After(now) && !slices.Contains(from, e.PeerID) {
candidates = append(candidates, e) candidates = append(candidates, e)
} }
} }
ix.Native.liveIndexersMu.RUnlock() ix.Native.liveIndexersMu.RUnlock()
fmt.Println("midway...", candidates, from, ix.Native.knownPeerIDs)
if (count > 0 && len(candidates) < count) || count < 0 {
ix.Native.knownMu.RLock()
for k, v := range ix.Native.knownPeerIDs {
// Include peers whose liveIndexers entry is absent OR expired.
// A non-nil but expired entry means the peer was once known but
// has since timed out — PeerIsAlive below will decide if it's back.
fmt.Println("knownPeerIDs", slices.Contains(from, k), from, k)
if !slices.Contains(from, k) {
candidates = append(candidates, &liveIndexerEntry{
PeerID: k,
Addr: v,
})
}
}
ix.Native.knownMu.RUnlock()
}
fmt.Println("midway...1", candidates)
reachable := []string{} reachable := []string{}
for _, e := range candidates { for _, e := range candidates {
ad, err := pp.AddrInfoFromString(e.Addr) ad, err := pp.AddrInfoFromString(e.Addr)
@@ -371,6 +451,12 @@ func (ix *IndexerService) refreshIndexersFromDHT() {
ix.Native.liveIndexers[best.PeerID] = best ix.Native.liveIndexers[best.PeerID] = best
ix.Native.liveIndexersMu.Unlock() ix.Native.liveIndexersMu.Unlock()
logger.Info().Str("peer", best.PeerID).Msg("native: refreshed indexer from DHT") logger.Info().Str("peer", best.PeerID).Msg("native: refreshed indexer from DHT")
} else {
// DHT has no fresh entry — peer is gone, prune from known set.
ix.Native.knownMu.Lock()
delete(ix.Native.knownPeerIDs, pid)
ix.Native.knownMu.Unlock()
logger.Info().Str("peer", pid).Msg("native: pruned stale peer from knownPeerIDs")
} }
} }
} }
@@ -387,30 +473,107 @@ func (ix *IndexerService) runOffloadLoop() {
defer t.Stop() defer t.Stop()
logger := oclib.GetLogger() logger := oclib.GetLogger()
for range t.C { for range t.C {
fmt.Println("runOffloadLoop", ix.Native.responsiblePeers)
ix.Native.responsibleMu.RLock() ix.Native.responsibleMu.RLock()
count := len(ix.Native.responsiblePeers) count := len(ix.Native.responsiblePeers)
ix.Native.responsibleMu.RUnlock() ix.Native.responsibleMu.RUnlock()
if count == 0 { if count == 0 {
continue continue
} }
if len(ix.reachableLiveIndexers()) > 0 { ix.Native.responsibleMu.RLock()
peerIDS := []string{}
for p := range ix.Native.responsiblePeers {
peerIDS = append(peerIDS, p.String())
}
fmt.Println("COUNT --> ", count, len(ix.reachableLiveIndexers(-1, peerIDS...)))
ix.Native.responsibleMu.RUnlock()
if len(ix.reachableLiveIndexers(-1, peerIDS...)) > 0 {
ix.Native.responsibleMu.RLock()
released := ix.Native.responsiblePeers
ix.Native.responsibleMu.RUnlock()
// Reset (not Close) heartbeat streams of released peers.
// Close() only half-closes the native's write direction — the peer's write
// direction stays open and sendHeartbeat never sees an error.
// Reset() abruptly terminates both directions, making the peer's next
// json.Encode return an error which triggers replenishIndexersFromNative.
ix.StreamMU.Lock()
if streams := ix.StreamRecords[common.ProtocolHeartbeat]; streams != nil {
for pid := range released {
if rec, ok := streams[pid]; ok {
if rec.HeartbeatStream != nil && rec.HeartbeatStream.Stream != nil {
rec.HeartbeatStream.Stream.Reset()
}
ix.Native.responsibleMu.Lock() ix.Native.responsibleMu.Lock()
ix.Native.responsiblePeers = map[pp.ID]struct{}{} delete(ix.Native.responsiblePeers, pid)
ix.Native.responsibleMu.Unlock() ix.Native.responsibleMu.Unlock()
delete(streams, pid)
logger.Info().Str("peer", pid.String()).Str("proto", string(common.ProtocolHeartbeat)).Msg(
"native: offload — stream reset, peer will reconnect to real indexer")
} else {
// No recorded heartbeat stream for this peer: either it never
// passed the score check (new peer, uptime=0 → score<75) or the
// stream was GC'd. We cannot send a Reset signal, so close the
// whole connection instead — this makes the peer's sendHeartbeat
// return an error, which triggers replenishIndexersFromNative and
// migrates it to a real indexer.
ix.Native.responsibleMu.Lock()
delete(ix.Native.responsiblePeers, pid)
ix.Native.responsibleMu.Unlock()
go ix.Host.Network().ClosePeer(pid)
logger.Info().Str("peer", pid.String()).Msg(
"native: offload — no heartbeat stream, closing connection so peer re-requests real indexers")
}
}
}
ix.StreamMU.Unlock()
logger.Info().Int("released", count).Msg("native: offloaded responsible peers to real indexers") logger.Info().Int("released", count).Msg("native: offloaded responsible peers to real indexers")
} }
} }
} }
// handleNativeGetPeers returns a random selection of this native's known native
// contacts, excluding any in the request's Exclude list.
func (ix *IndexerService) handleNativeGetPeers(s network.Stream) {
defer s.Close()
logger := oclib.GetLogger()
var req common.GetNativePeersRequest
if err := json.NewDecoder(s).Decode(&req); err != nil {
logger.Err(err).Msg("native get peers: decode")
return
}
if req.Count <= 0 {
req.Count = 1
}
excludeSet := make(map[string]struct{}, len(req.Exclude))
for _, e := range req.Exclude {
excludeSet[e] = struct{}{}
}
common.StreamNativeMu.RLock()
candidates := make([]string, 0, len(common.StaticNatives))
for addr := range common.StaticNatives {
if _, excluded := excludeSet[addr]; !excluded {
candidates = append(candidates, addr)
}
}
common.StreamNativeMu.RUnlock()
rand.Shuffle(len(candidates), func(i, j int) { candidates[i], candidates[j] = candidates[j], candidates[i] })
if req.Count > len(candidates) {
req.Count = len(candidates)
}
resp := common.GetNativePeersResponse{Peers: candidates[:req.Count]}
if err := json.NewEncoder(s).Encode(resp); err != nil {
logger.Err(err).Msg("native get peers: encode response")
}
}
// StartNativeRegistration starts a goroutine that periodically registers this // StartNativeRegistration starts a goroutine that periodically registers this
// indexer with all configured native indexers (every RecommendedHeartbeatInterval). // indexer with all configured native indexers (every RecommendedHeartbeatInterval).
func StartNativeRegistration(h host.Host, nativeAddressesStr string) {
go func() {
common.RegisterWithNative(h, nativeAddressesStr)
t := time.NewTicker(common.RecommendedHeartbeatInterval)
defer t.Stop()
for range t.C {
common.RegisterWithNative(h, nativeAddressesStr)
}
}()
}

View File

@@ -11,6 +11,7 @@ import (
pubsub "github.com/libp2p/go-libp2p-pubsub" pubsub "github.com/libp2p/go-libp2p-pubsub"
record "github.com/libp2p/go-libp2p-record" record "github.com/libp2p/go-libp2p-record"
"github.com/libp2p/go-libp2p/core/host" "github.com/libp2p/go-libp2p/core/host"
pp "github.com/libp2p/go-libp2p/core/peer"
) )
// IndexerService manages the indexer node's state: stream records, DHT, pubsub. // IndexerService manages the indexer node's state: stream records, DHT, pubsub.
@@ -22,6 +23,7 @@ type IndexerService struct {
mu sync.RWMutex mu sync.RWMutex
IsNative bool IsNative bool
Native *NativeState // non-nil when IsNative == true Native *NativeState // non-nil when IsNative == true
nameIndex *nameIndexState
} }
// NewIndexerService creates an IndexerService. // NewIndexerService creates an IndexerService.
@@ -43,22 +45,34 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
} }
ix.PS = ps ix.PS = ps
if ix.isStrictIndexer { if ix.isStrictIndexer && !isNative {
logger.Info().Msg("connect to indexers as strict indexer...") logger.Info().Msg("connect to indexers as strict indexer...")
common.ConnectToIndexers(h, 0, 5, ix.Host.ID()) common.ConnectToIndexers(h, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, ix.Host.ID())
logger.Info().Msg("subscribe to decentralized search flow as strict indexer...") logger.Info().Msg("subscribe to decentralized search flow as strict indexer...")
ix.SubscribeToSearch(ix.PS, nil) go ix.SubscribeToSearch(ix.PS, nil)
}
if !isNative {
logger.Info().Msg("init distributed name index...")
ix.initNameIndex(ps)
ix.LongLivedStreamRecordedService.AfterDelete = func(pid pp.ID, name, did string) {
ix.publishNameEvent(NameIndexDelete, name, pid.String(), did)
}
} }
if ix.DHT, err = dht.New( if ix.DHT, err = dht.New(
context.Background(), context.Background(),
ix.Host, ix.Host,
dht.Mode(dht.ModeServer), dht.Mode(dht.ModeServer),
dht.ProtocolPrefix("oc"), // 🔥 réseau privé
dht.Validator(record.NamespacedValidator{ dht.Validator(record.NamespacedValidator{
"node": PeerRecordValidator{}, "node": PeerRecordValidator{},
"indexer": IndexerRecordValidator{}, // for native indexer registry "indexer": IndexerRecordValidator{}, // for native indexer registry
"name": DefaultValidator{},
"pid": DefaultValidator{},
}), }),
); err != nil { ); err != nil {
logger.Info().Msg(err.Error())
return nil return nil
} }
@@ -67,11 +81,10 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
ix.InitNative() ix.InitNative()
} else { } else {
ix.initNodeHandler() ix.initNodeHandler()
}
// Register with configured natives so this indexer appears in their cache // Register with configured natives so this indexer appears in their cache
if nativeAddrs := conf.GetConfig().NativeIndexerAddresses; nativeAddrs != "" { if nativeAddrs := conf.GetConfig().NativeIndexerAddresses; nativeAddrs != "" {
StartNativeRegistration(ix.Host, nativeAddrs) common.StartNativeRegistration(ix.Host, nativeAddrs)
}
} }
return ix return ix
} }
@@ -79,6 +92,9 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
func (ix *IndexerService) Close() { func (ix *IndexerService) Close() {
ix.DHT.Close() ix.DHT.Close()
ix.PS.UnregisterTopicValidator(common.TopicPubSubSearch) ix.PS.UnregisterTopicValidator(common.TopicPubSubSearch)
if ix.nameIndex != nil {
ix.PS.UnregisterTopicValidator(TopicNameIndex)
}
for _, s := range ix.StreamRecords { for _, s := range ix.StreamRecords {
for _, ss := range s { for _, ss := range s {
ss.HeartbeatStream.Stream.Close() ss.HeartbeatStream.Stream.Close()

View File

@@ -6,6 +6,16 @@ import (
"time" "time"
) )
type DefaultValidator struct{}
func (v DefaultValidator) Validate(key string, value []byte) error {
return nil
}
func (v DefaultValidator) Select(key string, values [][]byte) (int, error) {
return 0, nil
}
type PeerRecordValidator struct{} type PeerRecordValidator struct{}
func (v PeerRecordValidator) Validate(key string, value []byte) error { func (v PeerRecordValidator) Validate(key string, value []byte) error {
@@ -26,14 +36,7 @@ func (v PeerRecordValidator) Validate(key string, value []byte) error {
} }
// Signature verification // Signature verification
rec2 := PeerRecord{ if _, err := rec.Verify(); err != nil {
Name: rec.Name,
DID: rec.DID,
PubKey: rec.PubKey,
PeerID: rec.PeerID,
}
if _, err := rec2.Verify(); err != nil {
return errors.New("invalid signature") return errors.New("invalid signature")
} }

View File

@@ -96,6 +96,7 @@ func ListenNATS(n *Node) {
}, },
tools.PROPALGATION_EVENT: func(resp tools.NATSResponse) { tools.PROPALGATION_EVENT: func(resp tools.NATSResponse) {
fmt.Println("PROPALGATION")
if resp.FromApp == config.GetAppName() { if resp.FromApp == config.GetAppName() {
return return
} }
@@ -106,10 +107,10 @@ func ListenNATS(n *Node) {
dtt := tools.DataType(propalgation.DataType) dtt := tools.DataType(propalgation.DataType)
dt = &dtt dt = &dtt
} }
fmt.Println("PROPALGATION ACT", propalgation.Action, propalgation.Action == tools.PB_CREATE, err)
if err == nil { if err == nil {
switch propalgation.Action { switch propalgation.Action {
case tools.PB_ADMIRALTY_CONFIG: case tools.PB_ADMIRALTY_CONFIG, tools.PB_MINIO_CONFIG:
case tools.PB_MINIO_CONFIG:
var m configPayload var m configPayload
var proto protocol.ID = stream.ProtocolAdmiraltyConfigResource var proto protocol.ID = stream.ProtocolAdmiraltyConfigResource
if propalgation.Action == tools.PB_MINIO_CONFIG { if propalgation.Action == tools.PB_MINIO_CONFIG {
@@ -122,20 +123,17 @@ func ListenNATS(n *Node) {
p.PeerID, proto, resp.Payload) p.PeerID, proto, resp.Payload)
} }
} }
case tools.PB_CREATE: case tools.PB_CREATE, tools.PB_UPDATE, tools.PB_DELETE:
case tools.PB_UPDATE: fmt.Println(propalgation.Action, dt, resp.User, propalgation.Payload)
case tools.PB_DELETE: fmt.Println(n.StreamService.ToPartnerPublishEvent(
n.StreamService.ToPartnerPublishEvent(
context.Background(), context.Background(),
propalgation.Action, propalgation.Action,
dt, resp.User, dt, resp.User,
propalgation.Payload, propalgation.Payload,
) ))
case tools.PB_CONSIDERS: case tools.PB_CONSIDERS:
switch resp.Datatype { switch resp.Datatype {
case tools.BOOKING: case tools.BOOKING, tools.PURCHASE_RESOURCE, tools.WORKFLOW_EXECUTION:
case tools.PURCHASE_RESOURCE:
case tools.WORKFLOW_EXECUTION:
var m executionConsidersPayload var m executionConsidersPayload
if err := json.Unmarshal(resp.Payload, &m); err == nil { if err := json.Unmarshal(resp.Payload, &m); err == nil {
for _, p := range m.PeerIDs { for _, p := range m.PeerIDs {

View File

@@ -2,10 +2,10 @@ package node
import ( import (
"context" "context"
"crypto/sha256"
"encoding/json" "encoding/json"
"errors" "errors"
"fmt" "fmt"
"maps"
"oc-discovery/conf" "oc-discovery/conf"
"oc-discovery/daemons/node/common" "oc-discovery/daemons/node/common"
"oc-discovery/daemons/node/indexer" "oc-discovery/daemons/node/indexer"
@@ -15,6 +15,7 @@ import (
"time" "time"
oclib "cloud.o-forge.io/core/oc-lib" oclib "cloud.o-forge.io/core/oc-lib"
"cloud.o-forge.io/core/oc-lib/dbs"
"cloud.o-forge.io/core/oc-lib/models/peer" "cloud.o-forge.io/core/oc-lib/models/peer"
"cloud.o-forge.io/core/oc-lib/tools" "cloud.o-forge.io/core/oc-lib/tools"
"github.com/google/uuid" "github.com/google/uuid"
@@ -33,6 +34,7 @@ type Node struct {
StreamService *stream.StreamService StreamService *stream.StreamService
PeerID pp.ID PeerID pp.ID
isIndexer bool isIndexer bool
peerRecord *indexer.PeerRecord
Mu sync.RWMutex Mu sync.RWMutex
} }
@@ -69,6 +71,9 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
isIndexer: isIndexer, isIndexer: isIndexer,
LongLivedStreamRecordedService: common.NewStreamRecordedService[interface{}](h, 1000), LongLivedStreamRecordedService: common.NewStreamRecordedService[interface{}](h, 1000),
} }
// Register the bandwidth probe handler so any peer measuring this node's
// throughput can open a dedicated probe stream and read the echo.
h.SetStreamHandler(common.ProtocolBandwidthProbe, common.HandleBandwidthProbe)
var ps *pubsubs.PubSub var ps *pubsubs.PubSub
if isNode { if isNode {
logger.Info().Msg("generate opencloud node...") logger.Info().Msg("generate opencloud node...")
@@ -77,8 +82,30 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
panic(err) // can't run your node without a propalgation pubsub, of state of node. panic(err) // can't run your node without a propalgation pubsub, of state of node.
} }
node.PS = ps node.PS = ps
// buildRecord returns a fresh signed PeerRecord as JSON, embedded in each
// heartbeat so the receiving indexer can republish it to the DHT directly.
// peerRecord is nil until claimInfo runs, so the first ~20s heartbeats carry
// no record — that's fine, claimInfo publishes once synchronously at startup.
buildRecord := func() json.RawMessage {
if node.peerRecord == nil {
return nil
}
priv, err := tools.LoadKeyFromFilePrivate()
if err != nil {
return nil
}
fresh := *node.peerRecord
fresh.PeerRecordPayload.ExpiryDate = time.Now().UTC().Add(2 * time.Minute)
payload, _ := json.Marshal(fresh.PeerRecordPayload)
fresh.Signature, err = priv.Sign(payload)
if err != nil {
return nil
}
b, _ := json.Marshal(fresh)
return json.RawMessage(b)
}
logger.Info().Msg("connect to indexers...") logger.Info().Msg("connect to indexers...")
common.ConnectToIndexers(node.Host, 0, 5, node.PeerID) // TODO : make var to change how many indexers are allowed. common.ConnectToIndexers(node.Host, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, node.PeerID, buildRecord)
logger.Info().Msg("claims my node...") logger.Info().Msg("claims my node...")
if _, err := node.claimInfo(conf.GetConfig().Name, conf.GetConfig().Hostname); err != nil { if _, err := node.claimInfo(conf.GetConfig().Name, conf.GetConfig().Hostname); err != nil {
panic(err) panic(err)
@@ -100,14 +127,14 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
} }
} }
node.SubscribeToSearch(node.PS, &f) node.SubscribeToSearch(node.PS, &f)
logger.Info().Msg("connect to NATS")
go ListenNATS(node)
logger.Info().Msg("Node is actually running.")
} }
if isIndexer { if isIndexer {
logger.Info().Msg("generate opencloud indexer...") logger.Info().Msg("generate opencloud indexer...")
node.IndexerService = indexer.NewIndexerService(node.Host, ps, 5, isNativeIndexer) node.IndexerService = indexer.NewIndexerService(node.Host, ps, 500, isNativeIndexer)
} }
logger.Info().Msg("connect to NATS")
ListenNATS(node)
logger.Info().Msg("Node is actually running.")
return node, nil return node, nil
} }
@@ -127,24 +154,29 @@ func (d *Node) publishPeerRecord(
if err != nil { if err != nil {
return err return err
} }
common.StreamMuIndexes.RLock()
indexerSnapshot := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
for _, ad := range common.StaticIndexers { for _, ad := range common.StaticIndexers {
indexerSnapshot = append(indexerSnapshot, ad)
}
common.StreamMuIndexes.RUnlock()
for _, ad := range indexerSnapshot {
var err error var err error
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolPublish, "", common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{}, if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolPublish, "", common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{},
&common.StreamMuIndexes); err != nil { &common.StreamMuIndexes); err != nil {
continue continue
} }
stream := common.StreamIndexers[common.ProtocolPublish][ad.ID] stream := common.StreamIndexers[common.ProtocolPublish][ad.ID]
base := indexer.PeerRecord{ base := indexer.PeerRecordPayload{
Name: rec.Name, Name: rec.Name,
DID: rec.DID, DID: rec.DID,
PubKey: rec.PubKey, PubKey: rec.PubKey,
ExpiryDate: time.Now().UTC().Add(2 * time.Minute), ExpiryDate: time.Now().UTC().Add(2 * time.Minute),
} }
payload, _ := json.Marshal(base) payload, _ := json.Marshal(base)
hash := sha256.Sum256(payload) rec.PeerRecordPayload = base
rec.Signature, err = priv.Sign(payload)
rec.ExpiryDate = base.ExpiryDate
rec.Signature, err = priv.Sign(hash[:])
if err := json.NewEncoder(stream.Stream).Encode(&rec); err != nil { // then publish on stream if err := json.NewEncoder(stream.Stream).Encode(&rec); err != nil { // then publish on stream
return err return err
} }
@@ -156,38 +188,50 @@ func (d *Node) GetPeerRecord(
ctx context.Context, ctx context.Context,
pidOrdid string, pidOrdid string,
) ([]*peer.Peer, error) { ) ([]*peer.Peer, error) {
did := pidOrdid // if known pidOrdid is did
pid := pidOrdid // if not known pidOrdid is pid
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
if data := access.Search(nil, did, true); len(data.Data) > 0 {
did = data.Data[0].GetID()
pid = data.Data[0].(*peer.Peer).PeerID
}
var err error var err error
var info map[string]indexer.PeerRecord var info map[string]indexer.PeerRecord
common.StreamMuIndexes.RLock()
indexerSnapshot2 := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
for _, ad := range common.StaticIndexers { for _, ad := range common.StaticIndexers {
indexerSnapshot2 = append(indexerSnapshot2, ad)
}
common.StreamMuIndexes.RUnlock()
// Build the GetValue request: if pidOrdid is neither a UUID DID nor a libp2p
// PeerID, treat it as a human-readable name and let the indexer resolve it.
getReq := indexer.GetValue{Key: pidOrdid}
isNameSearch := false
if pidR, pidErr := pp.Decode(pidOrdid); pidErr == nil {
getReq.PeerID = pidR
} else if _, uuidErr := uuid.Parse(pidOrdid); uuidErr != nil {
// Not a UUID DID → treat pidOrdid as a name substring search.
getReq.Name = pidOrdid
getReq.Key = ""
isNameSearch = true
}
for _, ad := range indexerSnapshot2 {
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolGet, "", if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolGet, "",
common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{}, &common.StreamMuIndexes); err != nil { common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{}, &common.StreamMuIndexes); err != nil {
continue continue
} }
pidR, err := pp.Decode(pid) stream := common.StreamIndexers[common.ProtocolGet][ad.ID]
if err != nil { if err := json.NewEncoder(stream.Stream).Encode(getReq); err != nil {
continue continue
} }
stream := common.StreamIndexers[common.ProtocolGet][ad.ID]
if err := json.NewEncoder(stream.Stream).Encode(indexer.GetValue{
Key: did,
PeerID: pidR,
}); err != nil {
return nil, err
}
for {
var resp indexer.GetResponse var resp indexer.GetResponse
if err := json.NewDecoder(stream.Stream).Decode(&resp); err != nil { if err := json.NewDecoder(stream.Stream).Decode(&resp); err != nil {
return nil, err continue
} }
if resp.Found { if resp.Found {
if info == nil {
info = resp.Records info = resp.Records
} else {
// Aggregate results from all indexers for name searches.
maps.Copy(info, resp.Records)
}
// For exact lookups (PeerID / DID) stop at the first hit.
if !isNameSearch {
break break
} }
} }
@@ -196,7 +240,7 @@ func (d *Node) GetPeerRecord(
for _, pr := range info { for _, pr := range info {
if pk, err := pr.Verify(); err != nil { if pk, err := pr.Verify(); err != nil {
return nil, err return nil, err
} else if ok, p, err := pr.ExtractPeer(d.PeerID.String(), did, pk); err != nil { } else if ok, p, err := pr.ExtractPeer(d.PeerID.String(), pr.PeerID, pk); err != nil {
return nil, err return nil, err
} else { } else {
if ok { if ok {
@@ -218,7 +262,11 @@ func (d *Node) claimInfo(
} }
did := uuid.New().String() did := uuid.New().String()
peers := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil).Search(nil, fmt.Sprintf("%v", peer.SELF), false) peers := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil).Search(&dbs.Filters{
And: map[string][]dbs.Filter{ // search by name if no filters are provided
"peer_id": {{Operator: dbs.EQUAL.String(), Value: d.Host.ID().String()}},
},
}, "", false)
if len(peers.Data) > 0 { if len(peers.Data) > 0 {
did = peers.Data[0].GetID() // if already existing set up did as made did = peers.Data[0].GetID() // if already existing set up did as made
} }
@@ -238,39 +286,38 @@ func (d *Node) claimInfo(
now := time.Now().UTC() now := time.Now().UTC()
expiry := now.Add(150 * time.Second) expiry := now.Add(150 * time.Second)
rec := &indexer.PeerRecord{ pRec := indexer.PeerRecordPayload{
Name: name, Name: name,
DID: did, // REAL PEER ID DID: did, // REAL PEER ID
PubKey: pubBytes, PubKey: pubBytes,
ExpiryDate: expiry,
} }
rec.PeerID = d.Host.ID().String()
d.PeerID = d.Host.ID() d.PeerID = d.Host.ID()
payload, _ := json.Marshal(pRec)
payload, _ := json.Marshal(rec) rec := &indexer.PeerRecord{
hash := sha256.Sum256(payload) PeerRecordPayload: pRec,
}
rec.Signature, err = priv.Sign(hash[:]) rec.Signature, err = priv.Sign(payload)
if err != nil { if err != nil {
return nil, err return nil, err
} }
rec.PeerID = d.Host.ID().String()
rec.APIUrl = endPoint rec.APIUrl = endPoint
rec.StreamAddress = "/ip4/" + conf.GetConfig().Hostname + "/tcp/" + fmt.Sprintf("%v", conf.GetConfig().NodeEndpointPort) + "/p2p/" + rec.PeerID rec.StreamAddress = "/ip4/" + conf.GetConfig().Hostname + "/tcp/" + fmt.Sprintf("%v", conf.GetConfig().NodeEndpointPort) + "/p2p/" + rec.PeerID
rec.NATSAddress = oclib.GetConfig().NATSUrl rec.NATSAddress = oclib.GetConfig().NATSUrl
rec.WalletAddress = "my-wallet" rec.WalletAddress = "my-wallet"
rec.ExpiryDate = expiry
if err := d.publishPeerRecord(rec); err != nil { if err := d.publishPeerRecord(rec); err != nil {
return nil, err return nil, err
} }
/*if pk, err := rec.Verify(); err != nil { d.peerRecord = rec
fmt.Println("Verify") if _, err := rec.Verify(); err != nil {
return nil, err return nil, err
} else {*/ } else {
_, p, err := rec.ExtractPeer(did, did, pub) _, p, err := rec.ExtractPeer(did, did, pub)
return p, err return p, err
//} }
} }
/* /*

View File

@@ -4,47 +4,56 @@ import (
"context" "context"
"encoding/json" "encoding/json"
"errors" "errors"
"oc-discovery/daemons/node/stream"
"oc-discovery/models" "oc-discovery/models"
oclib "cloud.o-forge.io/core/oc-lib" "cloud.o-forge.io/core/oc-lib/dbs"
"cloud.o-forge.io/core/oc-lib/models/peer"
"cloud.o-forge.io/core/oc-lib/tools" "cloud.o-forge.io/core/oc-lib/tools"
) )
func (ps *PubSubService) SearchPublishEvent( func (ps *PubSubService) SearchPublishEvent(
ctx context.Context, dt *tools.DataType, typ string, user string, search string) error { ctx context.Context, dt *tools.DataType, typ string, user string, search string) error {
b, err := json.Marshal(map[string]string{"search": search})
if err != nil {
return err
}
switch typ { switch typ {
case "known": // define Search Strategy case "known": // define Search Strategy
return ps.StreamService.SearchKnownPublishEvent(dt, user, search) //if partners focus only them*/ return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"": {{Operator: dbs.NOT.String(), Value: dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.BLACKLIST}},
},
}}},
},
}, b, stream.ProtocolSearchResource) //if partners focus only them*/
case "partner": // define Search Strategy case "partner": // define Search Strategy
return ps.StreamService.SearchPartnersPublishEvent(dt, user, search) //if partners focus only them*/ return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
},
}, b, stream.ProtocolSearchResource)
case "all": // Gossip PubSub case "all": // Gossip PubSub
b, err := json.Marshal(map[string]string{"search": search}) b, err := json.Marshal(map[string]string{"search": search})
if err != nil { if err != nil {
return err return err
} }
return ps.searchPublishEvent(ctx, dt, user, b) return ps.publishEvent(ctx, dt, tools.PB_SEARCH, user, b)
default: default:
return errors.New("no type of research found") return errors.New("no type of research found")
} }
} }
func (ps *PubSubService) searchPublishEvent(
ctx context.Context, dt *tools.DataType, user string, payload []byte) error {
return ps.publishEvent(ctx, dt, tools.PB_SEARCH, user, payload)
}
func (ps *PubSubService) publishEvent( func (ps *PubSubService) publishEvent(
ctx context.Context, dt *tools.DataType, action tools.PubSubAction, user string, payload []byte, ctx context.Context, dt *tools.DataType, action tools.PubSubAction, user string, payload []byte,
) error { ) error {
from, err := oclib.GenerateNodeID()
if err != nil {
return err
}
priv, err := tools.LoadKeyFromFilePrivate() priv, err := tools.LoadKeyFromFilePrivate()
if err != nil { if err != nil {
return err return err
} }
msg, _ := json.Marshal(models.NewEvent(action.String(), from, dt, user, payload, priv)) msg, _ := json.Marshal(models.NewEvent(action.String(), ps.Host.ID().String(), dt, user, payload, priv))
topic, err := ps.PS.Join(action.String()) topic, err := ps.PS.Join(action.String())
if err != nil { if err != nil {
return err return err

View File

@@ -5,6 +5,7 @@ import (
"crypto/subtle" "crypto/subtle"
"encoding/json" "encoding/json"
"errors" "errors"
"fmt"
"oc-discovery/daemons/node/common" "oc-discovery/daemons/node/common"
oclib "cloud.o-forge.io/core/oc-lib" oclib "cloud.o-forge.io/core/oc-lib"
@@ -19,6 +20,7 @@ type Verify struct {
} }
func (ps *StreamService) handleEvent(protocol string, evt *common.Event) error { func (ps *StreamService) handleEvent(protocol string, evt *common.Event) error {
fmt.Println("handleEvent")
ps.handleEventFromPartner(evt, protocol) ps.handleEventFromPartner(evt, protocol)
/*if protocol == ProtocolVerifyResource { /*if protocol == ProtocolVerifyResource {
if evt.DataType == -1 { if evt.DataType == -1 {
@@ -148,14 +150,6 @@ func (abs *StreamService) pass(event *common.Event, action tools.PubSubAction) e
} }
func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol string) error { func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol string) error {
resource, err := resources.ToResource(int(evt.DataType), evt.Payload)
if err != nil {
return err
}
b, err := json.Marshal(resource)
if err != nil {
return err
}
switch protocol { switch protocol {
case ProtocolSearchResource: case ProtocolSearchResource:
if evt.DataType < 0 { if evt.DataType < 0 {
@@ -169,20 +163,20 @@ func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol stri
ps.SendResponse(p[0], evt) ps.SendResponse(p[0], evt)
} }
} }
case ProtocolCreateResource: case ProtocolCreateResource, ProtocolUpdateResource:
case ProtocolUpdateResource: fmt.Println("RECEIVED Protocol.Update")
go tools.NewNATSCaller().SetNATSPub(tools.CREATE_RESOURCE, tools.NATSResponse{ go tools.NewNATSCaller().SetNATSPub(tools.CREATE_RESOURCE, tools.NATSResponse{
FromApp: "oc-discovery", FromApp: "oc-discovery",
Datatype: tools.DataType(evt.DataType), Datatype: tools.DataType(evt.DataType),
Method: int(tools.CREATE_RESOURCE), Method: int(tools.CREATE_RESOURCE),
Payload: b, Payload: evt.Payload,
}) })
case ProtocolDeleteResource: case ProtocolDeleteResource:
go tools.NewNATSCaller().SetNATSPub(tools.REMOVE_RESOURCE, tools.NATSResponse{ go tools.NewNATSCaller().SetNATSPub(tools.REMOVE_RESOURCE, tools.NATSResponse{
FromApp: "oc-discovery", FromApp: "oc-discovery",
Datatype: tools.DataType(evt.DataType), Datatype: tools.DataType(evt.DataType),
Method: int(tools.REMOVE_RESOURCE), Method: int(tools.REMOVE_RESOURCE),
Payload: b, Payload: evt.Payload,
}) })
default: default:
return errors.New("no action authorized available : " + protocol) return errors.New("no action authorized available : " + protocol)
@@ -213,9 +207,9 @@ func (abs *StreamService) SendResponse(p *peer.Peer, event *common.Event) error
if j, err := json.Marshal(ss); err == nil { if j, err := json.Marshal(ss); err == nil {
if event.DataType != -1 { if event.DataType != -1 {
ndt := tools.DataType(dt.EnumIndex()) ndt := tools.DataType(dt.EnumIndex())
abs.PublishResources(&ndt, event.User, peerID, j) abs.PublishCommon(&ndt, event.User, peerID, ProtocolSearchResource, j)
} else { } else {
abs.PublishResources(nil, event.User, peerID, j) abs.PublishCommon(nil, event.User, peerID, ProtocolSearchResource, j)
} }
} }
} }

View File

@@ -15,81 +15,45 @@ import (
"github.com/libp2p/go-libp2p/core/protocol" "github.com/libp2p/go-libp2p/core/protocol"
) )
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) { func (ps *StreamService) PublishesCommon(dt *tools.DataType, user string, filter *dbs.Filters, resource []byte, protos ...protocol.ID) error {
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil) access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
p := access.LoadOne(toPeerID) p := access.Search(filter, "", false)
if p.Err != "" { for _, pes := range p.Data {
return nil, errors.New(p.Err) for _, proto := range protos {
} else { if _, err := ps.PublishCommon(dt, user, pes.(*peer.Peer).PeerID, proto, resource); err != nil {
ad, err := pp.AddrInfoFromString(p.Data.(*peer.Peer).StreamAddress) return err
}
}
}
return nil
}
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) {
fmt.Println("PublishCommon")
if toPeerID == ps.Key.String() {
return nil, errors.New("Can't send to ourself !")
}
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
p := access.Search(&dbs.Filters{
And: map[string][]dbs.Filter{ // search by name if no filters are provided
"peer_id": {{Operator: dbs.EQUAL.String(), Value: toPeerID}},
},
}, toPeerID, false)
var pe *peer.Peer
if len(p.Data) > 0 && p.Data[0].(*peer.Peer).Relation != peer.BLACKLIST {
pe = p.Data[0].(*peer.Peer)
} else if pps, err := ps.Node.GetPeerRecord(context.Background(), toPeerID); err == nil && len(pps) > 0 {
pe = pps[0]
}
if pe != nil {
ad, err := pp.AddrInfoFromString(p.Data[0].(*peer.Peer).StreamAddress)
if err != nil { if err != nil {
return nil, err return nil, err
} }
return ps.write(toPeerID, ad, dt, user, resource, proto) return ps.write(toPeerID, ad, dt, user, resource, proto)
} }
} return nil, errors.New("peer unvalid " + toPeerID)
func (ps *StreamService) PublishResources(dt *tools.DataType, user string, toPeerID string, resource []byte) error {
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
p := access.LoadOne(toPeerID)
if p.Err != "" {
return errors.New(p.Err)
} else {
ad, err := pp.AddrInfoFromString(p.Data.(*peer.Peer).StreamAddress)
if err != nil {
return err
}
ps.write(toPeerID, ad, dt, user, resource, ProtocolSearchResource)
}
return nil
}
func (ps *StreamService) SearchKnownPublishEvent(dt *tools.DataType, user string, search string) error {
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
peers := access.Search(&dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"": {{Operator: dbs.NOT.String(), Value: dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.BLACKLIST}},
},
}}},
},
}, search, false)
if peers.Err != "" {
return errors.New(peers.Err)
} else {
b, err := json.Marshal(map[string]string{"search": search})
if err != nil {
return err
}
for _, p := range peers.Data {
ad, err := pp.AddrInfoFromString(p.(*peer.Peer).StreamAddress)
if err != nil {
continue
}
ps.write(p.GetID(), ad, dt, user, b, ProtocolSearchResource)
}
}
return nil
}
func (ps *StreamService) SearchPartnersPublishEvent(dt *tools.DataType, user string, search string) error {
if peers, err := ps.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex())); err != nil {
return err
} else {
b, err := json.Marshal(map[string]string{"search": search})
if err != nil {
return err
}
for _, p := range peers {
ad, err := pp.AddrInfoFromString(p.StreamAddress)
if err != nil {
continue
}
ps.write(p.GetID(), ad, dt, user, b, ProtocolSearchResource)
}
}
return nil
} }
func (ps *StreamService) ToPartnerPublishEvent( func (ps *StreamService) ToPartnerPublishEvent(
@@ -103,12 +67,23 @@ func (ps *StreamService) ToPartnerPublishEvent(
if err != nil { if err != nil {
return err return err
} }
ps.Mu.Lock()
defer ps.Mu.Unlock() if pe, err := oclib.GetMySelf(); err != nil {
return err
} else if pe.GetID() == p.GetID() {
return fmt.Errorf("can't send to ourself")
} else {
pe.Relation = p.Relation
pe.Verify = false
if b2, err := json.Marshal(pe); err == nil {
if _, err := ps.PublishCommon(dt, user, p.PeerID, ProtocolUpdateResource, b2); err != nil {
return err
}
if p.Relation == peer.PARTNER { if p.Relation == peer.PARTNER {
if ps.Streams[ProtocolHeartbeatPartner] == nil { if ps.Streams[ProtocolHeartbeatPartner] == nil {
ps.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{} ps.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{}
} }
fmt.Println("SHOULD CONNECT")
ps.ConnectToPartner(p.StreamAddress) ps.ConnectToPartner(p.StreamAddress)
} else if ps.Streams[ProtocolHeartbeatPartner] != nil && ps.Streams[ProtocolHeartbeatPartner][pid] != nil { } else if ps.Streams[ProtocolHeartbeatPartner] != nil && ps.Streams[ProtocolHeartbeatPartner][pid] != nil {
for _, pids := range ps.Streams { for _, pids := range ps.Streams {
@@ -117,21 +92,19 @@ func (ps *StreamService) ToPartnerPublishEvent(
} }
} }
} }
}
}
return nil return nil
} }
if peers, err := ps.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex())); err != nil { ks := []protocol.ID{}
return err for k := range protocolsPartners {
} else { ks = append(ks, k)
for _, p := range peers {
for protocol := range protocolsPartners {
ad, err := pp.AddrInfoFromString(p.StreamAddress)
if err != nil {
continue
}
ps.write(p.GetID(), ad, dt, user, payload, protocol)
}
}
} }
ps.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
And: map[string][]dbs.Filter{
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
},
}, payload, ks...)
return nil return nil
} }
@@ -158,6 +131,7 @@ func (s *StreamService) write(
} }
stream := s.Streams[proto][peerID.ID] stream := s.Streams[proto][peerID.ID]
evt := common.NewEvent(string(proto), peerID.ID.String(), dt, user, payload) evt := common.NewEvent(string(proto), peerID.ID.String(), dt, user, payload)
fmt.Println("SEND EVENT ", evt.From, evt.DataType, evt.Timestamp)
if err := json.NewEncoder(stream.Stream).Encode(evt); err != nil { if err := json.NewEncoder(stream.Stream).Encode(evt); err != nil {
stream.Stream.Close() stream.Stream.Close()
logger.Err(err) logger.Err(err)

View File

@@ -116,7 +116,7 @@ func (s *StreamService) HandlePartnerHeartbeat(stream network.Stream) {
streamsAnonym[k] = v streamsAnonym[k] = v
} }
s.Mu.Unlock() s.Mu.Unlock()
pid, hb, err := common.CheckHeartbeat(s.Host, stream, streamsAnonym, &s.Mu, s.maxNodesConn) pid, hb, err := common.CheckHeartbeat(s.Host, stream, json.NewDecoder(stream), streamsAnonym, &s.Mu, s.maxNodesConn)
if err != nil { if err != nil {
return return
} }
@@ -132,10 +132,12 @@ func (s *StreamService) HandlePartnerHeartbeat(stream network.Stream) {
s.ConnectToPartner(val) s.ConnectToPartner(val)
} }
} }
go s.StartGC(30 * time.Second) // GC is already running via InitStream — starting a new ticker goroutine on
// every heartbeat would leak an unbounded number of goroutines.
} }
func (s *StreamService) connectToPartners() error { func (s *StreamService) connectToPartners() error {
logger := oclib.GetLogger()
for proto, info := range protocolsPartners { for proto, info := range protocolsPartners {
f := func(ss network.Stream) { f := func(ss network.Stream) {
if s.Streams[proto] == nil { if s.Streams[proto] == nil {
@@ -147,11 +149,12 @@ func (s *StreamService) connectToPartners() error {
} }
go s.readLoop(s.Streams[proto][ss.Conn().RemotePeer()], ss.Conn().RemotePeer(), proto, info) go s.readLoop(s.Streams[proto][ss.Conn().RemotePeer()], ss.Conn().RemotePeer(), proto, info)
} }
fmt.Println("SetStreamHandler", proto) logger.Info().Msg("SetStreamHandler " + string(proto))
s.Host.SetStreamHandler(proto, f) s.Host.SetStreamHandler(proto, f)
} }
peers, err := s.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex())) peers, err := s.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex()))
if err != nil { if err != nil {
logger.Err(err)
return err return err
} }
for _, p := range peers { for _, p := range peers {
@@ -161,19 +164,19 @@ func (s *StreamService) connectToPartners() error {
} }
func (s *StreamService) ConnectToPartner(address string) { func (s *StreamService) ConnectToPartner(address string) {
logger := oclib.GetLogger()
if ad, err := pp.AddrInfoFromString(address); err == nil { if ad, err := pp.AddrInfoFromString(address); err == nil {
logger.Info().Msg("Connect to Partner " + ProtocolHeartbeatPartner + " " + address)
common.SendHeartbeat(context.Background(), ProtocolHeartbeatPartner, conf.GetConfig().Name, common.SendHeartbeat(context.Background(), ProtocolHeartbeatPartner, conf.GetConfig().Name,
s.Host, s.Streams, map[string]*pp.AddrInfo{address: ad}, 20*time.Second) s.Host, s.Streams, map[string]*pp.AddrInfo{address: ad}, nil, 20*time.Second)
} }
} }
func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) { func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
/* TODO FOR TEST ONLY A VARS THAT DEFINE ADDRESS... deserialize */
ps := []*peer.Peer{} ps := []*peer.Peer{}
if conf.GetConfig().PeerIDS != "" { if conf.GetConfig().PeerIDS != "" {
for _, peerID := range strings.Split(conf.GetConfig().PeerIDS, ",") { for _, peerID := range strings.Split(conf.GetConfig().PeerIDS, ",") {
ppID := strings.Split(peerID, "/") ppID := strings.Split(peerID, "/")
fmt.Println(ppID, peerID)
ps = append(ps, &peer.Peer{ ps = append(ps, &peer.Peer{
AbstractObject: utils.AbstractObject{ AbstractObject: utils.AbstractObject{
UUID: uuid.New().String(), UUID: uuid.New().String(),
@@ -185,7 +188,6 @@ func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
}) })
} }
} }
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil) access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
peers := access.Search(nil, search, false) peers := access.Search(nil, search, false)
for _, p := range peers.Data { for _, p := range peers.Data {
@@ -252,8 +254,9 @@ func (ps *StreamService) readLoop(s *common.Stream, id pp.ID, proto protocol.ID,
} }
var evt common.Event var evt common.Event
if err := json.NewDecoder(s.Stream).Decode(&evt); err != nil { if err := json.NewDecoder(s.Stream).Decode(&evt); err != nil {
s.Stream.Close() // Any decode error (EOF, reset, malformed JSON) terminates the loop;
continue // continuing on a dead/closed stream creates an infinite spin.
return
} }
ps.handleEvent(evt.Type, &evt) ps.handleEvent(evt.Type, &evt)
if protocolInfo.WaitResponse && !protocolInfo.PersistantStream { if protocolInfo.WaitResponse && !protocolInfo.PersistantStream {

View File

@@ -1,23 +1,33 @@
#!/bin/bash #!/bin/bash
IMAGE_BASE_NAME="oc-discovery" IMAGE_BASE_NAME="oc-discovery"
DOCKERFILE_PATH="." DOCKERFILE_PATH="."
for i in {0..3}; do docker network create \
--subnet=172.40.0.0/24 \
discovery
for i in $(seq ${1:-0} ${2:-3}); do
NUM=$((i + 1)) NUM=$((i + 1))
PORT=$((4000 + $NUM)) PORT=$((4000 + $NUM))
IMAGE_NAME="${IMAGE_BASE_NAME}:${NUM}" IMAGE_NAME="${IMAGE_BASE_NAME}:${NUM}"
echo "▶ Building image ${IMAGE_NAME} with CONF_NUM=${NUM}" echo "▶ Building image ${IMAGE_NAME} with CONF_NUM=${NUM}"
docker build \ docker build \
--build-arg CONF_NUM=${NUM} \ --build-arg CONF_NUM=${NUM} \
-t ${IMAGE_NAME} \ -t "${IMAGE_BASE_NAME}_${NUM}" \
${DOCKERFILE_PATH} ${DOCKERFILE_PATH}
docker kill "${IMAGE_BASE_NAME}_${NUM}" | true
docker rm "${IMAGE_BASE_NAME}_${NUM}" | true
echo "▶ Running container ${IMAGE_NAME} on port ${PORT}:${PORT}" echo "▶ Running container ${IMAGE_NAME} on port ${PORT}:${PORT}"
docker run -d \ docker run -d \
--network="${3:-oc}" \
-p ${PORT}:${PORT} \ -p ${PORT}:${PORT} \
--name "${IMAGE_BASE_NAME}_${NUM}" \ --name "${IMAGE_BASE_NAME}_${NUM}" \
${IMAGE_NAME} "${IMAGE_BASE_NAME}_${NUM}"
docker network connect --ip "172.40.0.${NUM}" discovery "${IMAGE_BASE_NAME}_${NUM}"
done done

10
docker_discovery10.json Normal file
View File

@@ -0,0 +1,10 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "node",
"NODE_ENDPOINT_PORT": 4010,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
"MIN_INDEXER": 2,
"PEER_IDS": "/ip4/172.40.0.9/tcp/4009/p2p/12D3KooWGnQfKwX9E4umCPE8dUKZuig4vw5BndDowRLEbGmcZyta"
}

View File

@@ -4,5 +4,5 @@
"NATS_URL": "nats://nats:4222", "NATS_URL": "nats://nats:4222",
"NODE_MODE": "indexer", "NODE_MODE": "indexer",
"NODE_ENDPOINT_PORT": 4002, "NODE_ENDPOINT_PORT": 4002,
"INDEXER_ADDRESSES": "/ip4/172.19.0.2/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu" "INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
} }

View File

@@ -4,5 +4,5 @@
"NATS_URL": "nats://nats:4222", "NATS_URL": "nats://nats:4222",
"NODE_MODE": "node", "NODE_MODE": "node",
"NODE_ENDPOINT_PORT": 4003, "NODE_ENDPOINT_PORT": 4003,
"INDEXER_ADDRESSES": "/ip4/172.19.0.3/tcp/4002/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u" "INDEXER_ADDRESSES": "/ip4/172.40.0.2/tcp/4002/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
} }

View File

@@ -4,6 +4,6 @@
"NATS_URL": "nats://nats:4222", "NATS_URL": "nats://nats:4222",
"NODE_MODE": "node", "NODE_MODE": "node",
"NODE_ENDPOINT_PORT": 4004, "NODE_ENDPOINT_PORT": 4004,
"INDEXER_ADDRESSES": "/ip4/172.19.0.2/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu", "INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
"PEER_IDS": "/ip4/172.19.0.4/tcp/4003/p2p/12D3KooWBh9kZrekBAE5G33q4jCLNRAzygem3gP1mMdK8mhoCTaw" "PEER_IDS": "/ip4/172.40.0.3/tcp/4003/p2p/12D3KooWBh9kZrekBAE5G33q4jCLNRAzygem3gP1mMdK8mhoCTaw"
} }

7
docker_discovery5.json Normal file
View File

@@ -0,0 +1,7 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "native-indexer",
"NODE_ENDPOINT_PORT": 4005
}

8
docker_discovery6.json Normal file
View File

@@ -0,0 +1,8 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "native-indexer",
"NODE_ENDPOINT_PORT": 4006,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
}

8
docker_discovery7.json Normal file
View File

@@ -0,0 +1,8 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "indexer",
"NODE_ENDPOINT_PORT": 4007,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
}

8
docker_discovery8.json Normal file
View File

@@ -0,0 +1,8 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "indexer",
"NODE_ENDPOINT_PORT": 4008,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
}

8
docker_discovery9.json Normal file
View File

@@ -0,0 +1,8 @@
{
"MONGO_URL":"mongodb://mongo:27017/",
"MONGO_DATABASE":"DC_myDC",
"NATS_URL": "nats://nats:4222",
"NODE_MODE": "node",
"NODE_ENDPOINT_PORT": 4009,
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u,/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
}

2
go.mod
View File

@@ -3,7 +3,7 @@ module oc-discovery
go 1.25.0 go 1.25.0
require ( require (
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7 cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5
github.com/libp2p/go-libp2p v0.47.0 github.com/libp2p/go-libp2p v0.47.0
github.com/libp2p/go-libp2p-record v0.3.1 github.com/libp2p/go-libp2p-record v0.3.1
github.com/multiformats/go-multiaddr v0.16.1 github.com/multiformats/go-multiaddr v0.16.1

8
go.sum
View File

@@ -1,5 +1,13 @@
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7 h1:p9uJjMY+QkE4neA+xRmIRtAm9us94EKZqgajDdLOd0Y= cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7 h1:p9uJjMY+QkE4neA+xRmIRtAm9us94EKZqgajDdLOd0Y=
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA= cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c h1:FTUu9tdEfib6J+fuc7e5wYTe++EIlB70bVNpOeFjnyU=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0 h1:lvrRF4ToIMl/5k1q4AiPEy6ycjwRtOaDhWnQ/LrW1ZA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31 h1:hvkvJibS9NmImw73j79Ov5VpIYs4WbP4SYGlK/XO82Q=
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5 h1:h+Fkyj6cfwAirc0QGCBEkZSSrgcyThXswg7ytOLm948=
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU= github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/Masterminds/semver/v3 v3.4.0 h1:Zog+i5UMtVoCU8oKka5P7i9q9HgrJeGzI9SA1Xbatp0= github.com/Masterminds/semver/v3 v3.4.0 h1:Zog+i5UMtVoCU8oKka5P7i9q9HgrJeGzI9SA1Xbatp0=
github.com/Masterminds/semver/v3 v3.4.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM= github.com/Masterminds/semver/v3 v3.4.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM=

View File

@@ -28,11 +28,15 @@ func main() {
conf.GetConfig().PSKPath = o.GetStringDefault("PSK_PATH", "./psk/psk.key") conf.GetConfig().PSKPath = o.GetStringDefault("PSK_PATH", "./psk/psk.key")
conf.GetConfig().NodeEndpointPort = o.GetInt64Default("NODE_ENDPOINT_PORT", 4001) conf.GetConfig().NodeEndpointPort = o.GetInt64Default("NODE_ENDPOINT_PORT", 4001)
conf.GetConfig().IndexerAddresses = o.GetStringDefault("INDEXER_ADDRESSES", "") conf.GetConfig().IndexerAddresses = o.GetStringDefault("INDEXER_ADDRESSES", "")
conf.GetConfig().NativeIndexerAddresses = o.GetStringDefault("NATIVE_INDEXER_ADDRESSES", "")
conf.GetConfig().PeerIDS = o.GetStringDefault("PEER_IDS", "") conf.GetConfig().PeerIDS = o.GetStringDefault("PEER_IDS", "")
conf.GetConfig().NodeMode = o.GetStringDefault("NODE_MODE", "node") conf.GetConfig().NodeMode = o.GetStringDefault("NODE_MODE", "node")
conf.GetConfig().MinIndexer = o.GetIntDefault("MIN_INDEXER", 1)
conf.GetConfig().MaxIndexer = o.GetIntDefault("MAX_INDEXER", 5)
ctx, stop := signal.NotifyContext( ctx, stop := signal.NotifyContext(
context.Background(), context.Background(),
os.Interrupt, os.Interrupt,
@@ -47,7 +51,7 @@ func main() {
if n, err := node.InitNode(isNode, isIndexer, isNativeIndexer); err != nil { if n, err := node.InitNode(isNode, isIndexer, isNativeIndexer); err != nil {
panic(err) panic(err)
} else { } else {
<-ctx.Done() // 👈 the only blocking point <-ctx.Done() // the only blocking point
log.Println("shutting down") log.Println("shutting down")
n.Close() n.Close()
} }

3
pem/private10.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIPc7D3Mgb1U2Ipyb/85hA4Ew7dC8zHDEuQYSjqzzRgLK
-----END PRIVATE KEY-----

3
pem/private5.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIK2oBaOtGNchE09MBRtPd5oEOUcVUQG2ndym5wKExj7R
-----END PRIVATE KEY-----

3
pem/private6.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIE58GDazCyF1jp796ivSmHiCepbkC8TpzliIaQ7eGEpu
-----END PRIVATE KEY-----

3
pem/private7.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIAeX4O7ldwehRSnPkbzuE6csyo63vjvqAcNNujENOKUC
-----END PRIVATE KEY-----

3
pem/private8.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIEkgqINXDLnxIJZs2LEK9O4vdsqk43dwbULGUE25AWuR
-----END PRIVATE KEY-----

3
pem/private9.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIBcflxGlZYyUVJoExC94rHZbIyKMwZ+Oh7EDkb0qUlxd
-----END PRIVATE KEY-----

3
pem/public10.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAEomuEQGmGsYVw35C6DB5tfY8LI8jm359ceAxRX8eQ0o=
-----END PUBLIC KEY-----

3
pem/public5.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAZ2nLJBL8a5opfa8nFeVj0SZToW8pl4+zgcSUkeZFRO4=
-----END PUBLIC KEY-----

3
pem/public6.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAIQVeSGwsjPjyepPTnzzYqVxIxviSEjZXU7C7zuNTui4=
-----END PUBLIC KEY-----

3
pem/public7.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAG95Ettl3jTi41HM8le1A9WDmOEq0ANEqpLF7zTZrfXA=
-----END PUBLIC KEY-----

3
pem/public8.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEA/ymOIb0sJ0qCWrf3mKz7ACCvsMXLog/EK533JfNXZTM=
-----END PUBLIC KEY-----

3
pem/public9.pem Normal file
View File

@@ -0,0 +1,3 @@
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAZ4F3KqOp/5QrPdZGqqX6PYYEGd2snX4Q3AUt9XAG3v8=
-----END PUBLIC KEY-----