demo test + Peer
This commit is contained in:
495
ARCHITECTURE.md
Normal file
495
ARCHITECTURE.md
Normal file
@@ -0,0 +1,495 @@
|
|||||||
|
# oc-discovery — Architecture et analyse technique
|
||||||
|
|
||||||
|
> **Convention de lecture**
|
||||||
|
> Les points marqués ✅ ont été corrigés dans le code. Les points marqués ⚠️ restent ouverts.
|
||||||
|
|
||||||
|
## Table des matières
|
||||||
|
|
||||||
|
1. [Vue d'ensemble](#1-vue-densemble)
|
||||||
|
2. [Hiérarchie des rôles](#2-hiérarchie-des-rôles)
|
||||||
|
3. [Mécanismes principaux](#3-mécanismes-principaux)
|
||||||
|
- 3.1 Heartbeat long-lived (node → indexer)
|
||||||
|
- 3.2 Scoring de confiance
|
||||||
|
- 3.3 Enregistrement auprès des natifs (indexer → native)
|
||||||
|
- 3.4 Pool d'indexeurs : fetch + consensus
|
||||||
|
- 3.5 Self-delegation et offload loop
|
||||||
|
- 3.6 Résilience du mesh natif
|
||||||
|
- 3.7 DHT partagée
|
||||||
|
- 3.8 PubSub gossip (indexer registry)
|
||||||
|
- 3.9 Streams applicatifs (node ↔ node)
|
||||||
|
4. [Tableau récapitulatif](#4-tableau-récapitulatif)
|
||||||
|
5. [Risques et limites globaux](#5-risques-et-limites-globaux)
|
||||||
|
6. [Pistes d'amélioration](#6-pistes-damélioration)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Vue d'ensemble
|
||||||
|
|
||||||
|
`oc-discovery` est un service de découverte P2P pour le réseau OpenCloud. Il repose sur
|
||||||
|
**libp2p** (transport TCP + PSK réseau privé) et une **DHT Kademlia** (préfixe `oc`)
|
||||||
|
pour indexer les pairs. L'architecture est intentionnellement hiérarchique : des _natifs_
|
||||||
|
stables servent de hubs autoritaires auxquels des _indexeurs_ s'enregistrent, et des _nœuds_
|
||||||
|
ordinaires découvrent des indexeurs via ces natifs.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────┐ heartbeat ┌──────────────────┐
|
||||||
|
│ Node │ ───────────────────► │ Indexer │
|
||||||
|
│ (libp2p) │ ◄─────────────────── │ (DHT server) │
|
||||||
|
└──────────────┘ stream applicatif └────────┬─────────┘
|
||||||
|
│ subscribe / heartbeat
|
||||||
|
▼
|
||||||
|
┌──────────────────┐
|
||||||
|
│ Native Indexer │◄──► autres natifs
|
||||||
|
│ (hub autoritaire│ (mesh)
|
||||||
|
└──────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
Tous les participants partagent une **clé pré-partagée (PSK)** qui isole le réseau
|
||||||
|
des connexions libp2p externes non autorisées.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Hiérarchie des rôles
|
||||||
|
|
||||||
|
| Rôle | Binaire | Responsabilité |
|
||||||
|
|---|---|---|
|
||||||
|
| **Node** | `node_mode=node` | Se fait indexer, publie/consulte des records DHT |
|
||||||
|
| **Indexer** | `node_mode=indexer` | Reçoit les heartbeats, écrit en DHT, s'enregistre auprès des natifs |
|
||||||
|
| **Native Indexer** | `node_mode=native` | Hub : tient le registre des indexeurs vivants, évalue le consensus, sert de fallback |
|
||||||
|
|
||||||
|
Un même processus peut cumuler les rôles node+indexer ou indexer+native.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Mécanismes principaux
|
||||||
|
|
||||||
|
### 3.1 Heartbeat long-lived (node → indexer)
|
||||||
|
|
||||||
|
**Fonctionnement**
|
||||||
|
|
||||||
|
Un stream libp2p **persistant** (`/opencloud/heartbeat/1.0`) est ouvert depuis le nœud
|
||||||
|
vers chaque indexeur de son pool (`StaticIndexers`). Toutes les 20 secondes, le nœud
|
||||||
|
envoie un `Heartbeat` JSON sur ce stream. L'indexeur répond en enregistrant le peer dans
|
||||||
|
`StreamRecords[ProtocolHeartbeat]` avec une expiry de 2 min.
|
||||||
|
|
||||||
|
Si `sendHeartbeat` échoue (stream reset, EOF, timeout), le peer est retiré de
|
||||||
|
`StaticIndexers` et `replenishIndexersFromNative` est déclenché.
|
||||||
|
|
||||||
|
**Avantages**
|
||||||
|
- Détection rapide de déconnexion (erreur sur le prochain encode).
|
||||||
|
- Un seul stream par pair réduit la pression sur les connexions TCP.
|
||||||
|
- Le channel de nudge (`indexerHeartbeatNudge`) permet un reconnect immédiat sans
|
||||||
|
attendre le ticker de 20 s.
|
||||||
|
|
||||||
|
**Limites / risques**
|
||||||
|
- ⚠️ Un seul stream persistant : si la couche TCP reste ouverte mais "gelée" (middlebox,
|
||||||
|
NAT silencieux), l'erreur peut ne pas remonter avant plusieurs minutes.
|
||||||
|
- ⚠️ `StaticIndexers` est une map partagée globale : si deux goroutines appellent
|
||||||
|
`replenishIndexersFromNative` simultanément (cas de perte multiple), on peut avoir
|
||||||
|
des écritures concurrentes non protégées hors des sections critiques.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3.2 Scoring de confiance
|
||||||
|
|
||||||
|
**Fonctionnement**
|
||||||
|
|
||||||
|
Avant d'enregistrer un heartbeat dans `StreamRecords`, l'indexeur vérifie un **score
|
||||||
|
minimum** calculé par `CheckHeartbeat` :
|
||||||
|
|
||||||
|
```
|
||||||
|
Score = (0.4 × uptime_ratio + 0.4 × bpms + 0.2 × diversity) × 100
|
||||||
|
```
|
||||||
|
|
||||||
|
- `uptime_ratio` : durée de présence du peer / durée depuis le démarrage de l'indexeur.
|
||||||
|
- `bpms` : débit mesuré via un stream dédié (`/opencloud/probe/1.0`) normalisé par 50 Mbps.
|
||||||
|
- `diversity` : ratio d'IP /24 distincts parmi les indexeurs que le peer déclare.
|
||||||
|
|
||||||
|
Deux seuils sont appliqués selon l'état du peer :
|
||||||
|
- **Premier heartbeat** (peer absent de `StreamRecords`, uptime = 0) : seuil à **40**.
|
||||||
|
- **Heartbeats suivants** (uptime accumulé) : seuil à **75**.
|
||||||
|
|
||||||
|
**Avantages**
|
||||||
|
- Décourage les peers éphémères ou lents d'encombrer le registre.
|
||||||
|
- La diversité réseau réduit le risque de concentration sur un seul sous-réseau.
|
||||||
|
- Le stream de probe dédié évite de polluer le stream JSON heartbeat avec des données binaires.
|
||||||
|
- Le double seuil permet aux nouveaux peers d'être admis dès leur première connexion.
|
||||||
|
|
||||||
|
**Limites / risques**
|
||||||
|
- ✅ **Deadlock logique de démarrage corrigé** : avec uptime = 0 le score maximal était 60,
|
||||||
|
en-dessous du seuil de 75. Les nouveaux peers étaient silencieusement rejetés à jamais.
|
||||||
|
→ Seuil abaissé à **40** pour le premier heartbeat (`isFirstHeartbeat`), 75 ensuite.
|
||||||
|
- ⚠️ Les seuils (40 / 75) restent câblés en dur, sans possibilité de configuration.
|
||||||
|
- ⚠️ La mesure de bande passante envoie entre 512 et 2048 octets par heartbeat : à 20 s
|
||||||
|
d'intervalle et 500 nœuds max, cela représente ~50 KB/s de trafic probe en continu.
|
||||||
|
- ⚠️ `diversity` est calculé sur les adresses que le nœud *déclare* avoir — ce champ est
|
||||||
|
auto-rapporté et non vérifié, facilement falsifiable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3.3 Enregistrement auprès des natifs (indexer → native)
|
||||||
|
|
||||||
|
**Fonctionnement**
|
||||||
|
|
||||||
|
Chaque indexeur (non-natif) envoie périodiquement (toutes les 60 s) une
|
||||||
|
`IndexerRegistration` JSON sur un stream one-shot (`/opencloud/native/subscribe/1.0`)
|
||||||
|
vers chaque natif configuré. Le natif :
|
||||||
|
|
||||||
|
1. Stocke l'entrée en cache local avec un TTL de **90 s** (`IndexerTTL`).
|
||||||
|
2. Gossipe le `PeerID` sur le topic PubSub `oc-indexer-registry` aux autres natifs.
|
||||||
|
3. Persiste l'entrée en DHT de manière asynchrone (retry jusqu'à succès).
|
||||||
|
|
||||||
|
**Avantages**
|
||||||
|
- Stream jetable : pas de ressource longue durée côté natif pour les enregistrements.
|
||||||
|
- Le cache local est immédiatement disponible pour `handleNativeGetIndexers` sans
|
||||||
|
attendre la DHT.
|
||||||
|
- La dissémination PubSub permet à d'autres natifs de connaître l'indexeur sans
|
||||||
|
qu'il ait besoin de s'y enregistrer directement.
|
||||||
|
|
||||||
|
**Limites / risques**
|
||||||
|
- ✅ **TTL trop serré corrigé** : le TTL de 66 s n'était que 10 % au-dessus de l'intervalle
|
||||||
|
de 60 s — un léger retard réseau pouvait expirer un indexeur sain entre deux renewals.
|
||||||
|
→ `IndexerTTL` porté à **90 s** (+50 %).
|
||||||
|
- ⚠️ Si le `PutValue` DHT échoue définitivement (réseau partitionné), le natif possède
|
||||||
|
l'entrée mais les autres natifs qui n'ont pas reçu le message PubSub ne la connaissent
|
||||||
|
jamais — incohérence silencieuse.
|
||||||
|
- ⚠️ `RegisterWithNative` ignore les adresses en `127.0.0.1`, mais ne gère pas
|
||||||
|
les adresses privées (RFC1918) qui seraient non routables depuis d'autres hôtes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3.4 Pool d'indexeurs : fetch + consensus
|
||||||
|
|
||||||
|
**Fonctionnement**
|
||||||
|
|
||||||
|
Lors de `ConnectToNatives` (démarrage ou replenish), le nœud/indexeur :
|
||||||
|
|
||||||
|
1. **Fetch** : envoie `GetIndexersRequest` au premier natif répondant
|
||||||
|
(`/opencloud/native/indexers/1.0`), reçoit une liste de candidats.
|
||||||
|
2. **Consensus (round 1)** : interroge **tous** les natifs configurés en parallèle
|
||||||
|
(`/opencloud/native/consensus/1.0`, timeout 3 s, collecte sur 4 s).
|
||||||
|
Un indexeur est confirmé si **strictement plus de 50 %** des natifs répondants
|
||||||
|
le considèrent vivant.
|
||||||
|
3. **Consensus (round 2)** : si le pool est insuffisant, les suggestions des natifs
|
||||||
|
(indexeurs qu'ils connaissent mais qui n'étaient pas dans les candidats initiaux)
|
||||||
|
sont soumises à un second round.
|
||||||
|
|
||||||
|
**Avantages**
|
||||||
|
- La règle de majorité absolue empêche un natif compromis ou désynchronisé d'injecter
|
||||||
|
des indexeurs fantômes.
|
||||||
|
- Le double round permet de compléter le pool avec des alternatives connues des natifs
|
||||||
|
sans sacrifier la vérification.
|
||||||
|
- Si le fetch retourne un **fallback** (natif comme indexeur), le consensus est skippé —
|
||||||
|
cohérent car il n'y a qu'une seule source.
|
||||||
|
|
||||||
|
**Limites / risques**
|
||||||
|
- ⚠️ Avec **un seul natif** configuré (très courant en dev/test), le consensus est trivial
|
||||||
|
(100 % d'un seul vote) — la règle de majorité ne protège rien dans ce cas.
|
||||||
|
- ⚠️ `fetchIndexersFromNative` s'arrête au **premier natif répondant** (séquentiellement) :
|
||||||
|
si ce natif a un cache périmé ou partiel, le nœud obtient un pool sous-optimal sans
|
||||||
|
consulter les autres.
|
||||||
|
- ⚠️ Le timeout de collecte global (4 s) est fixe : sur un réseau lent ou géographiquement
|
||||||
|
distribué, des natifs valides peuvent être éliminés faute de réponse à temps.
|
||||||
|
- ⚠️ `replaceStaticIndexers` **ajoute** sans jamais retirer d'anciens indexeurs expirés :
|
||||||
|
le pool peut accumuler des entrées mortes que seul le heartbeat purge ensuite.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3.5 Self-delegation et offload loop
|
||||||
|
|
||||||
|
**Fonctionnement**
|
||||||
|
|
||||||
|
Si un natif ne dispose d'aucun indexeur vivant lors d'un `handleNativeGetIndexers`,
|
||||||
|
il se désigne lui-même comme indexeur temporaire (`selfDelegate`) : il retourne sa propre
|
||||||
|
adresse multiaddr et ajoute le demandeur dans `responsiblePeers`, dans la limite de
|
||||||
|
`maxFallbackPeers` (50). Au-delà, la délégation est refusée et une réponse vide est
|
||||||
|
retournée pour que le nœud tente un autre natif.
|
||||||
|
|
||||||
|
Toutes les 30 s, `runOffloadLoop` vérifie si des indexeurs réels sont de nouveau
|
||||||
|
disponibles. Si oui, pour chaque peer responsable :
|
||||||
|
- **Stream présent** : `Reset()` du stream heartbeat — le peer reçoit une erreur,
|
||||||
|
déclenche `replenishIndexersFromNative` et migre vers de vrais indexeurs.
|
||||||
|
- **Stream absent** (peer jamais admis par le scoring) : `ClosePeer()` sur la connexion
|
||||||
|
réseau — le peer reconnecte et re-demande ses indexeurs au natif.
|
||||||
|
|
||||||
|
**Avantages**
|
||||||
|
- Continuité de service : un nœud n'est jamais bloqué en l'absence temporaire d'indexeurs.
|
||||||
|
- La migration est automatique et transparente pour le nœud.
|
||||||
|
- `Reset()` (vs `Close()`) interrompt les deux sens du stream, garantissant que le peer
|
||||||
|
reçoit bien une erreur.
|
||||||
|
- La limite de 50 empêche le natif de se retrouver surchargé lors de pénuries prolongées.
|
||||||
|
|
||||||
|
**Limites / risques**
|
||||||
|
- ✅ **Offload sans stream corrigé** : si le heartbeat n'avait jamais été enregistré dans
|
||||||
|
`StreamRecords` (score < seuil — cas amplifié par le bug de scoring), l'offload
|
||||||
|
échouait silencieusement et le peer restait dans `responsiblePeers` indéfiniment.
|
||||||
|
→ Branche `else` : `ClosePeer()` + suppression de `responsiblePeers`.
|
||||||
|
- ✅ **`responsiblePeers` illimité corrigé** : le natif acceptait un nombre arbitraire
|
||||||
|
de peers en self-delegation, devenant lui-même un indexeur surchargé.
|
||||||
|
→ `selfDelegate` vérifie `len(responsiblePeers) >= maxFallbackPeers` et retourne
|
||||||
|
`false` si saturé.
|
||||||
|
- ⚠️ La délégation reste non coordonnée entre natifs : un natif surchargé refuse (retourne
|
||||||
|
vide) mais ne redirige pas explicitement vers un natif voisin qui aurait de la capacité.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3.6 Résilience du mesh natif
|
||||||
|
|
||||||
|
**Fonctionnement**
|
||||||
|
|
||||||
|
Quand le heartbeat vers un natif échoue, `replenishNativesFromPeers` tente de trouver
|
||||||
|
un remplaçant dans cet ordre :
|
||||||
|
|
||||||
|
1. `fetchNativeFromNatives` : demande à chaque natif vivant (`/opencloud/native/peers/1.0`)
|
||||||
|
une adresse de natif inconnue.
|
||||||
|
2. `fetchNativeFromIndexers` : demande à chaque indexeur connu
|
||||||
|
(`/opencloud/indexer/natives/1.0`) ses natifs configurés.
|
||||||
|
3. Si aucun remplaçant et `remaining ≤ 1` : `retryLostNative` relance un ticker de 30 s
|
||||||
|
qui retente la connexion directe au natif perdu.
|
||||||
|
|
||||||
|
`EnsureNativePeers` maintient des heartbeats de natif à natif via `ProtocolHeartbeat`,
|
||||||
|
avec une **unique goroutine** couvrant toute la map `StaticNatives`.
|
||||||
|
|
||||||
|
**Avantages**
|
||||||
|
- Le gossip multi-hop via indexeurs permet de retrouver un natif même si aucun pair
|
||||||
|
direct ne le connaît.
|
||||||
|
- `retryLostNative` gère le cas d'un seul natif (déploiement minimal).
|
||||||
|
- La reconnexion automatique (`retryLostNative`) déclenche `replenishIndexersIfNeeded`
|
||||||
|
pour restaurer aussi le pool d'indexeurs.
|
||||||
|
|
||||||
|
**Limites / risques**
|
||||||
|
- ✅ **Goroutines heartbeat multiples corrigé** : `EnsureNativePeers` démarrait une
|
||||||
|
goroutine `SendHeartbeat` par adresse native (N natifs → N goroutines → N² heartbeats
|
||||||
|
par tick). → Utilisation de `nativeMeshHeartbeatOnce` : une seule goroutine itère sur
|
||||||
|
`StaticNatives`.
|
||||||
|
- ⚠️ `retryLostNative` tourne indéfiniment sans condition d'arrêt liée à la vie du processus
|
||||||
|
(pas de `context.Context`). Si le binaire est gracefully shutdown, cette goroutine
|
||||||
|
peut bloquer.
|
||||||
|
- ⚠️ La découverte transitoire (natif → indexeur → natif) est à sens unique : un indexeur
|
||||||
|
ne connaît que les natifs de sa propre config, pas les nouveaux natifs qui auraient
|
||||||
|
rejoint après son démarrage.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3.7 DHT partagée
|
||||||
|
|
||||||
|
**Fonctionnement**
|
||||||
|
|
||||||
|
Tous les indexeurs et natifs participent à une DHT Kademlia (préfixe `oc`, mode
|
||||||
|
`ModeServer`). Deux namespaces sont utilisés :
|
||||||
|
|
||||||
|
- `/node/<DID>` → `PeerRecord` JSON signé (publié par les indexeurs sur heartbeat de nœud).
|
||||||
|
- `/indexer/<PeerID>` → `liveIndexerEntry` JSON avec TTL (publié par les natifs).
|
||||||
|
|
||||||
|
Chaque natif lance `refreshIndexersFromDHT` (toutes les 30 s) qui ré-hydrate son cache
|
||||||
|
local depuis la DHT pour les PeerIDs connus (`knownPeerIDs`) dont l'entrée locale a expiré.
|
||||||
|
|
||||||
|
**Avantages**
|
||||||
|
- Persistance décentralisée : un record survit à la perte d'un seul natif ou indexeur.
|
||||||
|
- Validation des entrées : `PeerRecordValidator` et `IndexerRecordValidator` rejettent
|
||||||
|
les records malformés ou expirés au moment du `PutValue`.
|
||||||
|
- L'index secondaire `/name/<name>` permet la résolution par nom humain.
|
||||||
|
|
||||||
|
**Limites / risques**
|
||||||
|
- ⚠️ La DHT Kademlia en réseau privé (PSK) est fonctionnelle mais les nœuds bootstrap
|
||||||
|
ne sont pas configurés explicitement : la découverte dépend de connexions déjà établies,
|
||||||
|
ce qui peut ralentir la convergence au démarrage.
|
||||||
|
- ⚠️ `PutValue` est réessayé en boucle infinie si `"failed to find any peer in table"` —
|
||||||
|
une panne de réseau prolongée génère des goroutines bloquées.
|
||||||
|
- ⚠️ Si la PSK est compromise, un attaquant peut écrire dans la DHT ; les `liveIndexerEntry`
|
||||||
|
d'indexeurs ne sont pas signées, contrairement aux `PeerRecord`.
|
||||||
|
- ⚠️ `refreshIndexersFromDHT` prune `knownPeerIDs` si la DHT n'a aucune entrée fraîche,
|
||||||
|
mais ne prune pas `liveIndexers` — une entrée expirée reste en mémoire jusqu'au GC
|
||||||
|
ou au prochain refresh.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3.8 PubSub gossip (indexer registry)
|
||||||
|
|
||||||
|
**Fonctionnement**
|
||||||
|
|
||||||
|
Quand un indexeur s'enregistre auprès d'un natif, ce dernier publie l'adresse sur le
|
||||||
|
topic GossipSub `oc-indexer-registry`. Les autres natifs abonnés mettent à jour leur
|
||||||
|
`knownPeerIDs` sans attendre la DHT.
|
||||||
|
|
||||||
|
Le `TopicValidator` rejette tout message dont le contenu n'est pas un multiaddr
|
||||||
|
parseable valide avant qu'il n'atteigne la boucle de traitement.
|
||||||
|
|
||||||
|
**Avantages**
|
||||||
|
- Dissémination quasi-instantanée entre natifs connectés.
|
||||||
|
- Complément utile à la DHT pour les registrations récentes qui n'ont pas encore
|
||||||
|
été persistées.
|
||||||
|
- Le filtre syntaxique bloque les messages malformés avant propagation dans le mesh.
|
||||||
|
|
||||||
|
**Limites / risques**
|
||||||
|
- ✅ **`TopicValidator` sans validation corrigé** : le validateur acceptait systématiquement
|
||||||
|
tous les messages (`return true`), permettant à un natif compromis de gossiper
|
||||||
|
n'importe quelle donnée.
|
||||||
|
→ Le validateur vérifie désormais que le message est un multiaddr parseable
|
||||||
|
(`pp.AddrInfoFromString`).
|
||||||
|
- ⚠️ La validation reste syntaxique uniquement : l'origine du message (l'émetteur
|
||||||
|
est-il un natif légitime ?) n'est pas vérifiée.
|
||||||
|
- ⚠️ Si le natif redémarre, il perd son abonnement et manque les messages publiés
|
||||||
|
pendant son absence. La re-hydratation depuis la DHT compense, mais avec un délai
|
||||||
|
pouvant aller jusqu'à 30 s.
|
||||||
|
- ⚠️ Le gossip ne porte que le `Addr` de l'indexeur, pas sa TTL ni sa signature.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3.9 Streams applicatifs (node ↔ node)
|
||||||
|
|
||||||
|
**Fonctionnement**
|
||||||
|
|
||||||
|
`StreamService` gère les streams entre nœuds partenaires (relations `PARTNER` stockées
|
||||||
|
en base) via des protocols dédiés (`/opencloud/resource/*`). Un heartbeat partenaire
|
||||||
|
(`ProtocolHeartbeatPartner`) maintient les connexions actives. Les events sont routés
|
||||||
|
via `handleEvent` et le système NATS en parallèle.
|
||||||
|
|
||||||
|
**Avantages**
|
||||||
|
- TTL par protocol (`PersistantStream`, `WaitResponse`) adapte le comportement au
|
||||||
|
type d'échange (longue durée pour le planner, courte pour les CRUDs).
|
||||||
|
- La GC (`gc()` toutes les 8 s, démarrée une seule fois dans `InitStream`) libère
|
||||||
|
rapidement les streams expirés.
|
||||||
|
|
||||||
|
**Limites / risques**
|
||||||
|
- ✅ **Fuite de goroutines GC corrigée** : `HandlePartnerHeartbeat` appelait
|
||||||
|
`go s.StartGC(30s)` à chaque heartbeat reçu (~20 s), créant un nouveau ticker
|
||||||
|
goroutine infini à chaque appel.
|
||||||
|
→ Appel supprimé ; la GC lancée par `InitStream` est suffisante.
|
||||||
|
- ✅ **Boucle infinie sur EOF corrigée** : `readLoop` effectuait `s.Stream.Close();
|
||||||
|
continue` après une erreur de décodage, re-tentant indéfiniment de lire un stream
|
||||||
|
fermé.
|
||||||
|
→ Remplacé par `return` ; les defers (`Close`, `delete`) nettoient correctement.
|
||||||
|
- ⚠️ La récupération de partenaires depuis `conf.PeerIDS` est marquée `TO REMOVE` :
|
||||||
|
présence de code provisoire en production.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Tableau récapitulatif
|
||||||
|
|
||||||
|
| Mécanisme | Protocole | Avantage principal | État du risque |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Heartbeat node→indexer | `/opencloud/heartbeat/1.0` | Détection rapide de perte | ⚠️ Stream TCP gelé non détecté |
|
||||||
|
| Scoring de confiance | (inline dans heartbeat) | Filtre les pairs instables | ✅ Deadlock corrigé (seuil 40/75) |
|
||||||
|
| Enregistrement natif | `/opencloud/native/subscribe/1.0` | TTL ample, cache immédiat | ✅ TTL porté à 90 s |
|
||||||
|
| Fetch pool d'indexeurs | `/opencloud/native/indexers/1.0` | Prend le 1er natif répondant | ⚠️ Natif au cache périmé possible |
|
||||||
|
| Consensus | `/opencloud/native/consensus/1.0` | Majorité absolue | ⚠️ Trivial avec 1 seul natif |
|
||||||
|
| Self-delegation + offload | (in-memory) | Disponibilité sans indexeur | ✅ Limite 50 peers + ClosePeer |
|
||||||
|
| Mesh natif | `/opencloud/native/peers/1.0` | Gossip multi-hop | ✅ Goroutines dédupliquées |
|
||||||
|
| DHT | `/oc/kad/1.0.0` | Persistance décentralisée | ⚠️ Retry infini, pas de bootstrap |
|
||||||
|
| PubSub registry | `oc-indexer-registry` | Dissémination rapide | ✅ Validation multiaddr |
|
||||||
|
| Streams applicatifs | `/opencloud/resource/*` | TTL par protocol | ✅ Fuite GC + EOF corrigés |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Risques et limites globaux
|
||||||
|
|
||||||
|
### Sécurité
|
||||||
|
|
||||||
|
- ⚠️ **Adresses auto-rapportées non vérifiées** : le champ `IndexersBinded` dans le heartbeat
|
||||||
|
est auto-déclaré par le nœud et sert à calculer la diversité. Un pair malveillant peut
|
||||||
|
gonfler son score en déclarant de fausses adresses.
|
||||||
|
- ⚠️ **PSK comme seule barrière d'entrée** : si la PSK est compromise (elle est statique et
|
||||||
|
fichier-based), tout l'isolement réseau saute. Il n'y a pas de rotation de clé ni
|
||||||
|
d'authentification supplémentaire par pair.
|
||||||
|
- ⚠️ **DHT sans ACL sur les entrées indexeur** : la signature des `PeerRecord` est vérifiée
|
||||||
|
à la lecture, mais les `liveIndexerEntry` ne sont pas signées. La validation PubSub
|
||||||
|
bloque les multiaddrs invalides mais pas les adresses d'indexeurs légitimes usurpées.
|
||||||
|
|
||||||
|
### Disponibilité
|
||||||
|
|
||||||
|
- ⚠️ **Single point of failure natif** : avec un seul natif, la perte de celui-ci stoppe
|
||||||
|
toute attribution d'indexeurs. `retryLostNative` pallie, mais sans indexeurs, les nœuds
|
||||||
|
ne peuvent pas publier.
|
||||||
|
- ⚠️ **Bootstrap DHT** : sans nœuds bootstrap explicites, la DHT met du temps à converger
|
||||||
|
si les connexions initiales sont peu nombreuses.
|
||||||
|
|
||||||
|
### Cohérence
|
||||||
|
|
||||||
|
- ⚠️ **`replaceStaticIndexers` n'efface jamais** : d'anciens indexeurs morts restent dans
|
||||||
|
`StaticIndexers` jusqu'à ce que le heartbeat échoue. Un nœud peut avoir un pool
|
||||||
|
surévalué contenant des entrées inatteignables.
|
||||||
|
- ⚠️ **`TimeWatcher` global** : défini une seule fois au démarrage de `ConnectToIndexers`.
|
||||||
|
Si l'indexeur tourne depuis longtemps, les nouveaux nœuds auront un `uptime_ratio`
|
||||||
|
durablement faible. Le seuil abaissé à 40 pour le premier heartbeat atténue l'impact
|
||||||
|
initial, mais les heartbeats suivants devront accumuler un uptime suffisant.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Pistes d'amélioration
|
||||||
|
|
||||||
|
Les pistes déjà implémentées sont marquées ✅. Les pistes ouvertes restent à traiter.
|
||||||
|
|
||||||
|
### ✅ Score : double seuil pour les nouveaux peers
|
||||||
|
~~Remplacer le seuil binaire~~ — **Implémenté** : seuil à 40 pour le premier heartbeat
|
||||||
|
(peer absent de `StreamRecords`), 75 pour les suivants. Un peer peut désormais être admis
|
||||||
|
dès sa première connexion sans bloquer sur l'uptime nul.
|
||||||
|
_Fichier : `common/common_stream.go`, `CheckHeartbeat`_
|
||||||
|
|
||||||
|
### ✅ TTL indexeur aligné avec l'intervalle de renouvellement
|
||||||
|
~~TTL de 66 s trop proche de 60 s~~ — **Implémenté** : `IndexerTTL` passé à **90 s**.
|
||||||
|
_Fichier : `indexer/native.go`_
|
||||||
|
|
||||||
|
### ✅ Limite de la self-delegation
|
||||||
|
~~`responsiblePeers` illimité~~ — **Implémenté** : `selfDelegate` retourne `false` quand
|
||||||
|
`len(responsiblePeers) >= maxFallbackPeers` (50). Le site d'appel retourne une réponse
|
||||||
|
vide et logue un warning.
|
||||||
|
_Fichier : `indexer/native.go`_
|
||||||
|
|
||||||
|
### ✅ Validation PubSub des adresses gossipées
|
||||||
|
~~`TopicValidator` accepte tout~~ — **Implémenté** : le validateur vérifie que le message
|
||||||
|
est un multiaddr parseable via `pp.AddrInfoFromString`.
|
||||||
|
_Fichier : `indexer/native.go`, `subscribeIndexerRegistry`_
|
||||||
|
|
||||||
|
### ✅ Goroutines heartbeat dédupliquées dans `EnsureNativePeers`
|
||||||
|
~~Une goroutine par adresse native~~ — **Implémenté** : `nativeMeshHeartbeatOnce`
|
||||||
|
garantit qu'une seule goroutine `SendHeartbeat` couvre toute la map `StaticNatives`.
|
||||||
|
_Fichier : `common/native_stream.go`_
|
||||||
|
|
||||||
|
### ✅ Fuite de goroutines GC dans `HandlePartnerHeartbeat`
|
||||||
|
~~`go s.StartGC(30s)` à chaque heartbeat~~ — **Implémenté** : appel supprimé ; la GC
|
||||||
|
de `InitStream` est suffisante.
|
||||||
|
_Fichier : `stream/service.go`_
|
||||||
|
|
||||||
|
### ✅ Boucle infinie sur EOF dans `readLoop`
|
||||||
|
~~`continue` après `Stream.Close()`~~ — **Implémenté** : remplacé par `return` pour
|
||||||
|
laisser les defers nettoyer proprement.
|
||||||
|
_Fichier : `stream/service.go`_
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### ⚠️ Fetch pool : interroger tous les natifs en parallèle
|
||||||
|
|
||||||
|
`fetchIndexersFromNative` s'arrête au premier natif répondant. Interroger tous les natifs
|
||||||
|
en parallèle et fusionner les listes (similairement à `clientSideConsensus`) éviterait
|
||||||
|
qu'un natif au cache périmé fournisse un pool sous-optimal.
|
||||||
|
|
||||||
|
### ⚠️ Consensus avec quorum configurable
|
||||||
|
|
||||||
|
Le seuil de confirmation (`count*2 > total`) est câblé en dur. Le rendre configurable
|
||||||
|
(ex. `consensus_quorum: 0.67`) permettrait de durcir la règle sur des déploiements
|
||||||
|
à 3+ natifs sans modifier le code.
|
||||||
|
|
||||||
|
### ⚠️ Désenregistrement explicite
|
||||||
|
|
||||||
|
Ajouter un protocole `/opencloud/native/unsubscribe/1.0` : quand un indexeur s'arrête
|
||||||
|
proprement, il notifie les natifs pour invalider son TTL immédiatement plutôt qu'attendre
|
||||||
|
90 s.
|
||||||
|
|
||||||
|
### ⚠️ Bootstrap DHT explicite
|
||||||
|
|
||||||
|
Configurer les natifs comme nœuds bootstrap DHT via `dht.BootstrapPeers` pour accélérer
|
||||||
|
la convergence Kademlia au démarrage.
|
||||||
|
|
||||||
|
### ⚠️ Context propagé dans les goroutines longue durée
|
||||||
|
|
||||||
|
`retryLostNative`, `refreshIndexersFromDHT` et `runOffloadLoop` ne reçoivent aucun
|
||||||
|
`context.Context`. Les passer depuis `InitNative` permettrait un arrêt propre lors du
|
||||||
|
shutdown du processus.
|
||||||
|
|
||||||
|
### ⚠️ Redirection explicite lors du refus de self-delegation
|
||||||
|
|
||||||
|
Quand un natif refuse la self-delegation (pool saturé), retourner vide force le nœud à
|
||||||
|
réessayer sans lui indiquer vers qui se tourner. Une liste de natifs alternatifs dans la
|
||||||
|
réponse (`AlternativeNatives []string`) permettrait au nœud de trouver directement un
|
||||||
|
natif moins chargé.
|
||||||
@@ -15,6 +15,9 @@ type Config struct {
|
|||||||
PeerIDS string // TO REMOVE
|
PeerIDS string // TO REMOVE
|
||||||
|
|
||||||
NodeMode string
|
NodeMode string
|
||||||
|
|
||||||
|
MinIndexer int
|
||||||
|
MaxIndexer int
|
||||||
}
|
}
|
||||||
|
|
||||||
var instance *Config
|
var instance *Config
|
||||||
|
|||||||
@@ -1,7 +1,6 @@
|
|||||||
package common
|
package common
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"bytes"
|
|
||||||
"context"
|
"context"
|
||||||
cr "crypto/rand"
|
cr "crypto/rand"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
@@ -28,6 +27,12 @@ type LongLivedStreamRecordedService[T interface{}] struct {
|
|||||||
StreamRecords map[protocol.ID]map[pp.ID]*StreamRecord[T]
|
StreamRecords map[protocol.ID]map[pp.ID]*StreamRecord[T]
|
||||||
StreamMU sync.RWMutex
|
StreamMU sync.RWMutex
|
||||||
maxNodesConn int
|
maxNodesConn int
|
||||||
|
// AfterHeartbeat is an optional hook called after each successful heartbeat update.
|
||||||
|
// The indexer sets it to republish the embedded signed record to the DHT.
|
||||||
|
AfterHeartbeat func(pid pp.ID)
|
||||||
|
// AfterDelete is called after gc() evicts an expired peer, outside the lock.
|
||||||
|
// name and did may be empty if the HeartbeatStream had no metadata.
|
||||||
|
AfterDelete func(pid pp.ID, name string, did string)
|
||||||
}
|
}
|
||||||
|
|
||||||
func NewStreamRecordedService[T interface{}](h host.Host, maxNodesConn int) *LongLivedStreamRecordedService[T] {
|
func NewStreamRecordedService[T interface{}](h host.Host, maxNodesConn int) *LongLivedStreamRecordedService[T] {
|
||||||
@@ -54,16 +59,29 @@ func (ix *LongLivedStreamRecordedService[T]) StartGC(interval time.Duration) {
|
|||||||
|
|
||||||
func (ix *LongLivedStreamRecordedService[T]) gc() {
|
func (ix *LongLivedStreamRecordedService[T]) gc() {
|
||||||
ix.StreamMU.Lock()
|
ix.StreamMU.Lock()
|
||||||
defer ix.StreamMU.Unlock()
|
|
||||||
now := time.Now().UTC()
|
now := time.Now().UTC()
|
||||||
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
||||||
ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{}
|
ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{}
|
||||||
|
ix.StreamMU.Unlock()
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
streams := ix.StreamRecords[ProtocolHeartbeat]
|
streams := ix.StreamRecords[ProtocolHeartbeat]
|
||||||
|
fmt.Println(StaticNatives, StaticIndexers, streams)
|
||||||
|
|
||||||
|
type gcEntry struct {
|
||||||
|
pid pp.ID
|
||||||
|
name string
|
||||||
|
did string
|
||||||
|
}
|
||||||
|
var evicted []gcEntry
|
||||||
for pid, rec := range streams {
|
for pid, rec := range streams {
|
||||||
if now.After(rec.HeartbeatStream.Expiry) || now.Sub(rec.HeartbeatStream.UptimeTracker.LastSeen) > 2*rec.HeartbeatStream.Expiry.Sub(now) {
|
if now.After(rec.HeartbeatStream.Expiry) || now.Sub(rec.HeartbeatStream.UptimeTracker.LastSeen) > 2*rec.HeartbeatStream.Expiry.Sub(now) {
|
||||||
|
name, did := "", ""
|
||||||
|
if rec.HeartbeatStream != nil {
|
||||||
|
name = rec.HeartbeatStream.Name
|
||||||
|
did = rec.HeartbeatStream.DID
|
||||||
|
}
|
||||||
|
evicted = append(evicted, gcEntry{pid, name, did})
|
||||||
for _, sstreams := range ix.StreamRecords {
|
for _, sstreams := range ix.StreamRecords {
|
||||||
if sstreams[pid] != nil {
|
if sstreams[pid] != nil {
|
||||||
delete(sstreams, pid)
|
delete(sstreams, pid)
|
||||||
@@ -71,6 +89,13 @@ func (ix *LongLivedStreamRecordedService[T]) gc() {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
ix.StreamMU.Unlock()
|
||||||
|
|
||||||
|
if ix.AfterDelete != nil {
|
||||||
|
for _, e := range evicted {
|
||||||
|
ix.AfterDelete(e.pid, e.name, e.did)
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func (ix *LongLivedStreamRecordedService[T]) Snapshot(interval time.Duration) {
|
func (ix *LongLivedStreamRecordedService[T]) Snapshot(interval time.Duration) {
|
||||||
@@ -101,8 +126,10 @@ func (ix *LongLivedStreamRecordedService[T]) snapshot() []*StreamRecord[T] {
|
|||||||
return out
|
return out
|
||||||
}
|
}
|
||||||
|
|
||||||
func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Stream) {
|
func (ix *LongLivedStreamRecordedService[T]) HandleHeartbeat(s network.Stream) {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
defer s.Close()
|
defer s.Close()
|
||||||
|
dec := json.NewDecoder(s)
|
||||||
for {
|
for {
|
||||||
ix.StreamMU.Lock()
|
ix.StreamMU.Lock()
|
||||||
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
||||||
@@ -114,17 +141,37 @@ func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Strea
|
|||||||
streamsAnonym[k] = v
|
streamsAnonym[k] = v
|
||||||
}
|
}
|
||||||
ix.StreamMU.Unlock()
|
ix.StreamMU.Unlock()
|
||||||
|
pid, hb, err := CheckHeartbeat(ix.Host, s, dec, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
|
||||||
pid, hb, err := CheckHeartbeat(ix.Host, s, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
// Stream-level errors (EOF, reset, closed) mean the connection is gone
|
||||||
|
// — exit so the goroutine doesn't spin forever on a dead stream.
|
||||||
|
// Metric/policy errors (score too low, too many connections) are transient
|
||||||
|
// — those are also stream-terminal since the stream carries one session.
|
||||||
|
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) ||
|
||||||
|
strings.Contains(err.Error(), "reset") ||
|
||||||
|
strings.Contains(err.Error(), "closed") ||
|
||||||
|
strings.Contains(err.Error(), "too many connections") {
|
||||||
|
logger.Info().Err(err).Msg("heartbeat stream terminated, closing handler")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
logger.Warn().Err(err).Msg("heartbeat check failed, retrying on same stream")
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
ix.StreamMU.Lock()
|
ix.StreamMU.Lock()
|
||||||
// if record already seen update last seen
|
// if record already seen update last seen
|
||||||
if rec, ok := streams[*pid]; ok {
|
if rec, ok := streams[*pid]; ok {
|
||||||
rec.DID = hb.DID
|
rec.DID = hb.DID
|
||||||
|
if rec.HeartbeatStream == nil {
|
||||||
rec.HeartbeatStream = hb.Stream
|
rec.HeartbeatStream = hb.Stream
|
||||||
rec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
|
}
|
||||||
|
rec.HeartbeatStream = hb.Stream
|
||||||
|
if rec.HeartbeatStream.UptimeTracker == nil {
|
||||||
|
rec.HeartbeatStream.UptimeTracker = &UptimeTracker{
|
||||||
|
FirstSeen: time.Now().UTC(),
|
||||||
|
LastSeen: time.Now().UTC(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
logger.Info().Msg("A new node is updated : " + pid.String())
|
||||||
} else {
|
} else {
|
||||||
hb.Stream.UptimeTracker = &UptimeTracker{
|
hb.Stream.UptimeTracker = &UptimeTracker{
|
||||||
FirstSeen: time.Now().UTC(),
|
FirstSeen: time.Now().UTC(),
|
||||||
@@ -134,37 +181,51 @@ func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Strea
|
|||||||
DID: hb.DID,
|
DID: hb.DID,
|
||||||
HeartbeatStream: hb.Stream,
|
HeartbeatStream: hb.Stream,
|
||||||
}
|
}
|
||||||
|
logger.Info().Msg("A new node is subscribed : " + pid.String())
|
||||||
}
|
}
|
||||||
ix.StreamMU.Unlock()
|
ix.StreamMU.Unlock()
|
||||||
|
// Let the indexer republish the embedded signed record to the DHT.
|
||||||
|
if ix.AfterHeartbeat != nil {
|
||||||
|
ix.AfterHeartbeat(*pid)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func CheckHeartbeat(h host.Host, s network.Stream, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) {
|
func CheckHeartbeat(h host.Host, s network.Stream, dec *json.Decoder, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) {
|
||||||
if len(h.Network().Peers()) >= maxNodes {
|
if len(h.Network().Peers()) >= maxNodes {
|
||||||
return nil, nil, fmt.Errorf("too many connections, try another indexer")
|
return nil, nil, fmt.Errorf("too many connections, try another indexer")
|
||||||
}
|
}
|
||||||
var hb Heartbeat
|
var hb Heartbeat
|
||||||
if err := json.NewDecoder(s).Decode(&hb); err != nil {
|
if err := dec.Decode(&hb); err != nil {
|
||||||
return nil, nil, err
|
return nil, nil, err
|
||||||
}
|
}
|
||||||
if ok, bpms, err := getBandwidthChallengeRate(MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)), s); err != nil {
|
_, bpms, _ := getBandwidthChallengeRate(h, s.Conn().RemotePeer(), MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)))
|
||||||
return nil, nil, err
|
{
|
||||||
} else if !ok {
|
|
||||||
return nil, nil, fmt.Errorf("Not a proper peer")
|
|
||||||
} else {
|
|
||||||
pid, err := pp.Decode(hb.PeerID)
|
pid, err := pp.Decode(hb.PeerID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, nil, err
|
return nil, nil, err
|
||||||
}
|
}
|
||||||
upTime := float64(0)
|
upTime := float64(0)
|
||||||
|
isFirstHeartbeat := true
|
||||||
lock.Lock()
|
lock.Lock()
|
||||||
if rec, ok := streams[pid]; ok && rec.GetUptimeTracker() != nil {
|
if rec, ok := streams[pid]; ok && rec.GetUptimeTracker() != nil {
|
||||||
upTime = rec.GetUptimeTracker().Uptime().Hours() / float64(time.Since(TimeWatcher).Hours())
|
upTime = rec.GetUptimeTracker().Uptime().Hours() / float64(time.Since(TimeWatcher).Hours())
|
||||||
|
isFirstHeartbeat = false
|
||||||
}
|
}
|
||||||
lock.Unlock()
|
lock.Unlock()
|
||||||
diversity := getDiversityRate(h, hb.IndexersBinded)
|
diversity := getDiversityRate(h, hb.IndexersBinded)
|
||||||
|
fmt.Println(upTime, bpms, diversity)
|
||||||
hb.ComputeIndexerScore(upTime, bpms, diversity)
|
hb.ComputeIndexerScore(upTime, bpms, diversity)
|
||||||
if hb.Score < 75 {
|
// First heartbeat: uptime is always 0 so the score ceiling is 60, below the
|
||||||
|
// steady-state threshold of 75. Use a lower admission threshold so new peers
|
||||||
|
// can enter and start accumulating uptime. Subsequent heartbeats must meet
|
||||||
|
// the full threshold once uptime is tracked.
|
||||||
|
minScore := float64(50)
|
||||||
|
if isFirstHeartbeat {
|
||||||
|
minScore = 40
|
||||||
|
}
|
||||||
|
fmt.Println(hb.Score, minScore)
|
||||||
|
if hb.Score < minScore {
|
||||||
return nil, nil, errors.New("not enough trusting value")
|
return nil, nil, errors.New("not enough trusting value")
|
||||||
}
|
}
|
||||||
hb.Stream = &Stream{
|
hb.Stream = &Stream{
|
||||||
@@ -178,11 +239,13 @@ func CheckHeartbeat(h host.Host, s network.Stream, streams map[pp.ID]HeartBeatSt
|
|||||||
}
|
}
|
||||||
|
|
||||||
func getDiversityRate(h host.Host, peers []string) float64 {
|
func getDiversityRate(h host.Host, peers []string) float64 {
|
||||||
|
|
||||||
peers, _ = checkPeers(h, peers)
|
peers, _ = checkPeers(h, peers)
|
||||||
diverse := []string{}
|
diverse := []string{}
|
||||||
for _, p := range peers {
|
for _, p := range peers {
|
||||||
ip, err := ExtractIP(p)
|
ip, err := ExtractIP(p)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
fmt.Println("NO IP", p, err)
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
div := ip.Mask(net.CIDRMask(24, 32)).String()
|
div := ip.Mask(net.CIDRMask(24, 32)).String()
|
||||||
@@ -190,6 +253,9 @@ func getDiversityRate(h host.Host, peers []string) float64 {
|
|||||||
diverse = append(diverse, div)
|
diverse = append(diverse, div)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
if len(diverse) == 0 || len(peers) == 0 {
|
||||||
|
return 1
|
||||||
|
}
|
||||||
return float64(len(diverse) / len(peers))
|
return float64(len(diverse) / len(peers))
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -211,35 +277,42 @@ func checkPeers(h host.Host, peers []string) ([]string, []string) {
|
|||||||
return concretePeer, ips
|
return concretePeer, ips
|
||||||
}
|
}
|
||||||
|
|
||||||
const MaxExpectedMbps = 50.0
|
const MaxExpectedMbps = 100.0
|
||||||
const MinPayloadChallenge = 512
|
const MinPayloadChallenge = 512
|
||||||
const MaxPayloadChallenge = 2048
|
const MaxPayloadChallenge = 2048
|
||||||
const BaseRoundTrip = 400 * time.Millisecond
|
const BaseRoundTrip = 400 * time.Millisecond
|
||||||
|
|
||||||
func getBandwidthChallengeRate(payloadSize int, s network.Stream) (bool, float64, error) {
|
// getBandwidthChallengeRate opens a dedicated ProtocolBandwidthProbe stream to
|
||||||
// Génération payload aléatoire
|
// remotePeer, sends a random payload, reads the echo, and computes throughput.
|
||||||
|
// Using a separate stream avoids mixing binary data on the JSON heartbeat stream
|
||||||
|
// and ensures the echo handler is actually running on the remote side.
|
||||||
|
func getBandwidthChallengeRate(h host.Host, remotePeer pp.ID, payloadSize int) (bool, float64, error) {
|
||||||
payload := make([]byte, payloadSize)
|
payload := make([]byte, payloadSize)
|
||||||
_, err := cr.Read(payload)
|
if _, err := cr.Read(payload); err != nil {
|
||||||
|
return false, 0, err
|
||||||
|
}
|
||||||
|
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
s, err := h.NewStream(ctx, remotePeer, ProtocolBandwidthProbe)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return false, 0, err
|
return false, 0, err
|
||||||
}
|
}
|
||||||
|
defer s.Reset()
|
||||||
|
s.SetDeadline(time.Now().Add(10 * time.Second))
|
||||||
start := time.Now()
|
start := time.Now()
|
||||||
// send on heartbeat stream the challenge
|
|
||||||
if _, err = s.Write(payload); err != nil {
|
if _, err = s.Write(payload); err != nil {
|
||||||
return false, 0, err
|
return false, 0, err
|
||||||
}
|
}
|
||||||
// read back
|
s.CloseWrite()
|
||||||
|
// Half-close the write side so the handler's io.Copy sees EOF and stops.
|
||||||
|
// Read the echo.
|
||||||
response := make([]byte, payloadSize)
|
response := make([]byte, payloadSize)
|
||||||
_, err = io.ReadFull(s, response)
|
if _, err = io.ReadFull(s, response); err != nil {
|
||||||
if err != nil {
|
|
||||||
return false, 0, err
|
return false, 0, err
|
||||||
}
|
}
|
||||||
|
|
||||||
duration := time.Since(start)
|
duration := time.Since(start)
|
||||||
// Verify content
|
|
||||||
if !bytes.Equal(payload, response) {
|
|
||||||
return false, 0, nil // pb or a sadge peer.
|
|
||||||
}
|
|
||||||
maxRoundTrip := BaseRoundTrip + (time.Duration(payloadSize) * (100 * time.Millisecond))
|
maxRoundTrip := BaseRoundTrip + (time.Duration(payloadSize) * (100 * time.Millisecond))
|
||||||
mbps := float64(payloadSize*8) / duration.Seconds() / 1e6
|
mbps := float64(payloadSize*8) / duration.Seconds() / 1e6
|
||||||
if duration > maxRoundTrip || mbps < 5.0 {
|
if duration > maxRoundTrip || mbps < 5.0 {
|
||||||
@@ -345,13 +418,36 @@ var StaticIndexers map[string]*pp.AddrInfo = map[string]*pp.AddrInfo{}
|
|||||||
var StreamMuIndexes sync.RWMutex
|
var StreamMuIndexes sync.RWMutex
|
||||||
var StreamIndexers ProtocolStream = ProtocolStream{}
|
var StreamIndexers ProtocolStream = ProtocolStream{}
|
||||||
|
|
||||||
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error {
|
// indexerHeartbeatNudge allows replenishIndexersFromNative to trigger an immediate
|
||||||
|
// heartbeat tick after adding new entries to StaticIndexers, without waiting up
|
||||||
|
// to 20s for the regular ticker. Buffered(1) so the sender never blocks.
|
||||||
|
var indexerHeartbeatNudge = make(chan struct{}, 1)
|
||||||
|
|
||||||
|
// NudgeIndexerHeartbeat signals the indexer heartbeat goroutine to fire immediately.
|
||||||
|
func NudgeIndexerHeartbeat() {
|
||||||
|
select {
|
||||||
|
case indexerHeartbeatNudge <- struct{}{}:
|
||||||
|
default: // nudge already pending, skip
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID, recordFn ...func() json.RawMessage) error {
|
||||||
TimeWatcher = time.Now().UTC()
|
TimeWatcher = time.Now().UTC()
|
||||||
logger := oclib.GetLogger()
|
logger := oclib.GetLogger()
|
||||||
|
|
||||||
// If native addresses are configured, bypass static indexer addresses
|
// If native addresses are configured, get the indexer pool from the native mesh,
|
||||||
|
// then start the long-lived heartbeat goroutine toward those indexers.
|
||||||
if conf.GetConfig().NativeIndexerAddresses != "" {
|
if conf.GetConfig().NativeIndexerAddresses != "" {
|
||||||
return ConnectToNatives(h, minIndexer, maxIndexer, myPID)
|
if err := ConnectToNatives(h, minIndexer, maxIndexer, myPID); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
// Step 2: start the long-lived heartbeat goroutine toward the indexer pool.
|
||||||
|
// replaceStaticIndexers/replenishIndexersFromNative update the map in-place
|
||||||
|
// so this single goroutine follows all pool changes automatically.
|
||||||
|
logger.Info().Msg("[native] step 2 — starting long-lived heartbeat to indexer pool")
|
||||||
|
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name,
|
||||||
|
h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...)
|
||||||
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
addresses := strings.Split(conf.GetConfig().IndexerAddresses, ",")
|
addresses := strings.Split(conf.GetConfig().IndexerAddresses, ",")
|
||||||
@@ -360,8 +456,8 @@ func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
|
|||||||
addresses = addresses[0:maxIndexer]
|
addresses = addresses[0:maxIndexer]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
StreamMuIndexes.Lock()
|
||||||
for _, indexerAddr := range addresses {
|
for _, indexerAddr := range addresses {
|
||||||
fmt.Println("GENERATE ADDR", indexerAddr)
|
|
||||||
ad, err := pp.AddrInfoFromString(indexerAddr)
|
ad, err := pp.AddrInfoFromString(indexerAddr)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
logger.Err(err)
|
logger.Err(err)
|
||||||
@@ -369,15 +465,18 @@ func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
|
|||||||
}
|
}
|
||||||
StaticIndexers[indexerAddr] = ad
|
StaticIndexers[indexerAddr] = ad
|
||||||
}
|
}
|
||||||
|
indexerCount := len(StaticIndexers)
|
||||||
|
StreamMuIndexes.Unlock()
|
||||||
|
|
||||||
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name, h, StreamIndexers, StaticIndexers, 20*time.Second) // your indexer is just like a node for the next indexer.
|
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name, h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...) // your indexer is just like a node for the next indexer.
|
||||||
if len(StaticIndexers) < minIndexer {
|
if indexerCount < minIndexer {
|
||||||
return errors.New("you run a node without indexers... your gonna be isolated.")
|
return errors.New("you run a node without indexers... your gonna be isolated.")
|
||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host, proto protocol.ID, id pp.ID, mypid pp.ID, force bool, onStreamCreated *func(network.Stream)) ProtocolStream {
|
func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host, proto protocol.ID, id pp.ID, mypid pp.ID, force bool, onStreamCreated *func(network.Stream)) ProtocolStream {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
if onStreamCreated == nil {
|
if onStreamCreated == nil {
|
||||||
f := func(s network.Stream) {
|
f := func(s network.Stream) {
|
||||||
protoS[proto][id] = &Stream{
|
protoS[proto][id] = &Stream{
|
||||||
@@ -400,7 +499,7 @@ func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host,
|
|||||||
if protoS[proto][id] != nil {
|
if protoS[proto][id] != nil {
|
||||||
protoS[proto][id].Expiry = time.Now().Add(2 * time.Minute)
|
protoS[proto][id].Expiry = time.Now().Add(2 * time.Minute)
|
||||||
} else {
|
} else {
|
||||||
fmt.Println("NEW STREAM", proto, id)
|
logger.Info().Msg("NEW STREAM Generated" + fmt.Sprintf("%v", proto) + " " + id.String())
|
||||||
s, err := h.NewStream(*ctx, id, proto)
|
s, err := h.NewStream(*ctx, id, proto)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
panic(err.Error())
|
panic(err.Error())
|
||||||
@@ -419,12 +518,16 @@ type Heartbeat struct {
|
|||||||
Timestamp int64 `json:"timestamp"`
|
Timestamp int64 `json:"timestamp"`
|
||||||
IndexersBinded []string `json:"indexers_binded"`
|
IndexersBinded []string `json:"indexers_binded"`
|
||||||
Score float64
|
Score float64
|
||||||
|
// Record carries a fresh signed PeerRecord (JSON) so the receiving indexer
|
||||||
|
// can republish it to the DHT without an extra round-trip.
|
||||||
|
// Only set by nodes (not indexers heartbeating other indexers).
|
||||||
|
Record json.RawMessage `json:"record,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
func (hb *Heartbeat) ComputeIndexerScore(uptimeHours float64, bpms float64, diversity float64) {
|
func (hb *Heartbeat) ComputeIndexerScore(uptimeHours float64, bpms float64, diversity float64) {
|
||||||
hb.Score = (0.4 * uptimeHours) +
|
hb.Score = ((0.3 * uptimeHours) +
|
||||||
(0.4 * bpms) +
|
(0.3 * bpms) +
|
||||||
(0.2 * diversity)
|
(0.4 * diversity)) * 100
|
||||||
}
|
}
|
||||||
|
|
||||||
type HeartbeatInfo []struct {
|
type HeartbeatInfo []struct {
|
||||||
@@ -433,35 +536,214 @@ type HeartbeatInfo []struct {
|
|||||||
|
|
||||||
const ProtocolHeartbeat = "/opencloud/heartbeat/1.0"
|
const ProtocolHeartbeat = "/opencloud/heartbeat/1.0"
|
||||||
|
|
||||||
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, ps ProtocolStream, peers map[string]*pp.AddrInfo, interval time.Duration) {
|
// ProtocolBandwidthProbe is a dedicated short-lived stream used exclusively
|
||||||
peerID, err := oclib.GenerateNodeID()
|
// for bandwidth/latency measurement. The handler echoes any bytes it receives.
|
||||||
if err == nil {
|
// All nodes and indexers register this handler so peers can measure them.
|
||||||
panic("can't heartbeat daemon failed to start")
|
const ProtocolBandwidthProbe = "/opencloud/probe/1.0"
|
||||||
|
|
||||||
|
// HandleBandwidthProbe echoes back everything written on the stream, then closes.
|
||||||
|
// It is registered by all participants so the measuring side (the heartbeat receiver)
|
||||||
|
// can open a dedicated probe stream and read the round-trip latency + throughput.
|
||||||
|
func HandleBandwidthProbe(s network.Stream) {
|
||||||
|
defer s.Close()
|
||||||
|
s.SetDeadline(time.Now().Add(10 * time.Second))
|
||||||
|
io.Copy(s, s) // echo every byte back to the sender
|
||||||
|
}
|
||||||
|
|
||||||
|
// SendHeartbeat starts a goroutine that sends periodic heartbeats to peers.
|
||||||
|
// recordFn, when provided, is called on each tick and its output is embedded in
|
||||||
|
// the heartbeat as a fresh signed PeerRecord so the receiving indexer can
|
||||||
|
// republish it to the DHT without an extra round-trip.
|
||||||
|
// Pass no recordFn (or nil) for indexer→indexer / native heartbeats.
|
||||||
|
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, ps ProtocolStream, peers map[string]*pp.AddrInfo, mu *sync.RWMutex, interval time.Duration, recordFn ...func() json.RawMessage) {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
// isIndexerHB is true when this goroutine drives the indexer heartbeat.
|
||||||
|
// isNativeHB is true when it drives the native heartbeat.
|
||||||
|
isIndexerHB := mu == &StreamMuIndexes
|
||||||
|
isNativeHB := mu == &StreamNativeMu
|
||||||
|
var recFn func() json.RawMessage
|
||||||
|
if len(recordFn) > 0 {
|
||||||
|
recFn = recordFn[0]
|
||||||
}
|
}
|
||||||
go func() {
|
go func() {
|
||||||
|
logger.Info().Str("proto", string(proto)).Int("peers", len(peers)).Msg("heartbeat started")
|
||||||
t := time.NewTicker(interval)
|
t := time.NewTicker(interval)
|
||||||
defer t.Stop()
|
defer t.Stop()
|
||||||
for {
|
|
||||||
select {
|
// doTick sends one round of heartbeats to the current peer snapshot.
|
||||||
case <-t.C:
|
doTick := func() {
|
||||||
addrs := []string{}
|
// Build the heartbeat payload — snapshot current indexer addresses.
|
||||||
|
StreamMuIndexes.RLock()
|
||||||
|
addrs := make([]string, 0, len(StaticIndexers))
|
||||||
for addr := range StaticIndexers {
|
for addr := range StaticIndexers {
|
||||||
addrs = append(addrs, addr)
|
addrs = append(addrs, addr)
|
||||||
}
|
}
|
||||||
|
StreamMuIndexes.RUnlock()
|
||||||
hb := Heartbeat{
|
hb := Heartbeat{
|
||||||
Name: name,
|
Name: name,
|
||||||
DID: peerID,
|
|
||||||
PeerID: h.ID().String(),
|
PeerID: h.ID().String(),
|
||||||
Timestamp: time.Now().UTC().Unix(),
|
Timestamp: time.Now().UTC().Unix(),
|
||||||
IndexersBinded: addrs,
|
IndexersBinded: addrs,
|
||||||
}
|
}
|
||||||
|
if recFn != nil {
|
||||||
|
hb.Record = recFn()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Snapshot the peer list under a read lock so we don't hold the
|
||||||
|
// write lock during network I/O.
|
||||||
|
if mu != nil {
|
||||||
|
mu.RLock()
|
||||||
|
}
|
||||||
|
snapshot := make([]*pp.AddrInfo, 0, len(peers))
|
||||||
for _, ix := range peers {
|
for _, ix := range peers {
|
||||||
if err = sendHeartbeat(ctx, h, proto, ix, hb, ps, interval*time.Second); err != nil {
|
snapshot = append(snapshot, ix)
|
||||||
|
}
|
||||||
|
if mu != nil {
|
||||||
|
mu.RUnlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, ix := range snapshot {
|
||||||
|
wasConnected := h.Network().Connectedness(ix.ID) == network.Connected
|
||||||
|
if err := sendHeartbeat(ctx, h, proto, ix, hb, ps, interval*time.Second); err != nil {
|
||||||
|
// Step 3: heartbeat failed — remove from pool and trigger replenish.
|
||||||
|
logger.Info().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 3 — heartbeat failed, removing peer from pool")
|
||||||
|
|
||||||
|
// Remove the dead peer and clean up its stream.
|
||||||
|
// mu already covers ps when isIndexerHB (same mutex), so one
|
||||||
|
// lock acquisition is sufficient — no re-entrant double-lock.
|
||||||
|
if mu != nil {
|
||||||
|
mu.Lock()
|
||||||
|
}
|
||||||
|
if ps[proto] != nil {
|
||||||
|
if s, ok := ps[proto][ix.ID]; ok {
|
||||||
|
if s.Stream != nil {
|
||||||
|
s.Stream.Close()
|
||||||
|
}
|
||||||
|
delete(ps[proto], ix.ID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
lostAddr := ""
|
||||||
|
for addr, ad := range peers {
|
||||||
|
if ad.ID == ix.ID {
|
||||||
|
lostAddr = addr
|
||||||
|
delete(peers, addr)
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
need := conf.GetConfig().MinIndexer - len(peers)
|
||||||
|
remaining := len(peers)
|
||||||
|
if mu != nil {
|
||||||
|
mu.Unlock()
|
||||||
|
}
|
||||||
|
logger.Info().Int("remaining", remaining).Int("min", conf.GetConfig().MinIndexer).Int("need", need).Msg("[native] step 3 — pool state after removal")
|
||||||
|
|
||||||
|
// Step 4: ask the native for the missing indexer count.
|
||||||
|
if isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
|
||||||
|
if need < 1 {
|
||||||
|
need = 1
|
||||||
|
}
|
||||||
|
logger.Info().Int("need", need).Msg("[native] step 3→4 — triggering replenish")
|
||||||
|
go replenishIndexersFromNative(h, need)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Native heartbeat failed — find a replacement native.
|
||||||
|
// Case 1: if the dead native was also serving as an indexer, evict it
|
||||||
|
// from StaticIndexers immediately without waiting for the indexer HB tick.
|
||||||
|
if isNativeHB {
|
||||||
|
logger.Info().Str("addr", lostAddr).Msg("[native] step 3 — native heartbeat failed, triggering native replenish")
|
||||||
|
if lostAddr != "" && conf.GetConfig().NativeIndexerAddresses != "" {
|
||||||
StreamMuIndexes.Lock()
|
StreamMuIndexes.Lock()
|
||||||
delete(StreamIndexers[proto], ix.ID)
|
if _, wasIndexer := StaticIndexers[lostAddr]; wasIndexer {
|
||||||
|
delete(StaticIndexers, lostAddr)
|
||||||
|
if s := StreamIndexers[ProtocolHeartbeat]; s != nil {
|
||||||
|
if stream, ok := s[ix.ID]; ok {
|
||||||
|
if stream.Stream != nil {
|
||||||
|
stream.Stream.Close()
|
||||||
|
}
|
||||||
|
delete(s, ix.ID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
idxNeed := conf.GetConfig().MinIndexer - len(StaticIndexers)
|
||||||
|
StreamMuIndexes.Unlock()
|
||||||
|
if idxNeed < 1 {
|
||||||
|
idxNeed = 1
|
||||||
|
}
|
||||||
|
logger.Info().Str("addr", lostAddr).Msg("[native] dead native evicted from indexer pool, triggering replenish")
|
||||||
|
go replenishIndexersFromNative(h, idxNeed)
|
||||||
|
} else {
|
||||||
StreamMuIndexes.Unlock()
|
StreamMuIndexes.Unlock()
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
go replenishNativesFromPeers(h, lostAddr, proto)
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// Case 2: native-as-indexer reconnected after a restart.
|
||||||
|
// If the peer was disconnected before this tick and the heartbeat just
|
||||||
|
// succeeded (transparent reconnect), the native may have restarted with
|
||||||
|
// blank state (responsiblePeers empty). Evict it from StaticIndexers and
|
||||||
|
// re-request an assignment so the native re-tracks us properly and
|
||||||
|
// runOffloadLoop can eventually migrate us to real indexers.
|
||||||
|
if !wasConnected && isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
|
||||||
|
StreamNativeMu.RLock()
|
||||||
|
isNativeIndexer := false
|
||||||
|
for _, ad := range StaticNatives {
|
||||||
|
if ad.ID == ix.ID {
|
||||||
|
isNativeIndexer = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
StreamNativeMu.RUnlock()
|
||||||
|
if isNativeIndexer {
|
||||||
|
if mu != nil {
|
||||||
|
mu.Lock()
|
||||||
|
}
|
||||||
|
if ps[proto] != nil {
|
||||||
|
if s, ok := ps[proto][ix.ID]; ok {
|
||||||
|
if s.Stream != nil {
|
||||||
|
s.Stream.Close()
|
||||||
|
}
|
||||||
|
delete(ps[proto], ix.ID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
reconnectedAddr := ""
|
||||||
|
for addr, ad := range peers {
|
||||||
|
if ad.ID == ix.ID {
|
||||||
|
reconnectedAddr = addr
|
||||||
|
delete(peers, addr)
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
idxNeed := conf.GetConfig().MinIndexer - len(peers)
|
||||||
|
if mu != nil {
|
||||||
|
mu.Unlock()
|
||||||
|
}
|
||||||
|
if idxNeed < 1 {
|
||||||
|
idxNeed = 1
|
||||||
|
}
|
||||||
|
logger.Info().Str("addr", reconnectedAddr).Str("peer", ix.ID.String()).Msg(
|
||||||
|
"[native] native-as-indexer reconnected after restart — evicting and re-requesting assignment")
|
||||||
|
go replenishIndexersFromNative(h, idxNeed)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
logger.Debug().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 2 — heartbeat sent ok")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-t.C:
|
||||||
|
doTick()
|
||||||
|
case <-indexerHeartbeatNudge:
|
||||||
|
if isIndexerHB {
|
||||||
|
logger.Info().Msg("[native] step 2 — nudge received, heartbeating new indexers immediately")
|
||||||
|
doTick()
|
||||||
|
}
|
||||||
|
case <-nativeHeartbeatNudge:
|
||||||
|
if isNativeHB {
|
||||||
|
logger.Info().Msg("[native] native nudge received, heartbeating replacement native immediately")
|
||||||
|
doTick()
|
||||||
|
}
|
||||||
case <-ctx.Done():
|
case <-ctx.Done():
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
@@ -480,7 +762,7 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
|
|||||||
if pts[proto] != nil {
|
if pts[proto] != nil {
|
||||||
expiry = pts[proto].TTL
|
expiry = pts[proto].TTL
|
||||||
}
|
}
|
||||||
if ctxTTL, err := context.WithTimeout(context.Background(), expiry); err == nil {
|
ctxTTL, _ := context.WithTimeout(context.Background(), expiry)
|
||||||
if h.Network().Connectedness(ad.ID) != network.Connected {
|
if h.Network().Connectedness(ad.ID) != network.Connected {
|
||||||
if err := h.Connect(ctxTTL, ad); err != nil {
|
if err := h.Connect(ctxTTL, ad); err != nil {
|
||||||
return streams, err
|
return streams, err
|
||||||
@@ -496,10 +778,11 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
|
|||||||
mu.Unlock()
|
mu.Unlock()
|
||||||
time.AfterFunc(expiry, func() {
|
time.AfterFunc(expiry, func() {
|
||||||
mu.Lock()
|
mu.Lock()
|
||||||
defer mu.Unlock()
|
|
||||||
delete(streams[proto], ad.ID)
|
delete(streams[proto], ad.ID)
|
||||||
|
mu.Unlock()
|
||||||
})
|
})
|
||||||
streams[ProtocolPublish][ad.ID] = &Stream{
|
mu.Lock()
|
||||||
|
streams[proto][ad.ID] = &Stream{
|
||||||
DID: did,
|
DID: did,
|
||||||
Stream: s,
|
Stream: s,
|
||||||
Expiry: time.Now().UTC().Add(expiry),
|
Expiry: time.Now().UTC().Add(expiry),
|
||||||
@@ -509,29 +792,32 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
|
|||||||
} else {
|
} else {
|
||||||
return streams, err
|
return streams, err
|
||||||
}
|
}
|
||||||
}
|
|
||||||
return streams, errors.New("can't create a context")
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func sendHeartbeat(ctx context.Context, h host.Host, proto protocol.ID, p *pp.AddrInfo,
|
func sendHeartbeat(ctx context.Context, h host.Host, proto protocol.ID, p *pp.AddrInfo,
|
||||||
hb Heartbeat, ps ProtocolStream, interval time.Duration) error {
|
hb Heartbeat, ps ProtocolStream, interval time.Duration) error {
|
||||||
streams := ps.Get(proto)
|
logger := oclib.GetLogger()
|
||||||
if len(streams) == 0 {
|
if ps[proto] == nil {
|
||||||
return errors.New("no stream for protocol heartbeat founded")
|
ps[proto] = map[pp.ID]*Stream{}
|
||||||
}
|
}
|
||||||
|
streams := ps[proto]
|
||||||
pss, exists := streams[p.ID]
|
pss, exists := streams[p.ID]
|
||||||
ctxTTL, _ := context.WithTimeout(ctx, 3*interval)
|
ctxTTL, cancel := context.WithTimeout(ctx, 3*interval)
|
||||||
|
defer cancel()
|
||||||
// Connect si nécessaire
|
// Connect si nécessaire
|
||||||
if h.Network().Connectedness(p.ID) != network.Connected {
|
if h.Network().Connectedness(p.ID) != network.Connected {
|
||||||
if err := h.Connect(ctxTTL, *p); err != nil {
|
if err := h.Connect(ctxTTL, *p); err != nil {
|
||||||
|
logger.Err(err)
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
exists = false // on devra recréer le stream
|
exists = false // on devra recréer le stream
|
||||||
}
|
}
|
||||||
// Crée le stream si inexistant ou fermé
|
// Crée le stream si inexistant ou fermé
|
||||||
if !exists || pss.Stream == nil {
|
if !exists || pss.Stream == nil {
|
||||||
|
logger.Info().Msg("New Stream engaged as Heartbeat " + fmt.Sprintf("%v", proto) + " " + p.ID.String())
|
||||||
s, err := h.NewStream(ctx, p.ID, proto)
|
s, err := h.NewStream(ctx, p.ID, proto)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
logger.Err(err)
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
pss = &Stream{
|
pss = &Stream{
|
||||||
|
|||||||
@@ -13,6 +13,7 @@ import (
|
|||||||
oclib "cloud.o-forge.io/core/oc-lib"
|
oclib "cloud.o-forge.io/core/oc-lib"
|
||||||
"github.com/libp2p/go-libp2p/core/host"
|
"github.com/libp2p/go-libp2p/core/host"
|
||||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||||
|
"github.com/libp2p/go-libp2p/core/protocol"
|
||||||
)
|
)
|
||||||
|
|
||||||
const (
|
const (
|
||||||
@@ -57,6 +58,7 @@ type IndexerRegistration struct {
|
|||||||
// GetIndexersRequest asks a native for a pool of live indexers.
|
// GetIndexersRequest asks a native for a pool of live indexers.
|
||||||
type GetIndexersRequest struct {
|
type GetIndexersRequest struct {
|
||||||
Count int `json:"count"`
|
Count int `json:"count"`
|
||||||
|
From string `json:"from"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// GetIndexersResponse is returned by the native with live indexer multiaddrs.
|
// GetIndexersResponse is returned by the native with live indexer multiaddrs.
|
||||||
@@ -69,17 +71,26 @@ var StaticNatives = map[string]*pp.AddrInfo{}
|
|||||||
var StreamNativeMu sync.RWMutex
|
var StreamNativeMu sync.RWMutex
|
||||||
var StreamNatives ProtocolStream = ProtocolStream{}
|
var StreamNatives ProtocolStream = ProtocolStream{}
|
||||||
|
|
||||||
// ConnectToNatives is the client-side entry point for nodes/indexers that have
|
// nativeHeartbeatOnce ensures we start exactly one long-lived heartbeat goroutine
|
||||||
// NativeIndexerAddresses configured. It:
|
// toward the native mesh, even when ConnectToNatives is called from recovery paths.
|
||||||
// 1. Connects (long-lived heartbeat) to all configured natives.
|
var nativeHeartbeatOnce sync.Once
|
||||||
// 2. Fetches an initial indexer pool from the FIRST responsive native.
|
|
||||||
// 3. Challenges that pool to ALL natives (consensus round 1).
|
// nativeMeshHeartbeatOnce guards the native-to-native heartbeat goroutine started
|
||||||
// 4. If the confirmed list is short, samples native suggestions and re-challenges (round 2).
|
// by EnsureNativePeers so only one goroutine covers the whole StaticNatives map.
|
||||||
// 5. Populates StaticIndexers with majority-confirmed indexers.
|
var nativeMeshHeartbeatOnce sync.Once
|
||||||
|
|
||||||
|
// ConnectToNatives is the initial setup for nodes/indexers in native mode:
|
||||||
|
// 1. Parses native addresses → StaticNatives.
|
||||||
|
// 2. Starts a single long-lived heartbeat goroutine toward the native mesh.
|
||||||
|
// 3. Fetches an initial indexer pool from the first responsive native.
|
||||||
|
// 4. Runs consensus when real (non-fallback) indexers are returned.
|
||||||
|
// 5. Replaces StaticIndexers with the confirmed pool.
|
||||||
func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error {
|
func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error {
|
||||||
logger := oclib.GetLogger()
|
logger := oclib.GetLogger()
|
||||||
|
logger.Info().Msg("[native] step 1 — parsing native addresses")
|
||||||
|
|
||||||
// Parse in config order: the first entry is the primary pool source.
|
// Parse native addresses — safe to call multiple times.
|
||||||
|
StreamNativeMu.Lock()
|
||||||
orderedAddrs := []string{}
|
orderedAddrs := []string{}
|
||||||
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
|
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
|
||||||
addr = strings.TrimSpace(addr)
|
addr = strings.TrimSpace(addr)
|
||||||
@@ -88,106 +99,208 @@ func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
|
|||||||
}
|
}
|
||||||
ad, err := pp.AddrInfoFromString(addr)
|
ad, err := pp.AddrInfoFromString(addr)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
logger.Err(err).Msg("ConnectToNatives: invalid addr")
|
logger.Err(err).Msg("[native] step 1 — invalid native addr")
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
StaticNatives[addr] = ad
|
StaticNatives[addr] = ad
|
||||||
orderedAddrs = append(orderedAddrs, addr)
|
orderedAddrs = append(orderedAddrs, addr)
|
||||||
|
logger.Info().Str("addr", addr).Msg("[native] step 1 — native registered")
|
||||||
}
|
}
|
||||||
if len(StaticNatives) == 0 {
|
if len(StaticNatives) == 0 {
|
||||||
|
StreamNativeMu.Unlock()
|
||||||
return errors.New("no valid native addresses configured")
|
return errors.New("no valid native addresses configured")
|
||||||
}
|
}
|
||||||
|
StreamNativeMu.Unlock()
|
||||||
|
logger.Info().Int("count", len(orderedAddrs)).Msg("[native] step 1 — natives parsed")
|
||||||
|
|
||||||
// Long-lived heartbeat connections to keep the native mesh active.
|
// Step 1: one long-lived heartbeat to each native.
|
||||||
|
nativeHeartbeatOnce.Do(func() {
|
||||||
|
logger.Info().Msg("[native] step 1 — starting long-lived heartbeat to native mesh")
|
||||||
SendHeartbeat(context.Background(), ProtocolHeartbeat,
|
SendHeartbeat(context.Background(), ProtocolHeartbeat,
|
||||||
conf.GetConfig().Name, h, StreamNatives, StaticNatives, 20*time.Second)
|
conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
|
||||||
|
})
|
||||||
// Step 1: get an initial pool from the FIRST responsive native (in config order).
|
|
||||||
var candidates []string
|
|
||||||
var isFallback bool
|
|
||||||
for _, addr := range orderedAddrs {
|
|
||||||
ad := StaticNatives[addr]
|
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
|
||||||
if err := h.Connect(ctx, *ad); err != nil {
|
|
||||||
cancel()
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetIndexers)
|
|
||||||
cancel()
|
|
||||||
if err != nil {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
req := GetIndexersRequest{Count: maxIndexer}
|
|
||||||
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
|
||||||
s.Close()
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
var resp GetIndexersResponse
|
|
||||||
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
|
||||||
s.Close()
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
s.Close()
|
|
||||||
candidates = resp.Indexers
|
|
||||||
isFallback = resp.IsSelfFallback
|
|
||||||
break // first responsive native only
|
|
||||||
}
|
|
||||||
|
|
||||||
|
// Fetch initial pool from the first responsive native.
|
||||||
|
logger.Info().Int("want", maxIndexer).Msg("[native] step 1 — fetching indexer pool from native")
|
||||||
|
candidates, isFallback := fetchIndexersFromNative(h, orderedAddrs, maxIndexer)
|
||||||
if len(candidates) == 0 {
|
if len(candidates) == 0 {
|
||||||
|
logger.Warn().Msg("[native] step 1 — no candidates returned by any native")
|
||||||
if minIndexer > 0 {
|
if minIndexer > 0 {
|
||||||
return errors.New("ConnectToNatives: no indexers available from any native")
|
return errors.New("ConnectToNatives: no indexers available from any native")
|
||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 1 — pool received")
|
||||||
|
|
||||||
// If the native is already the fallback indexer, use it directly — no consensus needed.
|
// Step 2: populate StaticIndexers — consensus for real indexers, direct for fallback.
|
||||||
|
pool := resolvePool(h, candidates, isFallback, maxIndexer)
|
||||||
|
replaceStaticIndexers(pool)
|
||||||
|
|
||||||
|
StreamMuIndexes.RLock()
|
||||||
|
indexerCount := len(StaticIndexers)
|
||||||
|
StreamMuIndexes.RUnlock()
|
||||||
|
logger.Info().Int("pool_size", indexerCount).Msg("[native] step 2 — StaticIndexers replaced")
|
||||||
|
|
||||||
|
if minIndexer > 0 && indexerCount < minIndexer {
|
||||||
|
return errors.New("not enough majority-confirmed indexers available")
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// replenishIndexersFromNative is called when an indexer heartbeat fails (step 3→4).
|
||||||
|
// It asks the native for exactly `need` replacement indexers, runs consensus when
|
||||||
|
// real indexers are returned, and adds the results to StaticIndexers without
|
||||||
|
// clearing the existing pool.
|
||||||
|
func replenishIndexersFromNative(h host.Host, need int) {
|
||||||
|
if need <= 0 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
logger.Info().Int("need", need).Msg("[native] step 4 — replenishing indexer pool from native")
|
||||||
|
|
||||||
|
StreamNativeMu.RLock()
|
||||||
|
addrs := make([]string, 0, len(StaticNatives))
|
||||||
|
for addr := range StaticNatives {
|
||||||
|
addrs = append(addrs, addr)
|
||||||
|
}
|
||||||
|
StreamNativeMu.RUnlock()
|
||||||
|
|
||||||
|
candidates, isFallback := fetchIndexersFromNative(h, addrs, need)
|
||||||
|
if len(candidates) == 0 {
|
||||||
|
logger.Warn().Msg("[native] step 4 — no candidates returned by any native")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 4 — candidates received")
|
||||||
|
|
||||||
|
pool := resolvePool(h, candidates, isFallback, need)
|
||||||
|
if len(pool) == 0 {
|
||||||
|
logger.Warn().Msg("[native] step 4 — consensus yielded no confirmed indexers")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add new indexers to the pool — do NOT clear existing ones.
|
||||||
|
StreamMuIndexes.Lock()
|
||||||
|
for addr, ad := range pool {
|
||||||
|
StaticIndexers[addr] = ad
|
||||||
|
}
|
||||||
|
total := len(StaticIndexers)
|
||||||
|
|
||||||
|
StreamMuIndexes.Unlock()
|
||||||
|
logger.Info().Int("added", len(pool)).Int("total", total).Msg("[native] step 4 — pool replenished")
|
||||||
|
|
||||||
|
// Nudge the heartbeat goroutine to connect immediately instead of waiting
|
||||||
|
// for the next 20s tick.
|
||||||
|
NudgeIndexerHeartbeat()
|
||||||
|
logger.Info().Msg("[native] step 4 — heartbeat goroutine nudged")
|
||||||
|
}
|
||||||
|
|
||||||
|
// fetchIndexersFromNative opens a ProtocolNativeGetIndexers stream to the first
|
||||||
|
// responsive native and returns the candidate list and fallback flag.
|
||||||
|
func fetchIndexersFromNative(h host.Host, nativeAddrs []string, count int) (candidates []string, isFallback bool) {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
for _, addr := range nativeAddrs {
|
||||||
|
ad, err := pp.AddrInfoFromString(addr)
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn().Str("addr", addr).Msg("[native] fetch — skipping invalid addr")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
if err := h.Connect(ctx, *ad); err != nil {
|
||||||
|
cancel()
|
||||||
|
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — connect failed")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetIndexers)
|
||||||
|
cancel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — stream open failed")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
req := GetIndexersRequest{Count: count, From: h.ID().String()}
|
||||||
|
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||||
|
s.Close()
|
||||||
|
logger.Warn().Str("addr", addr).Err(encErr).Msg("[native] fetch — encode request failed")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
var resp GetIndexersResponse
|
||||||
|
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||||
|
s.Close()
|
||||||
|
logger.Warn().Str("addr", addr).Err(decErr).Msg("[native] fetch — decode response failed")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
s.Close()
|
||||||
|
logger.Info().Str("native", addr).Int("indexers", len(resp.Indexers)).Bool("fallback", resp.IsSelfFallback).Msg("[native] fetch — response received")
|
||||||
|
return resp.Indexers, resp.IsSelfFallback
|
||||||
|
}
|
||||||
|
logger.Warn().Msg("[native] fetch — no native responded")
|
||||||
|
return nil, false
|
||||||
|
}
|
||||||
|
|
||||||
|
// resolvePool converts a candidate list to a validated addr→AddrInfo map.
|
||||||
|
// When isFallback is true the native itself is the indexer — no consensus needed.
|
||||||
|
// When isFallback is false, consensus is run before accepting the candidates.
|
||||||
|
func resolvePool(h host.Host, candidates []string, isFallback bool, maxIndexer int) map[string]*pp.AddrInfo {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
if isFallback {
|
if isFallback {
|
||||||
|
logger.Info().Strs("addrs", candidates).Msg("[native] resolve — fallback mode, skipping consensus")
|
||||||
|
pool := make(map[string]*pp.AddrInfo, len(candidates))
|
||||||
for _, addr := range candidates {
|
for _, addr := range candidates {
|
||||||
ad, err := pp.AddrInfoFromString(addr)
|
ad, err := pp.AddrInfoFromString(addr)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
StaticIndexers[addr] = ad
|
pool[addr] = ad
|
||||||
}
|
}
|
||||||
return nil
|
return pool
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 2: challenge the pool to ALL configured natives and score by majority vote.
|
// Round 1.
|
||||||
|
logger.Info().Int("candidates", len(candidates)).Msg("[native] resolve — consensus round 1")
|
||||||
confirmed, suggestions := clientSideConsensus(h, candidates)
|
confirmed, suggestions := clientSideConsensus(h, candidates)
|
||||||
|
logger.Info().Int("confirmed", len(confirmed)).Int("suggestions", len(suggestions)).Msg("[native] resolve — consensus round 1 done")
|
||||||
|
|
||||||
// Step 3: if we still have gaps, sample from suggestions and re-challenge.
|
// Round 2: fill gaps from suggestions if below target.
|
||||||
if len(confirmed) < maxIndexer && len(suggestions) > 0 {
|
if len(confirmed) < maxIndexer && len(suggestions) > 0 {
|
||||||
rand.Shuffle(len(suggestions), func(i, j int) { suggestions[i], suggestions[j] = suggestions[j], suggestions[i] })
|
rand.Shuffle(len(suggestions), func(i, j int) { suggestions[i], suggestions[j] = suggestions[j], suggestions[i] })
|
||||||
gap := maxIndexer - len(confirmed)
|
gap := maxIndexer - len(confirmed)
|
||||||
if gap > len(suggestions) {
|
if gap > len(suggestions) {
|
||||||
gap = len(suggestions)
|
gap = len(suggestions)
|
||||||
}
|
}
|
||||||
|
logger.Info().Int("gap", gap).Msg("[native] resolve — consensus round 2 (filling gaps)")
|
||||||
confirmed2, _ := clientSideConsensus(h, append(confirmed, suggestions[:gap]...))
|
confirmed2, _ := clientSideConsensus(h, append(confirmed, suggestions[:gap]...))
|
||||||
if len(confirmed2) > 0 {
|
if len(confirmed2) > 0 {
|
||||||
confirmed = confirmed2
|
confirmed = confirmed2
|
||||||
}
|
}
|
||||||
|
logger.Info().Int("confirmed", len(confirmed)).Msg("[native] resolve — consensus round 2 done")
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 4: populate StaticIndexers with confirmed addresses.
|
pool := make(map[string]*pp.AddrInfo, len(confirmed))
|
||||||
for _, addr := range confirmed {
|
for _, addr := range confirmed {
|
||||||
ad, err := pp.AddrInfoFromString(addr)
|
ad, err := pp.AddrInfoFromString(addr)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
pool[addr] = ad
|
||||||
|
}
|
||||||
|
logger.Info().Int("pool_size", len(pool)).Msg("[native] resolve — pool ready")
|
||||||
|
return pool
|
||||||
|
}
|
||||||
|
|
||||||
|
// replaceStaticIndexers atomically replaces the active indexer pool.
|
||||||
|
// Peers no longer in next have their heartbeat streams closed so the SendHeartbeat
|
||||||
|
// goroutine stops sending to them on the next tick.
|
||||||
|
func replaceStaticIndexers(next map[string]*pp.AddrInfo) {
|
||||||
|
StreamMuIndexes.Lock()
|
||||||
|
defer StreamMuIndexes.Unlock()
|
||||||
|
for addr, ad := range next {
|
||||||
StaticIndexers[addr] = ad
|
StaticIndexers[addr] = ad
|
||||||
}
|
}
|
||||||
|
|
||||||
if minIndexer > 0 && len(StaticIndexers) < minIndexer {
|
|
||||||
return errors.New("not enough majority-confirmed indexers available")
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// clientSideConsensus challenges a candidate list to ALL configured native peers
|
// clientSideConsensus challenges a candidate list to ALL configured native peers
|
||||||
// in parallel. Each native replies with the candidates it trusts plus extras it
|
// in parallel. Each native replies with the candidates it trusts plus extras it
|
||||||
// recommends. An indexer is confirmed when strictly more than 50% of responding
|
// recommends. An indexer is confirmed when strictly more than 50% of responding
|
||||||
// natives trust it. The remaining addresses from native suggestions are returned
|
// natives trust it.
|
||||||
// as suggestions for a possible second round.
|
|
||||||
func clientSideConsensus(h host.Host, candidates []string) (confirmed []string, suggestions []string) {
|
func clientSideConsensus(h host.Host, candidates []string) (confirmed []string, suggestions []string) {
|
||||||
if len(candidates) == 0 {
|
if len(candidates) == 0 {
|
||||||
return nil, nil
|
return nil, nil
|
||||||
@@ -201,7 +314,6 @@ func clientSideConsensus(h host.Host, candidates []string) (confirmed []string,
|
|||||||
StreamNativeMu.RUnlock()
|
StreamNativeMu.RUnlock()
|
||||||
|
|
||||||
if len(peers) == 0 {
|
if len(peers) == 0 {
|
||||||
// No natives to challenge: trust candidates as-is.
|
|
||||||
return candidates, nil
|
return candidates, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -239,13 +351,12 @@ func clientSideConsensus(h host.Host, candidates []string) (confirmed []string,
|
|||||||
}(ad)
|
}(ad)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Collect responses up to consensusCollectTimeout.
|
|
||||||
timer := time.NewTimer(consensusCollectTimeout)
|
timer := time.NewTimer(consensusCollectTimeout)
|
||||||
defer timer.Stop()
|
defer timer.Stop()
|
||||||
|
|
||||||
trustedCounts := map[string]int{}
|
trustedCounts := map[string]int{}
|
||||||
suggestionPool := map[string]struct{}{}
|
suggestionPool := map[string]struct{}{}
|
||||||
total := 0 // counts only natives that actually responded
|
total := 0
|
||||||
collected := 0
|
collected := 0
|
||||||
|
|
||||||
collect:
|
collect:
|
||||||
@@ -254,7 +365,7 @@ collect:
|
|||||||
case r := <-ch:
|
case r := <-ch:
|
||||||
collected++
|
collected++
|
||||||
if !r.responded {
|
if !r.responded {
|
||||||
continue // timeout / error: skip, do not count as vote
|
continue
|
||||||
}
|
}
|
||||||
total++
|
total++
|
||||||
seen := map[string]struct{}{}
|
seen := map[string]struct{}{}
|
||||||
@@ -273,13 +384,12 @@ collect:
|
|||||||
}
|
}
|
||||||
|
|
||||||
if total == 0 {
|
if total == 0 {
|
||||||
// No native responded: fall back to trusting the candidates as-is.
|
|
||||||
return candidates, nil
|
return candidates, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
confirmedSet := map[string]struct{}{}
|
confirmedSet := map[string]struct{}{}
|
||||||
for addr, count := range trustedCounts {
|
for addr, count := range trustedCounts {
|
||||||
if count*2 > total { // strictly >50%
|
if count*2 > total {
|
||||||
confirmed = append(confirmed, addr)
|
confirmed = append(confirmed, addr)
|
||||||
confirmedSet[addr] = struct{}{}
|
confirmedSet[addr] = struct{}{}
|
||||||
}
|
}
|
||||||
@@ -292,15 +402,17 @@ collect:
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
const ProtocolIndexerHeartbeat = "/opencloud/heartbeat/indexer/1.0"
|
|
||||||
|
|
||||||
// RegisterWithNative sends a one-shot registration to each configured native indexer.
|
// RegisterWithNative sends a one-shot registration to each configured native indexer.
|
||||||
// Should be called periodically every RecommendedHeartbeatInterval.
|
// Should be called periodically every RecommendedHeartbeatInterval.
|
||||||
func RegisterWithNative(h host.Host, nativeAddressesStr string) {
|
func RegisterWithNative(h host.Host, nativeAddressesStr string) {
|
||||||
logger := oclib.GetLogger()
|
logger := oclib.GetLogger()
|
||||||
myAddr := ""
|
myAddr := ""
|
||||||
if len(h.Addrs()) > 0 {
|
if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
|
||||||
myAddr = h.Addrs()[0].String() + "/p2p/" + h.ID().String()
|
myAddr = h.Addrs()[len(h.Addrs())-1].String() + "/p2p/" + h.ID().String()
|
||||||
|
}
|
||||||
|
if myAddr == "" {
|
||||||
|
logger.Warn().Msg("RegisterWithNative: no routable address yet, skipping")
|
||||||
|
return
|
||||||
}
|
}
|
||||||
reg := IndexerRegistration{
|
reg := IndexerRegistration{
|
||||||
PeerID: h.ID().String(),
|
PeerID: h.ID().String(),
|
||||||
@@ -334,16 +446,16 @@ func RegisterWithNative(h host.Host, nativeAddressesStr string) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// EnsureNativePeers populates StaticNatives from config and starts heartbeat
|
// EnsureNativePeers populates StaticNatives from config and starts a single
|
||||||
// connections to other natives. Safe to call multiple times; heartbeat is only
|
// heartbeat goroutine toward the native mesh. Safe to call multiple times;
|
||||||
// started once (when StaticNatives transitions from empty to non-empty).
|
// the heartbeat goroutine is started at most once (nativeMeshHeartbeatOnce).
|
||||||
func EnsureNativePeers(h host.Host) {
|
func EnsureNativePeers(h host.Host) {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
nativeAddrs := conf.GetConfig().NativeIndexerAddresses
|
nativeAddrs := conf.GetConfig().NativeIndexerAddresses
|
||||||
if nativeAddrs == "" {
|
if nativeAddrs == "" {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
StreamNativeMu.Lock()
|
StreamNativeMu.Lock()
|
||||||
wasEmpty := len(StaticNatives) == 0
|
|
||||||
for _, addr := range strings.Split(nativeAddrs, ",") {
|
for _, addr := range strings.Split(nativeAddrs, ",") {
|
||||||
addr = strings.TrimSpace(addr)
|
addr = strings.TrimSpace(addr)
|
||||||
if addr == "" {
|
if addr == "" {
|
||||||
@@ -354,11 +466,312 @@ func EnsureNativePeers(h host.Host) {
|
|||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
StaticNatives[addr] = ad
|
StaticNatives[addr] = ad
|
||||||
|
logger.Info().Str("addr", addr).Msg("native: registered peer in native mesh")
|
||||||
}
|
}
|
||||||
StreamNativeMu.Unlock()
|
StreamNativeMu.Unlock()
|
||||||
|
// One heartbeat goroutine iterates over all of StaticNatives on each tick;
|
||||||
|
// starting one per address would multiply heartbeats by the native count.
|
||||||
|
nativeMeshHeartbeatOnce.Do(func() {
|
||||||
|
logger.Info().Msg("native: starting mesh heartbeat goroutine")
|
||||||
|
SendHeartbeat(context.Background(), ProtocolHeartbeat,
|
||||||
|
conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
if wasEmpty && len(StaticNatives) > 0 {
|
func StartNativeRegistration(h host.Host, nativeAddressesStr string) {
|
||||||
SendHeartbeat(context.Background(), ProtocolIndexerHeartbeat,
|
go func() {
|
||||||
conf.GetConfig().Name, h, StreamNatives, StaticNatives, 20*time.Second)
|
// Poll until a routable (non-loopback) address is available before the first
|
||||||
|
// registration attempt. libp2p may not have discovered external addresses yet
|
||||||
|
// at startup. Cap at 12 retries (~1 minute) so we don't spin indefinitely.
|
||||||
|
for i := 0; i < 12; i++ {
|
||||||
|
hasRoutable := false
|
||||||
|
if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
|
||||||
|
hasRoutable = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
|
||||||
|
if hasRoutable {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
time.Sleep(5 * time.Second)
|
||||||
|
}
|
||||||
|
RegisterWithNative(h, nativeAddressesStr)
|
||||||
|
t := time.NewTicker(RecommendedHeartbeatInterval)
|
||||||
|
defer t.Stop()
|
||||||
|
for range t.C {
|
||||||
|
RegisterWithNative(h, nativeAddressesStr)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Lost-native replacement ───────────────────────────────────────────────────
|
||||||
|
|
||||||
|
const (
|
||||||
|
// ProtocolNativeGetPeers lets a node/indexer ask a native for a random
|
||||||
|
// selection of that native's own native contacts (to replace a dead native).
|
||||||
|
ProtocolNativeGetPeers = "/opencloud/native/peers/1.0"
|
||||||
|
// ProtocolIndexerGetNatives lets nodes/indexers ask a connected indexer for
|
||||||
|
// its configured native addresses (fallback when no alive native responds).
|
||||||
|
ProtocolIndexerGetNatives = "/opencloud/indexer/natives/1.0"
|
||||||
|
// retryNativeInterval is how often retryLostNative polls a dead native.
|
||||||
|
retryNativeInterval = 30 * time.Second
|
||||||
|
)
|
||||||
|
|
||||||
|
// GetNativePeersRequest is sent to a native to ask for its known native contacts.
|
||||||
|
type GetNativePeersRequest struct {
|
||||||
|
Exclude []string `json:"exclude"`
|
||||||
|
Count int `json:"count"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetNativePeersResponse carries native addresses returned by a native's peer list.
|
||||||
|
type GetNativePeersResponse struct {
|
||||||
|
Peers []string `json:"peers"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetIndexerNativesRequest is sent to an indexer to ask for its configured native addresses.
|
||||||
|
type GetIndexerNativesRequest struct {
|
||||||
|
Exclude []string `json:"exclude"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetIndexerNativesResponse carries native addresses returned by an indexer.
|
||||||
|
type GetIndexerNativesResponse struct {
|
||||||
|
Natives []string `json:"natives"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// nativeHeartbeatNudge allows replenishNativesFromPeers to trigger an immediate
|
||||||
|
// native heartbeat tick after adding a replacement native to the pool.
|
||||||
|
var nativeHeartbeatNudge = make(chan struct{}, 1)
|
||||||
|
|
||||||
|
// NudgeNativeHeartbeat signals the native heartbeat goroutine to fire immediately.
|
||||||
|
func NudgeNativeHeartbeat() {
|
||||||
|
select {
|
||||||
|
case nativeHeartbeatNudge <- struct{}{}:
|
||||||
|
default: // nudge already pending, skip
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// replenishIndexersIfNeeded checks if the indexer pool is below the configured
|
||||||
|
// minimum (or empty) and, if so, asks the native mesh for replacements.
|
||||||
|
// Called whenever a native is recovered so the indexer pool is restored.
|
||||||
|
func replenishIndexersIfNeeded(h host.Host) {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
minIdx := conf.GetConfig().MinIndexer
|
||||||
|
if minIdx < 1 {
|
||||||
|
minIdx = 1
|
||||||
|
}
|
||||||
|
StreamMuIndexes.RLock()
|
||||||
|
indexerCount := len(StaticIndexers)
|
||||||
|
StreamMuIndexes.RUnlock()
|
||||||
|
if indexerCount < minIdx {
|
||||||
|
need := minIdx - indexerCount
|
||||||
|
logger.Info().Int("need", need).Int("current", indexerCount).Msg("[native] native recovered — replenishing indexer pool")
|
||||||
|
go replenishIndexersFromNative(h, need)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// replenishNativesFromPeers is called when the heartbeat to a native fails.
|
||||||
|
// Flow:
|
||||||
|
// 1. Ask other alive natives for one of their native contacts (ProtocolNativeGetPeers).
|
||||||
|
// 2. If none respond or return a new address, ask connected indexers (ProtocolIndexerGetNatives).
|
||||||
|
// 3. If no replacement found:
|
||||||
|
// - remaining > 1 → ignore (enough natives remain).
|
||||||
|
// - remaining ≤ 1 → start periodic retry (retryLostNative).
|
||||||
|
func replenishNativesFromPeers(h host.Host, lostAddr string, proto protocol.ID) {
|
||||||
|
if lostAddr == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
logger.Info().Str("lost", lostAddr).Msg("[native] replenish natives — start")
|
||||||
|
|
||||||
|
// Build exclude list: the lost addr + all currently alive natives.
|
||||||
|
// lostAddr has already been removed from StaticNatives by doTick.
|
||||||
|
StreamNativeMu.RLock()
|
||||||
|
remaining := len(StaticNatives)
|
||||||
|
exclude := make([]string, 0, remaining+1)
|
||||||
|
exclude = append(exclude, lostAddr)
|
||||||
|
for addr := range StaticNatives {
|
||||||
|
exclude = append(exclude, addr)
|
||||||
|
}
|
||||||
|
StreamNativeMu.RUnlock()
|
||||||
|
|
||||||
|
logger.Info().Int("remaining", remaining).Msg("[native] replenish natives — step 1: ask alive natives for a peer")
|
||||||
|
|
||||||
|
// Step 1: ask other alive natives for a replacement.
|
||||||
|
newAddr := fetchNativeFromNatives(h, exclude)
|
||||||
|
|
||||||
|
// Step 2: fallback — ask connected indexers for their native addresses.
|
||||||
|
if newAddr == "" {
|
||||||
|
logger.Info().Msg("[native] replenish natives — step 2: ask indexers for their native addresses")
|
||||||
|
newAddr = fetchNativeFromIndexers(h, exclude)
|
||||||
|
}
|
||||||
|
|
||||||
|
if newAddr != "" {
|
||||||
|
ad, err := pp.AddrInfoFromString(newAddr)
|
||||||
|
if err == nil {
|
||||||
|
StreamNativeMu.Lock()
|
||||||
|
StaticNatives[newAddr] = ad
|
||||||
|
StreamNativeMu.Unlock()
|
||||||
|
logger.Info().Str("new", newAddr).Msg("[native] replenish natives — replacement added, nudging heartbeat")
|
||||||
|
NudgeNativeHeartbeat()
|
||||||
|
replenishIndexersIfNeeded(h)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Step 3: no replacement found.
|
||||||
|
logger.Warn().Int("remaining", remaining).Msg("[native] replenish natives — no replacement found")
|
||||||
|
if remaining > 1 {
|
||||||
|
logger.Info().Msg("[native] replenish natives — enough natives remain, ignoring loss")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
// Last (or only) native — retry periodically.
|
||||||
|
logger.Info().Str("addr", lostAddr).Msg("[native] replenish natives — last native lost, starting periodic retry")
|
||||||
|
go retryLostNative(h, lostAddr, proto)
|
||||||
|
}
|
||||||
|
|
||||||
|
// fetchNativeFromNatives asks each alive native for one of its own native contacts
|
||||||
|
// not in exclude. Returns the first new address found or "" if none.
|
||||||
|
func fetchNativeFromNatives(h host.Host, exclude []string) string {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
excludeSet := make(map[string]struct{}, len(exclude))
|
||||||
|
for _, e := range exclude {
|
||||||
|
excludeSet[e] = struct{}{}
|
||||||
|
}
|
||||||
|
|
||||||
|
StreamNativeMu.RLock()
|
||||||
|
natives := make([]*pp.AddrInfo, 0, len(StaticNatives))
|
||||||
|
for _, ad := range StaticNatives {
|
||||||
|
natives = append(natives, ad)
|
||||||
|
}
|
||||||
|
StreamNativeMu.RUnlock()
|
||||||
|
|
||||||
|
rand.Shuffle(len(natives), func(i, j int) { natives[i], natives[j] = natives[j], natives[i] })
|
||||||
|
|
||||||
|
for _, ad := range natives {
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
if err := h.Connect(ctx, *ad); err != nil {
|
||||||
|
cancel()
|
||||||
|
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — connect failed")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetPeers)
|
||||||
|
cancel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — stream failed")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
req := GetNativePeersRequest{Exclude: exclude, Count: 1}
|
||||||
|
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||||
|
s.Close()
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
var resp GetNativePeersResponse
|
||||||
|
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||||
|
s.Close()
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
s.Close()
|
||||||
|
for _, peer := range resp.Peers {
|
||||||
|
if _, excluded := excludeSet[peer]; !excluded && peer != "" {
|
||||||
|
logger.Info().Str("from", ad.ID.String()).Str("new", peer).Msg("[native] fetch native peers — got replacement")
|
||||||
|
return peer
|
||||||
|
}
|
||||||
|
}
|
||||||
|
logger.Debug().Str("native", ad.ID.String()).Msg("[native] fetch native peers — no new native from this peer")
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
// fetchNativeFromIndexers asks connected indexers for their configured native addresses,
|
||||||
|
// returning the first one not in exclude.
|
||||||
|
func fetchNativeFromIndexers(h host.Host, exclude []string) string {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
excludeSet := make(map[string]struct{}, len(exclude))
|
||||||
|
for _, e := range exclude {
|
||||||
|
excludeSet[e] = struct{}{}
|
||||||
|
}
|
||||||
|
|
||||||
|
StreamMuIndexes.RLock()
|
||||||
|
indexers := make([]*pp.AddrInfo, 0, len(StaticIndexers))
|
||||||
|
for _, ad := range StaticIndexers {
|
||||||
|
indexers = append(indexers, ad)
|
||||||
|
}
|
||||||
|
StreamMuIndexes.RUnlock()
|
||||||
|
|
||||||
|
rand.Shuffle(len(indexers), func(i, j int) { indexers[i], indexers[j] = indexers[j], indexers[i] })
|
||||||
|
|
||||||
|
for _, ad := range indexers {
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
if err := h.Connect(ctx, *ad); err != nil {
|
||||||
|
cancel()
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
s, err := h.NewStream(ctx, ad.ID, ProtocolIndexerGetNatives)
|
||||||
|
cancel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn().Str("indexer", ad.ID.String()).Err(err).Msg("[native] fetch indexer natives — stream failed")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
req := GetIndexerNativesRequest{Exclude: exclude}
|
||||||
|
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||||
|
s.Close()
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
var resp GetIndexerNativesResponse
|
||||||
|
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||||
|
s.Close()
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
s.Close()
|
||||||
|
for _, nativeAddr := range resp.Natives {
|
||||||
|
if _, excluded := excludeSet[nativeAddr]; !excluded && nativeAddr != "" {
|
||||||
|
logger.Info().Str("indexer", ad.ID.String()).Str("native", nativeAddr).Msg("[native] fetch indexer natives — got native")
|
||||||
|
return nativeAddr
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
logger.Warn().Msg("[native] fetch indexer natives — no native found from indexers")
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
// retryLostNative periodically retries connecting to a lost native address until
|
||||||
|
// it becomes reachable again or was already restored by another path.
|
||||||
|
func retryLostNative(h host.Host, addr string, nativeProto protocol.ID) {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
logger.Info().Str("addr", addr).Msg("[native] retry — periodic retry for lost native started")
|
||||||
|
t := time.NewTicker(retryNativeInterval)
|
||||||
|
defer t.Stop()
|
||||||
|
for range t.C {
|
||||||
|
StreamNativeMu.RLock()
|
||||||
|
_, alreadyRestored := StaticNatives[addr]
|
||||||
|
StreamNativeMu.RUnlock()
|
||||||
|
if alreadyRestored {
|
||||||
|
logger.Info().Str("addr", addr).Msg("[native] retry — native already restored, stopping retry")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
ad, err := pp.AddrInfoFromString(addr)
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn().Str("addr", addr).Msg("[native] retry — invalid addr, stopping retry")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
err = h.Connect(ctx, *ad)
|
||||||
|
cancel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn().Str("addr", addr).Msg("[native] retry — still unreachable")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
// Reachable again — add back to pool.
|
||||||
|
StreamNativeMu.Lock()
|
||||||
|
StaticNatives[addr] = ad
|
||||||
|
StreamNativeMu.Unlock()
|
||||||
|
logger.Info().Str("addr", addr).Msg("[native] retry — native reconnected and added back to pool")
|
||||||
|
NudgeNativeHeartbeat()
|
||||||
|
replenishIndexersIfNeeded(h)
|
||||||
|
if nativeProto == ProtocolNativeGetIndexers {
|
||||||
|
StartNativeRegistration(h, addr) // register back
|
||||||
|
}
|
||||||
|
return
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -24,17 +24,16 @@ func ExtractIP(addr string) (net.IP, error) {
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
ips, err := ma.ValueForProtocol(multiaddr.P_IP4) // or P_IP6
|
ipStr, err := ma.ValueForProtocol(multiaddr.P_IP4)
|
||||||
|
if err != nil {
|
||||||
|
ipStr, err = ma.ValueForProtocol(multiaddr.P_IP6)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
host, _, err := net.SplitHostPort(ips)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
}
|
||||||
ip := net.ParseIP(host)
|
ip := net.ParseIP(ipStr)
|
||||||
if ip == nil {
|
if ip == nil {
|
||||||
return nil, fmt.Errorf("invalid IP: %s", host)
|
return nil, fmt.Errorf("invalid IP: %s", ipStr)
|
||||||
}
|
}
|
||||||
return ip, nil
|
return ip, nil
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -5,8 +5,9 @@ import (
|
|||||||
"encoding/base64"
|
"encoding/base64"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"errors"
|
"errors"
|
||||||
"fmt"
|
"oc-discovery/conf"
|
||||||
"oc-discovery/daemons/node/common"
|
"oc-discovery/daemons/node/common"
|
||||||
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
oclib "cloud.o-forge.io/core/oc-lib"
|
oclib "cloud.o-forge.io/core/oc-lib"
|
||||||
@@ -18,17 +19,21 @@ import (
|
|||||||
"github.com/libp2p/go-libp2p/core/peer"
|
"github.com/libp2p/go-libp2p/core/peer"
|
||||||
)
|
)
|
||||||
|
|
||||||
type PeerRecord struct {
|
type PeerRecordPayload struct {
|
||||||
Name string `json:"name"`
|
Name string `json:"name"`
|
||||||
DID string `json:"did"` // real PEER ID
|
DID string `json:"did"`
|
||||||
PeerID string `json:"peer_id"`
|
|
||||||
PubKey []byte `json:"pub_key"`
|
PubKey []byte `json:"pub_key"`
|
||||||
|
ExpiryDate time.Time `json:"expiry_date"`
|
||||||
|
}
|
||||||
|
|
||||||
|
type PeerRecord struct {
|
||||||
|
PeerRecordPayload
|
||||||
|
PeerID string `json:"peer_id"`
|
||||||
APIUrl string `json:"api_url"`
|
APIUrl string `json:"api_url"`
|
||||||
StreamAddress string `json:"stream_address"`
|
StreamAddress string `json:"stream_address"`
|
||||||
NATSAddress string `json:"nats_address"`
|
NATSAddress string `json:"nats_address"`
|
||||||
WalletAddress string `json:"wallet_address"`
|
WalletAddress string `json:"wallet_address"`
|
||||||
Signature []byte `json:"signature"`
|
Signature []byte `json:"signature"`
|
||||||
ExpiryDate time.Time `json:"expiry_date"`
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func (p *PeerRecord) Sign() error {
|
func (p *PeerRecord) Sign() error {
|
||||||
@@ -36,13 +41,7 @@ func (p *PeerRecord) Sign() error {
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
dht := PeerRecord{
|
payload, _ := json.Marshal(p.PeerRecordPayload)
|
||||||
Name: p.Name,
|
|
||||||
DID: p.DID,
|
|
||||||
PubKey: p.PubKey,
|
|
||||||
ExpiryDate: p.ExpiryDate,
|
|
||||||
}
|
|
||||||
payload, _ := json.Marshal(dht)
|
|
||||||
b, err := common.Sign(priv, payload)
|
b, err := common.Sign(priv, payload)
|
||||||
p.Signature = b
|
p.Signature = b
|
||||||
return err
|
return err
|
||||||
@@ -51,19 +50,11 @@ func (p *PeerRecord) Sign() error {
|
|||||||
func (p *PeerRecord) Verify() (crypto.PubKey, error) {
|
func (p *PeerRecord) Verify() (crypto.PubKey, error) {
|
||||||
pubKey, err := crypto.UnmarshalPublicKey(p.PubKey) // retrieve pub key in message
|
pubKey, err := crypto.UnmarshalPublicKey(p.PubKey) // retrieve pub key in message
|
||||||
if err != nil {
|
if err != nil {
|
||||||
fmt.Println("UnmarshalPublicKey")
|
|
||||||
return pubKey, err
|
return pubKey, err
|
||||||
}
|
}
|
||||||
dht := PeerRecord{
|
payload, _ := json.Marshal(p.PeerRecordPayload)
|
||||||
Name: p.Name,
|
|
||||||
DID: p.DID,
|
|
||||||
PubKey: p.PubKey,
|
|
||||||
ExpiryDate: p.ExpiryDate,
|
|
||||||
}
|
|
||||||
payload, _ := json.Marshal(dht)
|
|
||||||
|
|
||||||
if ok, _ := common.Verify(pubKey, payload, p.Signature); !ok { // verify minimal message was sign per pubKey
|
if ok, _ := pubKey.Verify(payload, p.Signature); !ok { // verify minimal message was sign per pubKey
|
||||||
fmt.Println("Verify")
|
|
||||||
return pubKey, errors.New("invalid signature")
|
return pubKey, errors.New("invalid signature")
|
||||||
}
|
}
|
||||||
return pubKey, nil
|
return pubKey, nil
|
||||||
@@ -114,6 +105,8 @@ func (pr *PeerRecord) ExtractPeer(ourkey string, key string, pubKey crypto.PubKe
|
|||||||
type GetValue struct {
|
type GetValue struct {
|
||||||
Key string `json:"key"`
|
Key string `json:"key"`
|
||||||
PeerID peer.ID `json:"peer_id"`
|
PeerID peer.ID `json:"peer_id"`
|
||||||
|
Name string `json:"name,omitempty"`
|
||||||
|
Search bool `json:"search,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
type GetResponse struct {
|
type GetResponse struct {
|
||||||
@@ -125,122 +118,233 @@ func (ix *IndexerService) genKey(did string) string {
|
|||||||
return "/node/" + did
|
return "/node/" + did
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (ix *IndexerService) genNameKey(name string) string {
|
||||||
|
return "/name/" + name
|
||||||
|
}
|
||||||
|
|
||||||
|
func (ix *IndexerService) genPIDKey(peerID string) string {
|
||||||
|
return "/pid/" + peerID
|
||||||
|
}
|
||||||
|
|
||||||
func (ix *IndexerService) initNodeHandler() {
|
func (ix *IndexerService) initNodeHandler() {
|
||||||
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleNodeHeartbeat)
|
logger := oclib.GetLogger()
|
||||||
|
logger.Info().Msg("Init Node Handler")
|
||||||
|
// Each heartbeat from a node carries a freshly signed PeerRecord.
|
||||||
|
// Republish it to the DHT so the record never expires as long as the node
|
||||||
|
// is alive — no separate publish stream needed from the node side.
|
||||||
|
ix.AfterHeartbeat = func(pid peer.ID) {
|
||||||
|
ctx1, cancel1 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel1()
|
||||||
|
res, err := ix.DHT.GetValue(ctx1, ix.genPIDKey(pid.String()))
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn().Err(err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
did := string(res)
|
||||||
|
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel2()
|
||||||
|
res, err = ix.DHT.GetValue(ctx2, ix.genKey(did))
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn().Err(err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
var rec PeerRecord
|
||||||
|
if err := json.Unmarshal(res, &rec); err != nil {
|
||||||
|
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record unmarshal failed")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if _, err := rec.Verify(); err != nil {
|
||||||
|
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record signature invalid")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
data, err := json.Marshal(rec)
|
||||||
|
if err != nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
logger.Info().Msg("REFRESH PutValue " + ix.genKey(rec.DID))
|
||||||
|
if err := ix.DHT.PutValue(ctx, ix.genKey(rec.DID), data); err != nil {
|
||||||
|
logger.Warn().Err(err).Str("did", rec.DID).Msg("indexer: DHT refresh failed")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if rec.Name != "" {
|
||||||
|
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID))
|
||||||
|
cancel2()
|
||||||
|
}
|
||||||
|
if rec.PeerID != "" {
|
||||||
|
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID))
|
||||||
|
cancel3()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat)
|
||||||
ix.Host.SetStreamHandler(common.ProtocolPublish, ix.handleNodePublish)
|
ix.Host.SetStreamHandler(common.ProtocolPublish, ix.handleNodePublish)
|
||||||
ix.Host.SetStreamHandler(common.ProtocolGet, ix.handleNodeGet)
|
ix.Host.SetStreamHandler(common.ProtocolGet, ix.handleNodeGet)
|
||||||
|
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
|
||||||
}
|
}
|
||||||
|
|
||||||
func (ix *IndexerService) handleNodePublish(s network.Stream) {
|
func (ix *IndexerService) handleNodePublish(s network.Stream) {
|
||||||
defer s.Close()
|
defer s.Close()
|
||||||
logger := oclib.GetLogger()
|
logger := oclib.GetLogger()
|
||||||
for {
|
|
||||||
var rec PeerRecord
|
var rec PeerRecord
|
||||||
if err := json.NewDecoder(s).Decode(&rec); err != nil {
|
if err := json.NewDecoder(s).Decode(&rec); err != nil {
|
||||||
logger.Err(err)
|
logger.Err(err)
|
||||||
continue
|
return
|
||||||
}
|
}
|
||||||
rec2 := PeerRecord{
|
if _, err := rec.Verify(); err != nil {
|
||||||
Name: rec.Name,
|
|
||||||
DID: rec.DID, // REAL PEER ID
|
|
||||||
PubKey: rec.PubKey,
|
|
||||||
PeerID: rec.PeerID,
|
|
||||||
}
|
|
||||||
if _, err := rec2.Verify(); err != nil {
|
|
||||||
logger.Err(err)
|
logger.Err(err)
|
||||||
continue
|
return
|
||||||
}
|
}
|
||||||
if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) { // already expired
|
if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) {
|
||||||
logger.Err(errors.New(rec.PeerID + " is expired."))
|
logger.Err(errors.New(rec.PeerID + " is expired."))
|
||||||
continue
|
return
|
||||||
}
|
}
|
||||||
pid, err := peer.Decode(rec.PeerID)
|
pid, err := peer.Decode(rec.PeerID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
continue
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
ix.StreamMU.Lock()
|
ix.StreamMU.Lock()
|
||||||
|
defer ix.StreamMU.Unlock()
|
||||||
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
|
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
|
||||||
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
|
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
|
||||||
}
|
}
|
||||||
streams := ix.StreamRecords[common.ProtocolHeartbeat]
|
streams := ix.StreamRecords[common.ProtocolHeartbeat]
|
||||||
|
|
||||||
if srec, ok := streams[pid]; ok {
|
if srec, ok := streams[pid]; ok {
|
||||||
srec.DID = rec.DID
|
srec.DID = rec.DID
|
||||||
srec.Record = rec
|
srec.Record = rec
|
||||||
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
|
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
|
||||||
} else {
|
|
||||||
ix.StreamMU.Unlock()
|
|
||||||
logger.Err(errors.New("no heartbeat"))
|
|
||||||
continue
|
|
||||||
}
|
}
|
||||||
ix.StreamMU.Unlock()
|
|
||||||
|
|
||||||
key := ix.genKey(rec.DID)
|
key := ix.genKey(rec.DID)
|
||||||
|
|
||||||
data, err := json.Marshal(rec)
|
data, err := json.Marshal(rec)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
logger.Err(err)
|
logger.Err(err)
|
||||||
continue
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
||||||
logger.Err(err)
|
logger.Err(err)
|
||||||
cancel()
|
cancel()
|
||||||
continue
|
return
|
||||||
}
|
}
|
||||||
cancel()
|
cancel()
|
||||||
break // response... so quit
|
|
||||||
|
// Secondary index: /name/<name> → DID, so peers can resolve by human-readable name.
|
||||||
|
if rec.Name != "" {
|
||||||
|
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
if err := ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID)); err != nil {
|
||||||
|
logger.Err(err).Str("name", rec.Name).Msg("indexer: failed to write name index")
|
||||||
|
}
|
||||||
|
cancel2()
|
||||||
|
}
|
||||||
|
// Secondary index: /pid/<peerID> → DID, so peers can resolve by libp2p PeerID.
|
||||||
|
if rec.PeerID != "" {
|
||||||
|
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
if err := ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID)); err != nil {
|
||||||
|
logger.Err(err).Str("pid", rec.PeerID).Msg("indexer: failed to write pid index")
|
||||||
|
}
|
||||||
|
cancel3()
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func (ix *IndexerService) handleNodeGet(s network.Stream) {
|
func (ix *IndexerService) handleNodeGet(s network.Stream) {
|
||||||
|
|
||||||
defer s.Close()
|
defer s.Close()
|
||||||
logger := oclib.GetLogger()
|
logger := oclib.GetLogger()
|
||||||
for {
|
|
||||||
var req GetValue
|
var req GetValue
|
||||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||||
logger.Err(err)
|
logger.Err(err)
|
||||||
continue
|
return
|
||||||
}
|
}
|
||||||
ix.StreamMU.Lock()
|
|
||||||
|
|
||||||
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
|
resp := GetResponse{Found: false, Records: map[string]PeerRecord{}}
|
||||||
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
|
|
||||||
}
|
|
||||||
resp := GetResponse{
|
|
||||||
Found: false,
|
|
||||||
Records: map[string]PeerRecord{},
|
|
||||||
}
|
|
||||||
streams := ix.StreamRecords[common.ProtocolHeartbeat]
|
|
||||||
|
|
||||||
key := ix.genKey(req.Key)
|
keys := []string{}
|
||||||
// simple lookup by PeerID (or DID)
|
// Name substring search — scan in-memory connected nodes first, then DHT exact match.
|
||||||
|
if req.Name != "" {
|
||||||
|
if req.Search {
|
||||||
|
for _, did := range ix.LookupNameIndex(strings.ToLower(req.Name)) {
|
||||||
|
keys = append(keys, did)
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// 2. DHT exact-name lookup: covers nodes that published but aren't currently connected.
|
||||||
|
nameCtx, nameCancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
if ch, err := ix.DHT.SearchValue(nameCtx, ix.genNameKey(req.Name)); err == nil {
|
||||||
|
for did := range ch {
|
||||||
|
keys = append(keys, string(did))
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
nameCancel()
|
||||||
|
}
|
||||||
|
} else if req.PeerID != "" {
|
||||||
|
pidCtx, pidCancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
if did, err := ix.DHT.GetValue(pidCtx, ix.genPIDKey(req.PeerID.String())); err == nil {
|
||||||
|
keys = append(keys, string(did))
|
||||||
|
}
|
||||||
|
pidCancel()
|
||||||
|
} else {
|
||||||
|
keys = append(keys, req.Key)
|
||||||
|
}
|
||||||
|
|
||||||
|
// DHT record fetch by DID key (covers exact-name and PeerID paths).
|
||||||
|
if len(keys) > 0 {
|
||||||
|
for _, k := range keys {
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
recBytes, err := ix.DHT.SearchValue(ctx, key)
|
c, err := ix.DHT.GetValue(ctx, ix.genKey(k))
|
||||||
if err != nil {
|
|
||||||
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT")
|
|
||||||
cancel()
|
cancel()
|
||||||
}
|
if err == nil {
|
||||||
cancel()
|
|
||||||
for c := range recBytes {
|
|
||||||
var rec PeerRecord
|
var rec PeerRecord
|
||||||
if err := json.Unmarshal(c, &rec); err != nil || rec.PeerID != req.PeerID.String() {
|
if json.Unmarshal(c, &rec) == nil {
|
||||||
|
// Filter by PeerID only when one was explicitly specified.
|
||||||
|
if req.PeerID == "" || rec.PeerID == req.PeerID.String() {
|
||||||
|
resp.Records[rec.PeerID] = rec
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else if req.Name == "" && req.PeerID == "" {
|
||||||
|
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT " + req.Key)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resp.Found = len(resp.Records) > 0
|
||||||
|
_ = json.NewEncoder(s).Encode(resp)
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleGetNatives returns this indexer's configured native addresses,
|
||||||
|
// excluding any in the request's Exclude list.
|
||||||
|
func (ix *IndexerService) handleGetNatives(s network.Stream) {
|
||||||
|
defer s.Close()
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
|
||||||
|
var req common.GetIndexerNativesRequest
|
||||||
|
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||||
|
logger.Err(err).Msg("indexer get natives: decode")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
excludeSet := make(map[string]struct{}, len(req.Exclude))
|
||||||
|
for _, e := range req.Exclude {
|
||||||
|
excludeSet[e] = struct{}{}
|
||||||
|
}
|
||||||
|
|
||||||
|
resp := common.GetIndexerNativesResponse{}
|
||||||
|
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
|
||||||
|
addr = strings.TrimSpace(addr)
|
||||||
|
if addr == "" {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
resp.Found = true
|
if _, excluded := excludeSet[addr]; !excluded {
|
||||||
resp.Records[rec.PeerID] = rec
|
resp.Natives = append(resp.Natives, addr)
|
||||||
if srec, ok := streams[req.PeerID]; ok {
|
|
||||||
srec.DID = rec.DID
|
|
||||||
srec.Record = rec
|
|
||||||
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// Not found
|
|
||||||
_ = json.NewEncoder(s).Encode(resp)
|
if err := json.NewEncoder(s).Encode(resp); err != nil {
|
||||||
ix.StreamMU.Unlock()
|
logger.Err(err).Msg("indexer get natives: encode response")
|
||||||
break // response... so quit
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
168
daemons/node/indexer/nameindex.go
Normal file
168
daemons/node/indexer/nameindex.go
Normal file
@@ -0,0 +1,168 @@
|
|||||||
|
package indexer
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"strings"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"oc-discovery/daemons/node/common"
|
||||||
|
|
||||||
|
oclib "cloud.o-forge.io/core/oc-lib"
|
||||||
|
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||||
|
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TopicNameIndex is the GossipSub topic shared by regular indexers to exchange
|
||||||
|
// add/delete events for the distributed name→peerID mapping.
|
||||||
|
const TopicNameIndex = "oc-name-index"
|
||||||
|
|
||||||
|
// nameIndexDedupWindow suppresses re-emission of the same (action, name, peerID)
|
||||||
|
// tuple within this window, reducing duplicate events when a node is registered
|
||||||
|
// with multiple indexers simultaneously.
|
||||||
|
const nameIndexDedupWindow = 30 * time.Second
|
||||||
|
|
||||||
|
// NameIndexAction indicates whether a name mapping is being added or removed.
|
||||||
|
type NameIndexAction string
|
||||||
|
|
||||||
|
const (
|
||||||
|
NameIndexAdd NameIndexAction = "add"
|
||||||
|
NameIndexDelete NameIndexAction = "delete"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NameIndexEvent is published on TopicNameIndex by each indexer when a node
|
||||||
|
// registers (add) or is evicted by the GC (delete).
|
||||||
|
type NameIndexEvent struct {
|
||||||
|
Action NameIndexAction `json:"action"`
|
||||||
|
Name string `json:"name"`
|
||||||
|
PeerID string `json:"peer_id"`
|
||||||
|
DID string `json:"did"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// nameIndexState holds the local in-memory name index and the sender-side
|
||||||
|
// deduplication tracker.
|
||||||
|
type nameIndexState struct {
|
||||||
|
// index: name → peerID → DID, built from events received from all indexers.
|
||||||
|
index map[string]map[string]string
|
||||||
|
indexMu sync.RWMutex
|
||||||
|
|
||||||
|
// emitted tracks the last emission time for each (action, name, peerID) key
|
||||||
|
// to suppress duplicates within nameIndexDedupWindow.
|
||||||
|
emitted map[string]time.Time
|
||||||
|
emittedMu sync.Mutex
|
||||||
|
}
|
||||||
|
|
||||||
|
// shouldEmit returns true if the (action, name, peerID) tuple has not been
|
||||||
|
// emitted within nameIndexDedupWindow, updating the tracker if so.
|
||||||
|
func (s *nameIndexState) shouldEmit(action NameIndexAction, name, peerID string) bool {
|
||||||
|
key := string(action) + ":" + name + ":" + peerID
|
||||||
|
s.emittedMu.Lock()
|
||||||
|
defer s.emittedMu.Unlock()
|
||||||
|
if t, ok := s.emitted[key]; ok && time.Since(t) < nameIndexDedupWindow {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
s.emitted[key] = time.Now()
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
// onEvent applies a received NameIndexEvent to the local index.
|
||||||
|
// "add" inserts/updates the mapping; "delete" removes it.
|
||||||
|
// Operations are idempotent — duplicate events from multiple indexers are harmless.
|
||||||
|
func (s *nameIndexState) onEvent(evt NameIndexEvent) {
|
||||||
|
if evt.Name == "" || evt.PeerID == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
s.indexMu.Lock()
|
||||||
|
defer s.indexMu.Unlock()
|
||||||
|
switch evt.Action {
|
||||||
|
case NameIndexAdd:
|
||||||
|
if s.index[evt.Name] == nil {
|
||||||
|
s.index[evt.Name] = map[string]string{}
|
||||||
|
}
|
||||||
|
s.index[evt.Name][evt.PeerID] = evt.DID
|
||||||
|
case NameIndexDelete:
|
||||||
|
if s.index[evt.Name] != nil {
|
||||||
|
delete(s.index[evt.Name], evt.PeerID)
|
||||||
|
if len(s.index[evt.Name]) == 0 {
|
||||||
|
delete(s.index, evt.Name)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// initNameIndex joins TopicNameIndex and starts consuming events.
|
||||||
|
// Must be called after ix.PS is ready.
|
||||||
|
func (ix *IndexerService) initNameIndex(ps *pubsub.PubSub) {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
ix.nameIndex = &nameIndexState{
|
||||||
|
index: map[string]map[string]string{},
|
||||||
|
emitted: map[string]time.Time{},
|
||||||
|
}
|
||||||
|
|
||||||
|
ps.RegisterTopicValidator(TopicNameIndex, func(_ context.Context, _ pp.ID, _ *pubsub.Message) bool {
|
||||||
|
return true
|
||||||
|
})
|
||||||
|
topic, err := ps.Join(TopicNameIndex)
|
||||||
|
if err != nil {
|
||||||
|
logger.Err(err).Msg("name index: failed to join topic")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Lock()
|
||||||
|
ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex] = topic
|
||||||
|
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Unlock()
|
||||||
|
|
||||||
|
common.SubscribeEvents(
|
||||||
|
ix.LongLivedStreamRecordedService.LongLivedPubSubService,
|
||||||
|
context.Background(),
|
||||||
|
TopicNameIndex,
|
||||||
|
-1,
|
||||||
|
func(_ context.Context, evt NameIndexEvent, _ string) {
|
||||||
|
ix.nameIndex.onEvent(evt)
|
||||||
|
},
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
// publishNameEvent emits a NameIndexEvent on TopicNameIndex, subject to the
|
||||||
|
// sender-side deduplication window.
|
||||||
|
func (ix *IndexerService) publishNameEvent(action NameIndexAction, name, peerID, did string) {
|
||||||
|
if ix.nameIndex == nil || name == "" || peerID == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if !ix.nameIndex.shouldEmit(action, name, peerID) {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RLock()
|
||||||
|
topic := ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex]
|
||||||
|
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RUnlock()
|
||||||
|
if topic == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
evt := NameIndexEvent{Action: action, Name: name, PeerID: peerID, DID: did}
|
||||||
|
b, err := json.Marshal(evt)
|
||||||
|
if err != nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
_ = topic.Publish(context.Background(), b)
|
||||||
|
}
|
||||||
|
|
||||||
|
// LookupNameIndex searches the distributed name index for peers whose name
|
||||||
|
// contains needle (case-insensitive). Returns peerID → DID for matched peers.
|
||||||
|
// Returns nil if the name index is not initialised (e.g. native indexers).
|
||||||
|
func (ix *IndexerService) LookupNameIndex(needle string) map[string]string {
|
||||||
|
if ix.nameIndex == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
result := map[string]string{}
|
||||||
|
needleLow := strings.ToLower(needle)
|
||||||
|
ix.nameIndex.indexMu.RLock()
|
||||||
|
defer ix.nameIndex.indexMu.RUnlock()
|
||||||
|
for name, peers := range ix.nameIndex.index {
|
||||||
|
if strings.Contains(strings.ToLower(name), needleLow) {
|
||||||
|
for peerID, did := range peers {
|
||||||
|
result[peerID] = did
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return result
|
||||||
|
}
|
||||||
@@ -4,7 +4,10 @@ import (
|
|||||||
"context"
|
"context"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"errors"
|
"errors"
|
||||||
|
"fmt"
|
||||||
"math/rand"
|
"math/rand"
|
||||||
|
"slices"
|
||||||
|
"strings"
|
||||||
"sync"
|
"sync"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
@@ -12,19 +15,24 @@ import (
|
|||||||
|
|
||||||
oclib "cloud.o-forge.io/core/oc-lib"
|
oclib "cloud.o-forge.io/core/oc-lib"
|
||||||
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||||
"github.com/libp2p/go-libp2p/core/host"
|
|
||||||
"github.com/libp2p/go-libp2p/core/network"
|
"github.com/libp2p/go-libp2p/core/network"
|
||||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||||
)
|
)
|
||||||
|
|
||||||
const (
|
const (
|
||||||
// IndexerTTL is 10% above the recommended 60s heartbeat interval.
|
// IndexerTTL is the lifetime of a live-indexer cache entry. Set to 50% above
|
||||||
IndexerTTL = 66 * time.Second
|
// the recommended 60s heartbeat interval so a single delayed renewal does not
|
||||||
|
// evict a healthy indexer from the native's cache.
|
||||||
|
IndexerTTL = 90 * time.Second
|
||||||
// offloadInterval is how often the native checks if it can release responsible peers.
|
// offloadInterval is how often the native checks if it can release responsible peers.
|
||||||
offloadInterval = 30 * time.Second
|
offloadInterval = 30 * time.Second
|
||||||
// dhtRefreshInterval is how often the background goroutine queries the DHT for
|
// dhtRefreshInterval is how often the background goroutine queries the DHT for
|
||||||
// known-but-expired indexer entries (written by neighbouring natives).
|
// known-but-expired indexer entries (written by neighbouring natives).
|
||||||
dhtRefreshInterval = 30 * time.Second
|
dhtRefreshInterval = 30 * time.Second
|
||||||
|
// maxFallbackPeers caps how many peers the native will accept in self-delegation
|
||||||
|
// mode. Beyond this limit the native refuses to act as a fallback indexer so it
|
||||||
|
// is not overwhelmed during prolonged indexer outages.
|
||||||
|
maxFallbackPeers = 50
|
||||||
)
|
)
|
||||||
|
|
||||||
// liveIndexerEntry tracks a registered indexer in the native's in-memory cache and DHT.
|
// liveIndexerEntry tracks a registered indexer in the native's in-memory cache and DHT.
|
||||||
@@ -43,7 +51,7 @@ type NativeState struct {
|
|||||||
// knownPeerIDs accumulates all indexer PeerIDs ever seen (local stream or gossip).
|
// knownPeerIDs accumulates all indexer PeerIDs ever seen (local stream or gossip).
|
||||||
// Used by refreshIndexersFromDHT to re-hydrate expired entries from the shared DHT,
|
// Used by refreshIndexersFromDHT to re-hydrate expired entries from the shared DHT,
|
||||||
// including entries written by other natives.
|
// including entries written by other natives.
|
||||||
knownPeerIDs map[string]struct{}
|
knownPeerIDs map[string]string
|
||||||
knownMu sync.RWMutex
|
knownMu sync.RWMutex
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -51,7 +59,7 @@ func newNativeState() *NativeState {
|
|||||||
return &NativeState{
|
return &NativeState{
|
||||||
liveIndexers: map[string]*liveIndexerEntry{},
|
liveIndexers: map[string]*liveIndexerEntry{},
|
||||||
responsiblePeers: map[pp.ID]struct{}{},
|
responsiblePeers: map[pp.ID]struct{}{},
|
||||||
knownPeerIDs: map[string]struct{}{},
|
knownPeerIDs: map[string]string{},
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -92,10 +100,12 @@ func (v IndexerRecordValidator) Select(_ string, values [][]byte) (int, error) {
|
|||||||
// Must be called after DHT is initialized.
|
// Must be called after DHT is initialized.
|
||||||
func (ix *IndexerService) InitNative() {
|
func (ix *IndexerService) InitNative() {
|
||||||
ix.Native = newNativeState()
|
ix.Native = newNativeState()
|
||||||
ix.Host.SetStreamHandler(common.ProtocolIndexerHeartbeat, ix.HandleNodeHeartbeat) // specific heartbeat for Indexer.
|
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat) // specific heartbeat for Indexer.
|
||||||
ix.Host.SetStreamHandler(common.ProtocolNativeSubscription, ix.handleNativeSubscription)
|
ix.Host.SetStreamHandler(common.ProtocolNativeSubscription, ix.handleNativeSubscription)
|
||||||
ix.Host.SetStreamHandler(common.ProtocolNativeGetIndexers, ix.handleNativeGetIndexers)
|
ix.Host.SetStreamHandler(common.ProtocolNativeGetIndexers, ix.handleNativeGetIndexers)
|
||||||
ix.Host.SetStreamHandler(common.ProtocolNativeConsensus, ix.handleNativeConsensus)
|
ix.Host.SetStreamHandler(common.ProtocolNativeConsensus, ix.handleNativeConsensus)
|
||||||
|
ix.Host.SetStreamHandler(common.ProtocolNativeGetPeers, ix.handleNativeGetPeers)
|
||||||
|
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
|
||||||
ix.subscribeIndexerRegistry()
|
ix.subscribeIndexerRegistry()
|
||||||
// Ensure long connections to other configured natives (native-to-native mesh).
|
// Ensure long connections to other configured natives (native-to-native mesh).
|
||||||
common.EnsureNativePeers(ix.Host)
|
common.EnsureNativePeers(ix.Host)
|
||||||
@@ -107,8 +117,15 @@ func (ix *IndexerService) InitNative() {
|
|||||||
// registered indexer PeerIDs to one another, enabling cross-native DHT discovery.
|
// registered indexer PeerIDs to one another, enabling cross-native DHT discovery.
|
||||||
func (ix *IndexerService) subscribeIndexerRegistry() {
|
func (ix *IndexerService) subscribeIndexerRegistry() {
|
||||||
logger := oclib.GetLogger()
|
logger := oclib.GetLogger()
|
||||||
ix.PS.RegisterTopicValidator(common.TopicIndexerRegistry, func(_ context.Context, _ pp.ID, _ *pubsub.Message) bool {
|
ix.PS.RegisterTopicValidator(common.TopicIndexerRegistry, func(_ context.Context, _ pp.ID, msg *pubsub.Message) bool {
|
||||||
return true
|
// Reject empty or syntactically invalid multiaddrs before they reach the
|
||||||
|
// message loop. A compromised native could otherwise gossip arbitrary data.
|
||||||
|
addr := string(msg.Data)
|
||||||
|
if addr == "" {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
_, err := pp.AddrInfoFromString(addr)
|
||||||
|
return err == nil
|
||||||
})
|
})
|
||||||
topic, err := ix.PS.Join(common.TopicIndexerRegistry)
|
topic, err := ix.PS.Join(common.TopicIndexerRegistry)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -130,29 +147,38 @@ func (ix *IndexerService) subscribeIndexerRegistry() {
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
peerID := string(msg.Data)
|
addr := string(msg.Data)
|
||||||
if peerID == "" {
|
if addr == "" {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
// A neighbouring native registered this PeerID; add to known set for DHT refresh.
|
if peer, err := pp.AddrInfoFromString(addr); err == nil {
|
||||||
ix.Native.knownMu.Lock()
|
ix.Native.knownMu.Lock()
|
||||||
ix.Native.knownPeerIDs[peerID] = struct{}{}
|
ix.Native.knownPeerIDs[peer.ID.String()] = addr
|
||||||
ix.Native.knownMu.Unlock()
|
ix.Native.knownMu.Unlock()
|
||||||
|
|
||||||
|
}
|
||||||
|
// A neighbouring native registered this PeerID; add to known set for DHT refresh.
|
||||||
|
|
||||||
}
|
}
|
||||||
}()
|
}()
|
||||||
}
|
}
|
||||||
|
|
||||||
// handleNativeSubscription stores an indexer's alive registration in the DHT cache.
|
// handleNativeSubscription stores an indexer's alive registration in the local cache
|
||||||
|
// immediately, then persists it to the DHT asynchronously.
|
||||||
// The stream is temporary: indexer sends one IndexerRegistration and closes.
|
// The stream is temporary: indexer sends one IndexerRegistration and closes.
|
||||||
func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
|
func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
|
||||||
defer s.Close()
|
defer s.Close()
|
||||||
logger := oclib.GetLogger()
|
logger := oclib.GetLogger()
|
||||||
|
|
||||||
|
logger.Info().Msg("Subscription")
|
||||||
|
|
||||||
var reg common.IndexerRegistration
|
var reg common.IndexerRegistration
|
||||||
if err := json.NewDecoder(s).Decode(®); err != nil {
|
if err := json.NewDecoder(s).Decode(®); err != nil {
|
||||||
logger.Err(err).Msg("native subscription: decode")
|
logger.Err(err).Msg("native subscription: decode")
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
logger.Info().Msg("Subscription " + reg.Addr)
|
||||||
|
|
||||||
if reg.Addr == "" {
|
if reg.Addr == "" {
|
||||||
logger.Error().Msg("native subscription: missing addr")
|
logger.Error().Msg("native subscription: missing addr")
|
||||||
return
|
return
|
||||||
@@ -166,30 +192,23 @@ func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
|
|||||||
reg.PeerID = ad.ID.String()
|
reg.PeerID = ad.ID.String()
|
||||||
}
|
}
|
||||||
|
|
||||||
expiry := time.Now().UTC().Add(IndexerTTL)
|
// Build entry with a fresh TTL — must happen before the cache write so the 66s
|
||||||
|
// window is not consumed by DHT retries.
|
||||||
entry := &liveIndexerEntry{
|
entry := &liveIndexerEntry{
|
||||||
PeerID: reg.PeerID,
|
PeerID: reg.PeerID,
|
||||||
Addr: reg.Addr,
|
Addr: reg.Addr,
|
||||||
ExpiresAt: expiry,
|
ExpiresAt: time.Now().UTC().Add(IndexerTTL),
|
||||||
}
|
}
|
||||||
|
|
||||||
// Persist in DHT with 66s TTL.
|
// Update local cache and known set immediately so concurrent GetIndexers calls
|
||||||
key := ix.genIndexerKey(reg.PeerID)
|
// can already see this indexer without waiting for the DHT write to complete.
|
||||||
if data, err := json.Marshal(entry); err == nil {
|
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
|
||||||
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
|
||||||
logger.Err(err).Msg("native subscription: DHT put")
|
|
||||||
}
|
|
||||||
cancel()
|
|
||||||
}
|
|
||||||
|
|
||||||
// Update local cache and known set.
|
|
||||||
ix.Native.liveIndexersMu.Lock()
|
ix.Native.liveIndexersMu.Lock()
|
||||||
|
_, isRenewal := ix.Native.liveIndexers[reg.PeerID]
|
||||||
ix.Native.liveIndexers[reg.PeerID] = entry
|
ix.Native.liveIndexers[reg.PeerID] = entry
|
||||||
ix.Native.liveIndexersMu.Unlock()
|
ix.Native.liveIndexersMu.Unlock()
|
||||||
|
|
||||||
ix.Native.knownMu.Lock()
|
ix.Native.knownMu.Lock()
|
||||||
ix.Native.knownPeerIDs[reg.PeerID] = struct{}{}
|
ix.Native.knownPeerIDs[reg.PeerID] = reg.Addr
|
||||||
ix.Native.knownMu.Unlock()
|
ix.Native.knownMu.Unlock()
|
||||||
|
|
||||||
// Gossip PeerID to neighbouring natives so they discover it via DHT.
|
// Gossip PeerID to neighbouring natives so they discover it via DHT.
|
||||||
@@ -197,16 +216,46 @@ func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
|
|||||||
topic := ix.LongLivedPubSubs[common.TopicIndexerRegistry]
|
topic := ix.LongLivedPubSubs[common.TopicIndexerRegistry]
|
||||||
ix.PubsubMu.RUnlock()
|
ix.PubsubMu.RUnlock()
|
||||||
if topic != nil {
|
if topic != nil {
|
||||||
if err := topic.Publish(context.Background(), []byte(reg.PeerID)); err != nil {
|
if err := topic.Publish(context.Background(), []byte(reg.Addr)); err != nil {
|
||||||
logger.Err(err).Msg("native subscription: registry gossip publish")
|
logger.Err(err).Msg("native subscription: registry gossip publish")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
logger.Info().Str("peer", reg.PeerID).Msg("native: indexer registered")
|
if isRenewal {
|
||||||
|
logger.Debug().Str("peer", reg.PeerID).Msg("native: indexer TTL renewed : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
|
||||||
|
} else {
|
||||||
|
logger.Info().Str("peer", reg.PeerID).Msg("native: indexer registered : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Persist in DHT asynchronously — retries must not block the handler or consume
|
||||||
|
// the local cache TTL.
|
||||||
|
key := ix.genIndexerKey(reg.PeerID)
|
||||||
|
data, err := json.Marshal(entry)
|
||||||
|
if err != nil {
|
||||||
|
logger.Err(err).Msg("native subscription: marshal entry")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
go func() {
|
||||||
|
for {
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
||||||
|
cancel()
|
||||||
|
logger.Err(err).Msg("native subscription: DHT put " + key)
|
||||||
|
if strings.Contains(err.Error(), "failed to find any peer in table") {
|
||||||
|
time.Sleep(10 * time.Second)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
cancel()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}()
|
||||||
}
|
}
|
||||||
|
|
||||||
// handleNativeGetIndexers returns this native's own list of reachable indexers.
|
// handleNativeGetIndexers returns this native's own list of reachable indexers.
|
||||||
// If none are available, it self-delegates (becomes the fallback indexer for the caller).
|
// Self-delegation (native acting as temporary fallback indexer) is only permitted
|
||||||
|
// for nodes — never for peers that are themselves registered indexers in knownPeerIDs.
|
||||||
// The consensus across natives is the responsibility of the requesting node/indexer.
|
// The consensus across natives is the responsibility of the requesting node/indexer.
|
||||||
func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
|
func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
|
||||||
defer s.Close()
|
defer s.Close()
|
||||||
@@ -220,14 +269,20 @@ func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
|
|||||||
if req.Count <= 0 {
|
if req.Count <= 0 {
|
||||||
req.Count = 3
|
req.Count = 3
|
||||||
}
|
}
|
||||||
|
callerPeerID := s.Conn().RemotePeer().String()
|
||||||
reachable := ix.reachableLiveIndexers()
|
reachable := ix.reachableLiveIndexers(req.Count, callerPeerID)
|
||||||
var resp common.GetIndexersResponse
|
var resp common.GetIndexersResponse
|
||||||
|
|
||||||
if len(reachable) == 0 {
|
if len(reachable) == 0 {
|
||||||
// No indexers known: become temporary fallback for this caller.
|
// No live indexers reachable — try to self-delegate.
|
||||||
ix.selfDelegate(s.Conn().RemotePeer(), &resp)
|
if ix.selfDelegate(s.Conn().RemotePeer(), &resp) {
|
||||||
logger.Info().Str("peer", s.Conn().RemotePeer().String()).Msg("native: no indexers, acting as fallback")
|
logger.Info().Str("peer", callerPeerID).Msg("native: no indexers, acting as fallback for node")
|
||||||
|
} else {
|
||||||
|
// Fallback pool saturated: return empty so the caller retries another
|
||||||
|
// native instead of piling more load onto this one.
|
||||||
|
logger.Warn().Str("peer", callerPeerID).Int("pool", maxFallbackPeers).Msg(
|
||||||
|
"native: fallback pool saturated, refusing self-delegation")
|
||||||
|
}
|
||||||
} else {
|
} else {
|
||||||
rand.Shuffle(len(reachable), func(i, j int) { reachable[i], reachable[j] = reachable[j], reachable[i] })
|
rand.Shuffle(len(reachable), func(i, j int) { reachable[i], reachable[j] = reachable[j], reachable[i] })
|
||||||
if req.Count > len(reachable) {
|
if req.Count > len(reachable) {
|
||||||
@@ -255,7 +310,7 @@ func (ix *IndexerService) handleNativeConsensus(s network.Stream) {
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
myList := ix.reachableLiveIndexers()
|
myList := ix.reachableLiveIndexers(-1, s.Conn().RemotePeer().String())
|
||||||
mySet := make(map[string]struct{}, len(myList))
|
mySet := make(map[string]struct{}, len(myList))
|
||||||
for _, addr := range myList {
|
for _, addr := range myList {
|
||||||
mySet[addr] = struct{}{}
|
mySet[addr] = struct{}{}
|
||||||
@@ -285,31 +340,56 @@ func (ix *IndexerService) handleNativeConsensus(s network.Stream) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// selfDelegate marks the caller as a responsible peer and exposes this native's own
|
// selfDelegate marks the caller as a responsible peer and exposes this native's own
|
||||||
// address as its temporary indexer.
|
// address as its temporary indexer. Returns false when the fallback pool is saturated
|
||||||
func (ix *IndexerService) selfDelegate(remotePeer pp.ID, resp *common.GetIndexersResponse) {
|
// (maxFallbackPeers reached) — the caller must return an empty response so the node
|
||||||
|
// retries later instead of pinning indefinitely to an overloaded native.
|
||||||
|
func (ix *IndexerService) selfDelegate(remotePeer pp.ID, resp *common.GetIndexersResponse) bool {
|
||||||
ix.Native.responsibleMu.Lock()
|
ix.Native.responsibleMu.Lock()
|
||||||
ix.Native.responsiblePeers[remotePeer] = struct{}{}
|
defer ix.Native.responsibleMu.Unlock()
|
||||||
ix.Native.responsibleMu.Unlock()
|
if len(ix.Native.responsiblePeers) >= maxFallbackPeers {
|
||||||
resp.IsSelfFallback = true
|
return false
|
||||||
for _, a := range ix.Host.Addrs() {
|
|
||||||
resp.Indexers = []string{a.String() + "/p2p/" + ix.Host.ID().String()}
|
|
||||||
break
|
|
||||||
}
|
}
|
||||||
|
ix.Native.responsiblePeers[remotePeer] = struct{}{}
|
||||||
|
resp.IsSelfFallback = true
|
||||||
|
resp.Indexers = []string{ix.Host.Addrs()[len(ix.Host.Addrs())-1].String() + "/p2p/" + ix.Host.ID().String()}
|
||||||
|
return true
|
||||||
}
|
}
|
||||||
|
|
||||||
// reachableLiveIndexers returns the multiaddrs of non-expired, pingable indexers
|
// reachableLiveIndexers returns the multiaddrs of non-expired, pingable indexers
|
||||||
// from the local cache (kept fresh by refreshIndexersFromDHT in background).
|
// from the local cache (kept fresh by refreshIndexersFromDHT in background).
|
||||||
func (ix *IndexerService) reachableLiveIndexers() []string {
|
func (ix *IndexerService) reachableLiveIndexers(count int, from ...string) []string {
|
||||||
ix.Native.liveIndexersMu.RLock()
|
ix.Native.liveIndexersMu.RLock()
|
||||||
now := time.Now().UTC()
|
now := time.Now().UTC()
|
||||||
candidates := []*liveIndexerEntry{}
|
candidates := []*liveIndexerEntry{}
|
||||||
for _, e := range ix.Native.liveIndexers {
|
for _, e := range ix.Native.liveIndexers {
|
||||||
if e.ExpiresAt.After(now) {
|
fmt.Println("liveIndexers", slices.Contains(from, e.PeerID), from, e.PeerID)
|
||||||
|
if e.ExpiresAt.After(now) && !slices.Contains(from, e.PeerID) {
|
||||||
candidates = append(candidates, e)
|
candidates = append(candidates, e)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
ix.Native.liveIndexersMu.RUnlock()
|
ix.Native.liveIndexersMu.RUnlock()
|
||||||
|
|
||||||
|
fmt.Println("midway...", candidates, from, ix.Native.knownPeerIDs)
|
||||||
|
|
||||||
|
if (count > 0 && len(candidates) < count) || count < 0 {
|
||||||
|
ix.Native.knownMu.RLock()
|
||||||
|
for k, v := range ix.Native.knownPeerIDs {
|
||||||
|
// Include peers whose liveIndexers entry is absent OR expired.
|
||||||
|
// A non-nil but expired entry means the peer was once known but
|
||||||
|
// has since timed out — PeerIsAlive below will decide if it's back.
|
||||||
|
fmt.Println("knownPeerIDs", slices.Contains(from, k), from, k)
|
||||||
|
if !slices.Contains(from, k) {
|
||||||
|
candidates = append(candidates, &liveIndexerEntry{
|
||||||
|
PeerID: k,
|
||||||
|
Addr: v,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
ix.Native.knownMu.RUnlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println("midway...1", candidates)
|
||||||
|
|
||||||
reachable := []string{}
|
reachable := []string{}
|
||||||
for _, e := range candidates {
|
for _, e := range candidates {
|
||||||
ad, err := pp.AddrInfoFromString(e.Addr)
|
ad, err := pp.AddrInfoFromString(e.Addr)
|
||||||
@@ -371,6 +451,12 @@ func (ix *IndexerService) refreshIndexersFromDHT() {
|
|||||||
ix.Native.liveIndexers[best.PeerID] = best
|
ix.Native.liveIndexers[best.PeerID] = best
|
||||||
ix.Native.liveIndexersMu.Unlock()
|
ix.Native.liveIndexersMu.Unlock()
|
||||||
logger.Info().Str("peer", best.PeerID).Msg("native: refreshed indexer from DHT")
|
logger.Info().Str("peer", best.PeerID).Msg("native: refreshed indexer from DHT")
|
||||||
|
} else {
|
||||||
|
// DHT has no fresh entry — peer is gone, prune from known set.
|
||||||
|
ix.Native.knownMu.Lock()
|
||||||
|
delete(ix.Native.knownPeerIDs, pid)
|
||||||
|
ix.Native.knownMu.Unlock()
|
||||||
|
logger.Info().Str("peer", pid).Msg("native: pruned stale peer from knownPeerIDs")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -387,30 +473,107 @@ func (ix *IndexerService) runOffloadLoop() {
|
|||||||
defer t.Stop()
|
defer t.Stop()
|
||||||
logger := oclib.GetLogger()
|
logger := oclib.GetLogger()
|
||||||
for range t.C {
|
for range t.C {
|
||||||
|
fmt.Println("runOffloadLoop", ix.Native.responsiblePeers)
|
||||||
ix.Native.responsibleMu.RLock()
|
ix.Native.responsibleMu.RLock()
|
||||||
count := len(ix.Native.responsiblePeers)
|
count := len(ix.Native.responsiblePeers)
|
||||||
ix.Native.responsibleMu.RUnlock()
|
ix.Native.responsibleMu.RUnlock()
|
||||||
if count == 0 {
|
if count == 0 {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
if len(ix.reachableLiveIndexers()) > 0 {
|
ix.Native.responsibleMu.RLock()
|
||||||
|
peerIDS := []string{}
|
||||||
|
for p := range ix.Native.responsiblePeers {
|
||||||
|
peerIDS = append(peerIDS, p.String())
|
||||||
|
}
|
||||||
|
fmt.Println("COUNT --> ", count, len(ix.reachableLiveIndexers(-1, peerIDS...)))
|
||||||
|
ix.Native.responsibleMu.RUnlock()
|
||||||
|
if len(ix.reachableLiveIndexers(-1, peerIDS...)) > 0 {
|
||||||
|
ix.Native.responsibleMu.RLock()
|
||||||
|
released := ix.Native.responsiblePeers
|
||||||
|
ix.Native.responsibleMu.RUnlock()
|
||||||
|
|
||||||
|
// Reset (not Close) heartbeat streams of released peers.
|
||||||
|
// Close() only half-closes the native's write direction — the peer's write
|
||||||
|
// direction stays open and sendHeartbeat never sees an error.
|
||||||
|
// Reset() abruptly terminates both directions, making the peer's next
|
||||||
|
// json.Encode return an error which triggers replenishIndexersFromNative.
|
||||||
|
ix.StreamMU.Lock()
|
||||||
|
if streams := ix.StreamRecords[common.ProtocolHeartbeat]; streams != nil {
|
||||||
|
for pid := range released {
|
||||||
|
if rec, ok := streams[pid]; ok {
|
||||||
|
if rec.HeartbeatStream != nil && rec.HeartbeatStream.Stream != nil {
|
||||||
|
rec.HeartbeatStream.Stream.Reset()
|
||||||
|
}
|
||||||
ix.Native.responsibleMu.Lock()
|
ix.Native.responsibleMu.Lock()
|
||||||
ix.Native.responsiblePeers = map[pp.ID]struct{}{}
|
delete(ix.Native.responsiblePeers, pid)
|
||||||
ix.Native.responsibleMu.Unlock()
|
ix.Native.responsibleMu.Unlock()
|
||||||
|
|
||||||
|
delete(streams, pid)
|
||||||
|
logger.Info().Str("peer", pid.String()).Str("proto", string(common.ProtocolHeartbeat)).Msg(
|
||||||
|
"native: offload — stream reset, peer will reconnect to real indexer")
|
||||||
|
} else {
|
||||||
|
// No recorded heartbeat stream for this peer: either it never
|
||||||
|
// passed the score check (new peer, uptime=0 → score<75) or the
|
||||||
|
// stream was GC'd. We cannot send a Reset signal, so close the
|
||||||
|
// whole connection instead — this makes the peer's sendHeartbeat
|
||||||
|
// return an error, which triggers replenishIndexersFromNative and
|
||||||
|
// migrates it to a real indexer.
|
||||||
|
ix.Native.responsibleMu.Lock()
|
||||||
|
delete(ix.Native.responsiblePeers, pid)
|
||||||
|
ix.Native.responsibleMu.Unlock()
|
||||||
|
go ix.Host.Network().ClosePeer(pid)
|
||||||
|
logger.Info().Str("peer", pid.String()).Msg(
|
||||||
|
"native: offload — no heartbeat stream, closing connection so peer re-requests real indexers")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
ix.StreamMU.Unlock()
|
||||||
|
|
||||||
logger.Info().Int("released", count).Msg("native: offloaded responsible peers to real indexers")
|
logger.Info().Int("released", count).Msg("native: offloaded responsible peers to real indexers")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// handleNativeGetPeers returns a random selection of this native's known native
|
||||||
|
// contacts, excluding any in the request's Exclude list.
|
||||||
|
func (ix *IndexerService) handleNativeGetPeers(s network.Stream) {
|
||||||
|
defer s.Close()
|
||||||
|
logger := oclib.GetLogger()
|
||||||
|
|
||||||
|
var req common.GetNativePeersRequest
|
||||||
|
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||||
|
logger.Err(err).Msg("native get peers: decode")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if req.Count <= 0 {
|
||||||
|
req.Count = 1
|
||||||
|
}
|
||||||
|
|
||||||
|
excludeSet := make(map[string]struct{}, len(req.Exclude))
|
||||||
|
for _, e := range req.Exclude {
|
||||||
|
excludeSet[e] = struct{}{}
|
||||||
|
}
|
||||||
|
|
||||||
|
common.StreamNativeMu.RLock()
|
||||||
|
candidates := make([]string, 0, len(common.StaticNatives))
|
||||||
|
for addr := range common.StaticNatives {
|
||||||
|
if _, excluded := excludeSet[addr]; !excluded {
|
||||||
|
candidates = append(candidates, addr)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
common.StreamNativeMu.RUnlock()
|
||||||
|
|
||||||
|
rand.Shuffle(len(candidates), func(i, j int) { candidates[i], candidates[j] = candidates[j], candidates[i] })
|
||||||
|
if req.Count > len(candidates) {
|
||||||
|
req.Count = len(candidates)
|
||||||
|
}
|
||||||
|
|
||||||
|
resp := common.GetNativePeersResponse{Peers: candidates[:req.Count]}
|
||||||
|
if err := json.NewEncoder(s).Encode(resp); err != nil {
|
||||||
|
logger.Err(err).Msg("native get peers: encode response")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// StartNativeRegistration starts a goroutine that periodically registers this
|
// StartNativeRegistration starts a goroutine that periodically registers this
|
||||||
// indexer with all configured native indexers (every RecommendedHeartbeatInterval).
|
// indexer with all configured native indexers (every RecommendedHeartbeatInterval).
|
||||||
func StartNativeRegistration(h host.Host, nativeAddressesStr string) {
|
|
||||||
go func() {
|
|
||||||
common.RegisterWithNative(h, nativeAddressesStr)
|
|
||||||
t := time.NewTicker(common.RecommendedHeartbeatInterval)
|
|
||||||
defer t.Stop()
|
|
||||||
for range t.C {
|
|
||||||
common.RegisterWithNative(h, nativeAddressesStr)
|
|
||||||
}
|
|
||||||
}()
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -11,6 +11,7 @@ import (
|
|||||||
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||||
record "github.com/libp2p/go-libp2p-record"
|
record "github.com/libp2p/go-libp2p-record"
|
||||||
"github.com/libp2p/go-libp2p/core/host"
|
"github.com/libp2p/go-libp2p/core/host"
|
||||||
|
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||||
)
|
)
|
||||||
|
|
||||||
// IndexerService manages the indexer node's state: stream records, DHT, pubsub.
|
// IndexerService manages the indexer node's state: stream records, DHT, pubsub.
|
||||||
@@ -22,6 +23,7 @@ type IndexerService struct {
|
|||||||
mu sync.RWMutex
|
mu sync.RWMutex
|
||||||
IsNative bool
|
IsNative bool
|
||||||
Native *NativeState // non-nil when IsNative == true
|
Native *NativeState // non-nil when IsNative == true
|
||||||
|
nameIndex *nameIndexState
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewIndexerService creates an IndexerService.
|
// NewIndexerService creates an IndexerService.
|
||||||
@@ -43,22 +45,34 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
|
|||||||
}
|
}
|
||||||
ix.PS = ps
|
ix.PS = ps
|
||||||
|
|
||||||
if ix.isStrictIndexer {
|
if ix.isStrictIndexer && !isNative {
|
||||||
logger.Info().Msg("connect to indexers as strict indexer...")
|
logger.Info().Msg("connect to indexers as strict indexer...")
|
||||||
common.ConnectToIndexers(h, 0, 5, ix.Host.ID())
|
common.ConnectToIndexers(h, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, ix.Host.ID())
|
||||||
logger.Info().Msg("subscribe to decentralized search flow as strict indexer...")
|
logger.Info().Msg("subscribe to decentralized search flow as strict indexer...")
|
||||||
ix.SubscribeToSearch(ix.PS, nil)
|
go ix.SubscribeToSearch(ix.PS, nil)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !isNative {
|
||||||
|
logger.Info().Msg("init distributed name index...")
|
||||||
|
ix.initNameIndex(ps)
|
||||||
|
ix.LongLivedStreamRecordedService.AfterDelete = func(pid pp.ID, name, did string) {
|
||||||
|
ix.publishNameEvent(NameIndexDelete, name, pid.String(), did)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if ix.DHT, err = dht.New(
|
if ix.DHT, err = dht.New(
|
||||||
context.Background(),
|
context.Background(),
|
||||||
ix.Host,
|
ix.Host,
|
||||||
dht.Mode(dht.ModeServer),
|
dht.Mode(dht.ModeServer),
|
||||||
|
dht.ProtocolPrefix("oc"), // 🔥 réseau privé
|
||||||
dht.Validator(record.NamespacedValidator{
|
dht.Validator(record.NamespacedValidator{
|
||||||
"node": PeerRecordValidator{},
|
"node": PeerRecordValidator{},
|
||||||
"indexer": IndexerRecordValidator{}, // for native indexer registry
|
"indexer": IndexerRecordValidator{}, // for native indexer registry
|
||||||
|
"name": DefaultValidator{},
|
||||||
|
"pid": DefaultValidator{},
|
||||||
}),
|
}),
|
||||||
); err != nil {
|
); err != nil {
|
||||||
|
logger.Info().Msg(err.Error())
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -67,11 +81,10 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
|
|||||||
ix.InitNative()
|
ix.InitNative()
|
||||||
} else {
|
} else {
|
||||||
ix.initNodeHandler()
|
ix.initNodeHandler()
|
||||||
}
|
|
||||||
|
|
||||||
// Register with configured natives so this indexer appears in their cache
|
// Register with configured natives so this indexer appears in their cache
|
||||||
if nativeAddrs := conf.GetConfig().NativeIndexerAddresses; nativeAddrs != "" {
|
if nativeAddrs := conf.GetConfig().NativeIndexerAddresses; nativeAddrs != "" {
|
||||||
StartNativeRegistration(ix.Host, nativeAddrs)
|
common.StartNativeRegistration(ix.Host, nativeAddrs)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
return ix
|
return ix
|
||||||
}
|
}
|
||||||
@@ -79,6 +92,9 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
|
|||||||
func (ix *IndexerService) Close() {
|
func (ix *IndexerService) Close() {
|
||||||
ix.DHT.Close()
|
ix.DHT.Close()
|
||||||
ix.PS.UnregisterTopicValidator(common.TopicPubSubSearch)
|
ix.PS.UnregisterTopicValidator(common.TopicPubSubSearch)
|
||||||
|
if ix.nameIndex != nil {
|
||||||
|
ix.PS.UnregisterTopicValidator(TopicNameIndex)
|
||||||
|
}
|
||||||
for _, s := range ix.StreamRecords {
|
for _, s := range ix.StreamRecords {
|
||||||
for _, ss := range s {
|
for _, ss := range s {
|
||||||
ss.HeartbeatStream.Stream.Close()
|
ss.HeartbeatStream.Stream.Close()
|
||||||
|
|||||||
@@ -6,6 +6,16 @@ import (
|
|||||||
"time"
|
"time"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
type DefaultValidator struct{}
|
||||||
|
|
||||||
|
func (v DefaultValidator) Validate(key string, value []byte) error {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (v DefaultValidator) Select(key string, values [][]byte) (int, error) {
|
||||||
|
return 0, nil
|
||||||
|
}
|
||||||
|
|
||||||
type PeerRecordValidator struct{}
|
type PeerRecordValidator struct{}
|
||||||
|
|
||||||
func (v PeerRecordValidator) Validate(key string, value []byte) error {
|
func (v PeerRecordValidator) Validate(key string, value []byte) error {
|
||||||
@@ -26,14 +36,7 @@ func (v PeerRecordValidator) Validate(key string, value []byte) error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Signature verification
|
// Signature verification
|
||||||
rec2 := PeerRecord{
|
if _, err := rec.Verify(); err != nil {
|
||||||
Name: rec.Name,
|
|
||||||
DID: rec.DID,
|
|
||||||
PubKey: rec.PubKey,
|
|
||||||
PeerID: rec.PeerID,
|
|
||||||
}
|
|
||||||
|
|
||||||
if _, err := rec2.Verify(); err != nil {
|
|
||||||
return errors.New("invalid signature")
|
return errors.New("invalid signature")
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -96,6 +96,7 @@ func ListenNATS(n *Node) {
|
|||||||
|
|
||||||
},
|
},
|
||||||
tools.PROPALGATION_EVENT: func(resp tools.NATSResponse) {
|
tools.PROPALGATION_EVENT: func(resp tools.NATSResponse) {
|
||||||
|
fmt.Println("PROPALGATION")
|
||||||
if resp.FromApp == config.GetAppName() {
|
if resp.FromApp == config.GetAppName() {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
@@ -106,10 +107,10 @@ func ListenNATS(n *Node) {
|
|||||||
dtt := tools.DataType(propalgation.DataType)
|
dtt := tools.DataType(propalgation.DataType)
|
||||||
dt = &dtt
|
dt = &dtt
|
||||||
}
|
}
|
||||||
|
fmt.Println("PROPALGATION ACT", propalgation.Action, propalgation.Action == tools.PB_CREATE, err)
|
||||||
if err == nil {
|
if err == nil {
|
||||||
switch propalgation.Action {
|
switch propalgation.Action {
|
||||||
case tools.PB_ADMIRALTY_CONFIG:
|
case tools.PB_ADMIRALTY_CONFIG, tools.PB_MINIO_CONFIG:
|
||||||
case tools.PB_MINIO_CONFIG:
|
|
||||||
var m configPayload
|
var m configPayload
|
||||||
var proto protocol.ID = stream.ProtocolAdmiraltyConfigResource
|
var proto protocol.ID = stream.ProtocolAdmiraltyConfigResource
|
||||||
if propalgation.Action == tools.PB_MINIO_CONFIG {
|
if propalgation.Action == tools.PB_MINIO_CONFIG {
|
||||||
@@ -122,20 +123,17 @@ func ListenNATS(n *Node) {
|
|||||||
p.PeerID, proto, resp.Payload)
|
p.PeerID, proto, resp.Payload)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
case tools.PB_CREATE:
|
case tools.PB_CREATE, tools.PB_UPDATE, tools.PB_DELETE:
|
||||||
case tools.PB_UPDATE:
|
fmt.Println(propalgation.Action, dt, resp.User, propalgation.Payload)
|
||||||
case tools.PB_DELETE:
|
fmt.Println(n.StreamService.ToPartnerPublishEvent(
|
||||||
n.StreamService.ToPartnerPublishEvent(
|
|
||||||
context.Background(),
|
context.Background(),
|
||||||
propalgation.Action,
|
propalgation.Action,
|
||||||
dt, resp.User,
|
dt, resp.User,
|
||||||
propalgation.Payload,
|
propalgation.Payload,
|
||||||
)
|
))
|
||||||
case tools.PB_CONSIDERS:
|
case tools.PB_CONSIDERS:
|
||||||
switch resp.Datatype {
|
switch resp.Datatype {
|
||||||
case tools.BOOKING:
|
case tools.BOOKING, tools.PURCHASE_RESOURCE, tools.WORKFLOW_EXECUTION:
|
||||||
case tools.PURCHASE_RESOURCE:
|
|
||||||
case tools.WORKFLOW_EXECUTION:
|
|
||||||
var m executionConsidersPayload
|
var m executionConsidersPayload
|
||||||
if err := json.Unmarshal(resp.Payload, &m); err == nil {
|
if err := json.Unmarshal(resp.Payload, &m); err == nil {
|
||||||
for _, p := range m.PeerIDs {
|
for _, p := range m.PeerIDs {
|
||||||
|
|||||||
@@ -2,10 +2,10 @@ package node
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"crypto/sha256"
|
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"errors"
|
"errors"
|
||||||
"fmt"
|
"fmt"
|
||||||
|
"maps"
|
||||||
"oc-discovery/conf"
|
"oc-discovery/conf"
|
||||||
"oc-discovery/daemons/node/common"
|
"oc-discovery/daemons/node/common"
|
||||||
"oc-discovery/daemons/node/indexer"
|
"oc-discovery/daemons/node/indexer"
|
||||||
@@ -15,6 +15,7 @@ import (
|
|||||||
"time"
|
"time"
|
||||||
|
|
||||||
oclib "cloud.o-forge.io/core/oc-lib"
|
oclib "cloud.o-forge.io/core/oc-lib"
|
||||||
|
"cloud.o-forge.io/core/oc-lib/dbs"
|
||||||
"cloud.o-forge.io/core/oc-lib/models/peer"
|
"cloud.o-forge.io/core/oc-lib/models/peer"
|
||||||
"cloud.o-forge.io/core/oc-lib/tools"
|
"cloud.o-forge.io/core/oc-lib/tools"
|
||||||
"github.com/google/uuid"
|
"github.com/google/uuid"
|
||||||
@@ -33,6 +34,7 @@ type Node struct {
|
|||||||
StreamService *stream.StreamService
|
StreamService *stream.StreamService
|
||||||
PeerID pp.ID
|
PeerID pp.ID
|
||||||
isIndexer bool
|
isIndexer bool
|
||||||
|
peerRecord *indexer.PeerRecord
|
||||||
|
|
||||||
Mu sync.RWMutex
|
Mu sync.RWMutex
|
||||||
}
|
}
|
||||||
@@ -69,6 +71,9 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
|
|||||||
isIndexer: isIndexer,
|
isIndexer: isIndexer,
|
||||||
LongLivedStreamRecordedService: common.NewStreamRecordedService[interface{}](h, 1000),
|
LongLivedStreamRecordedService: common.NewStreamRecordedService[interface{}](h, 1000),
|
||||||
}
|
}
|
||||||
|
// Register the bandwidth probe handler so any peer measuring this node's
|
||||||
|
// throughput can open a dedicated probe stream and read the echo.
|
||||||
|
h.SetStreamHandler(common.ProtocolBandwidthProbe, common.HandleBandwidthProbe)
|
||||||
var ps *pubsubs.PubSub
|
var ps *pubsubs.PubSub
|
||||||
if isNode {
|
if isNode {
|
||||||
logger.Info().Msg("generate opencloud node...")
|
logger.Info().Msg("generate opencloud node...")
|
||||||
@@ -77,8 +82,30 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
|
|||||||
panic(err) // can't run your node without a propalgation pubsub, of state of node.
|
panic(err) // can't run your node without a propalgation pubsub, of state of node.
|
||||||
}
|
}
|
||||||
node.PS = ps
|
node.PS = ps
|
||||||
|
// buildRecord returns a fresh signed PeerRecord as JSON, embedded in each
|
||||||
|
// heartbeat so the receiving indexer can republish it to the DHT directly.
|
||||||
|
// peerRecord is nil until claimInfo runs, so the first ~20s heartbeats carry
|
||||||
|
// no record — that's fine, claimInfo publishes once synchronously at startup.
|
||||||
|
buildRecord := func() json.RawMessage {
|
||||||
|
if node.peerRecord == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
priv, err := tools.LoadKeyFromFilePrivate()
|
||||||
|
if err != nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
fresh := *node.peerRecord
|
||||||
|
fresh.PeerRecordPayload.ExpiryDate = time.Now().UTC().Add(2 * time.Minute)
|
||||||
|
payload, _ := json.Marshal(fresh.PeerRecordPayload)
|
||||||
|
fresh.Signature, err = priv.Sign(payload)
|
||||||
|
if err != nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
b, _ := json.Marshal(fresh)
|
||||||
|
return json.RawMessage(b)
|
||||||
|
}
|
||||||
logger.Info().Msg("connect to indexers...")
|
logger.Info().Msg("connect to indexers...")
|
||||||
common.ConnectToIndexers(node.Host, 0, 5, node.PeerID) // TODO : make var to change how many indexers are allowed.
|
common.ConnectToIndexers(node.Host, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, node.PeerID, buildRecord)
|
||||||
logger.Info().Msg("claims my node...")
|
logger.Info().Msg("claims my node...")
|
||||||
if _, err := node.claimInfo(conf.GetConfig().Name, conf.GetConfig().Hostname); err != nil {
|
if _, err := node.claimInfo(conf.GetConfig().Name, conf.GetConfig().Hostname); err != nil {
|
||||||
panic(err)
|
panic(err)
|
||||||
@@ -100,14 +127,14 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
node.SubscribeToSearch(node.PS, &f)
|
node.SubscribeToSearch(node.PS, &f)
|
||||||
|
logger.Info().Msg("connect to NATS")
|
||||||
|
go ListenNATS(node)
|
||||||
|
logger.Info().Msg("Node is actually running.")
|
||||||
}
|
}
|
||||||
if isIndexer {
|
if isIndexer {
|
||||||
logger.Info().Msg("generate opencloud indexer...")
|
logger.Info().Msg("generate opencloud indexer...")
|
||||||
node.IndexerService = indexer.NewIndexerService(node.Host, ps, 5, isNativeIndexer)
|
node.IndexerService = indexer.NewIndexerService(node.Host, ps, 500, isNativeIndexer)
|
||||||
}
|
}
|
||||||
logger.Info().Msg("connect to NATS")
|
|
||||||
ListenNATS(node)
|
|
||||||
logger.Info().Msg("Node is actually running.")
|
|
||||||
return node, nil
|
return node, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -127,24 +154,29 @@ func (d *Node) publishPeerRecord(
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
common.StreamMuIndexes.RLock()
|
||||||
|
indexerSnapshot := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
|
||||||
for _, ad := range common.StaticIndexers {
|
for _, ad := range common.StaticIndexers {
|
||||||
|
indexerSnapshot = append(indexerSnapshot, ad)
|
||||||
|
}
|
||||||
|
common.StreamMuIndexes.RUnlock()
|
||||||
|
|
||||||
|
for _, ad := range indexerSnapshot {
|
||||||
var err error
|
var err error
|
||||||
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolPublish, "", common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{},
|
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolPublish, "", common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{},
|
||||||
&common.StreamMuIndexes); err != nil {
|
&common.StreamMuIndexes); err != nil {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
stream := common.StreamIndexers[common.ProtocolPublish][ad.ID]
|
stream := common.StreamIndexers[common.ProtocolPublish][ad.ID]
|
||||||
base := indexer.PeerRecord{
|
base := indexer.PeerRecordPayload{
|
||||||
Name: rec.Name,
|
Name: rec.Name,
|
||||||
DID: rec.DID,
|
DID: rec.DID,
|
||||||
PubKey: rec.PubKey,
|
PubKey: rec.PubKey,
|
||||||
ExpiryDate: time.Now().UTC().Add(2 * time.Minute),
|
ExpiryDate: time.Now().UTC().Add(2 * time.Minute),
|
||||||
}
|
}
|
||||||
payload, _ := json.Marshal(base)
|
payload, _ := json.Marshal(base)
|
||||||
hash := sha256.Sum256(payload)
|
rec.PeerRecordPayload = base
|
||||||
|
rec.Signature, err = priv.Sign(payload)
|
||||||
rec.ExpiryDate = base.ExpiryDate
|
|
||||||
rec.Signature, err = priv.Sign(hash[:])
|
|
||||||
if err := json.NewEncoder(stream.Stream).Encode(&rec); err != nil { // then publish on stream
|
if err := json.NewEncoder(stream.Stream).Encode(&rec); err != nil { // then publish on stream
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
@@ -156,38 +188,50 @@ func (d *Node) GetPeerRecord(
|
|||||||
ctx context.Context,
|
ctx context.Context,
|
||||||
pidOrdid string,
|
pidOrdid string,
|
||||||
) ([]*peer.Peer, error) {
|
) ([]*peer.Peer, error) {
|
||||||
did := pidOrdid // if known pidOrdid is did
|
|
||||||
pid := pidOrdid // if not known pidOrdid is pid
|
|
||||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
|
||||||
if data := access.Search(nil, did, true); len(data.Data) > 0 {
|
|
||||||
did = data.Data[0].GetID()
|
|
||||||
pid = data.Data[0].(*peer.Peer).PeerID
|
|
||||||
}
|
|
||||||
var err error
|
var err error
|
||||||
var info map[string]indexer.PeerRecord
|
var info map[string]indexer.PeerRecord
|
||||||
|
common.StreamMuIndexes.RLock()
|
||||||
|
indexerSnapshot2 := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
|
||||||
for _, ad := range common.StaticIndexers {
|
for _, ad := range common.StaticIndexers {
|
||||||
|
indexerSnapshot2 = append(indexerSnapshot2, ad)
|
||||||
|
}
|
||||||
|
common.StreamMuIndexes.RUnlock()
|
||||||
|
|
||||||
|
// Build the GetValue request: if pidOrdid is neither a UUID DID nor a libp2p
|
||||||
|
// PeerID, treat it as a human-readable name and let the indexer resolve it.
|
||||||
|
getReq := indexer.GetValue{Key: pidOrdid}
|
||||||
|
isNameSearch := false
|
||||||
|
if pidR, pidErr := pp.Decode(pidOrdid); pidErr == nil {
|
||||||
|
getReq.PeerID = pidR
|
||||||
|
} else if _, uuidErr := uuid.Parse(pidOrdid); uuidErr != nil {
|
||||||
|
// Not a UUID DID → treat pidOrdid as a name substring search.
|
||||||
|
getReq.Name = pidOrdid
|
||||||
|
getReq.Key = ""
|
||||||
|
isNameSearch = true
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, ad := range indexerSnapshot2 {
|
||||||
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolGet, "",
|
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolGet, "",
|
||||||
common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{}, &common.StreamMuIndexes); err != nil {
|
common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{}, &common.StreamMuIndexes); err != nil {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
pidR, err := pp.Decode(pid)
|
stream := common.StreamIndexers[common.ProtocolGet][ad.ID]
|
||||||
if err != nil {
|
if err := json.NewEncoder(stream.Stream).Encode(getReq); err != nil {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
stream := common.StreamIndexers[common.ProtocolGet][ad.ID]
|
|
||||||
if err := json.NewEncoder(stream.Stream).Encode(indexer.GetValue{
|
|
||||||
Key: did,
|
|
||||||
PeerID: pidR,
|
|
||||||
}); err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
for {
|
|
||||||
var resp indexer.GetResponse
|
var resp indexer.GetResponse
|
||||||
if err := json.NewDecoder(stream.Stream).Decode(&resp); err != nil {
|
if err := json.NewDecoder(stream.Stream).Decode(&resp); err != nil {
|
||||||
return nil, err
|
continue
|
||||||
}
|
}
|
||||||
if resp.Found {
|
if resp.Found {
|
||||||
|
if info == nil {
|
||||||
info = resp.Records
|
info = resp.Records
|
||||||
|
} else {
|
||||||
|
// Aggregate results from all indexers for name searches.
|
||||||
|
maps.Copy(info, resp.Records)
|
||||||
|
}
|
||||||
|
// For exact lookups (PeerID / DID) stop at the first hit.
|
||||||
|
if !isNameSearch {
|
||||||
break
|
break
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -196,7 +240,7 @@ func (d *Node) GetPeerRecord(
|
|||||||
for _, pr := range info {
|
for _, pr := range info {
|
||||||
if pk, err := pr.Verify(); err != nil {
|
if pk, err := pr.Verify(); err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
} else if ok, p, err := pr.ExtractPeer(d.PeerID.String(), did, pk); err != nil {
|
} else if ok, p, err := pr.ExtractPeer(d.PeerID.String(), pr.PeerID, pk); err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
} else {
|
} else {
|
||||||
if ok {
|
if ok {
|
||||||
@@ -218,7 +262,11 @@ func (d *Node) claimInfo(
|
|||||||
}
|
}
|
||||||
did := uuid.New().String()
|
did := uuid.New().String()
|
||||||
|
|
||||||
peers := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil).Search(nil, fmt.Sprintf("%v", peer.SELF), false)
|
peers := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil).Search(&dbs.Filters{
|
||||||
|
And: map[string][]dbs.Filter{ // search by name if no filters are provided
|
||||||
|
"peer_id": {{Operator: dbs.EQUAL.String(), Value: d.Host.ID().String()}},
|
||||||
|
},
|
||||||
|
}, "", false)
|
||||||
if len(peers.Data) > 0 {
|
if len(peers.Data) > 0 {
|
||||||
did = peers.Data[0].GetID() // if already existing set up did as made
|
did = peers.Data[0].GetID() // if already existing set up did as made
|
||||||
}
|
}
|
||||||
@@ -238,39 +286,38 @@ func (d *Node) claimInfo(
|
|||||||
now := time.Now().UTC()
|
now := time.Now().UTC()
|
||||||
expiry := now.Add(150 * time.Second)
|
expiry := now.Add(150 * time.Second)
|
||||||
|
|
||||||
rec := &indexer.PeerRecord{
|
pRec := indexer.PeerRecordPayload{
|
||||||
Name: name,
|
Name: name,
|
||||||
DID: did, // REAL PEER ID
|
DID: did, // REAL PEER ID
|
||||||
PubKey: pubBytes,
|
PubKey: pubBytes,
|
||||||
|
ExpiryDate: expiry,
|
||||||
}
|
}
|
||||||
|
|
||||||
rec.PeerID = d.Host.ID().String()
|
|
||||||
d.PeerID = d.Host.ID()
|
d.PeerID = d.Host.ID()
|
||||||
|
payload, _ := json.Marshal(pRec)
|
||||||
|
|
||||||
payload, _ := json.Marshal(rec)
|
rec := &indexer.PeerRecord{
|
||||||
hash := sha256.Sum256(payload)
|
PeerRecordPayload: pRec,
|
||||||
|
}
|
||||||
rec.Signature, err = priv.Sign(hash[:])
|
rec.Signature, err = priv.Sign(payload)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
rec.PeerID = d.Host.ID().String()
|
||||||
rec.APIUrl = endPoint
|
rec.APIUrl = endPoint
|
||||||
rec.StreamAddress = "/ip4/" + conf.GetConfig().Hostname + "/tcp/" + fmt.Sprintf("%v", conf.GetConfig().NodeEndpointPort) + "/p2p/" + rec.PeerID
|
rec.StreamAddress = "/ip4/" + conf.GetConfig().Hostname + "/tcp/" + fmt.Sprintf("%v", conf.GetConfig().NodeEndpointPort) + "/p2p/" + rec.PeerID
|
||||||
rec.NATSAddress = oclib.GetConfig().NATSUrl
|
rec.NATSAddress = oclib.GetConfig().NATSUrl
|
||||||
rec.WalletAddress = "my-wallet"
|
rec.WalletAddress = "my-wallet"
|
||||||
rec.ExpiryDate = expiry
|
|
||||||
|
|
||||||
if err := d.publishPeerRecord(rec); err != nil {
|
if err := d.publishPeerRecord(rec); err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
/*if pk, err := rec.Verify(); err != nil {
|
d.peerRecord = rec
|
||||||
fmt.Println("Verify")
|
if _, err := rec.Verify(); err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
} else {*/
|
} else {
|
||||||
_, p, err := rec.ExtractPeer(did, did, pub)
|
_, p, err := rec.ExtractPeer(did, did, pub)
|
||||||
return p, err
|
return p, err
|
||||||
//}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
|||||||
@@ -4,47 +4,56 @@ import (
|
|||||||
"context"
|
"context"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"errors"
|
"errors"
|
||||||
|
"oc-discovery/daemons/node/stream"
|
||||||
"oc-discovery/models"
|
"oc-discovery/models"
|
||||||
|
|
||||||
oclib "cloud.o-forge.io/core/oc-lib"
|
"cloud.o-forge.io/core/oc-lib/dbs"
|
||||||
|
"cloud.o-forge.io/core/oc-lib/models/peer"
|
||||||
"cloud.o-forge.io/core/oc-lib/tools"
|
"cloud.o-forge.io/core/oc-lib/tools"
|
||||||
)
|
)
|
||||||
|
|
||||||
func (ps *PubSubService) SearchPublishEvent(
|
func (ps *PubSubService) SearchPublishEvent(
|
||||||
ctx context.Context, dt *tools.DataType, typ string, user string, search string) error {
|
ctx context.Context, dt *tools.DataType, typ string, user string, search string) error {
|
||||||
|
b, err := json.Marshal(map[string]string{"search": search})
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
switch typ {
|
switch typ {
|
||||||
case "known": // define Search Strategy
|
case "known": // define Search Strategy
|
||||||
return ps.StreamService.SearchKnownPublishEvent(dt, user, search) //if partners focus only them*/
|
return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||||
|
And: map[string][]dbs.Filter{
|
||||||
|
"": {{Operator: dbs.NOT.String(), Value: dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||||
|
And: map[string][]dbs.Filter{
|
||||||
|
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.BLACKLIST}},
|
||||||
|
},
|
||||||
|
}}},
|
||||||
|
},
|
||||||
|
}, b, stream.ProtocolSearchResource) //if partners focus only them*/
|
||||||
case "partner": // define Search Strategy
|
case "partner": // define Search Strategy
|
||||||
return ps.StreamService.SearchPartnersPublishEvent(dt, user, search) //if partners focus only them*/
|
return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||||
|
And: map[string][]dbs.Filter{
|
||||||
|
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
|
||||||
|
},
|
||||||
|
}, b, stream.ProtocolSearchResource)
|
||||||
case "all": // Gossip PubSub
|
case "all": // Gossip PubSub
|
||||||
b, err := json.Marshal(map[string]string{"search": search})
|
b, err := json.Marshal(map[string]string{"search": search})
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
return ps.searchPublishEvent(ctx, dt, user, b)
|
return ps.publishEvent(ctx, dt, tools.PB_SEARCH, user, b)
|
||||||
default:
|
default:
|
||||||
return errors.New("no type of research found")
|
return errors.New("no type of research found")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func (ps *PubSubService) searchPublishEvent(
|
|
||||||
ctx context.Context, dt *tools.DataType, user string, payload []byte) error {
|
|
||||||
return ps.publishEvent(ctx, dt, tools.PB_SEARCH, user, payload)
|
|
||||||
}
|
|
||||||
|
|
||||||
func (ps *PubSubService) publishEvent(
|
func (ps *PubSubService) publishEvent(
|
||||||
ctx context.Context, dt *tools.DataType, action tools.PubSubAction, user string, payload []byte,
|
ctx context.Context, dt *tools.DataType, action tools.PubSubAction, user string, payload []byte,
|
||||||
) error {
|
) error {
|
||||||
from, err := oclib.GenerateNodeID()
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
priv, err := tools.LoadKeyFromFilePrivate()
|
priv, err := tools.LoadKeyFromFilePrivate()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
msg, _ := json.Marshal(models.NewEvent(action.String(), from, dt, user, payload, priv))
|
msg, _ := json.Marshal(models.NewEvent(action.String(), ps.Host.ID().String(), dt, user, payload, priv))
|
||||||
topic, err := ps.PS.Join(action.String())
|
topic, err := ps.PS.Join(action.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
|
|||||||
@@ -5,6 +5,7 @@ import (
|
|||||||
"crypto/subtle"
|
"crypto/subtle"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"errors"
|
"errors"
|
||||||
|
"fmt"
|
||||||
"oc-discovery/daemons/node/common"
|
"oc-discovery/daemons/node/common"
|
||||||
|
|
||||||
oclib "cloud.o-forge.io/core/oc-lib"
|
oclib "cloud.o-forge.io/core/oc-lib"
|
||||||
@@ -19,6 +20,7 @@ type Verify struct {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (ps *StreamService) handleEvent(protocol string, evt *common.Event) error {
|
func (ps *StreamService) handleEvent(protocol string, evt *common.Event) error {
|
||||||
|
fmt.Println("handleEvent")
|
||||||
ps.handleEventFromPartner(evt, protocol)
|
ps.handleEventFromPartner(evt, protocol)
|
||||||
/*if protocol == ProtocolVerifyResource {
|
/*if protocol == ProtocolVerifyResource {
|
||||||
if evt.DataType == -1 {
|
if evt.DataType == -1 {
|
||||||
@@ -148,14 +150,6 @@ func (abs *StreamService) pass(event *common.Event, action tools.PubSubAction) e
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol string) error {
|
func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol string) error {
|
||||||
resource, err := resources.ToResource(int(evt.DataType), evt.Payload)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
b, err := json.Marshal(resource)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
switch protocol {
|
switch protocol {
|
||||||
case ProtocolSearchResource:
|
case ProtocolSearchResource:
|
||||||
if evt.DataType < 0 {
|
if evt.DataType < 0 {
|
||||||
@@ -169,20 +163,20 @@ func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol stri
|
|||||||
ps.SendResponse(p[0], evt)
|
ps.SendResponse(p[0], evt)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
case ProtocolCreateResource:
|
case ProtocolCreateResource, ProtocolUpdateResource:
|
||||||
case ProtocolUpdateResource:
|
fmt.Println("RECEIVED Protocol.Update")
|
||||||
go tools.NewNATSCaller().SetNATSPub(tools.CREATE_RESOURCE, tools.NATSResponse{
|
go tools.NewNATSCaller().SetNATSPub(tools.CREATE_RESOURCE, tools.NATSResponse{
|
||||||
FromApp: "oc-discovery",
|
FromApp: "oc-discovery",
|
||||||
Datatype: tools.DataType(evt.DataType),
|
Datatype: tools.DataType(evt.DataType),
|
||||||
Method: int(tools.CREATE_RESOURCE),
|
Method: int(tools.CREATE_RESOURCE),
|
||||||
Payload: b,
|
Payload: evt.Payload,
|
||||||
})
|
})
|
||||||
case ProtocolDeleteResource:
|
case ProtocolDeleteResource:
|
||||||
go tools.NewNATSCaller().SetNATSPub(tools.REMOVE_RESOURCE, tools.NATSResponse{
|
go tools.NewNATSCaller().SetNATSPub(tools.REMOVE_RESOURCE, tools.NATSResponse{
|
||||||
FromApp: "oc-discovery",
|
FromApp: "oc-discovery",
|
||||||
Datatype: tools.DataType(evt.DataType),
|
Datatype: tools.DataType(evt.DataType),
|
||||||
Method: int(tools.REMOVE_RESOURCE),
|
Method: int(tools.REMOVE_RESOURCE),
|
||||||
Payload: b,
|
Payload: evt.Payload,
|
||||||
})
|
})
|
||||||
default:
|
default:
|
||||||
return errors.New("no action authorized available : " + protocol)
|
return errors.New("no action authorized available : " + protocol)
|
||||||
@@ -213,9 +207,9 @@ func (abs *StreamService) SendResponse(p *peer.Peer, event *common.Event) error
|
|||||||
if j, err := json.Marshal(ss); err == nil {
|
if j, err := json.Marshal(ss); err == nil {
|
||||||
if event.DataType != -1 {
|
if event.DataType != -1 {
|
||||||
ndt := tools.DataType(dt.EnumIndex())
|
ndt := tools.DataType(dt.EnumIndex())
|
||||||
abs.PublishResources(&ndt, event.User, peerID, j)
|
abs.PublishCommon(&ndt, event.User, peerID, ProtocolSearchResource, j)
|
||||||
} else {
|
} else {
|
||||||
abs.PublishResources(nil, event.User, peerID, j)
|
abs.PublishCommon(nil, event.User, peerID, ProtocolSearchResource, j)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -15,81 +15,45 @@ import (
|
|||||||
"github.com/libp2p/go-libp2p/core/protocol"
|
"github.com/libp2p/go-libp2p/core/protocol"
|
||||||
)
|
)
|
||||||
|
|
||||||
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) {
|
func (ps *StreamService) PublishesCommon(dt *tools.DataType, user string, filter *dbs.Filters, resource []byte, protos ...protocol.ID) error {
|
||||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||||
p := access.LoadOne(toPeerID)
|
p := access.Search(filter, "", false)
|
||||||
if p.Err != "" {
|
for _, pes := range p.Data {
|
||||||
return nil, errors.New(p.Err)
|
for _, proto := range protos {
|
||||||
} else {
|
if _, err := ps.PublishCommon(dt, user, pes.(*peer.Peer).PeerID, proto, resource); err != nil {
|
||||||
ad, err := pp.AddrInfoFromString(p.Data.(*peer.Peer).StreamAddress)
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) {
|
||||||
|
fmt.Println("PublishCommon")
|
||||||
|
if toPeerID == ps.Key.String() {
|
||||||
|
return nil, errors.New("Can't send to ourself !")
|
||||||
|
}
|
||||||
|
|
||||||
|
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||||
|
p := access.Search(&dbs.Filters{
|
||||||
|
And: map[string][]dbs.Filter{ // search by name if no filters are provided
|
||||||
|
"peer_id": {{Operator: dbs.EQUAL.String(), Value: toPeerID}},
|
||||||
|
},
|
||||||
|
}, toPeerID, false)
|
||||||
|
var pe *peer.Peer
|
||||||
|
if len(p.Data) > 0 && p.Data[0].(*peer.Peer).Relation != peer.BLACKLIST {
|
||||||
|
pe = p.Data[0].(*peer.Peer)
|
||||||
|
} else if pps, err := ps.Node.GetPeerRecord(context.Background(), toPeerID); err == nil && len(pps) > 0 {
|
||||||
|
pe = pps[0]
|
||||||
|
}
|
||||||
|
if pe != nil {
|
||||||
|
ad, err := pp.AddrInfoFromString(p.Data[0].(*peer.Peer).StreamAddress)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
return ps.write(toPeerID, ad, dt, user, resource, proto)
|
return ps.write(toPeerID, ad, dt, user, resource, proto)
|
||||||
}
|
}
|
||||||
}
|
return nil, errors.New("peer unvalid " + toPeerID)
|
||||||
|
|
||||||
func (ps *StreamService) PublishResources(dt *tools.DataType, user string, toPeerID string, resource []byte) error {
|
|
||||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
|
||||||
p := access.LoadOne(toPeerID)
|
|
||||||
if p.Err != "" {
|
|
||||||
return errors.New(p.Err)
|
|
||||||
} else {
|
|
||||||
ad, err := pp.AddrInfoFromString(p.Data.(*peer.Peer).StreamAddress)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
ps.write(toPeerID, ad, dt, user, resource, ProtocolSearchResource)
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|
||||||
func (ps *StreamService) SearchKnownPublishEvent(dt *tools.DataType, user string, search string) error {
|
|
||||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
|
||||||
peers := access.Search(&dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
|
||||||
And: map[string][]dbs.Filter{
|
|
||||||
"": {{Operator: dbs.NOT.String(), Value: dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
|
||||||
And: map[string][]dbs.Filter{
|
|
||||||
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.BLACKLIST}},
|
|
||||||
},
|
|
||||||
}}},
|
|
||||||
},
|
|
||||||
}, search, false)
|
|
||||||
if peers.Err != "" {
|
|
||||||
return errors.New(peers.Err)
|
|
||||||
} else {
|
|
||||||
b, err := json.Marshal(map[string]string{"search": search})
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
for _, p := range peers.Data {
|
|
||||||
ad, err := pp.AddrInfoFromString(p.(*peer.Peer).StreamAddress)
|
|
||||||
if err != nil {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
ps.write(p.GetID(), ad, dt, user, b, ProtocolSearchResource)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|
||||||
func (ps *StreamService) SearchPartnersPublishEvent(dt *tools.DataType, user string, search string) error {
|
|
||||||
if peers, err := ps.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex())); err != nil {
|
|
||||||
return err
|
|
||||||
} else {
|
|
||||||
b, err := json.Marshal(map[string]string{"search": search})
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
for _, p := range peers {
|
|
||||||
ad, err := pp.AddrInfoFromString(p.StreamAddress)
|
|
||||||
if err != nil {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
ps.write(p.GetID(), ad, dt, user, b, ProtocolSearchResource)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func (ps *StreamService) ToPartnerPublishEvent(
|
func (ps *StreamService) ToPartnerPublishEvent(
|
||||||
@@ -103,12 +67,23 @@ func (ps *StreamService) ToPartnerPublishEvent(
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
ps.Mu.Lock()
|
|
||||||
defer ps.Mu.Unlock()
|
if pe, err := oclib.GetMySelf(); err != nil {
|
||||||
|
return err
|
||||||
|
} else if pe.GetID() == p.GetID() {
|
||||||
|
return fmt.Errorf("can't send to ourself")
|
||||||
|
} else {
|
||||||
|
pe.Relation = p.Relation
|
||||||
|
pe.Verify = false
|
||||||
|
if b2, err := json.Marshal(pe); err == nil {
|
||||||
|
if _, err := ps.PublishCommon(dt, user, p.PeerID, ProtocolUpdateResource, b2); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
if p.Relation == peer.PARTNER {
|
if p.Relation == peer.PARTNER {
|
||||||
if ps.Streams[ProtocolHeartbeatPartner] == nil {
|
if ps.Streams[ProtocolHeartbeatPartner] == nil {
|
||||||
ps.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{}
|
ps.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{}
|
||||||
}
|
}
|
||||||
|
fmt.Println("SHOULD CONNECT")
|
||||||
ps.ConnectToPartner(p.StreamAddress)
|
ps.ConnectToPartner(p.StreamAddress)
|
||||||
} else if ps.Streams[ProtocolHeartbeatPartner] != nil && ps.Streams[ProtocolHeartbeatPartner][pid] != nil {
|
} else if ps.Streams[ProtocolHeartbeatPartner] != nil && ps.Streams[ProtocolHeartbeatPartner][pid] != nil {
|
||||||
for _, pids := range ps.Streams {
|
for _, pids := range ps.Streams {
|
||||||
@@ -117,21 +92,19 @@ func (ps *StreamService) ToPartnerPublishEvent(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
if peers, err := ps.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex())); err != nil {
|
ks := []protocol.ID{}
|
||||||
return err
|
for k := range protocolsPartners {
|
||||||
} else {
|
ks = append(ks, k)
|
||||||
for _, p := range peers {
|
|
||||||
for protocol := range protocolsPartners {
|
|
||||||
ad, err := pp.AddrInfoFromString(p.StreamAddress)
|
|
||||||
if err != nil {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
ps.write(p.GetID(), ad, dt, user, payload, protocol)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
ps.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||||
|
And: map[string][]dbs.Filter{
|
||||||
|
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
|
||||||
|
},
|
||||||
|
}, payload, ks...)
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -158,6 +131,7 @@ func (s *StreamService) write(
|
|||||||
}
|
}
|
||||||
stream := s.Streams[proto][peerID.ID]
|
stream := s.Streams[proto][peerID.ID]
|
||||||
evt := common.NewEvent(string(proto), peerID.ID.String(), dt, user, payload)
|
evt := common.NewEvent(string(proto), peerID.ID.String(), dt, user, payload)
|
||||||
|
fmt.Println("SEND EVENT ", evt.From, evt.DataType, evt.Timestamp)
|
||||||
if err := json.NewEncoder(stream.Stream).Encode(evt); err != nil {
|
if err := json.NewEncoder(stream.Stream).Encode(evt); err != nil {
|
||||||
stream.Stream.Close()
|
stream.Stream.Close()
|
||||||
logger.Err(err)
|
logger.Err(err)
|
||||||
|
|||||||
@@ -116,7 +116,7 @@ func (s *StreamService) HandlePartnerHeartbeat(stream network.Stream) {
|
|||||||
streamsAnonym[k] = v
|
streamsAnonym[k] = v
|
||||||
}
|
}
|
||||||
s.Mu.Unlock()
|
s.Mu.Unlock()
|
||||||
pid, hb, err := common.CheckHeartbeat(s.Host, stream, streamsAnonym, &s.Mu, s.maxNodesConn)
|
pid, hb, err := common.CheckHeartbeat(s.Host, stream, json.NewDecoder(stream), streamsAnonym, &s.Mu, s.maxNodesConn)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
@@ -132,10 +132,12 @@ func (s *StreamService) HandlePartnerHeartbeat(stream network.Stream) {
|
|||||||
s.ConnectToPartner(val)
|
s.ConnectToPartner(val)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
go s.StartGC(30 * time.Second)
|
// GC is already running via InitStream — starting a new ticker goroutine on
|
||||||
|
// every heartbeat would leak an unbounded number of goroutines.
|
||||||
}
|
}
|
||||||
|
|
||||||
func (s *StreamService) connectToPartners() error {
|
func (s *StreamService) connectToPartners() error {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
for proto, info := range protocolsPartners {
|
for proto, info := range protocolsPartners {
|
||||||
f := func(ss network.Stream) {
|
f := func(ss network.Stream) {
|
||||||
if s.Streams[proto] == nil {
|
if s.Streams[proto] == nil {
|
||||||
@@ -147,11 +149,12 @@ func (s *StreamService) connectToPartners() error {
|
|||||||
}
|
}
|
||||||
go s.readLoop(s.Streams[proto][ss.Conn().RemotePeer()], ss.Conn().RemotePeer(), proto, info)
|
go s.readLoop(s.Streams[proto][ss.Conn().RemotePeer()], ss.Conn().RemotePeer(), proto, info)
|
||||||
}
|
}
|
||||||
fmt.Println("SetStreamHandler", proto)
|
logger.Info().Msg("SetStreamHandler " + string(proto))
|
||||||
s.Host.SetStreamHandler(proto, f)
|
s.Host.SetStreamHandler(proto, f)
|
||||||
}
|
}
|
||||||
peers, err := s.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex()))
|
peers, err := s.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex()))
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
logger.Err(err)
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
for _, p := range peers {
|
for _, p := range peers {
|
||||||
@@ -161,19 +164,19 @@ func (s *StreamService) connectToPartners() error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (s *StreamService) ConnectToPartner(address string) {
|
func (s *StreamService) ConnectToPartner(address string) {
|
||||||
|
logger := oclib.GetLogger()
|
||||||
if ad, err := pp.AddrInfoFromString(address); err == nil {
|
if ad, err := pp.AddrInfoFromString(address); err == nil {
|
||||||
|
logger.Info().Msg("Connect to Partner " + ProtocolHeartbeatPartner + " " + address)
|
||||||
common.SendHeartbeat(context.Background(), ProtocolHeartbeatPartner, conf.GetConfig().Name,
|
common.SendHeartbeat(context.Background(), ProtocolHeartbeatPartner, conf.GetConfig().Name,
|
||||||
s.Host, s.Streams, map[string]*pp.AddrInfo{address: ad}, 20*time.Second)
|
s.Host, s.Streams, map[string]*pp.AddrInfo{address: ad}, nil, 20*time.Second)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
|
func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
|
||||||
/* TODO FOR TEST ONLY A VARS THAT DEFINE ADDRESS... deserialize */
|
|
||||||
ps := []*peer.Peer{}
|
ps := []*peer.Peer{}
|
||||||
if conf.GetConfig().PeerIDS != "" {
|
if conf.GetConfig().PeerIDS != "" {
|
||||||
for _, peerID := range strings.Split(conf.GetConfig().PeerIDS, ",") {
|
for _, peerID := range strings.Split(conf.GetConfig().PeerIDS, ",") {
|
||||||
ppID := strings.Split(peerID, "/")
|
ppID := strings.Split(peerID, "/")
|
||||||
fmt.Println(ppID, peerID)
|
|
||||||
ps = append(ps, &peer.Peer{
|
ps = append(ps, &peer.Peer{
|
||||||
AbstractObject: utils.AbstractObject{
|
AbstractObject: utils.AbstractObject{
|
||||||
UUID: uuid.New().String(),
|
UUID: uuid.New().String(),
|
||||||
@@ -185,7 +188,6 @@ func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||||
peers := access.Search(nil, search, false)
|
peers := access.Search(nil, search, false)
|
||||||
for _, p := range peers.Data {
|
for _, p := range peers.Data {
|
||||||
@@ -252,8 +254,9 @@ func (ps *StreamService) readLoop(s *common.Stream, id pp.ID, proto protocol.ID,
|
|||||||
}
|
}
|
||||||
var evt common.Event
|
var evt common.Event
|
||||||
if err := json.NewDecoder(s.Stream).Decode(&evt); err != nil {
|
if err := json.NewDecoder(s.Stream).Decode(&evt); err != nil {
|
||||||
s.Stream.Close()
|
// Any decode error (EOF, reset, malformed JSON) terminates the loop;
|
||||||
continue
|
// continuing on a dead/closed stream creates an infinite spin.
|
||||||
|
return
|
||||||
}
|
}
|
||||||
ps.handleEvent(evt.Type, &evt)
|
ps.handleEvent(evt.Type, &evt)
|
||||||
if protocolInfo.WaitResponse && !protocolInfo.PersistantStream {
|
if protocolInfo.WaitResponse && !protocolInfo.PersistantStream {
|
||||||
|
|||||||
@@ -1,23 +1,33 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
IMAGE_BASE_NAME="oc-discovery"
|
IMAGE_BASE_NAME="oc-discovery"
|
||||||
DOCKERFILE_PATH="."
|
DOCKERFILE_PATH="."
|
||||||
|
|
||||||
for i in {0..3}; do
|
docker network create \
|
||||||
|
--subnet=172.40.0.0/24 \
|
||||||
|
discovery
|
||||||
|
|
||||||
|
for i in $(seq ${1:-0} ${2:-3}); do
|
||||||
NUM=$((i + 1))
|
NUM=$((i + 1))
|
||||||
PORT=$((4000 + $NUM))
|
PORT=$((4000 + $NUM))
|
||||||
|
|
||||||
IMAGE_NAME="${IMAGE_BASE_NAME}:${NUM}"
|
IMAGE_NAME="${IMAGE_BASE_NAME}:${NUM}"
|
||||||
|
|
||||||
|
|
||||||
echo "▶ Building image ${IMAGE_NAME} with CONF_NUM=${NUM}"
|
echo "▶ Building image ${IMAGE_NAME} with CONF_NUM=${NUM}"
|
||||||
docker build \
|
docker build \
|
||||||
--build-arg CONF_NUM=${NUM} \
|
--build-arg CONF_NUM=${NUM} \
|
||||||
-t ${IMAGE_NAME} \
|
-t "${IMAGE_BASE_NAME}_${NUM}" \
|
||||||
${DOCKERFILE_PATH}
|
${DOCKERFILE_PATH}
|
||||||
|
|
||||||
|
docker kill "${IMAGE_BASE_NAME}_${NUM}" | true
|
||||||
|
docker rm "${IMAGE_BASE_NAME}_${NUM}" | true
|
||||||
|
|
||||||
echo "▶ Running container ${IMAGE_NAME} on port ${PORT}:${PORT}"
|
echo "▶ Running container ${IMAGE_NAME} on port ${PORT}:${PORT}"
|
||||||
docker run -d \
|
docker run -d \
|
||||||
|
--network="${3:-oc}" \
|
||||||
-p ${PORT}:${PORT} \
|
-p ${PORT}:${PORT} \
|
||||||
--name "${IMAGE_BASE_NAME}_${NUM}" \
|
--name "${IMAGE_BASE_NAME}_${NUM}" \
|
||||||
${IMAGE_NAME}
|
"${IMAGE_BASE_NAME}_${NUM}"
|
||||||
|
|
||||||
|
docker network connect --ip "172.40.0.${NUM}" discovery "${IMAGE_BASE_NAME}_${NUM}"
|
||||||
done
|
done
|
||||||
10
docker_discovery10.json
Normal file
10
docker_discovery10.json
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
{
|
||||||
|
"MONGO_URL":"mongodb://mongo:27017/",
|
||||||
|
"MONGO_DATABASE":"DC_myDC",
|
||||||
|
"NATS_URL": "nats://nats:4222",
|
||||||
|
"NODE_MODE": "node",
|
||||||
|
"NODE_ENDPOINT_PORT": 4010,
|
||||||
|
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
|
||||||
|
"MIN_INDEXER": 2,
|
||||||
|
"PEER_IDS": "/ip4/172.40.0.9/tcp/4009/p2p/12D3KooWGnQfKwX9E4umCPE8dUKZuig4vw5BndDowRLEbGmcZyta"
|
||||||
|
}
|
||||||
@@ -4,5 +4,5 @@
|
|||||||
"NATS_URL": "nats://nats:4222",
|
"NATS_URL": "nats://nats:4222",
|
||||||
"NODE_MODE": "indexer",
|
"NODE_MODE": "indexer",
|
||||||
"NODE_ENDPOINT_PORT": 4002,
|
"NODE_ENDPOINT_PORT": 4002,
|
||||||
"INDEXER_ADDRESSES": "/ip4/172.19.0.2/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
"INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||||
}
|
}
|
||||||
@@ -4,5 +4,5 @@
|
|||||||
"NATS_URL": "nats://nats:4222",
|
"NATS_URL": "nats://nats:4222",
|
||||||
"NODE_MODE": "node",
|
"NODE_MODE": "node",
|
||||||
"NODE_ENDPOINT_PORT": 4003,
|
"NODE_ENDPOINT_PORT": 4003,
|
||||||
"INDEXER_ADDRESSES": "/ip4/172.19.0.3/tcp/4002/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
|
"INDEXER_ADDRESSES": "/ip4/172.40.0.2/tcp/4002/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
|
||||||
}
|
}
|
||||||
@@ -4,6 +4,6 @@
|
|||||||
"NATS_URL": "nats://nats:4222",
|
"NATS_URL": "nats://nats:4222",
|
||||||
"NODE_MODE": "node",
|
"NODE_MODE": "node",
|
||||||
"NODE_ENDPOINT_PORT": 4004,
|
"NODE_ENDPOINT_PORT": 4004,
|
||||||
"INDEXER_ADDRESSES": "/ip4/172.19.0.2/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
|
"INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
|
||||||
"PEER_IDS": "/ip4/172.19.0.4/tcp/4003/p2p/12D3KooWBh9kZrekBAE5G33q4jCLNRAzygem3gP1mMdK8mhoCTaw"
|
"PEER_IDS": "/ip4/172.40.0.3/tcp/4003/p2p/12D3KooWBh9kZrekBAE5G33q4jCLNRAzygem3gP1mMdK8mhoCTaw"
|
||||||
}
|
}
|
||||||
|
|||||||
7
docker_discovery5.json
Normal file
7
docker_discovery5.json
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
{
|
||||||
|
"MONGO_URL":"mongodb://mongo:27017/",
|
||||||
|
"MONGO_DATABASE":"DC_myDC",
|
||||||
|
"NATS_URL": "nats://nats:4222",
|
||||||
|
"NODE_MODE": "native-indexer",
|
||||||
|
"NODE_ENDPOINT_PORT": 4005
|
||||||
|
}
|
||||||
8
docker_discovery6.json
Normal file
8
docker_discovery6.json
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
{
|
||||||
|
"MONGO_URL":"mongodb://mongo:27017/",
|
||||||
|
"MONGO_DATABASE":"DC_myDC",
|
||||||
|
"NATS_URL": "nats://nats:4222",
|
||||||
|
"NODE_MODE": "native-indexer",
|
||||||
|
"NODE_ENDPOINT_PORT": 4006,
|
||||||
|
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||||
|
}
|
||||||
8
docker_discovery7.json
Normal file
8
docker_discovery7.json
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
{
|
||||||
|
"MONGO_URL":"mongodb://mongo:27017/",
|
||||||
|
"MONGO_DATABASE":"DC_myDC",
|
||||||
|
"NATS_URL": "nats://nats:4222",
|
||||||
|
"NODE_MODE": "indexer",
|
||||||
|
"NODE_ENDPOINT_PORT": 4007,
|
||||||
|
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
|
||||||
|
}
|
||||||
8
docker_discovery8.json
Normal file
8
docker_discovery8.json
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
{
|
||||||
|
"MONGO_URL":"mongodb://mongo:27017/",
|
||||||
|
"MONGO_DATABASE":"DC_myDC",
|
||||||
|
"NATS_URL": "nats://nats:4222",
|
||||||
|
"NODE_MODE": "indexer",
|
||||||
|
"NODE_ENDPOINT_PORT": 4008,
|
||||||
|
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||||
|
}
|
||||||
8
docker_discovery9.json
Normal file
8
docker_discovery9.json
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
{
|
||||||
|
"MONGO_URL":"mongodb://mongo:27017/",
|
||||||
|
"MONGO_DATABASE":"DC_myDC",
|
||||||
|
"NATS_URL": "nats://nats:4222",
|
||||||
|
"NODE_MODE": "node",
|
||||||
|
"NODE_ENDPOINT_PORT": 4009,
|
||||||
|
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u,/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||||
|
}
|
||||||
2
go.mod
2
go.mod
@@ -3,7 +3,7 @@ module oc-discovery
|
|||||||
go 1.25.0
|
go 1.25.0
|
||||||
|
|
||||||
require (
|
require (
|
||||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5
|
||||||
github.com/libp2p/go-libp2p v0.47.0
|
github.com/libp2p/go-libp2p v0.47.0
|
||||||
github.com/libp2p/go-libp2p-record v0.3.1
|
github.com/libp2p/go-libp2p-record v0.3.1
|
||||||
github.com/multiformats/go-multiaddr v0.16.1
|
github.com/multiformats/go-multiaddr v0.16.1
|
||||||
|
|||||||
8
go.sum
8
go.sum
@@ -1,5 +1,13 @@
|
|||||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7 h1:p9uJjMY+QkE4neA+xRmIRtAm9us94EKZqgajDdLOd0Y=
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7 h1:p9uJjMY+QkE4neA+xRmIRtAm9us94EKZqgajDdLOd0Y=
|
||||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||||
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c h1:FTUu9tdEfib6J+fuc7e5wYTe++EIlB70bVNpOeFjnyU=
|
||||||
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||||
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0 h1:lvrRF4ToIMl/5k1q4AiPEy6ycjwRtOaDhWnQ/LrW1ZA=
|
||||||
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||||
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31 h1:hvkvJibS9NmImw73j79Ov5VpIYs4WbP4SYGlK/XO82Q=
|
||||||
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||||
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5 h1:h+Fkyj6cfwAirc0QGCBEkZSSrgcyThXswg7ytOLm948=
|
||||||
|
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||||
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
||||||
github.com/Masterminds/semver/v3 v3.4.0 h1:Zog+i5UMtVoCU8oKka5P7i9q9HgrJeGzI9SA1Xbatp0=
|
github.com/Masterminds/semver/v3 v3.4.0 h1:Zog+i5UMtVoCU8oKka5P7i9q9HgrJeGzI9SA1Xbatp0=
|
||||||
github.com/Masterminds/semver/v3 v3.4.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM=
|
github.com/Masterminds/semver/v3 v3.4.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM=
|
||||||
|
|||||||
6
main.go
6
main.go
@@ -28,11 +28,15 @@ func main() {
|
|||||||
conf.GetConfig().PSKPath = o.GetStringDefault("PSK_PATH", "./psk/psk.key")
|
conf.GetConfig().PSKPath = o.GetStringDefault("PSK_PATH", "./psk/psk.key")
|
||||||
conf.GetConfig().NodeEndpointPort = o.GetInt64Default("NODE_ENDPOINT_PORT", 4001)
|
conf.GetConfig().NodeEndpointPort = o.GetInt64Default("NODE_ENDPOINT_PORT", 4001)
|
||||||
conf.GetConfig().IndexerAddresses = o.GetStringDefault("INDEXER_ADDRESSES", "")
|
conf.GetConfig().IndexerAddresses = o.GetStringDefault("INDEXER_ADDRESSES", "")
|
||||||
|
conf.GetConfig().NativeIndexerAddresses = o.GetStringDefault("NATIVE_INDEXER_ADDRESSES", "")
|
||||||
|
|
||||||
conf.GetConfig().PeerIDS = o.GetStringDefault("PEER_IDS", "")
|
conf.GetConfig().PeerIDS = o.GetStringDefault("PEER_IDS", "")
|
||||||
|
|
||||||
conf.GetConfig().NodeMode = o.GetStringDefault("NODE_MODE", "node")
|
conf.GetConfig().NodeMode = o.GetStringDefault("NODE_MODE", "node")
|
||||||
|
|
||||||
|
conf.GetConfig().MinIndexer = o.GetIntDefault("MIN_INDEXER", 1)
|
||||||
|
conf.GetConfig().MaxIndexer = o.GetIntDefault("MAX_INDEXER", 5)
|
||||||
|
|
||||||
ctx, stop := signal.NotifyContext(
|
ctx, stop := signal.NotifyContext(
|
||||||
context.Background(),
|
context.Background(),
|
||||||
os.Interrupt,
|
os.Interrupt,
|
||||||
@@ -47,7 +51,7 @@ func main() {
|
|||||||
if n, err := node.InitNode(isNode, isIndexer, isNativeIndexer); err != nil {
|
if n, err := node.InitNode(isNode, isIndexer, isNativeIndexer); err != nil {
|
||||||
panic(err)
|
panic(err)
|
||||||
} else {
|
} else {
|
||||||
<-ctx.Done() // 👈 the only blocking point
|
<-ctx.Done() // the only blocking point
|
||||||
log.Println("shutting down")
|
log.Println("shutting down")
|
||||||
n.Close()
|
n.Close()
|
||||||
}
|
}
|
||||||
|
|||||||
3
pem/private10.pem
Normal file
3
pem/private10.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PRIVATE KEY-----
|
||||||
|
MC4CAQAwBQYDK2VwBCIEIPc7D3Mgb1U2Ipyb/85hA4Ew7dC8zHDEuQYSjqzzRgLK
|
||||||
|
-----END PRIVATE KEY-----
|
||||||
3
pem/private5.pem
Normal file
3
pem/private5.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PRIVATE KEY-----
|
||||||
|
MC4CAQAwBQYDK2VwBCIEIK2oBaOtGNchE09MBRtPd5oEOUcVUQG2ndym5wKExj7R
|
||||||
|
-----END PRIVATE KEY-----
|
||||||
3
pem/private6.pem
Normal file
3
pem/private6.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PRIVATE KEY-----
|
||||||
|
MC4CAQAwBQYDK2VwBCIEIE58GDazCyF1jp796ivSmHiCepbkC8TpzliIaQ7eGEpu
|
||||||
|
-----END PRIVATE KEY-----
|
||||||
3
pem/private7.pem
Normal file
3
pem/private7.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PRIVATE KEY-----
|
||||||
|
MC4CAQAwBQYDK2VwBCIEIAeX4O7ldwehRSnPkbzuE6csyo63vjvqAcNNujENOKUC
|
||||||
|
-----END PRIVATE KEY-----
|
||||||
3
pem/private8.pem
Normal file
3
pem/private8.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PRIVATE KEY-----
|
||||||
|
MC4CAQAwBQYDK2VwBCIEIEkgqINXDLnxIJZs2LEK9O4vdsqk43dwbULGUE25AWuR
|
||||||
|
-----END PRIVATE KEY-----
|
||||||
3
pem/private9.pem
Normal file
3
pem/private9.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PRIVATE KEY-----
|
||||||
|
MC4CAQAwBQYDK2VwBCIEIBcflxGlZYyUVJoExC94rHZbIyKMwZ+Oh7EDkb0qUlxd
|
||||||
|
-----END PRIVATE KEY-----
|
||||||
3
pem/public10.pem
Normal file
3
pem/public10.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PUBLIC KEY-----
|
||||||
|
MCowBQYDK2VwAyEAEomuEQGmGsYVw35C6DB5tfY8LI8jm359ceAxRX8eQ0o=
|
||||||
|
-----END PUBLIC KEY-----
|
||||||
3
pem/public5.pem
Normal file
3
pem/public5.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PUBLIC KEY-----
|
||||||
|
MCowBQYDK2VwAyEAZ2nLJBL8a5opfa8nFeVj0SZToW8pl4+zgcSUkeZFRO4=
|
||||||
|
-----END PUBLIC KEY-----
|
||||||
3
pem/public6.pem
Normal file
3
pem/public6.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PUBLIC KEY-----
|
||||||
|
MCowBQYDK2VwAyEAIQVeSGwsjPjyepPTnzzYqVxIxviSEjZXU7C7zuNTui4=
|
||||||
|
-----END PUBLIC KEY-----
|
||||||
3
pem/public7.pem
Normal file
3
pem/public7.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PUBLIC KEY-----
|
||||||
|
MCowBQYDK2VwAyEAG95Ettl3jTi41HM8le1A9WDmOEq0ANEqpLF7zTZrfXA=
|
||||||
|
-----END PUBLIC KEY-----
|
||||||
3
pem/public8.pem
Normal file
3
pem/public8.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PUBLIC KEY-----
|
||||||
|
MCowBQYDK2VwAyEA/ymOIb0sJ0qCWrf3mKz7ACCvsMXLog/EK533JfNXZTM=
|
||||||
|
-----END PUBLIC KEY-----
|
||||||
3
pem/public9.pem
Normal file
3
pem/public9.pem
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
-----BEGIN PUBLIC KEY-----
|
||||||
|
MCowBQYDK2VwAyEAZ4F3KqOp/5QrPdZGqqX6PYYEGd2snX4Q3AUt9XAG3v8=
|
||||||
|
-----END PUBLIC KEY-----
|
||||||
Reference in New Issue
Block a user