demo test + Peer
This commit is contained in:
495
ARCHITECTURE.md
Normal file
495
ARCHITECTURE.md
Normal file
@@ -0,0 +1,495 @@
|
||||
# oc-discovery — Architecture et analyse technique
|
||||
|
||||
> **Convention de lecture**
|
||||
> Les points marqués ✅ ont été corrigés dans le code. Les points marqués ⚠️ restent ouverts.
|
||||
|
||||
## Table des matières
|
||||
|
||||
1. [Vue d'ensemble](#1-vue-densemble)
|
||||
2. [Hiérarchie des rôles](#2-hiérarchie-des-rôles)
|
||||
3. [Mécanismes principaux](#3-mécanismes-principaux)
|
||||
- 3.1 Heartbeat long-lived (node → indexer)
|
||||
- 3.2 Scoring de confiance
|
||||
- 3.3 Enregistrement auprès des natifs (indexer → native)
|
||||
- 3.4 Pool d'indexeurs : fetch + consensus
|
||||
- 3.5 Self-delegation et offload loop
|
||||
- 3.6 Résilience du mesh natif
|
||||
- 3.7 DHT partagée
|
||||
- 3.8 PubSub gossip (indexer registry)
|
||||
- 3.9 Streams applicatifs (node ↔ node)
|
||||
4. [Tableau récapitulatif](#4-tableau-récapitulatif)
|
||||
5. [Risques et limites globaux](#5-risques-et-limites-globaux)
|
||||
6. [Pistes d'amélioration](#6-pistes-damélioration)
|
||||
|
||||
---
|
||||
|
||||
## 1. Vue d'ensemble
|
||||
|
||||
`oc-discovery` est un service de découverte P2P pour le réseau OpenCloud. Il repose sur
|
||||
**libp2p** (transport TCP + PSK réseau privé) et une **DHT Kademlia** (préfixe `oc`)
|
||||
pour indexer les pairs. L'architecture est intentionnellement hiérarchique : des _natifs_
|
||||
stables servent de hubs autoritaires auxquels des _indexeurs_ s'enregistrent, et des _nœuds_
|
||||
ordinaires découvrent des indexeurs via ces natifs.
|
||||
|
||||
```
|
||||
┌──────────────┐ heartbeat ┌──────────────────┐
|
||||
│ Node │ ───────────────────► │ Indexer │
|
||||
│ (libp2p) │ ◄─────────────────── │ (DHT server) │
|
||||
└──────────────┘ stream applicatif └────────┬─────────┘
|
||||
│ subscribe / heartbeat
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Native Indexer │◄──► autres natifs
|
||||
│ (hub autoritaire│ (mesh)
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
Tous les participants partagent une **clé pré-partagée (PSK)** qui isole le réseau
|
||||
des connexions libp2p externes non autorisées.
|
||||
|
||||
---
|
||||
|
||||
## 2. Hiérarchie des rôles
|
||||
|
||||
| Rôle | Binaire | Responsabilité |
|
||||
|---|---|---|
|
||||
| **Node** | `node_mode=node` | Se fait indexer, publie/consulte des records DHT |
|
||||
| **Indexer** | `node_mode=indexer` | Reçoit les heartbeats, écrit en DHT, s'enregistre auprès des natifs |
|
||||
| **Native Indexer** | `node_mode=native` | Hub : tient le registre des indexeurs vivants, évalue le consensus, sert de fallback |
|
||||
|
||||
Un même processus peut cumuler les rôles node+indexer ou indexer+native.
|
||||
|
||||
---
|
||||
|
||||
## 3. Mécanismes principaux
|
||||
|
||||
### 3.1 Heartbeat long-lived (node → indexer)
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Un stream libp2p **persistant** (`/opencloud/heartbeat/1.0`) est ouvert depuis le nœud
|
||||
vers chaque indexeur de son pool (`StaticIndexers`). Toutes les 20 secondes, le nœud
|
||||
envoie un `Heartbeat` JSON sur ce stream. L'indexeur répond en enregistrant le peer dans
|
||||
`StreamRecords[ProtocolHeartbeat]` avec une expiry de 2 min.
|
||||
|
||||
Si `sendHeartbeat` échoue (stream reset, EOF, timeout), le peer est retiré de
|
||||
`StaticIndexers` et `replenishIndexersFromNative` est déclenché.
|
||||
|
||||
**Avantages**
|
||||
- Détection rapide de déconnexion (erreur sur le prochain encode).
|
||||
- Un seul stream par pair réduit la pression sur les connexions TCP.
|
||||
- Le channel de nudge (`indexerHeartbeatNudge`) permet un reconnect immédiat sans
|
||||
attendre le ticker de 20 s.
|
||||
|
||||
**Limites / risques**
|
||||
- ⚠️ Un seul stream persistant : si la couche TCP reste ouverte mais "gelée" (middlebox,
|
||||
NAT silencieux), l'erreur peut ne pas remonter avant plusieurs minutes.
|
||||
- ⚠️ `StaticIndexers` est une map partagée globale : si deux goroutines appellent
|
||||
`replenishIndexersFromNative` simultanément (cas de perte multiple), on peut avoir
|
||||
des écritures concurrentes non protégées hors des sections critiques.
|
||||
|
||||
---
|
||||
|
||||
### 3.2 Scoring de confiance
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Avant d'enregistrer un heartbeat dans `StreamRecords`, l'indexeur vérifie un **score
|
||||
minimum** calculé par `CheckHeartbeat` :
|
||||
|
||||
```
|
||||
Score = (0.4 × uptime_ratio + 0.4 × bpms + 0.2 × diversity) × 100
|
||||
```
|
||||
|
||||
- `uptime_ratio` : durée de présence du peer / durée depuis le démarrage de l'indexeur.
|
||||
- `bpms` : débit mesuré via un stream dédié (`/opencloud/probe/1.0`) normalisé par 50 Mbps.
|
||||
- `diversity` : ratio d'IP /24 distincts parmi les indexeurs que le peer déclare.
|
||||
|
||||
Deux seuils sont appliqués selon l'état du peer :
|
||||
- **Premier heartbeat** (peer absent de `StreamRecords`, uptime = 0) : seuil à **40**.
|
||||
- **Heartbeats suivants** (uptime accumulé) : seuil à **75**.
|
||||
|
||||
**Avantages**
|
||||
- Décourage les peers éphémères ou lents d'encombrer le registre.
|
||||
- La diversité réseau réduit le risque de concentration sur un seul sous-réseau.
|
||||
- Le stream de probe dédié évite de polluer le stream JSON heartbeat avec des données binaires.
|
||||
- Le double seuil permet aux nouveaux peers d'être admis dès leur première connexion.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **Deadlock logique de démarrage corrigé** : avec uptime = 0 le score maximal était 60,
|
||||
en-dessous du seuil de 75. Les nouveaux peers étaient silencieusement rejetés à jamais.
|
||||
→ Seuil abaissé à **40** pour le premier heartbeat (`isFirstHeartbeat`), 75 ensuite.
|
||||
- ⚠️ Les seuils (40 / 75) restent câblés en dur, sans possibilité de configuration.
|
||||
- ⚠️ La mesure de bande passante envoie entre 512 et 2048 octets par heartbeat : à 20 s
|
||||
d'intervalle et 500 nœuds max, cela représente ~50 KB/s de trafic probe en continu.
|
||||
- ⚠️ `diversity` est calculé sur les adresses que le nœud *déclare* avoir — ce champ est
|
||||
auto-rapporté et non vérifié, facilement falsifiable.
|
||||
|
||||
---
|
||||
|
||||
### 3.3 Enregistrement auprès des natifs (indexer → native)
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Chaque indexeur (non-natif) envoie périodiquement (toutes les 60 s) une
|
||||
`IndexerRegistration` JSON sur un stream one-shot (`/opencloud/native/subscribe/1.0`)
|
||||
vers chaque natif configuré. Le natif :
|
||||
|
||||
1. Stocke l'entrée en cache local avec un TTL de **90 s** (`IndexerTTL`).
|
||||
2. Gossipe le `PeerID` sur le topic PubSub `oc-indexer-registry` aux autres natifs.
|
||||
3. Persiste l'entrée en DHT de manière asynchrone (retry jusqu'à succès).
|
||||
|
||||
**Avantages**
|
||||
- Stream jetable : pas de ressource longue durée côté natif pour les enregistrements.
|
||||
- Le cache local est immédiatement disponible pour `handleNativeGetIndexers` sans
|
||||
attendre la DHT.
|
||||
- La dissémination PubSub permet à d'autres natifs de connaître l'indexeur sans
|
||||
qu'il ait besoin de s'y enregistrer directement.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **TTL trop serré corrigé** : le TTL de 66 s n'était que 10 % au-dessus de l'intervalle
|
||||
de 60 s — un léger retard réseau pouvait expirer un indexeur sain entre deux renewals.
|
||||
→ `IndexerTTL` porté à **90 s** (+50 %).
|
||||
- ⚠️ Si le `PutValue` DHT échoue définitivement (réseau partitionné), le natif possède
|
||||
l'entrée mais les autres natifs qui n'ont pas reçu le message PubSub ne la connaissent
|
||||
jamais — incohérence silencieuse.
|
||||
- ⚠️ `RegisterWithNative` ignore les adresses en `127.0.0.1`, mais ne gère pas
|
||||
les adresses privées (RFC1918) qui seraient non routables depuis d'autres hôtes.
|
||||
|
||||
---
|
||||
|
||||
### 3.4 Pool d'indexeurs : fetch + consensus
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Lors de `ConnectToNatives` (démarrage ou replenish), le nœud/indexeur :
|
||||
|
||||
1. **Fetch** : envoie `GetIndexersRequest` au premier natif répondant
|
||||
(`/opencloud/native/indexers/1.0`), reçoit une liste de candidats.
|
||||
2. **Consensus (round 1)** : interroge **tous** les natifs configurés en parallèle
|
||||
(`/opencloud/native/consensus/1.0`, timeout 3 s, collecte sur 4 s).
|
||||
Un indexeur est confirmé si **strictement plus de 50 %** des natifs répondants
|
||||
le considèrent vivant.
|
||||
3. **Consensus (round 2)** : si le pool est insuffisant, les suggestions des natifs
|
||||
(indexeurs qu'ils connaissent mais qui n'étaient pas dans les candidats initiaux)
|
||||
sont soumises à un second round.
|
||||
|
||||
**Avantages**
|
||||
- La règle de majorité absolue empêche un natif compromis ou désynchronisé d'injecter
|
||||
des indexeurs fantômes.
|
||||
- Le double round permet de compléter le pool avec des alternatives connues des natifs
|
||||
sans sacrifier la vérification.
|
||||
- Si le fetch retourne un **fallback** (natif comme indexeur), le consensus est skippé —
|
||||
cohérent car il n'y a qu'une seule source.
|
||||
|
||||
**Limites / risques**
|
||||
- ⚠️ Avec **un seul natif** configuré (très courant en dev/test), le consensus est trivial
|
||||
(100 % d'un seul vote) — la règle de majorité ne protège rien dans ce cas.
|
||||
- ⚠️ `fetchIndexersFromNative` s'arrête au **premier natif répondant** (séquentiellement) :
|
||||
si ce natif a un cache périmé ou partiel, le nœud obtient un pool sous-optimal sans
|
||||
consulter les autres.
|
||||
- ⚠️ Le timeout de collecte global (4 s) est fixe : sur un réseau lent ou géographiquement
|
||||
distribué, des natifs valides peuvent être éliminés faute de réponse à temps.
|
||||
- ⚠️ `replaceStaticIndexers` **ajoute** sans jamais retirer d'anciens indexeurs expirés :
|
||||
le pool peut accumuler des entrées mortes que seul le heartbeat purge ensuite.
|
||||
|
||||
---
|
||||
|
||||
### 3.5 Self-delegation et offload loop
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Si un natif ne dispose d'aucun indexeur vivant lors d'un `handleNativeGetIndexers`,
|
||||
il se désigne lui-même comme indexeur temporaire (`selfDelegate`) : il retourne sa propre
|
||||
adresse multiaddr et ajoute le demandeur dans `responsiblePeers`, dans la limite de
|
||||
`maxFallbackPeers` (50). Au-delà, la délégation est refusée et une réponse vide est
|
||||
retournée pour que le nœud tente un autre natif.
|
||||
|
||||
Toutes les 30 s, `runOffloadLoop` vérifie si des indexeurs réels sont de nouveau
|
||||
disponibles. Si oui, pour chaque peer responsable :
|
||||
- **Stream présent** : `Reset()` du stream heartbeat — le peer reçoit une erreur,
|
||||
déclenche `replenishIndexersFromNative` et migre vers de vrais indexeurs.
|
||||
- **Stream absent** (peer jamais admis par le scoring) : `ClosePeer()` sur la connexion
|
||||
réseau — le peer reconnecte et re-demande ses indexeurs au natif.
|
||||
|
||||
**Avantages**
|
||||
- Continuité de service : un nœud n'est jamais bloqué en l'absence temporaire d'indexeurs.
|
||||
- La migration est automatique et transparente pour le nœud.
|
||||
- `Reset()` (vs `Close()`) interrompt les deux sens du stream, garantissant que le peer
|
||||
reçoit bien une erreur.
|
||||
- La limite de 50 empêche le natif de se retrouver surchargé lors de pénuries prolongées.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **Offload sans stream corrigé** : si le heartbeat n'avait jamais été enregistré dans
|
||||
`StreamRecords` (score < seuil — cas amplifié par le bug de scoring), l'offload
|
||||
échouait silencieusement et le peer restait dans `responsiblePeers` indéfiniment.
|
||||
→ Branche `else` : `ClosePeer()` + suppression de `responsiblePeers`.
|
||||
- ✅ **`responsiblePeers` illimité corrigé** : le natif acceptait un nombre arbitraire
|
||||
de peers en self-delegation, devenant lui-même un indexeur surchargé.
|
||||
→ `selfDelegate` vérifie `len(responsiblePeers) >= maxFallbackPeers` et retourne
|
||||
`false` si saturé.
|
||||
- ⚠️ La délégation reste non coordonnée entre natifs : un natif surchargé refuse (retourne
|
||||
vide) mais ne redirige pas explicitement vers un natif voisin qui aurait de la capacité.
|
||||
|
||||
---
|
||||
|
||||
### 3.6 Résilience du mesh natif
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Quand le heartbeat vers un natif échoue, `replenishNativesFromPeers` tente de trouver
|
||||
un remplaçant dans cet ordre :
|
||||
|
||||
1. `fetchNativeFromNatives` : demande à chaque natif vivant (`/opencloud/native/peers/1.0`)
|
||||
une adresse de natif inconnue.
|
||||
2. `fetchNativeFromIndexers` : demande à chaque indexeur connu
|
||||
(`/opencloud/indexer/natives/1.0`) ses natifs configurés.
|
||||
3. Si aucun remplaçant et `remaining ≤ 1` : `retryLostNative` relance un ticker de 30 s
|
||||
qui retente la connexion directe au natif perdu.
|
||||
|
||||
`EnsureNativePeers` maintient des heartbeats de natif à natif via `ProtocolHeartbeat`,
|
||||
avec une **unique goroutine** couvrant toute la map `StaticNatives`.
|
||||
|
||||
**Avantages**
|
||||
- Le gossip multi-hop via indexeurs permet de retrouver un natif même si aucun pair
|
||||
direct ne le connaît.
|
||||
- `retryLostNative` gère le cas d'un seul natif (déploiement minimal).
|
||||
- La reconnexion automatique (`retryLostNative`) déclenche `replenishIndexersIfNeeded`
|
||||
pour restaurer aussi le pool d'indexeurs.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **Goroutines heartbeat multiples corrigé** : `EnsureNativePeers` démarrait une
|
||||
goroutine `SendHeartbeat` par adresse native (N natifs → N goroutines → N² heartbeats
|
||||
par tick). → Utilisation de `nativeMeshHeartbeatOnce` : une seule goroutine itère sur
|
||||
`StaticNatives`.
|
||||
- ⚠️ `retryLostNative` tourne indéfiniment sans condition d'arrêt liée à la vie du processus
|
||||
(pas de `context.Context`). Si le binaire est gracefully shutdown, cette goroutine
|
||||
peut bloquer.
|
||||
- ⚠️ La découverte transitoire (natif → indexeur → natif) est à sens unique : un indexeur
|
||||
ne connaît que les natifs de sa propre config, pas les nouveaux natifs qui auraient
|
||||
rejoint après son démarrage.
|
||||
|
||||
---
|
||||
|
||||
### 3.7 DHT partagée
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Tous les indexeurs et natifs participent à une DHT Kademlia (préfixe `oc`, mode
|
||||
`ModeServer`). Deux namespaces sont utilisés :
|
||||
|
||||
- `/node/<DID>` → `PeerRecord` JSON signé (publié par les indexeurs sur heartbeat de nœud).
|
||||
- `/indexer/<PeerID>` → `liveIndexerEntry` JSON avec TTL (publié par les natifs).
|
||||
|
||||
Chaque natif lance `refreshIndexersFromDHT` (toutes les 30 s) qui ré-hydrate son cache
|
||||
local depuis la DHT pour les PeerIDs connus (`knownPeerIDs`) dont l'entrée locale a expiré.
|
||||
|
||||
**Avantages**
|
||||
- Persistance décentralisée : un record survit à la perte d'un seul natif ou indexeur.
|
||||
- Validation des entrées : `PeerRecordValidator` et `IndexerRecordValidator` rejettent
|
||||
les records malformés ou expirés au moment du `PutValue`.
|
||||
- L'index secondaire `/name/<name>` permet la résolution par nom humain.
|
||||
|
||||
**Limites / risques**
|
||||
- ⚠️ La DHT Kademlia en réseau privé (PSK) est fonctionnelle mais les nœuds bootstrap
|
||||
ne sont pas configurés explicitement : la découverte dépend de connexions déjà établies,
|
||||
ce qui peut ralentir la convergence au démarrage.
|
||||
- ⚠️ `PutValue` est réessayé en boucle infinie si `"failed to find any peer in table"` —
|
||||
une panne de réseau prolongée génère des goroutines bloquées.
|
||||
- ⚠️ Si la PSK est compromise, un attaquant peut écrire dans la DHT ; les `liveIndexerEntry`
|
||||
d'indexeurs ne sont pas signées, contrairement aux `PeerRecord`.
|
||||
- ⚠️ `refreshIndexersFromDHT` prune `knownPeerIDs` si la DHT n'a aucune entrée fraîche,
|
||||
mais ne prune pas `liveIndexers` — une entrée expirée reste en mémoire jusqu'au GC
|
||||
ou au prochain refresh.
|
||||
|
||||
---
|
||||
|
||||
### 3.8 PubSub gossip (indexer registry)
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Quand un indexeur s'enregistre auprès d'un natif, ce dernier publie l'adresse sur le
|
||||
topic GossipSub `oc-indexer-registry`. Les autres natifs abonnés mettent à jour leur
|
||||
`knownPeerIDs` sans attendre la DHT.
|
||||
|
||||
Le `TopicValidator` rejette tout message dont le contenu n'est pas un multiaddr
|
||||
parseable valide avant qu'il n'atteigne la boucle de traitement.
|
||||
|
||||
**Avantages**
|
||||
- Dissémination quasi-instantanée entre natifs connectés.
|
||||
- Complément utile à la DHT pour les registrations récentes qui n'ont pas encore
|
||||
été persistées.
|
||||
- Le filtre syntaxique bloque les messages malformés avant propagation dans le mesh.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **`TopicValidator` sans validation corrigé** : le validateur acceptait systématiquement
|
||||
tous les messages (`return true`), permettant à un natif compromis de gossiper
|
||||
n'importe quelle donnée.
|
||||
→ Le validateur vérifie désormais que le message est un multiaddr parseable
|
||||
(`pp.AddrInfoFromString`).
|
||||
- ⚠️ La validation reste syntaxique uniquement : l'origine du message (l'émetteur
|
||||
est-il un natif légitime ?) n'est pas vérifiée.
|
||||
- ⚠️ Si le natif redémarre, il perd son abonnement et manque les messages publiés
|
||||
pendant son absence. La re-hydratation depuis la DHT compense, mais avec un délai
|
||||
pouvant aller jusqu'à 30 s.
|
||||
- ⚠️ Le gossip ne porte que le `Addr` de l'indexeur, pas sa TTL ni sa signature.
|
||||
|
||||
---
|
||||
|
||||
### 3.9 Streams applicatifs (node ↔ node)
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
`StreamService` gère les streams entre nœuds partenaires (relations `PARTNER` stockées
|
||||
en base) via des protocols dédiés (`/opencloud/resource/*`). Un heartbeat partenaire
|
||||
(`ProtocolHeartbeatPartner`) maintient les connexions actives. Les events sont routés
|
||||
via `handleEvent` et le système NATS en parallèle.
|
||||
|
||||
**Avantages**
|
||||
- TTL par protocol (`PersistantStream`, `WaitResponse`) adapte le comportement au
|
||||
type d'échange (longue durée pour le planner, courte pour les CRUDs).
|
||||
- La GC (`gc()` toutes les 8 s, démarrée une seule fois dans `InitStream`) libère
|
||||
rapidement les streams expirés.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **Fuite de goroutines GC corrigée** : `HandlePartnerHeartbeat` appelait
|
||||
`go s.StartGC(30s)` à chaque heartbeat reçu (~20 s), créant un nouveau ticker
|
||||
goroutine infini à chaque appel.
|
||||
→ Appel supprimé ; la GC lancée par `InitStream` est suffisante.
|
||||
- ✅ **Boucle infinie sur EOF corrigée** : `readLoop` effectuait `s.Stream.Close();
|
||||
continue` après une erreur de décodage, re-tentant indéfiniment de lire un stream
|
||||
fermé.
|
||||
→ Remplacé par `return` ; les defers (`Close`, `delete`) nettoient correctement.
|
||||
- ⚠️ La récupération de partenaires depuis `conf.PeerIDS` est marquée `TO REMOVE` :
|
||||
présence de code provisoire en production.
|
||||
|
||||
---
|
||||
|
||||
## 4. Tableau récapitulatif
|
||||
|
||||
| Mécanisme | Protocole | Avantage principal | État du risque |
|
||||
|---|---|---|---|
|
||||
| Heartbeat node→indexer | `/opencloud/heartbeat/1.0` | Détection rapide de perte | ⚠️ Stream TCP gelé non détecté |
|
||||
| Scoring de confiance | (inline dans heartbeat) | Filtre les pairs instables | ✅ Deadlock corrigé (seuil 40/75) |
|
||||
| Enregistrement natif | `/opencloud/native/subscribe/1.0` | TTL ample, cache immédiat | ✅ TTL porté à 90 s |
|
||||
| Fetch pool d'indexeurs | `/opencloud/native/indexers/1.0` | Prend le 1er natif répondant | ⚠️ Natif au cache périmé possible |
|
||||
| Consensus | `/opencloud/native/consensus/1.0` | Majorité absolue | ⚠️ Trivial avec 1 seul natif |
|
||||
| Self-delegation + offload | (in-memory) | Disponibilité sans indexeur | ✅ Limite 50 peers + ClosePeer |
|
||||
| Mesh natif | `/opencloud/native/peers/1.0` | Gossip multi-hop | ✅ Goroutines dédupliquées |
|
||||
| DHT | `/oc/kad/1.0.0` | Persistance décentralisée | ⚠️ Retry infini, pas de bootstrap |
|
||||
| PubSub registry | `oc-indexer-registry` | Dissémination rapide | ✅ Validation multiaddr |
|
||||
| Streams applicatifs | `/opencloud/resource/*` | TTL par protocol | ✅ Fuite GC + EOF corrigés |
|
||||
|
||||
---
|
||||
|
||||
## 5. Risques et limites globaux
|
||||
|
||||
### Sécurité
|
||||
|
||||
- ⚠️ **Adresses auto-rapportées non vérifiées** : le champ `IndexersBinded` dans le heartbeat
|
||||
est auto-déclaré par le nœud et sert à calculer la diversité. Un pair malveillant peut
|
||||
gonfler son score en déclarant de fausses adresses.
|
||||
- ⚠️ **PSK comme seule barrière d'entrée** : si la PSK est compromise (elle est statique et
|
||||
fichier-based), tout l'isolement réseau saute. Il n'y a pas de rotation de clé ni
|
||||
d'authentification supplémentaire par pair.
|
||||
- ⚠️ **DHT sans ACL sur les entrées indexeur** : la signature des `PeerRecord` est vérifiée
|
||||
à la lecture, mais les `liveIndexerEntry` ne sont pas signées. La validation PubSub
|
||||
bloque les multiaddrs invalides mais pas les adresses d'indexeurs légitimes usurpées.
|
||||
|
||||
### Disponibilité
|
||||
|
||||
- ⚠️ **Single point of failure natif** : avec un seul natif, la perte de celui-ci stoppe
|
||||
toute attribution d'indexeurs. `retryLostNative` pallie, mais sans indexeurs, les nœuds
|
||||
ne peuvent pas publier.
|
||||
- ⚠️ **Bootstrap DHT** : sans nœuds bootstrap explicites, la DHT met du temps à converger
|
||||
si les connexions initiales sont peu nombreuses.
|
||||
|
||||
### Cohérence
|
||||
|
||||
- ⚠️ **`replaceStaticIndexers` n'efface jamais** : d'anciens indexeurs morts restent dans
|
||||
`StaticIndexers` jusqu'à ce que le heartbeat échoue. Un nœud peut avoir un pool
|
||||
surévalué contenant des entrées inatteignables.
|
||||
- ⚠️ **`TimeWatcher` global** : défini une seule fois au démarrage de `ConnectToIndexers`.
|
||||
Si l'indexeur tourne depuis longtemps, les nouveaux nœuds auront un `uptime_ratio`
|
||||
durablement faible. Le seuil abaissé à 40 pour le premier heartbeat atténue l'impact
|
||||
initial, mais les heartbeats suivants devront accumuler un uptime suffisant.
|
||||
|
||||
---
|
||||
|
||||
## 6. Pistes d'amélioration
|
||||
|
||||
Les pistes déjà implémentées sont marquées ✅. Les pistes ouvertes restent à traiter.
|
||||
|
||||
### ✅ Score : double seuil pour les nouveaux peers
|
||||
~~Remplacer le seuil binaire~~ — **Implémenté** : seuil à 40 pour le premier heartbeat
|
||||
(peer absent de `StreamRecords`), 75 pour les suivants. Un peer peut désormais être admis
|
||||
dès sa première connexion sans bloquer sur l'uptime nul.
|
||||
_Fichier : `common/common_stream.go`, `CheckHeartbeat`_
|
||||
|
||||
### ✅ TTL indexeur aligné avec l'intervalle de renouvellement
|
||||
~~TTL de 66 s trop proche de 60 s~~ — **Implémenté** : `IndexerTTL` passé à **90 s**.
|
||||
_Fichier : `indexer/native.go`_
|
||||
|
||||
### ✅ Limite de la self-delegation
|
||||
~~`responsiblePeers` illimité~~ — **Implémenté** : `selfDelegate` retourne `false` quand
|
||||
`len(responsiblePeers) >= maxFallbackPeers` (50). Le site d'appel retourne une réponse
|
||||
vide et logue un warning.
|
||||
_Fichier : `indexer/native.go`_
|
||||
|
||||
### ✅ Validation PubSub des adresses gossipées
|
||||
~~`TopicValidator` accepte tout~~ — **Implémenté** : le validateur vérifie que le message
|
||||
est un multiaddr parseable via `pp.AddrInfoFromString`.
|
||||
_Fichier : `indexer/native.go`, `subscribeIndexerRegistry`_
|
||||
|
||||
### ✅ Goroutines heartbeat dédupliquées dans `EnsureNativePeers`
|
||||
~~Une goroutine par adresse native~~ — **Implémenté** : `nativeMeshHeartbeatOnce`
|
||||
garantit qu'une seule goroutine `SendHeartbeat` couvre toute la map `StaticNatives`.
|
||||
_Fichier : `common/native_stream.go`_
|
||||
|
||||
### ✅ Fuite de goroutines GC dans `HandlePartnerHeartbeat`
|
||||
~~`go s.StartGC(30s)` à chaque heartbeat~~ — **Implémenté** : appel supprimé ; la GC
|
||||
de `InitStream` est suffisante.
|
||||
_Fichier : `stream/service.go`_
|
||||
|
||||
### ✅ Boucle infinie sur EOF dans `readLoop`
|
||||
~~`continue` après `Stream.Close()`~~ — **Implémenté** : remplacé par `return` pour
|
||||
laisser les defers nettoyer proprement.
|
||||
_Fichier : `stream/service.go`_
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ Fetch pool : interroger tous les natifs en parallèle
|
||||
|
||||
`fetchIndexersFromNative` s'arrête au premier natif répondant. Interroger tous les natifs
|
||||
en parallèle et fusionner les listes (similairement à `clientSideConsensus`) éviterait
|
||||
qu'un natif au cache périmé fournisse un pool sous-optimal.
|
||||
|
||||
### ⚠️ Consensus avec quorum configurable
|
||||
|
||||
Le seuil de confirmation (`count*2 > total`) est câblé en dur. Le rendre configurable
|
||||
(ex. `consensus_quorum: 0.67`) permettrait de durcir la règle sur des déploiements
|
||||
à 3+ natifs sans modifier le code.
|
||||
|
||||
### ⚠️ Désenregistrement explicite
|
||||
|
||||
Ajouter un protocole `/opencloud/native/unsubscribe/1.0` : quand un indexeur s'arrête
|
||||
proprement, il notifie les natifs pour invalider son TTL immédiatement plutôt qu'attendre
|
||||
90 s.
|
||||
|
||||
### ⚠️ Bootstrap DHT explicite
|
||||
|
||||
Configurer les natifs comme nœuds bootstrap DHT via `dht.BootstrapPeers` pour accélérer
|
||||
la convergence Kademlia au démarrage.
|
||||
|
||||
### ⚠️ Context propagé dans les goroutines longue durée
|
||||
|
||||
`retryLostNative`, `refreshIndexersFromDHT` et `runOffloadLoop` ne reçoivent aucun
|
||||
`context.Context`. Les passer depuis `InitNative` permettrait un arrêt propre lors du
|
||||
shutdown du processus.
|
||||
|
||||
### ⚠️ Redirection explicite lors du refus de self-delegation
|
||||
|
||||
Quand un natif refuse la self-delegation (pool saturé), retourner vide force le nœud à
|
||||
réessayer sans lui indiquer vers qui se tourner. Une liste de natifs alternatifs dans la
|
||||
réponse (`AlternativeNatives []string`) permettrait au nœud de trouver directement un
|
||||
natif moins chargé.
|
||||
@@ -15,6 +15,9 @@ type Config struct {
|
||||
PeerIDS string // TO REMOVE
|
||||
|
||||
NodeMode string
|
||||
|
||||
MinIndexer int
|
||||
MaxIndexer int
|
||||
}
|
||||
|
||||
var instance *Config
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
cr "crypto/rand"
|
||||
"encoding/json"
|
||||
@@ -28,6 +27,12 @@ type LongLivedStreamRecordedService[T interface{}] struct {
|
||||
StreamRecords map[protocol.ID]map[pp.ID]*StreamRecord[T]
|
||||
StreamMU sync.RWMutex
|
||||
maxNodesConn int
|
||||
// AfterHeartbeat is an optional hook called after each successful heartbeat update.
|
||||
// The indexer sets it to republish the embedded signed record to the DHT.
|
||||
AfterHeartbeat func(pid pp.ID)
|
||||
// AfterDelete is called after gc() evicts an expired peer, outside the lock.
|
||||
// name and did may be empty if the HeartbeatStream had no metadata.
|
||||
AfterDelete func(pid pp.ID, name string, did string)
|
||||
}
|
||||
|
||||
func NewStreamRecordedService[T interface{}](h host.Host, maxNodesConn int) *LongLivedStreamRecordedService[T] {
|
||||
@@ -54,16 +59,29 @@ func (ix *LongLivedStreamRecordedService[T]) StartGC(interval time.Duration) {
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) gc() {
|
||||
ix.StreamMU.Lock()
|
||||
defer ix.StreamMU.Unlock()
|
||||
now := time.Now().UTC()
|
||||
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
||||
ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{}
|
||||
ix.StreamMU.Unlock()
|
||||
return
|
||||
}
|
||||
streams := ix.StreamRecords[ProtocolHeartbeat]
|
||||
fmt.Println(StaticNatives, StaticIndexers, streams)
|
||||
|
||||
type gcEntry struct {
|
||||
pid pp.ID
|
||||
name string
|
||||
did string
|
||||
}
|
||||
var evicted []gcEntry
|
||||
for pid, rec := range streams {
|
||||
if now.After(rec.HeartbeatStream.Expiry) || now.Sub(rec.HeartbeatStream.UptimeTracker.LastSeen) > 2*rec.HeartbeatStream.Expiry.Sub(now) {
|
||||
name, did := "", ""
|
||||
if rec.HeartbeatStream != nil {
|
||||
name = rec.HeartbeatStream.Name
|
||||
did = rec.HeartbeatStream.DID
|
||||
}
|
||||
evicted = append(evicted, gcEntry{pid, name, did})
|
||||
for _, sstreams := range ix.StreamRecords {
|
||||
if sstreams[pid] != nil {
|
||||
delete(sstreams, pid)
|
||||
@@ -71,6 +89,13 @@ func (ix *LongLivedStreamRecordedService[T]) gc() {
|
||||
}
|
||||
}
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
|
||||
if ix.AfterDelete != nil {
|
||||
for _, e := range evicted {
|
||||
ix.AfterDelete(e.pid, e.name, e.did)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) Snapshot(interval time.Duration) {
|
||||
@@ -101,8 +126,10 @@ func (ix *LongLivedStreamRecordedService[T]) snapshot() []*StreamRecord[T] {
|
||||
return out
|
||||
}
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Stream) {
|
||||
func (ix *LongLivedStreamRecordedService[T]) HandleHeartbeat(s network.Stream) {
|
||||
logger := oclib.GetLogger()
|
||||
defer s.Close()
|
||||
dec := json.NewDecoder(s)
|
||||
for {
|
||||
ix.StreamMU.Lock()
|
||||
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
||||
@@ -114,17 +141,37 @@ func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Strea
|
||||
streamsAnonym[k] = v
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
|
||||
pid, hb, err := CheckHeartbeat(ix.Host, s, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
|
||||
pid, hb, err := CheckHeartbeat(ix.Host, s, dec, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
|
||||
if err != nil {
|
||||
// Stream-level errors (EOF, reset, closed) mean the connection is gone
|
||||
// — exit so the goroutine doesn't spin forever on a dead stream.
|
||||
// Metric/policy errors (score too low, too many connections) are transient
|
||||
// — those are also stream-terminal since the stream carries one session.
|
||||
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) ||
|
||||
strings.Contains(err.Error(), "reset") ||
|
||||
strings.Contains(err.Error(), "closed") ||
|
||||
strings.Contains(err.Error(), "too many connections") {
|
||||
logger.Info().Err(err).Msg("heartbeat stream terminated, closing handler")
|
||||
return
|
||||
}
|
||||
logger.Warn().Err(err).Msg("heartbeat check failed, retrying on same stream")
|
||||
continue
|
||||
}
|
||||
ix.StreamMU.Lock()
|
||||
// if record already seen update last seen
|
||||
if rec, ok := streams[*pid]; ok {
|
||||
rec.DID = hb.DID
|
||||
if rec.HeartbeatStream == nil {
|
||||
rec.HeartbeatStream = hb.Stream
|
||||
rec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
|
||||
}
|
||||
rec.HeartbeatStream = hb.Stream
|
||||
if rec.HeartbeatStream.UptimeTracker == nil {
|
||||
rec.HeartbeatStream.UptimeTracker = &UptimeTracker{
|
||||
FirstSeen: time.Now().UTC(),
|
||||
LastSeen: time.Now().UTC(),
|
||||
}
|
||||
}
|
||||
logger.Info().Msg("A new node is updated : " + pid.String())
|
||||
} else {
|
||||
hb.Stream.UptimeTracker = &UptimeTracker{
|
||||
FirstSeen: time.Now().UTC(),
|
||||
@@ -134,37 +181,51 @@ func (ix *LongLivedStreamRecordedService[T]) HandleNodeHeartbeat(s network.Strea
|
||||
DID: hb.DID,
|
||||
HeartbeatStream: hb.Stream,
|
||||
}
|
||||
logger.Info().Msg("A new node is subscribed : " + pid.String())
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
// Let the indexer republish the embedded signed record to the DHT.
|
||||
if ix.AfterHeartbeat != nil {
|
||||
ix.AfterHeartbeat(*pid)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func CheckHeartbeat(h host.Host, s network.Stream, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) {
|
||||
func CheckHeartbeat(h host.Host, s network.Stream, dec *json.Decoder, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) {
|
||||
if len(h.Network().Peers()) >= maxNodes {
|
||||
return nil, nil, fmt.Errorf("too many connections, try another indexer")
|
||||
}
|
||||
var hb Heartbeat
|
||||
if err := json.NewDecoder(s).Decode(&hb); err != nil {
|
||||
if err := dec.Decode(&hb); err != nil {
|
||||
return nil, nil, err
|
||||
}
|
||||
if ok, bpms, err := getBandwidthChallengeRate(MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)), s); err != nil {
|
||||
return nil, nil, err
|
||||
} else if !ok {
|
||||
return nil, nil, fmt.Errorf("Not a proper peer")
|
||||
} else {
|
||||
_, bpms, _ := getBandwidthChallengeRate(h, s.Conn().RemotePeer(), MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)))
|
||||
{
|
||||
pid, err := pp.Decode(hb.PeerID)
|
||||
if err != nil {
|
||||
return nil, nil, err
|
||||
}
|
||||
upTime := float64(0)
|
||||
isFirstHeartbeat := true
|
||||
lock.Lock()
|
||||
if rec, ok := streams[pid]; ok && rec.GetUptimeTracker() != nil {
|
||||
upTime = rec.GetUptimeTracker().Uptime().Hours() / float64(time.Since(TimeWatcher).Hours())
|
||||
isFirstHeartbeat = false
|
||||
}
|
||||
lock.Unlock()
|
||||
diversity := getDiversityRate(h, hb.IndexersBinded)
|
||||
fmt.Println(upTime, bpms, diversity)
|
||||
hb.ComputeIndexerScore(upTime, bpms, diversity)
|
||||
if hb.Score < 75 {
|
||||
// First heartbeat: uptime is always 0 so the score ceiling is 60, below the
|
||||
// steady-state threshold of 75. Use a lower admission threshold so new peers
|
||||
// can enter and start accumulating uptime. Subsequent heartbeats must meet
|
||||
// the full threshold once uptime is tracked.
|
||||
minScore := float64(50)
|
||||
if isFirstHeartbeat {
|
||||
minScore = 40
|
||||
}
|
||||
fmt.Println(hb.Score, minScore)
|
||||
if hb.Score < minScore {
|
||||
return nil, nil, errors.New("not enough trusting value")
|
||||
}
|
||||
hb.Stream = &Stream{
|
||||
@@ -178,11 +239,13 @@ func CheckHeartbeat(h host.Host, s network.Stream, streams map[pp.ID]HeartBeatSt
|
||||
}
|
||||
|
||||
func getDiversityRate(h host.Host, peers []string) float64 {
|
||||
|
||||
peers, _ = checkPeers(h, peers)
|
||||
diverse := []string{}
|
||||
for _, p := range peers {
|
||||
ip, err := ExtractIP(p)
|
||||
if err != nil {
|
||||
fmt.Println("NO IP", p, err)
|
||||
continue
|
||||
}
|
||||
div := ip.Mask(net.CIDRMask(24, 32)).String()
|
||||
@@ -190,6 +253,9 @@ func getDiversityRate(h host.Host, peers []string) float64 {
|
||||
diverse = append(diverse, div)
|
||||
}
|
||||
}
|
||||
if len(diverse) == 0 || len(peers) == 0 {
|
||||
return 1
|
||||
}
|
||||
return float64(len(diverse) / len(peers))
|
||||
}
|
||||
|
||||
@@ -211,35 +277,42 @@ func checkPeers(h host.Host, peers []string) ([]string, []string) {
|
||||
return concretePeer, ips
|
||||
}
|
||||
|
||||
const MaxExpectedMbps = 50.0
|
||||
const MaxExpectedMbps = 100.0
|
||||
const MinPayloadChallenge = 512
|
||||
const MaxPayloadChallenge = 2048
|
||||
const BaseRoundTrip = 400 * time.Millisecond
|
||||
|
||||
func getBandwidthChallengeRate(payloadSize int, s network.Stream) (bool, float64, error) {
|
||||
// Génération payload aléatoire
|
||||
// getBandwidthChallengeRate opens a dedicated ProtocolBandwidthProbe stream to
|
||||
// remotePeer, sends a random payload, reads the echo, and computes throughput.
|
||||
// Using a separate stream avoids mixing binary data on the JSON heartbeat stream
|
||||
// and ensures the echo handler is actually running on the remote side.
|
||||
func getBandwidthChallengeRate(h host.Host, remotePeer pp.ID, payloadSize int) (bool, float64, error) {
|
||||
payload := make([]byte, payloadSize)
|
||||
_, err := cr.Read(payload)
|
||||
if _, err := cr.Read(payload); err != nil {
|
||||
return false, 0, err
|
||||
}
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
s, err := h.NewStream(ctx, remotePeer, ProtocolBandwidthProbe)
|
||||
if err != nil {
|
||||
return false, 0, err
|
||||
}
|
||||
defer s.Reset()
|
||||
s.SetDeadline(time.Now().Add(10 * time.Second))
|
||||
start := time.Now()
|
||||
// send on heartbeat stream the challenge
|
||||
if _, err = s.Write(payload); err != nil {
|
||||
return false, 0, err
|
||||
}
|
||||
// read back
|
||||
s.CloseWrite()
|
||||
// Half-close the write side so the handler's io.Copy sees EOF and stops.
|
||||
// Read the echo.
|
||||
response := make([]byte, payloadSize)
|
||||
_, err = io.ReadFull(s, response)
|
||||
if err != nil {
|
||||
if _, err = io.ReadFull(s, response); err != nil {
|
||||
return false, 0, err
|
||||
}
|
||||
|
||||
duration := time.Since(start)
|
||||
// Verify content
|
||||
if !bytes.Equal(payload, response) {
|
||||
return false, 0, nil // pb or a sadge peer.
|
||||
}
|
||||
maxRoundTrip := BaseRoundTrip + (time.Duration(payloadSize) * (100 * time.Millisecond))
|
||||
mbps := float64(payloadSize*8) / duration.Seconds() / 1e6
|
||||
if duration > maxRoundTrip || mbps < 5.0 {
|
||||
@@ -345,13 +418,36 @@ var StaticIndexers map[string]*pp.AddrInfo = map[string]*pp.AddrInfo{}
|
||||
var StreamMuIndexes sync.RWMutex
|
||||
var StreamIndexers ProtocolStream = ProtocolStream{}
|
||||
|
||||
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error {
|
||||
// indexerHeartbeatNudge allows replenishIndexersFromNative to trigger an immediate
|
||||
// heartbeat tick after adding new entries to StaticIndexers, without waiting up
|
||||
// to 20s for the regular ticker. Buffered(1) so the sender never blocks.
|
||||
var indexerHeartbeatNudge = make(chan struct{}, 1)
|
||||
|
||||
// NudgeIndexerHeartbeat signals the indexer heartbeat goroutine to fire immediately.
|
||||
func NudgeIndexerHeartbeat() {
|
||||
select {
|
||||
case indexerHeartbeatNudge <- struct{}{}:
|
||||
default: // nudge already pending, skip
|
||||
}
|
||||
}
|
||||
|
||||
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID, recordFn ...func() json.RawMessage) error {
|
||||
TimeWatcher = time.Now().UTC()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
// If native addresses are configured, bypass static indexer addresses
|
||||
// If native addresses are configured, get the indexer pool from the native mesh,
|
||||
// then start the long-lived heartbeat goroutine toward those indexers.
|
||||
if conf.GetConfig().NativeIndexerAddresses != "" {
|
||||
return ConnectToNatives(h, minIndexer, maxIndexer, myPID)
|
||||
if err := ConnectToNatives(h, minIndexer, maxIndexer, myPID); err != nil {
|
||||
return err
|
||||
}
|
||||
// Step 2: start the long-lived heartbeat goroutine toward the indexer pool.
|
||||
// replaceStaticIndexers/replenishIndexersFromNative update the map in-place
|
||||
// so this single goroutine follows all pool changes automatically.
|
||||
logger.Info().Msg("[native] step 2 — starting long-lived heartbeat to indexer pool")
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name,
|
||||
h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...)
|
||||
return nil
|
||||
}
|
||||
|
||||
addresses := strings.Split(conf.GetConfig().IndexerAddresses, ",")
|
||||
@@ -360,8 +456,8 @@ func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
|
||||
addresses = addresses[0:maxIndexer]
|
||||
}
|
||||
|
||||
StreamMuIndexes.Lock()
|
||||
for _, indexerAddr := range addresses {
|
||||
fmt.Println("GENERATE ADDR", indexerAddr)
|
||||
ad, err := pp.AddrInfoFromString(indexerAddr)
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
@@ -369,15 +465,18 @@ func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
|
||||
}
|
||||
StaticIndexers[indexerAddr] = ad
|
||||
}
|
||||
indexerCount := len(StaticIndexers)
|
||||
StreamMuIndexes.Unlock()
|
||||
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name, h, StreamIndexers, StaticIndexers, 20*time.Second) // your indexer is just like a node for the next indexer.
|
||||
if len(StaticIndexers) < minIndexer {
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name, h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...) // your indexer is just like a node for the next indexer.
|
||||
if indexerCount < minIndexer {
|
||||
return errors.New("you run a node without indexers... your gonna be isolated.")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host, proto protocol.ID, id pp.ID, mypid pp.ID, force bool, onStreamCreated *func(network.Stream)) ProtocolStream {
|
||||
logger := oclib.GetLogger()
|
||||
if onStreamCreated == nil {
|
||||
f := func(s network.Stream) {
|
||||
protoS[proto][id] = &Stream{
|
||||
@@ -400,7 +499,7 @@ func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host,
|
||||
if protoS[proto][id] != nil {
|
||||
protoS[proto][id].Expiry = time.Now().Add(2 * time.Minute)
|
||||
} else {
|
||||
fmt.Println("NEW STREAM", proto, id)
|
||||
logger.Info().Msg("NEW STREAM Generated" + fmt.Sprintf("%v", proto) + " " + id.String())
|
||||
s, err := h.NewStream(*ctx, id, proto)
|
||||
if err != nil {
|
||||
panic(err.Error())
|
||||
@@ -419,12 +518,16 @@ type Heartbeat struct {
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
IndexersBinded []string `json:"indexers_binded"`
|
||||
Score float64
|
||||
// Record carries a fresh signed PeerRecord (JSON) so the receiving indexer
|
||||
// can republish it to the DHT without an extra round-trip.
|
||||
// Only set by nodes (not indexers heartbeating other indexers).
|
||||
Record json.RawMessage `json:"record,omitempty"`
|
||||
}
|
||||
|
||||
func (hb *Heartbeat) ComputeIndexerScore(uptimeHours float64, bpms float64, diversity float64) {
|
||||
hb.Score = (0.4 * uptimeHours) +
|
||||
(0.4 * bpms) +
|
||||
(0.2 * diversity)
|
||||
hb.Score = ((0.3 * uptimeHours) +
|
||||
(0.3 * bpms) +
|
||||
(0.4 * diversity)) * 100
|
||||
}
|
||||
|
||||
type HeartbeatInfo []struct {
|
||||
@@ -433,35 +536,214 @@ type HeartbeatInfo []struct {
|
||||
|
||||
const ProtocolHeartbeat = "/opencloud/heartbeat/1.0"
|
||||
|
||||
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, ps ProtocolStream, peers map[string]*pp.AddrInfo, interval time.Duration) {
|
||||
peerID, err := oclib.GenerateNodeID()
|
||||
if err == nil {
|
||||
panic("can't heartbeat daemon failed to start")
|
||||
// ProtocolBandwidthProbe is a dedicated short-lived stream used exclusively
|
||||
// for bandwidth/latency measurement. The handler echoes any bytes it receives.
|
||||
// All nodes and indexers register this handler so peers can measure them.
|
||||
const ProtocolBandwidthProbe = "/opencloud/probe/1.0"
|
||||
|
||||
// HandleBandwidthProbe echoes back everything written on the stream, then closes.
|
||||
// It is registered by all participants so the measuring side (the heartbeat receiver)
|
||||
// can open a dedicated probe stream and read the round-trip latency + throughput.
|
||||
func HandleBandwidthProbe(s network.Stream) {
|
||||
defer s.Close()
|
||||
s.SetDeadline(time.Now().Add(10 * time.Second))
|
||||
io.Copy(s, s) // echo every byte back to the sender
|
||||
}
|
||||
|
||||
// SendHeartbeat starts a goroutine that sends periodic heartbeats to peers.
|
||||
// recordFn, when provided, is called on each tick and its output is embedded in
|
||||
// the heartbeat as a fresh signed PeerRecord so the receiving indexer can
|
||||
// republish it to the DHT without an extra round-trip.
|
||||
// Pass no recordFn (or nil) for indexer→indexer / native heartbeats.
|
||||
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, ps ProtocolStream, peers map[string]*pp.AddrInfo, mu *sync.RWMutex, interval time.Duration, recordFn ...func() json.RawMessage) {
|
||||
logger := oclib.GetLogger()
|
||||
// isIndexerHB is true when this goroutine drives the indexer heartbeat.
|
||||
// isNativeHB is true when it drives the native heartbeat.
|
||||
isIndexerHB := mu == &StreamMuIndexes
|
||||
isNativeHB := mu == &StreamNativeMu
|
||||
var recFn func() json.RawMessage
|
||||
if len(recordFn) > 0 {
|
||||
recFn = recordFn[0]
|
||||
}
|
||||
go func() {
|
||||
logger.Info().Str("proto", string(proto)).Int("peers", len(peers)).Msg("heartbeat started")
|
||||
t := time.NewTicker(interval)
|
||||
defer t.Stop()
|
||||
for {
|
||||
select {
|
||||
case <-t.C:
|
||||
addrs := []string{}
|
||||
|
||||
// doTick sends one round of heartbeats to the current peer snapshot.
|
||||
doTick := func() {
|
||||
// Build the heartbeat payload — snapshot current indexer addresses.
|
||||
StreamMuIndexes.RLock()
|
||||
addrs := make([]string, 0, len(StaticIndexers))
|
||||
for addr := range StaticIndexers {
|
||||
addrs = append(addrs, addr)
|
||||
}
|
||||
StreamMuIndexes.RUnlock()
|
||||
hb := Heartbeat{
|
||||
Name: name,
|
||||
DID: peerID,
|
||||
PeerID: h.ID().String(),
|
||||
Timestamp: time.Now().UTC().Unix(),
|
||||
IndexersBinded: addrs,
|
||||
}
|
||||
if recFn != nil {
|
||||
hb.Record = recFn()
|
||||
}
|
||||
|
||||
// Snapshot the peer list under a read lock so we don't hold the
|
||||
// write lock during network I/O.
|
||||
if mu != nil {
|
||||
mu.RLock()
|
||||
}
|
||||
snapshot := make([]*pp.AddrInfo, 0, len(peers))
|
||||
for _, ix := range peers {
|
||||
if err = sendHeartbeat(ctx, h, proto, ix, hb, ps, interval*time.Second); err != nil {
|
||||
snapshot = append(snapshot, ix)
|
||||
}
|
||||
if mu != nil {
|
||||
mu.RUnlock()
|
||||
}
|
||||
|
||||
for _, ix := range snapshot {
|
||||
wasConnected := h.Network().Connectedness(ix.ID) == network.Connected
|
||||
if err := sendHeartbeat(ctx, h, proto, ix, hb, ps, interval*time.Second); err != nil {
|
||||
// Step 3: heartbeat failed — remove from pool and trigger replenish.
|
||||
logger.Info().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 3 — heartbeat failed, removing peer from pool")
|
||||
|
||||
// Remove the dead peer and clean up its stream.
|
||||
// mu already covers ps when isIndexerHB (same mutex), so one
|
||||
// lock acquisition is sufficient — no re-entrant double-lock.
|
||||
if mu != nil {
|
||||
mu.Lock()
|
||||
}
|
||||
if ps[proto] != nil {
|
||||
if s, ok := ps[proto][ix.ID]; ok {
|
||||
if s.Stream != nil {
|
||||
s.Stream.Close()
|
||||
}
|
||||
delete(ps[proto], ix.ID)
|
||||
}
|
||||
}
|
||||
lostAddr := ""
|
||||
for addr, ad := range peers {
|
||||
if ad.ID == ix.ID {
|
||||
lostAddr = addr
|
||||
delete(peers, addr)
|
||||
break
|
||||
}
|
||||
}
|
||||
need := conf.GetConfig().MinIndexer - len(peers)
|
||||
remaining := len(peers)
|
||||
if mu != nil {
|
||||
mu.Unlock()
|
||||
}
|
||||
logger.Info().Int("remaining", remaining).Int("min", conf.GetConfig().MinIndexer).Int("need", need).Msg("[native] step 3 — pool state after removal")
|
||||
|
||||
// Step 4: ask the native for the missing indexer count.
|
||||
if isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
|
||||
if need < 1 {
|
||||
need = 1
|
||||
}
|
||||
logger.Info().Int("need", need).Msg("[native] step 3→4 — triggering replenish")
|
||||
go replenishIndexersFromNative(h, need)
|
||||
}
|
||||
|
||||
// Native heartbeat failed — find a replacement native.
|
||||
// Case 1: if the dead native was also serving as an indexer, evict it
|
||||
// from StaticIndexers immediately without waiting for the indexer HB tick.
|
||||
if isNativeHB {
|
||||
logger.Info().Str("addr", lostAddr).Msg("[native] step 3 — native heartbeat failed, triggering native replenish")
|
||||
if lostAddr != "" && conf.GetConfig().NativeIndexerAddresses != "" {
|
||||
StreamMuIndexes.Lock()
|
||||
delete(StreamIndexers[proto], ix.ID)
|
||||
if _, wasIndexer := StaticIndexers[lostAddr]; wasIndexer {
|
||||
delete(StaticIndexers, lostAddr)
|
||||
if s := StreamIndexers[ProtocolHeartbeat]; s != nil {
|
||||
if stream, ok := s[ix.ID]; ok {
|
||||
if stream.Stream != nil {
|
||||
stream.Stream.Close()
|
||||
}
|
||||
delete(s, ix.ID)
|
||||
}
|
||||
}
|
||||
idxNeed := conf.GetConfig().MinIndexer - len(StaticIndexers)
|
||||
StreamMuIndexes.Unlock()
|
||||
if idxNeed < 1 {
|
||||
idxNeed = 1
|
||||
}
|
||||
logger.Info().Str("addr", lostAddr).Msg("[native] dead native evicted from indexer pool, triggering replenish")
|
||||
go replenishIndexersFromNative(h, idxNeed)
|
||||
} else {
|
||||
StreamMuIndexes.Unlock()
|
||||
}
|
||||
}
|
||||
go replenishNativesFromPeers(h, lostAddr, proto)
|
||||
}
|
||||
} else {
|
||||
// Case 2: native-as-indexer reconnected after a restart.
|
||||
// If the peer was disconnected before this tick and the heartbeat just
|
||||
// succeeded (transparent reconnect), the native may have restarted with
|
||||
// blank state (responsiblePeers empty). Evict it from StaticIndexers and
|
||||
// re-request an assignment so the native re-tracks us properly and
|
||||
// runOffloadLoop can eventually migrate us to real indexers.
|
||||
if !wasConnected && isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
|
||||
StreamNativeMu.RLock()
|
||||
isNativeIndexer := false
|
||||
for _, ad := range StaticNatives {
|
||||
if ad.ID == ix.ID {
|
||||
isNativeIndexer = true
|
||||
break
|
||||
}
|
||||
}
|
||||
StreamNativeMu.RUnlock()
|
||||
if isNativeIndexer {
|
||||
if mu != nil {
|
||||
mu.Lock()
|
||||
}
|
||||
if ps[proto] != nil {
|
||||
if s, ok := ps[proto][ix.ID]; ok {
|
||||
if s.Stream != nil {
|
||||
s.Stream.Close()
|
||||
}
|
||||
delete(ps[proto], ix.ID)
|
||||
}
|
||||
}
|
||||
reconnectedAddr := ""
|
||||
for addr, ad := range peers {
|
||||
if ad.ID == ix.ID {
|
||||
reconnectedAddr = addr
|
||||
delete(peers, addr)
|
||||
break
|
||||
}
|
||||
}
|
||||
idxNeed := conf.GetConfig().MinIndexer - len(peers)
|
||||
if mu != nil {
|
||||
mu.Unlock()
|
||||
}
|
||||
if idxNeed < 1 {
|
||||
idxNeed = 1
|
||||
}
|
||||
logger.Info().Str("addr", reconnectedAddr).Str("peer", ix.ID.String()).Msg(
|
||||
"[native] native-as-indexer reconnected after restart — evicting and re-requesting assignment")
|
||||
go replenishIndexersFromNative(h, idxNeed)
|
||||
}
|
||||
}
|
||||
logger.Debug().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 2 — heartbeat sent ok")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-t.C:
|
||||
doTick()
|
||||
case <-indexerHeartbeatNudge:
|
||||
if isIndexerHB {
|
||||
logger.Info().Msg("[native] step 2 — nudge received, heartbeating new indexers immediately")
|
||||
doTick()
|
||||
}
|
||||
case <-nativeHeartbeatNudge:
|
||||
if isNativeHB {
|
||||
logger.Info().Msg("[native] native nudge received, heartbeating replacement native immediately")
|
||||
doTick()
|
||||
}
|
||||
case <-ctx.Done():
|
||||
return
|
||||
}
|
||||
@@ -480,7 +762,7 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
|
||||
if pts[proto] != nil {
|
||||
expiry = pts[proto].TTL
|
||||
}
|
||||
if ctxTTL, err := context.WithTimeout(context.Background(), expiry); err == nil {
|
||||
ctxTTL, _ := context.WithTimeout(context.Background(), expiry)
|
||||
if h.Network().Connectedness(ad.ID) != network.Connected {
|
||||
if err := h.Connect(ctxTTL, ad); err != nil {
|
||||
return streams, err
|
||||
@@ -496,10 +778,11 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
|
||||
mu.Unlock()
|
||||
time.AfterFunc(expiry, func() {
|
||||
mu.Lock()
|
||||
defer mu.Unlock()
|
||||
delete(streams[proto], ad.ID)
|
||||
mu.Unlock()
|
||||
})
|
||||
streams[ProtocolPublish][ad.ID] = &Stream{
|
||||
mu.Lock()
|
||||
streams[proto][ad.ID] = &Stream{
|
||||
DID: did,
|
||||
Stream: s,
|
||||
Expiry: time.Now().UTC().Add(expiry),
|
||||
@@ -509,29 +792,32 @@ func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, stre
|
||||
} else {
|
||||
return streams, err
|
||||
}
|
||||
}
|
||||
return streams, errors.New("can't create a context")
|
||||
}
|
||||
|
||||
func sendHeartbeat(ctx context.Context, h host.Host, proto protocol.ID, p *pp.AddrInfo,
|
||||
hb Heartbeat, ps ProtocolStream, interval time.Duration) error {
|
||||
streams := ps.Get(proto)
|
||||
if len(streams) == 0 {
|
||||
return errors.New("no stream for protocol heartbeat founded")
|
||||
logger := oclib.GetLogger()
|
||||
if ps[proto] == nil {
|
||||
ps[proto] = map[pp.ID]*Stream{}
|
||||
}
|
||||
streams := ps[proto]
|
||||
pss, exists := streams[p.ID]
|
||||
ctxTTL, _ := context.WithTimeout(ctx, 3*interval)
|
||||
ctxTTL, cancel := context.WithTimeout(ctx, 3*interval)
|
||||
defer cancel()
|
||||
// Connect si nécessaire
|
||||
if h.Network().Connectedness(p.ID) != network.Connected {
|
||||
if err := h.Connect(ctxTTL, *p); err != nil {
|
||||
logger.Err(err)
|
||||
return err
|
||||
}
|
||||
exists = false // on devra recréer le stream
|
||||
}
|
||||
// Crée le stream si inexistant ou fermé
|
||||
if !exists || pss.Stream == nil {
|
||||
logger.Info().Msg("New Stream engaged as Heartbeat " + fmt.Sprintf("%v", proto) + " " + p.ID.String())
|
||||
s, err := h.NewStream(ctx, p.ID, proto)
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
return err
|
||||
}
|
||||
pss = &Stream{
|
||||
|
||||
@@ -13,6 +13,7 @@ import (
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
)
|
||||
|
||||
const (
|
||||
@@ -57,6 +58,7 @@ type IndexerRegistration struct {
|
||||
// GetIndexersRequest asks a native for a pool of live indexers.
|
||||
type GetIndexersRequest struct {
|
||||
Count int `json:"count"`
|
||||
From string `json:"from"`
|
||||
}
|
||||
|
||||
// GetIndexersResponse is returned by the native with live indexer multiaddrs.
|
||||
@@ -69,17 +71,26 @@ var StaticNatives = map[string]*pp.AddrInfo{}
|
||||
var StreamNativeMu sync.RWMutex
|
||||
var StreamNatives ProtocolStream = ProtocolStream{}
|
||||
|
||||
// ConnectToNatives is the client-side entry point for nodes/indexers that have
|
||||
// NativeIndexerAddresses configured. It:
|
||||
// 1. Connects (long-lived heartbeat) to all configured natives.
|
||||
// 2. Fetches an initial indexer pool from the FIRST responsive native.
|
||||
// 3. Challenges that pool to ALL natives (consensus round 1).
|
||||
// 4. If the confirmed list is short, samples native suggestions and re-challenges (round 2).
|
||||
// 5. Populates StaticIndexers with majority-confirmed indexers.
|
||||
// nativeHeartbeatOnce ensures we start exactly one long-lived heartbeat goroutine
|
||||
// toward the native mesh, even when ConnectToNatives is called from recovery paths.
|
||||
var nativeHeartbeatOnce sync.Once
|
||||
|
||||
// nativeMeshHeartbeatOnce guards the native-to-native heartbeat goroutine started
|
||||
// by EnsureNativePeers so only one goroutine covers the whole StaticNatives map.
|
||||
var nativeMeshHeartbeatOnce sync.Once
|
||||
|
||||
// ConnectToNatives is the initial setup for nodes/indexers in native mode:
|
||||
// 1. Parses native addresses → StaticNatives.
|
||||
// 2. Starts a single long-lived heartbeat goroutine toward the native mesh.
|
||||
// 3. Fetches an initial indexer pool from the first responsive native.
|
||||
// 4. Runs consensus when real (non-fallback) indexers are returned.
|
||||
// 5. Replaces StaticIndexers with the confirmed pool.
|
||||
func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Msg("[native] step 1 — parsing native addresses")
|
||||
|
||||
// Parse in config order: the first entry is the primary pool source.
|
||||
// Parse native addresses — safe to call multiple times.
|
||||
StreamNativeMu.Lock()
|
||||
orderedAddrs := []string{}
|
||||
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
|
||||
addr = strings.TrimSpace(addr)
|
||||
@@ -88,106 +99,208 @@ func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID)
|
||||
}
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("ConnectToNatives: invalid addr")
|
||||
logger.Err(err).Msg("[native] step 1 — invalid native addr")
|
||||
continue
|
||||
}
|
||||
StaticNatives[addr] = ad
|
||||
orderedAddrs = append(orderedAddrs, addr)
|
||||
logger.Info().Str("addr", addr).Msg("[native] step 1 — native registered")
|
||||
}
|
||||
if len(StaticNatives) == 0 {
|
||||
StreamNativeMu.Unlock()
|
||||
return errors.New("no valid native addresses configured")
|
||||
}
|
||||
StreamNativeMu.Unlock()
|
||||
logger.Info().Int("count", len(orderedAddrs)).Msg("[native] step 1 — natives parsed")
|
||||
|
||||
// Long-lived heartbeat connections to keep the native mesh active.
|
||||
// Step 1: one long-lived heartbeat to each native.
|
||||
nativeHeartbeatOnce.Do(func() {
|
||||
logger.Info().Msg("[native] step 1 — starting long-lived heartbeat to native mesh")
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat,
|
||||
conf.GetConfig().Name, h, StreamNatives, StaticNatives, 20*time.Second)
|
||||
|
||||
// Step 1: get an initial pool from the FIRST responsive native (in config order).
|
||||
var candidates []string
|
||||
var isFallback bool
|
||||
for _, addr := range orderedAddrs {
|
||||
ad := StaticNatives[addr]
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if err := h.Connect(ctx, *ad); err != nil {
|
||||
cancel()
|
||||
continue
|
||||
}
|
||||
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetIndexers)
|
||||
cancel()
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
req := GetIndexersRequest{Count: maxIndexer}
|
||||
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
var resp GetIndexersResponse
|
||||
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
s.Close()
|
||||
candidates = resp.Indexers
|
||||
isFallback = resp.IsSelfFallback
|
||||
break // first responsive native only
|
||||
}
|
||||
conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
|
||||
})
|
||||
|
||||
// Fetch initial pool from the first responsive native.
|
||||
logger.Info().Int("want", maxIndexer).Msg("[native] step 1 — fetching indexer pool from native")
|
||||
candidates, isFallback := fetchIndexersFromNative(h, orderedAddrs, maxIndexer)
|
||||
if len(candidates) == 0 {
|
||||
logger.Warn().Msg("[native] step 1 — no candidates returned by any native")
|
||||
if minIndexer > 0 {
|
||||
return errors.New("ConnectToNatives: no indexers available from any native")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 1 — pool received")
|
||||
|
||||
// If the native is already the fallback indexer, use it directly — no consensus needed.
|
||||
// Step 2: populate StaticIndexers — consensus for real indexers, direct for fallback.
|
||||
pool := resolvePool(h, candidates, isFallback, maxIndexer)
|
||||
replaceStaticIndexers(pool)
|
||||
|
||||
StreamMuIndexes.RLock()
|
||||
indexerCount := len(StaticIndexers)
|
||||
StreamMuIndexes.RUnlock()
|
||||
logger.Info().Int("pool_size", indexerCount).Msg("[native] step 2 — StaticIndexers replaced")
|
||||
|
||||
if minIndexer > 0 && indexerCount < minIndexer {
|
||||
return errors.New("not enough majority-confirmed indexers available")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// replenishIndexersFromNative is called when an indexer heartbeat fails (step 3→4).
|
||||
// It asks the native for exactly `need` replacement indexers, runs consensus when
|
||||
// real indexers are returned, and adds the results to StaticIndexers without
|
||||
// clearing the existing pool.
|
||||
func replenishIndexersFromNative(h host.Host, need int) {
|
||||
if need <= 0 {
|
||||
return
|
||||
}
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Int("need", need).Msg("[native] step 4 — replenishing indexer pool from native")
|
||||
|
||||
StreamNativeMu.RLock()
|
||||
addrs := make([]string, 0, len(StaticNatives))
|
||||
for addr := range StaticNatives {
|
||||
addrs = append(addrs, addr)
|
||||
}
|
||||
StreamNativeMu.RUnlock()
|
||||
|
||||
candidates, isFallback := fetchIndexersFromNative(h, addrs, need)
|
||||
if len(candidates) == 0 {
|
||||
logger.Warn().Msg("[native] step 4 — no candidates returned by any native")
|
||||
return
|
||||
}
|
||||
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 4 — candidates received")
|
||||
|
||||
pool := resolvePool(h, candidates, isFallback, need)
|
||||
if len(pool) == 0 {
|
||||
logger.Warn().Msg("[native] step 4 — consensus yielded no confirmed indexers")
|
||||
return
|
||||
}
|
||||
|
||||
// Add new indexers to the pool — do NOT clear existing ones.
|
||||
StreamMuIndexes.Lock()
|
||||
for addr, ad := range pool {
|
||||
StaticIndexers[addr] = ad
|
||||
}
|
||||
total := len(StaticIndexers)
|
||||
|
||||
StreamMuIndexes.Unlock()
|
||||
logger.Info().Int("added", len(pool)).Int("total", total).Msg("[native] step 4 — pool replenished")
|
||||
|
||||
// Nudge the heartbeat goroutine to connect immediately instead of waiting
|
||||
// for the next 20s tick.
|
||||
NudgeIndexerHeartbeat()
|
||||
logger.Info().Msg("[native] step 4 — heartbeat goroutine nudged")
|
||||
}
|
||||
|
||||
// fetchIndexersFromNative opens a ProtocolNativeGetIndexers stream to the first
|
||||
// responsive native and returns the candidate list and fallback flag.
|
||||
func fetchIndexersFromNative(h host.Host, nativeAddrs []string, count int) (candidates []string, isFallback bool) {
|
||||
logger := oclib.GetLogger()
|
||||
for _, addr := range nativeAddrs {
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
logger.Warn().Str("addr", addr).Msg("[native] fetch — skipping invalid addr")
|
||||
continue
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if err := h.Connect(ctx, *ad); err != nil {
|
||||
cancel()
|
||||
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — connect failed")
|
||||
continue
|
||||
}
|
||||
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetIndexers)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — stream open failed")
|
||||
continue
|
||||
}
|
||||
req := GetIndexersRequest{Count: count, From: h.ID().String()}
|
||||
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||
s.Close()
|
||||
logger.Warn().Str("addr", addr).Err(encErr).Msg("[native] fetch — encode request failed")
|
||||
continue
|
||||
}
|
||||
var resp GetIndexersResponse
|
||||
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||
s.Close()
|
||||
logger.Warn().Str("addr", addr).Err(decErr).Msg("[native] fetch — decode response failed")
|
||||
continue
|
||||
}
|
||||
s.Close()
|
||||
logger.Info().Str("native", addr).Int("indexers", len(resp.Indexers)).Bool("fallback", resp.IsSelfFallback).Msg("[native] fetch — response received")
|
||||
return resp.Indexers, resp.IsSelfFallback
|
||||
}
|
||||
logger.Warn().Msg("[native] fetch — no native responded")
|
||||
return nil, false
|
||||
}
|
||||
|
||||
// resolvePool converts a candidate list to a validated addr→AddrInfo map.
|
||||
// When isFallback is true the native itself is the indexer — no consensus needed.
|
||||
// When isFallback is false, consensus is run before accepting the candidates.
|
||||
func resolvePool(h host.Host, candidates []string, isFallback bool, maxIndexer int) map[string]*pp.AddrInfo {
|
||||
logger := oclib.GetLogger()
|
||||
if isFallback {
|
||||
logger.Info().Strs("addrs", candidates).Msg("[native] resolve — fallback mode, skipping consensus")
|
||||
pool := make(map[string]*pp.AddrInfo, len(candidates))
|
||||
for _, addr := range candidates {
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
StaticIndexers[addr] = ad
|
||||
pool[addr] = ad
|
||||
}
|
||||
return nil
|
||||
return pool
|
||||
}
|
||||
|
||||
// Step 2: challenge the pool to ALL configured natives and score by majority vote.
|
||||
// Round 1.
|
||||
logger.Info().Int("candidates", len(candidates)).Msg("[native] resolve — consensus round 1")
|
||||
confirmed, suggestions := clientSideConsensus(h, candidates)
|
||||
logger.Info().Int("confirmed", len(confirmed)).Int("suggestions", len(suggestions)).Msg("[native] resolve — consensus round 1 done")
|
||||
|
||||
// Step 3: if we still have gaps, sample from suggestions and re-challenge.
|
||||
// Round 2: fill gaps from suggestions if below target.
|
||||
if len(confirmed) < maxIndexer && len(suggestions) > 0 {
|
||||
rand.Shuffle(len(suggestions), func(i, j int) { suggestions[i], suggestions[j] = suggestions[j], suggestions[i] })
|
||||
gap := maxIndexer - len(confirmed)
|
||||
if gap > len(suggestions) {
|
||||
gap = len(suggestions)
|
||||
}
|
||||
logger.Info().Int("gap", gap).Msg("[native] resolve — consensus round 2 (filling gaps)")
|
||||
confirmed2, _ := clientSideConsensus(h, append(confirmed, suggestions[:gap]...))
|
||||
if len(confirmed2) > 0 {
|
||||
confirmed = confirmed2
|
||||
}
|
||||
logger.Info().Int("confirmed", len(confirmed)).Msg("[native] resolve — consensus round 2 done")
|
||||
}
|
||||
|
||||
// Step 4: populate StaticIndexers with confirmed addresses.
|
||||
pool := make(map[string]*pp.AddrInfo, len(confirmed))
|
||||
for _, addr := range confirmed {
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
pool[addr] = ad
|
||||
}
|
||||
logger.Info().Int("pool_size", len(pool)).Msg("[native] resolve — pool ready")
|
||||
return pool
|
||||
}
|
||||
|
||||
// replaceStaticIndexers atomically replaces the active indexer pool.
|
||||
// Peers no longer in next have their heartbeat streams closed so the SendHeartbeat
|
||||
// goroutine stops sending to them on the next tick.
|
||||
func replaceStaticIndexers(next map[string]*pp.AddrInfo) {
|
||||
StreamMuIndexes.Lock()
|
||||
defer StreamMuIndexes.Unlock()
|
||||
for addr, ad := range next {
|
||||
StaticIndexers[addr] = ad
|
||||
}
|
||||
|
||||
if minIndexer > 0 && len(StaticIndexers) < minIndexer {
|
||||
return errors.New("not enough majority-confirmed indexers available")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// clientSideConsensus challenges a candidate list to ALL configured native peers
|
||||
// in parallel. Each native replies with the candidates it trusts plus extras it
|
||||
// recommends. An indexer is confirmed when strictly more than 50% of responding
|
||||
// natives trust it. The remaining addresses from native suggestions are returned
|
||||
// as suggestions for a possible second round.
|
||||
// natives trust it.
|
||||
func clientSideConsensus(h host.Host, candidates []string) (confirmed []string, suggestions []string) {
|
||||
if len(candidates) == 0 {
|
||||
return nil, nil
|
||||
@@ -201,7 +314,6 @@ func clientSideConsensus(h host.Host, candidates []string) (confirmed []string,
|
||||
StreamNativeMu.RUnlock()
|
||||
|
||||
if len(peers) == 0 {
|
||||
// No natives to challenge: trust candidates as-is.
|
||||
return candidates, nil
|
||||
}
|
||||
|
||||
@@ -239,13 +351,12 @@ func clientSideConsensus(h host.Host, candidates []string) (confirmed []string,
|
||||
}(ad)
|
||||
}
|
||||
|
||||
// Collect responses up to consensusCollectTimeout.
|
||||
timer := time.NewTimer(consensusCollectTimeout)
|
||||
defer timer.Stop()
|
||||
|
||||
trustedCounts := map[string]int{}
|
||||
suggestionPool := map[string]struct{}{}
|
||||
total := 0 // counts only natives that actually responded
|
||||
total := 0
|
||||
collected := 0
|
||||
|
||||
collect:
|
||||
@@ -254,7 +365,7 @@ collect:
|
||||
case r := <-ch:
|
||||
collected++
|
||||
if !r.responded {
|
||||
continue // timeout / error: skip, do not count as vote
|
||||
continue
|
||||
}
|
||||
total++
|
||||
seen := map[string]struct{}{}
|
||||
@@ -273,13 +384,12 @@ collect:
|
||||
}
|
||||
|
||||
if total == 0 {
|
||||
// No native responded: fall back to trusting the candidates as-is.
|
||||
return candidates, nil
|
||||
}
|
||||
|
||||
confirmedSet := map[string]struct{}{}
|
||||
for addr, count := range trustedCounts {
|
||||
if count*2 > total { // strictly >50%
|
||||
if count*2 > total {
|
||||
confirmed = append(confirmed, addr)
|
||||
confirmedSet[addr] = struct{}{}
|
||||
}
|
||||
@@ -292,15 +402,17 @@ collect:
|
||||
return
|
||||
}
|
||||
|
||||
const ProtocolIndexerHeartbeat = "/opencloud/heartbeat/indexer/1.0"
|
||||
|
||||
// RegisterWithNative sends a one-shot registration to each configured native indexer.
|
||||
// Should be called periodically every RecommendedHeartbeatInterval.
|
||||
func RegisterWithNative(h host.Host, nativeAddressesStr string) {
|
||||
logger := oclib.GetLogger()
|
||||
myAddr := ""
|
||||
if len(h.Addrs()) > 0 {
|
||||
myAddr = h.Addrs()[0].String() + "/p2p/" + h.ID().String()
|
||||
if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
|
||||
myAddr = h.Addrs()[len(h.Addrs())-1].String() + "/p2p/" + h.ID().String()
|
||||
}
|
||||
if myAddr == "" {
|
||||
logger.Warn().Msg("RegisterWithNative: no routable address yet, skipping")
|
||||
return
|
||||
}
|
||||
reg := IndexerRegistration{
|
||||
PeerID: h.ID().String(),
|
||||
@@ -334,16 +446,16 @@ func RegisterWithNative(h host.Host, nativeAddressesStr string) {
|
||||
}
|
||||
}
|
||||
|
||||
// EnsureNativePeers populates StaticNatives from config and starts heartbeat
|
||||
// connections to other natives. Safe to call multiple times; heartbeat is only
|
||||
// started once (when StaticNatives transitions from empty to non-empty).
|
||||
// EnsureNativePeers populates StaticNatives from config and starts a single
|
||||
// heartbeat goroutine toward the native mesh. Safe to call multiple times;
|
||||
// the heartbeat goroutine is started at most once (nativeMeshHeartbeatOnce).
|
||||
func EnsureNativePeers(h host.Host) {
|
||||
logger := oclib.GetLogger()
|
||||
nativeAddrs := conf.GetConfig().NativeIndexerAddresses
|
||||
if nativeAddrs == "" {
|
||||
return
|
||||
}
|
||||
StreamNativeMu.Lock()
|
||||
wasEmpty := len(StaticNatives) == 0
|
||||
for _, addr := range strings.Split(nativeAddrs, ",") {
|
||||
addr = strings.TrimSpace(addr)
|
||||
if addr == "" {
|
||||
@@ -354,11 +466,312 @@ func EnsureNativePeers(h host.Host) {
|
||||
continue
|
||||
}
|
||||
StaticNatives[addr] = ad
|
||||
logger.Info().Str("addr", addr).Msg("native: registered peer in native mesh")
|
||||
}
|
||||
StreamNativeMu.Unlock()
|
||||
// One heartbeat goroutine iterates over all of StaticNatives on each tick;
|
||||
// starting one per address would multiply heartbeats by the native count.
|
||||
nativeMeshHeartbeatOnce.Do(func() {
|
||||
logger.Info().Msg("native: starting mesh heartbeat goroutine")
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat,
|
||||
conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
|
||||
})
|
||||
}
|
||||
|
||||
if wasEmpty && len(StaticNatives) > 0 {
|
||||
SendHeartbeat(context.Background(), ProtocolIndexerHeartbeat,
|
||||
conf.GetConfig().Name, h, StreamNatives, StaticNatives, 20*time.Second)
|
||||
func StartNativeRegistration(h host.Host, nativeAddressesStr string) {
|
||||
go func() {
|
||||
// Poll until a routable (non-loopback) address is available before the first
|
||||
// registration attempt. libp2p may not have discovered external addresses yet
|
||||
// at startup. Cap at 12 retries (~1 minute) so we don't spin indefinitely.
|
||||
for i := 0; i < 12; i++ {
|
||||
hasRoutable := false
|
||||
if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
|
||||
hasRoutable = true
|
||||
break
|
||||
}
|
||||
|
||||
if hasRoutable {
|
||||
break
|
||||
}
|
||||
time.Sleep(5 * time.Second)
|
||||
}
|
||||
RegisterWithNative(h, nativeAddressesStr)
|
||||
t := time.NewTicker(RecommendedHeartbeatInterval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
RegisterWithNative(h, nativeAddressesStr)
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// ── Lost-native replacement ───────────────────────────────────────────────────
|
||||
|
||||
const (
|
||||
// ProtocolNativeGetPeers lets a node/indexer ask a native for a random
|
||||
// selection of that native's own native contacts (to replace a dead native).
|
||||
ProtocolNativeGetPeers = "/opencloud/native/peers/1.0"
|
||||
// ProtocolIndexerGetNatives lets nodes/indexers ask a connected indexer for
|
||||
// its configured native addresses (fallback when no alive native responds).
|
||||
ProtocolIndexerGetNatives = "/opencloud/indexer/natives/1.0"
|
||||
// retryNativeInterval is how often retryLostNative polls a dead native.
|
||||
retryNativeInterval = 30 * time.Second
|
||||
)
|
||||
|
||||
// GetNativePeersRequest is sent to a native to ask for its known native contacts.
|
||||
type GetNativePeersRequest struct {
|
||||
Exclude []string `json:"exclude"`
|
||||
Count int `json:"count"`
|
||||
}
|
||||
|
||||
// GetNativePeersResponse carries native addresses returned by a native's peer list.
|
||||
type GetNativePeersResponse struct {
|
||||
Peers []string `json:"peers"`
|
||||
}
|
||||
|
||||
// GetIndexerNativesRequest is sent to an indexer to ask for its configured native addresses.
|
||||
type GetIndexerNativesRequest struct {
|
||||
Exclude []string `json:"exclude"`
|
||||
}
|
||||
|
||||
// GetIndexerNativesResponse carries native addresses returned by an indexer.
|
||||
type GetIndexerNativesResponse struct {
|
||||
Natives []string `json:"natives"`
|
||||
}
|
||||
|
||||
// nativeHeartbeatNudge allows replenishNativesFromPeers to trigger an immediate
|
||||
// native heartbeat tick after adding a replacement native to the pool.
|
||||
var nativeHeartbeatNudge = make(chan struct{}, 1)
|
||||
|
||||
// NudgeNativeHeartbeat signals the native heartbeat goroutine to fire immediately.
|
||||
func NudgeNativeHeartbeat() {
|
||||
select {
|
||||
case nativeHeartbeatNudge <- struct{}{}:
|
||||
default: // nudge already pending, skip
|
||||
}
|
||||
}
|
||||
|
||||
// replenishIndexersIfNeeded checks if the indexer pool is below the configured
|
||||
// minimum (or empty) and, if so, asks the native mesh for replacements.
|
||||
// Called whenever a native is recovered so the indexer pool is restored.
|
||||
func replenishIndexersIfNeeded(h host.Host) {
|
||||
logger := oclib.GetLogger()
|
||||
minIdx := conf.GetConfig().MinIndexer
|
||||
if minIdx < 1 {
|
||||
minIdx = 1
|
||||
}
|
||||
StreamMuIndexes.RLock()
|
||||
indexerCount := len(StaticIndexers)
|
||||
StreamMuIndexes.RUnlock()
|
||||
if indexerCount < minIdx {
|
||||
need := minIdx - indexerCount
|
||||
logger.Info().Int("need", need).Int("current", indexerCount).Msg("[native] native recovered — replenishing indexer pool")
|
||||
go replenishIndexersFromNative(h, need)
|
||||
}
|
||||
}
|
||||
|
||||
// replenishNativesFromPeers is called when the heartbeat to a native fails.
|
||||
// Flow:
|
||||
// 1. Ask other alive natives for one of their native contacts (ProtocolNativeGetPeers).
|
||||
// 2. If none respond or return a new address, ask connected indexers (ProtocolIndexerGetNatives).
|
||||
// 3. If no replacement found:
|
||||
// - remaining > 1 → ignore (enough natives remain).
|
||||
// - remaining ≤ 1 → start periodic retry (retryLostNative).
|
||||
func replenishNativesFromPeers(h host.Host, lostAddr string, proto protocol.ID) {
|
||||
if lostAddr == "" {
|
||||
return
|
||||
}
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Str("lost", lostAddr).Msg("[native] replenish natives — start")
|
||||
|
||||
// Build exclude list: the lost addr + all currently alive natives.
|
||||
// lostAddr has already been removed from StaticNatives by doTick.
|
||||
StreamNativeMu.RLock()
|
||||
remaining := len(StaticNatives)
|
||||
exclude := make([]string, 0, remaining+1)
|
||||
exclude = append(exclude, lostAddr)
|
||||
for addr := range StaticNatives {
|
||||
exclude = append(exclude, addr)
|
||||
}
|
||||
StreamNativeMu.RUnlock()
|
||||
|
||||
logger.Info().Int("remaining", remaining).Msg("[native] replenish natives — step 1: ask alive natives for a peer")
|
||||
|
||||
// Step 1: ask other alive natives for a replacement.
|
||||
newAddr := fetchNativeFromNatives(h, exclude)
|
||||
|
||||
// Step 2: fallback — ask connected indexers for their native addresses.
|
||||
if newAddr == "" {
|
||||
logger.Info().Msg("[native] replenish natives — step 2: ask indexers for their native addresses")
|
||||
newAddr = fetchNativeFromIndexers(h, exclude)
|
||||
}
|
||||
|
||||
if newAddr != "" {
|
||||
ad, err := pp.AddrInfoFromString(newAddr)
|
||||
if err == nil {
|
||||
StreamNativeMu.Lock()
|
||||
StaticNatives[newAddr] = ad
|
||||
StreamNativeMu.Unlock()
|
||||
logger.Info().Str("new", newAddr).Msg("[native] replenish natives — replacement added, nudging heartbeat")
|
||||
NudgeNativeHeartbeat()
|
||||
replenishIndexersIfNeeded(h)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
// Step 3: no replacement found.
|
||||
logger.Warn().Int("remaining", remaining).Msg("[native] replenish natives — no replacement found")
|
||||
if remaining > 1 {
|
||||
logger.Info().Msg("[native] replenish natives — enough natives remain, ignoring loss")
|
||||
return
|
||||
}
|
||||
// Last (or only) native — retry periodically.
|
||||
logger.Info().Str("addr", lostAddr).Msg("[native] replenish natives — last native lost, starting periodic retry")
|
||||
go retryLostNative(h, lostAddr, proto)
|
||||
}
|
||||
|
||||
// fetchNativeFromNatives asks each alive native for one of its own native contacts
|
||||
// not in exclude. Returns the first new address found or "" if none.
|
||||
func fetchNativeFromNatives(h host.Host, exclude []string) string {
|
||||
logger := oclib.GetLogger()
|
||||
excludeSet := make(map[string]struct{}, len(exclude))
|
||||
for _, e := range exclude {
|
||||
excludeSet[e] = struct{}{}
|
||||
}
|
||||
|
||||
StreamNativeMu.RLock()
|
||||
natives := make([]*pp.AddrInfo, 0, len(StaticNatives))
|
||||
for _, ad := range StaticNatives {
|
||||
natives = append(natives, ad)
|
||||
}
|
||||
StreamNativeMu.RUnlock()
|
||||
|
||||
rand.Shuffle(len(natives), func(i, j int) { natives[i], natives[j] = natives[j], natives[i] })
|
||||
|
||||
for _, ad := range natives {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if err := h.Connect(ctx, *ad); err != nil {
|
||||
cancel()
|
||||
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — connect failed")
|
||||
continue
|
||||
}
|
||||
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetPeers)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — stream failed")
|
||||
continue
|
||||
}
|
||||
req := GetNativePeersRequest{Exclude: exclude, Count: 1}
|
||||
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
var resp GetNativePeersResponse
|
||||
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
s.Close()
|
||||
for _, peer := range resp.Peers {
|
||||
if _, excluded := excludeSet[peer]; !excluded && peer != "" {
|
||||
logger.Info().Str("from", ad.ID.String()).Str("new", peer).Msg("[native] fetch native peers — got replacement")
|
||||
return peer
|
||||
}
|
||||
}
|
||||
logger.Debug().Str("native", ad.ID.String()).Msg("[native] fetch native peers — no new native from this peer")
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// fetchNativeFromIndexers asks connected indexers for their configured native addresses,
|
||||
// returning the first one not in exclude.
|
||||
func fetchNativeFromIndexers(h host.Host, exclude []string) string {
|
||||
logger := oclib.GetLogger()
|
||||
excludeSet := make(map[string]struct{}, len(exclude))
|
||||
for _, e := range exclude {
|
||||
excludeSet[e] = struct{}{}
|
||||
}
|
||||
|
||||
StreamMuIndexes.RLock()
|
||||
indexers := make([]*pp.AddrInfo, 0, len(StaticIndexers))
|
||||
for _, ad := range StaticIndexers {
|
||||
indexers = append(indexers, ad)
|
||||
}
|
||||
StreamMuIndexes.RUnlock()
|
||||
|
||||
rand.Shuffle(len(indexers), func(i, j int) { indexers[i], indexers[j] = indexers[j], indexers[i] })
|
||||
|
||||
for _, ad := range indexers {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if err := h.Connect(ctx, *ad); err != nil {
|
||||
cancel()
|
||||
continue
|
||||
}
|
||||
s, err := h.NewStream(ctx, ad.ID, ProtocolIndexerGetNatives)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn().Str("indexer", ad.ID.String()).Err(err).Msg("[native] fetch indexer natives — stream failed")
|
||||
continue
|
||||
}
|
||||
req := GetIndexerNativesRequest{Exclude: exclude}
|
||||
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
var resp GetIndexerNativesResponse
|
||||
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
s.Close()
|
||||
for _, nativeAddr := range resp.Natives {
|
||||
if _, excluded := excludeSet[nativeAddr]; !excluded && nativeAddr != "" {
|
||||
logger.Info().Str("indexer", ad.ID.String()).Str("native", nativeAddr).Msg("[native] fetch indexer natives — got native")
|
||||
return nativeAddr
|
||||
}
|
||||
}
|
||||
}
|
||||
logger.Warn().Msg("[native] fetch indexer natives — no native found from indexers")
|
||||
return ""
|
||||
}
|
||||
|
||||
// retryLostNative periodically retries connecting to a lost native address until
|
||||
// it becomes reachable again or was already restored by another path.
|
||||
func retryLostNative(h host.Host, addr string, nativeProto protocol.ID) {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Str("addr", addr).Msg("[native] retry — periodic retry for lost native started")
|
||||
t := time.NewTicker(retryNativeInterval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
StreamNativeMu.RLock()
|
||||
_, alreadyRestored := StaticNatives[addr]
|
||||
StreamNativeMu.RUnlock()
|
||||
if alreadyRestored {
|
||||
logger.Info().Str("addr", addr).Msg("[native] retry — native already restored, stopping retry")
|
||||
return
|
||||
}
|
||||
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
logger.Warn().Str("addr", addr).Msg("[native] retry — invalid addr, stopping retry")
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
err = h.Connect(ctx, *ad)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn().Str("addr", addr).Msg("[native] retry — still unreachable")
|
||||
continue
|
||||
}
|
||||
// Reachable again — add back to pool.
|
||||
StreamNativeMu.Lock()
|
||||
StaticNatives[addr] = ad
|
||||
StreamNativeMu.Unlock()
|
||||
logger.Info().Str("addr", addr).Msg("[native] retry — native reconnected and added back to pool")
|
||||
NudgeNativeHeartbeat()
|
||||
replenishIndexersIfNeeded(h)
|
||||
if nativeProto == ProtocolNativeGetIndexers {
|
||||
StartNativeRegistration(h, addr) // register back
|
||||
}
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
@@ -24,17 +24,16 @@ func ExtractIP(addr string) (net.IP, error) {
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
ips, err := ma.ValueForProtocol(multiaddr.P_IP4) // or P_IP6
|
||||
ipStr, err := ma.ValueForProtocol(multiaddr.P_IP4)
|
||||
if err != nil {
|
||||
ipStr, err = ma.ValueForProtocol(multiaddr.P_IP6)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
host, _, err := net.SplitHostPort(ips)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
ip := net.ParseIP(host)
|
||||
ip := net.ParseIP(ipStr)
|
||||
if ip == nil {
|
||||
return nil, fmt.Errorf("invalid IP: %s", host)
|
||||
return nil, fmt.Errorf("invalid IP: %s", ipStr)
|
||||
}
|
||||
return ip, nil
|
||||
}
|
||||
|
||||
@@ -5,8 +5,9 @@ import (
|
||||
"encoding/base64"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"oc-discovery/conf"
|
||||
"oc-discovery/daemons/node/common"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
@@ -18,17 +19,21 @@ import (
|
||||
"github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
type PeerRecord struct {
|
||||
type PeerRecordPayload struct {
|
||||
Name string `json:"name"`
|
||||
DID string `json:"did"` // real PEER ID
|
||||
PeerID string `json:"peer_id"`
|
||||
DID string `json:"did"`
|
||||
PubKey []byte `json:"pub_key"`
|
||||
ExpiryDate time.Time `json:"expiry_date"`
|
||||
}
|
||||
|
||||
type PeerRecord struct {
|
||||
PeerRecordPayload
|
||||
PeerID string `json:"peer_id"`
|
||||
APIUrl string `json:"api_url"`
|
||||
StreamAddress string `json:"stream_address"`
|
||||
NATSAddress string `json:"nats_address"`
|
||||
WalletAddress string `json:"wallet_address"`
|
||||
Signature []byte `json:"signature"`
|
||||
ExpiryDate time.Time `json:"expiry_date"`
|
||||
}
|
||||
|
||||
func (p *PeerRecord) Sign() error {
|
||||
@@ -36,13 +41,7 @@ func (p *PeerRecord) Sign() error {
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
dht := PeerRecord{
|
||||
Name: p.Name,
|
||||
DID: p.DID,
|
||||
PubKey: p.PubKey,
|
||||
ExpiryDate: p.ExpiryDate,
|
||||
}
|
||||
payload, _ := json.Marshal(dht)
|
||||
payload, _ := json.Marshal(p.PeerRecordPayload)
|
||||
b, err := common.Sign(priv, payload)
|
||||
p.Signature = b
|
||||
return err
|
||||
@@ -51,19 +50,11 @@ func (p *PeerRecord) Sign() error {
|
||||
func (p *PeerRecord) Verify() (crypto.PubKey, error) {
|
||||
pubKey, err := crypto.UnmarshalPublicKey(p.PubKey) // retrieve pub key in message
|
||||
if err != nil {
|
||||
fmt.Println("UnmarshalPublicKey")
|
||||
return pubKey, err
|
||||
}
|
||||
dht := PeerRecord{
|
||||
Name: p.Name,
|
||||
DID: p.DID,
|
||||
PubKey: p.PubKey,
|
||||
ExpiryDate: p.ExpiryDate,
|
||||
}
|
||||
payload, _ := json.Marshal(dht)
|
||||
payload, _ := json.Marshal(p.PeerRecordPayload)
|
||||
|
||||
if ok, _ := common.Verify(pubKey, payload, p.Signature); !ok { // verify minimal message was sign per pubKey
|
||||
fmt.Println("Verify")
|
||||
if ok, _ := pubKey.Verify(payload, p.Signature); !ok { // verify minimal message was sign per pubKey
|
||||
return pubKey, errors.New("invalid signature")
|
||||
}
|
||||
return pubKey, nil
|
||||
@@ -114,6 +105,8 @@ func (pr *PeerRecord) ExtractPeer(ourkey string, key string, pubKey crypto.PubKe
|
||||
type GetValue struct {
|
||||
Key string `json:"key"`
|
||||
PeerID peer.ID `json:"peer_id"`
|
||||
Name string `json:"name,omitempty"`
|
||||
Search bool `json:"search,omitempty"`
|
||||
}
|
||||
|
||||
type GetResponse struct {
|
||||
@@ -125,122 +118,233 @@ func (ix *IndexerService) genKey(did string) string {
|
||||
return "/node/" + did
|
||||
}
|
||||
|
||||
func (ix *IndexerService) genNameKey(name string) string {
|
||||
return "/name/" + name
|
||||
}
|
||||
|
||||
func (ix *IndexerService) genPIDKey(peerID string) string {
|
||||
return "/pid/" + peerID
|
||||
}
|
||||
|
||||
func (ix *IndexerService) initNodeHandler() {
|
||||
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleNodeHeartbeat)
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Msg("Init Node Handler")
|
||||
// Each heartbeat from a node carries a freshly signed PeerRecord.
|
||||
// Republish it to the DHT so the record never expires as long as the node
|
||||
// is alive — no separate publish stream needed from the node side.
|
||||
ix.AfterHeartbeat = func(pid peer.ID) {
|
||||
ctx1, cancel1 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel1()
|
||||
res, err := ix.DHT.GetValue(ctx1, ix.genPIDKey(pid.String()))
|
||||
if err != nil {
|
||||
logger.Warn().Err(err)
|
||||
return
|
||||
}
|
||||
did := string(res)
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel2()
|
||||
res, err = ix.DHT.GetValue(ctx2, ix.genKey(did))
|
||||
if err != nil {
|
||||
logger.Warn().Err(err)
|
||||
return
|
||||
}
|
||||
var rec PeerRecord
|
||||
if err := json.Unmarshal(res, &rec); err != nil {
|
||||
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record unmarshal failed")
|
||||
return
|
||||
}
|
||||
if _, err := rec.Verify(); err != nil {
|
||||
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record signature invalid")
|
||||
return
|
||||
}
|
||||
data, err := json.Marshal(rec)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
logger.Info().Msg("REFRESH PutValue " + ix.genKey(rec.DID))
|
||||
if err := ix.DHT.PutValue(ctx, ix.genKey(rec.DID), data); err != nil {
|
||||
logger.Warn().Err(err).Str("did", rec.DID).Msg("indexer: DHT refresh failed")
|
||||
return
|
||||
}
|
||||
if rec.Name != "" {
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID))
|
||||
cancel2()
|
||||
}
|
||||
if rec.PeerID != "" {
|
||||
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID))
|
||||
cancel3()
|
||||
}
|
||||
}
|
||||
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat)
|
||||
ix.Host.SetStreamHandler(common.ProtocolPublish, ix.handleNodePublish)
|
||||
ix.Host.SetStreamHandler(common.ProtocolGet, ix.handleNodeGet)
|
||||
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
|
||||
}
|
||||
|
||||
func (ix *IndexerService) handleNodePublish(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
for {
|
||||
|
||||
var rec PeerRecord
|
||||
if err := json.NewDecoder(s).Decode(&rec); err != nil {
|
||||
logger.Err(err)
|
||||
continue
|
||||
return
|
||||
}
|
||||
rec2 := PeerRecord{
|
||||
Name: rec.Name,
|
||||
DID: rec.DID, // REAL PEER ID
|
||||
PubKey: rec.PubKey,
|
||||
PeerID: rec.PeerID,
|
||||
}
|
||||
if _, err := rec2.Verify(); err != nil {
|
||||
if _, err := rec.Verify(); err != nil {
|
||||
logger.Err(err)
|
||||
continue
|
||||
return
|
||||
}
|
||||
if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) { // already expired
|
||||
if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) {
|
||||
logger.Err(errors.New(rec.PeerID + " is expired."))
|
||||
continue
|
||||
return
|
||||
}
|
||||
pid, err := peer.Decode(rec.PeerID)
|
||||
if err != nil {
|
||||
continue
|
||||
return
|
||||
}
|
||||
|
||||
ix.StreamMU.Lock()
|
||||
|
||||
defer ix.StreamMU.Unlock()
|
||||
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
|
||||
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
|
||||
}
|
||||
streams := ix.StreamRecords[common.ProtocolHeartbeat]
|
||||
|
||||
if srec, ok := streams[pid]; ok {
|
||||
srec.DID = rec.DID
|
||||
srec.Record = rec
|
||||
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
|
||||
} else {
|
||||
ix.StreamMU.Unlock()
|
||||
logger.Err(errors.New("no heartbeat"))
|
||||
continue
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
|
||||
key := ix.genKey(rec.DID)
|
||||
|
||||
data, err := json.Marshal(rec)
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
continue
|
||||
return
|
||||
}
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
||||
logger.Err(err)
|
||||
cancel()
|
||||
continue
|
||||
return
|
||||
}
|
||||
cancel()
|
||||
break // response... so quit
|
||||
|
||||
// Secondary index: /name/<name> → DID, so peers can resolve by human-readable name.
|
||||
if rec.Name != "" {
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID)); err != nil {
|
||||
logger.Err(err).Str("name", rec.Name).Msg("indexer: failed to write name index")
|
||||
}
|
||||
cancel2()
|
||||
}
|
||||
// Secondary index: /pid/<peerID> → DID, so peers can resolve by libp2p PeerID.
|
||||
if rec.PeerID != "" {
|
||||
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID)); err != nil {
|
||||
logger.Err(err).Str("pid", rec.PeerID).Msg("indexer: failed to write pid index")
|
||||
}
|
||||
cancel3()
|
||||
}
|
||||
}
|
||||
|
||||
func (ix *IndexerService) handleNodeGet(s network.Stream) {
|
||||
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
for {
|
||||
|
||||
var req GetValue
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
logger.Err(err)
|
||||
continue
|
||||
return
|
||||
}
|
||||
ix.StreamMU.Lock()
|
||||
|
||||
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
|
||||
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
|
||||
}
|
||||
resp := GetResponse{
|
||||
Found: false,
|
||||
Records: map[string]PeerRecord{},
|
||||
}
|
||||
streams := ix.StreamRecords[common.ProtocolHeartbeat]
|
||||
resp := GetResponse{Found: false, Records: map[string]PeerRecord{}}
|
||||
|
||||
key := ix.genKey(req.Key)
|
||||
// simple lookup by PeerID (or DID)
|
||||
keys := []string{}
|
||||
// Name substring search — scan in-memory connected nodes first, then DHT exact match.
|
||||
if req.Name != "" {
|
||||
if req.Search {
|
||||
for _, did := range ix.LookupNameIndex(strings.ToLower(req.Name)) {
|
||||
keys = append(keys, did)
|
||||
}
|
||||
} else {
|
||||
// 2. DHT exact-name lookup: covers nodes that published but aren't currently connected.
|
||||
nameCtx, nameCancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if ch, err := ix.DHT.SearchValue(nameCtx, ix.genNameKey(req.Name)); err == nil {
|
||||
for did := range ch {
|
||||
keys = append(keys, string(did))
|
||||
break
|
||||
}
|
||||
}
|
||||
nameCancel()
|
||||
}
|
||||
} else if req.PeerID != "" {
|
||||
pidCtx, pidCancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if did, err := ix.DHT.GetValue(pidCtx, ix.genPIDKey(req.PeerID.String())); err == nil {
|
||||
keys = append(keys, string(did))
|
||||
}
|
||||
pidCancel()
|
||||
} else {
|
||||
keys = append(keys, req.Key)
|
||||
}
|
||||
|
||||
// DHT record fetch by DID key (covers exact-name and PeerID paths).
|
||||
if len(keys) > 0 {
|
||||
for _, k := range keys {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
recBytes, err := ix.DHT.SearchValue(ctx, key)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT")
|
||||
c, err := ix.DHT.GetValue(ctx, ix.genKey(k))
|
||||
cancel()
|
||||
}
|
||||
cancel()
|
||||
for c := range recBytes {
|
||||
if err == nil {
|
||||
var rec PeerRecord
|
||||
if err := json.Unmarshal(c, &rec); err != nil || rec.PeerID != req.PeerID.String() {
|
||||
if json.Unmarshal(c, &rec) == nil {
|
||||
// Filter by PeerID only when one was explicitly specified.
|
||||
if req.PeerID == "" || rec.PeerID == req.PeerID.String() {
|
||||
resp.Records[rec.PeerID] = rec
|
||||
}
|
||||
}
|
||||
} else if req.Name == "" && req.PeerID == "" {
|
||||
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT " + req.Key)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resp.Found = len(resp.Records) > 0
|
||||
_ = json.NewEncoder(s).Encode(resp)
|
||||
}
|
||||
|
||||
// handleGetNatives returns this indexer's configured native addresses,
|
||||
// excluding any in the request's Exclude list.
|
||||
func (ix *IndexerService) handleGetNatives(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
var req common.GetIndexerNativesRequest
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
logger.Err(err).Msg("indexer get natives: decode")
|
||||
return
|
||||
}
|
||||
|
||||
excludeSet := make(map[string]struct{}, len(req.Exclude))
|
||||
for _, e := range req.Exclude {
|
||||
excludeSet[e] = struct{}{}
|
||||
}
|
||||
|
||||
resp := common.GetIndexerNativesResponse{}
|
||||
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
|
||||
addr = strings.TrimSpace(addr)
|
||||
if addr == "" {
|
||||
continue
|
||||
}
|
||||
resp.Found = true
|
||||
resp.Records[rec.PeerID] = rec
|
||||
if srec, ok := streams[req.PeerID]; ok {
|
||||
srec.DID = rec.DID
|
||||
srec.Record = rec
|
||||
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
|
||||
if _, excluded := excludeSet[addr]; !excluded {
|
||||
resp.Natives = append(resp.Natives, addr)
|
||||
}
|
||||
}
|
||||
// Not found
|
||||
_ = json.NewEncoder(s).Encode(resp)
|
||||
ix.StreamMU.Unlock()
|
||||
break // response... so quit
|
||||
|
||||
if err := json.NewEncoder(s).Encode(resp); err != nil {
|
||||
logger.Err(err).Msg("indexer get natives: encode response")
|
||||
}
|
||||
}
|
||||
|
||||
168
daemons/node/indexer/nameindex.go
Normal file
168
daemons/node/indexer/nameindex.go
Normal file
@@ -0,0 +1,168 @@
|
||||
package indexer
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"oc-discovery/daemons/node/common"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
// TopicNameIndex is the GossipSub topic shared by regular indexers to exchange
|
||||
// add/delete events for the distributed name→peerID mapping.
|
||||
const TopicNameIndex = "oc-name-index"
|
||||
|
||||
// nameIndexDedupWindow suppresses re-emission of the same (action, name, peerID)
|
||||
// tuple within this window, reducing duplicate events when a node is registered
|
||||
// with multiple indexers simultaneously.
|
||||
const nameIndexDedupWindow = 30 * time.Second
|
||||
|
||||
// NameIndexAction indicates whether a name mapping is being added or removed.
|
||||
type NameIndexAction string
|
||||
|
||||
const (
|
||||
NameIndexAdd NameIndexAction = "add"
|
||||
NameIndexDelete NameIndexAction = "delete"
|
||||
)
|
||||
|
||||
// NameIndexEvent is published on TopicNameIndex by each indexer when a node
|
||||
// registers (add) or is evicted by the GC (delete).
|
||||
type NameIndexEvent struct {
|
||||
Action NameIndexAction `json:"action"`
|
||||
Name string `json:"name"`
|
||||
PeerID string `json:"peer_id"`
|
||||
DID string `json:"did"`
|
||||
}
|
||||
|
||||
// nameIndexState holds the local in-memory name index and the sender-side
|
||||
// deduplication tracker.
|
||||
type nameIndexState struct {
|
||||
// index: name → peerID → DID, built from events received from all indexers.
|
||||
index map[string]map[string]string
|
||||
indexMu sync.RWMutex
|
||||
|
||||
// emitted tracks the last emission time for each (action, name, peerID) key
|
||||
// to suppress duplicates within nameIndexDedupWindow.
|
||||
emitted map[string]time.Time
|
||||
emittedMu sync.Mutex
|
||||
}
|
||||
|
||||
// shouldEmit returns true if the (action, name, peerID) tuple has not been
|
||||
// emitted within nameIndexDedupWindow, updating the tracker if so.
|
||||
func (s *nameIndexState) shouldEmit(action NameIndexAction, name, peerID string) bool {
|
||||
key := string(action) + ":" + name + ":" + peerID
|
||||
s.emittedMu.Lock()
|
||||
defer s.emittedMu.Unlock()
|
||||
if t, ok := s.emitted[key]; ok && time.Since(t) < nameIndexDedupWindow {
|
||||
return false
|
||||
}
|
||||
s.emitted[key] = time.Now()
|
||||
return true
|
||||
}
|
||||
|
||||
// onEvent applies a received NameIndexEvent to the local index.
|
||||
// "add" inserts/updates the mapping; "delete" removes it.
|
||||
// Operations are idempotent — duplicate events from multiple indexers are harmless.
|
||||
func (s *nameIndexState) onEvent(evt NameIndexEvent) {
|
||||
if evt.Name == "" || evt.PeerID == "" {
|
||||
return
|
||||
}
|
||||
s.indexMu.Lock()
|
||||
defer s.indexMu.Unlock()
|
||||
switch evt.Action {
|
||||
case NameIndexAdd:
|
||||
if s.index[evt.Name] == nil {
|
||||
s.index[evt.Name] = map[string]string{}
|
||||
}
|
||||
s.index[evt.Name][evt.PeerID] = evt.DID
|
||||
case NameIndexDelete:
|
||||
if s.index[evt.Name] != nil {
|
||||
delete(s.index[evt.Name], evt.PeerID)
|
||||
if len(s.index[evt.Name]) == 0 {
|
||||
delete(s.index, evt.Name)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// initNameIndex joins TopicNameIndex and starts consuming events.
|
||||
// Must be called after ix.PS is ready.
|
||||
func (ix *IndexerService) initNameIndex(ps *pubsub.PubSub) {
|
||||
logger := oclib.GetLogger()
|
||||
ix.nameIndex = &nameIndexState{
|
||||
index: map[string]map[string]string{},
|
||||
emitted: map[string]time.Time{},
|
||||
}
|
||||
|
||||
ps.RegisterTopicValidator(TopicNameIndex, func(_ context.Context, _ pp.ID, _ *pubsub.Message) bool {
|
||||
return true
|
||||
})
|
||||
topic, err := ps.Join(TopicNameIndex)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("name index: failed to join topic")
|
||||
return
|
||||
}
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Lock()
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex] = topic
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Unlock()
|
||||
|
||||
common.SubscribeEvents(
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService,
|
||||
context.Background(),
|
||||
TopicNameIndex,
|
||||
-1,
|
||||
func(_ context.Context, evt NameIndexEvent, _ string) {
|
||||
ix.nameIndex.onEvent(evt)
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
// publishNameEvent emits a NameIndexEvent on TopicNameIndex, subject to the
|
||||
// sender-side deduplication window.
|
||||
func (ix *IndexerService) publishNameEvent(action NameIndexAction, name, peerID, did string) {
|
||||
if ix.nameIndex == nil || name == "" || peerID == "" {
|
||||
return
|
||||
}
|
||||
if !ix.nameIndex.shouldEmit(action, name, peerID) {
|
||||
return
|
||||
}
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RLock()
|
||||
topic := ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex]
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RUnlock()
|
||||
if topic == nil {
|
||||
return
|
||||
}
|
||||
evt := NameIndexEvent{Action: action, Name: name, PeerID: peerID, DID: did}
|
||||
b, err := json.Marshal(evt)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
_ = topic.Publish(context.Background(), b)
|
||||
}
|
||||
|
||||
// LookupNameIndex searches the distributed name index for peers whose name
|
||||
// contains needle (case-insensitive). Returns peerID → DID for matched peers.
|
||||
// Returns nil if the name index is not initialised (e.g. native indexers).
|
||||
func (ix *IndexerService) LookupNameIndex(needle string) map[string]string {
|
||||
if ix.nameIndex == nil {
|
||||
return nil
|
||||
}
|
||||
result := map[string]string{}
|
||||
needleLow := strings.ToLower(needle)
|
||||
ix.nameIndex.indexMu.RLock()
|
||||
defer ix.nameIndex.indexMu.RUnlock()
|
||||
for name, peers := range ix.nameIndex.index {
|
||||
if strings.Contains(strings.ToLower(name), needleLow) {
|
||||
for peerID, did := range peers {
|
||||
result[peerID] = did
|
||||
}
|
||||
}
|
||||
}
|
||||
return result
|
||||
}
|
||||
@@ -4,7 +4,10 @@ import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"math/rand"
|
||||
"slices"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
@@ -12,19 +15,24 @@ import (
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
const (
|
||||
// IndexerTTL is 10% above the recommended 60s heartbeat interval.
|
||||
IndexerTTL = 66 * time.Second
|
||||
// IndexerTTL is the lifetime of a live-indexer cache entry. Set to 50% above
|
||||
// the recommended 60s heartbeat interval so a single delayed renewal does not
|
||||
// evict a healthy indexer from the native's cache.
|
||||
IndexerTTL = 90 * time.Second
|
||||
// offloadInterval is how often the native checks if it can release responsible peers.
|
||||
offloadInterval = 30 * time.Second
|
||||
// dhtRefreshInterval is how often the background goroutine queries the DHT for
|
||||
// known-but-expired indexer entries (written by neighbouring natives).
|
||||
dhtRefreshInterval = 30 * time.Second
|
||||
// maxFallbackPeers caps how many peers the native will accept in self-delegation
|
||||
// mode. Beyond this limit the native refuses to act as a fallback indexer so it
|
||||
// is not overwhelmed during prolonged indexer outages.
|
||||
maxFallbackPeers = 50
|
||||
)
|
||||
|
||||
// liveIndexerEntry tracks a registered indexer in the native's in-memory cache and DHT.
|
||||
@@ -43,7 +51,7 @@ type NativeState struct {
|
||||
// knownPeerIDs accumulates all indexer PeerIDs ever seen (local stream or gossip).
|
||||
// Used by refreshIndexersFromDHT to re-hydrate expired entries from the shared DHT,
|
||||
// including entries written by other natives.
|
||||
knownPeerIDs map[string]struct{}
|
||||
knownPeerIDs map[string]string
|
||||
knownMu sync.RWMutex
|
||||
}
|
||||
|
||||
@@ -51,7 +59,7 @@ func newNativeState() *NativeState {
|
||||
return &NativeState{
|
||||
liveIndexers: map[string]*liveIndexerEntry{},
|
||||
responsiblePeers: map[pp.ID]struct{}{},
|
||||
knownPeerIDs: map[string]struct{}{},
|
||||
knownPeerIDs: map[string]string{},
|
||||
}
|
||||
}
|
||||
|
||||
@@ -92,10 +100,12 @@ func (v IndexerRecordValidator) Select(_ string, values [][]byte) (int, error) {
|
||||
// Must be called after DHT is initialized.
|
||||
func (ix *IndexerService) InitNative() {
|
||||
ix.Native = newNativeState()
|
||||
ix.Host.SetStreamHandler(common.ProtocolIndexerHeartbeat, ix.HandleNodeHeartbeat) // specific heartbeat for Indexer.
|
||||
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat) // specific heartbeat for Indexer.
|
||||
ix.Host.SetStreamHandler(common.ProtocolNativeSubscription, ix.handleNativeSubscription)
|
||||
ix.Host.SetStreamHandler(common.ProtocolNativeGetIndexers, ix.handleNativeGetIndexers)
|
||||
ix.Host.SetStreamHandler(common.ProtocolNativeConsensus, ix.handleNativeConsensus)
|
||||
ix.Host.SetStreamHandler(common.ProtocolNativeGetPeers, ix.handleNativeGetPeers)
|
||||
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
|
||||
ix.subscribeIndexerRegistry()
|
||||
// Ensure long connections to other configured natives (native-to-native mesh).
|
||||
common.EnsureNativePeers(ix.Host)
|
||||
@@ -107,8 +117,15 @@ func (ix *IndexerService) InitNative() {
|
||||
// registered indexer PeerIDs to one another, enabling cross-native DHT discovery.
|
||||
func (ix *IndexerService) subscribeIndexerRegistry() {
|
||||
logger := oclib.GetLogger()
|
||||
ix.PS.RegisterTopicValidator(common.TopicIndexerRegistry, func(_ context.Context, _ pp.ID, _ *pubsub.Message) bool {
|
||||
return true
|
||||
ix.PS.RegisterTopicValidator(common.TopicIndexerRegistry, func(_ context.Context, _ pp.ID, msg *pubsub.Message) bool {
|
||||
// Reject empty or syntactically invalid multiaddrs before they reach the
|
||||
// message loop. A compromised native could otherwise gossip arbitrary data.
|
||||
addr := string(msg.Data)
|
||||
if addr == "" {
|
||||
return false
|
||||
}
|
||||
_, err := pp.AddrInfoFromString(addr)
|
||||
return err == nil
|
||||
})
|
||||
topic, err := ix.PS.Join(common.TopicIndexerRegistry)
|
||||
if err != nil {
|
||||
@@ -130,29 +147,38 @@ func (ix *IndexerService) subscribeIndexerRegistry() {
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
peerID := string(msg.Data)
|
||||
if peerID == "" {
|
||||
addr := string(msg.Data)
|
||||
if addr == "" {
|
||||
continue
|
||||
}
|
||||
// A neighbouring native registered this PeerID; add to known set for DHT refresh.
|
||||
if peer, err := pp.AddrInfoFromString(addr); err == nil {
|
||||
ix.Native.knownMu.Lock()
|
||||
ix.Native.knownPeerIDs[peerID] = struct{}{}
|
||||
ix.Native.knownPeerIDs[peer.ID.String()] = addr
|
||||
ix.Native.knownMu.Unlock()
|
||||
|
||||
}
|
||||
// A neighbouring native registered this PeerID; add to known set for DHT refresh.
|
||||
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// handleNativeSubscription stores an indexer's alive registration in the DHT cache.
|
||||
// handleNativeSubscription stores an indexer's alive registration in the local cache
|
||||
// immediately, then persists it to the DHT asynchronously.
|
||||
// The stream is temporary: indexer sends one IndexerRegistration and closes.
|
||||
func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
logger.Info().Msg("Subscription")
|
||||
|
||||
var reg common.IndexerRegistration
|
||||
if err := json.NewDecoder(s).Decode(®); err != nil {
|
||||
logger.Err(err).Msg("native subscription: decode")
|
||||
return
|
||||
}
|
||||
logger.Info().Msg("Subscription " + reg.Addr)
|
||||
|
||||
if reg.Addr == "" {
|
||||
logger.Error().Msg("native subscription: missing addr")
|
||||
return
|
||||
@@ -166,30 +192,23 @@ func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
|
||||
reg.PeerID = ad.ID.String()
|
||||
}
|
||||
|
||||
expiry := time.Now().UTC().Add(IndexerTTL)
|
||||
// Build entry with a fresh TTL — must happen before the cache write so the 66s
|
||||
// window is not consumed by DHT retries.
|
||||
entry := &liveIndexerEntry{
|
||||
PeerID: reg.PeerID,
|
||||
Addr: reg.Addr,
|
||||
ExpiresAt: expiry,
|
||||
ExpiresAt: time.Now().UTC().Add(IndexerTTL),
|
||||
}
|
||||
|
||||
// Persist in DHT with 66s TTL.
|
||||
key := ix.genIndexerKey(reg.PeerID)
|
||||
if data, err := json.Marshal(entry); err == nil {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
||||
logger.Err(err).Msg("native subscription: DHT put")
|
||||
}
|
||||
cancel()
|
||||
}
|
||||
|
||||
// Update local cache and known set.
|
||||
// Update local cache and known set immediately so concurrent GetIndexers calls
|
||||
// can already see this indexer without waiting for the DHT write to complete.
|
||||
ix.Native.liveIndexersMu.Lock()
|
||||
_, isRenewal := ix.Native.liveIndexers[reg.PeerID]
|
||||
ix.Native.liveIndexers[reg.PeerID] = entry
|
||||
ix.Native.liveIndexersMu.Unlock()
|
||||
|
||||
ix.Native.knownMu.Lock()
|
||||
ix.Native.knownPeerIDs[reg.PeerID] = struct{}{}
|
||||
ix.Native.knownPeerIDs[reg.PeerID] = reg.Addr
|
||||
ix.Native.knownMu.Unlock()
|
||||
|
||||
// Gossip PeerID to neighbouring natives so they discover it via DHT.
|
||||
@@ -197,16 +216,46 @@ func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
|
||||
topic := ix.LongLivedPubSubs[common.TopicIndexerRegistry]
|
||||
ix.PubsubMu.RUnlock()
|
||||
if topic != nil {
|
||||
if err := topic.Publish(context.Background(), []byte(reg.PeerID)); err != nil {
|
||||
if err := topic.Publish(context.Background(), []byte(reg.Addr)); err != nil {
|
||||
logger.Err(err).Msg("native subscription: registry gossip publish")
|
||||
}
|
||||
}
|
||||
|
||||
logger.Info().Str("peer", reg.PeerID).Msg("native: indexer registered")
|
||||
if isRenewal {
|
||||
logger.Debug().Str("peer", reg.PeerID).Msg("native: indexer TTL renewed : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
|
||||
} else {
|
||||
logger.Info().Str("peer", reg.PeerID).Msg("native: indexer registered : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
|
||||
}
|
||||
|
||||
// Persist in DHT asynchronously — retries must not block the handler or consume
|
||||
// the local cache TTL.
|
||||
key := ix.genIndexerKey(reg.PeerID)
|
||||
data, err := json.Marshal(entry)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("native subscription: marshal entry")
|
||||
return
|
||||
}
|
||||
go func() {
|
||||
for {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
||||
cancel()
|
||||
logger.Err(err).Msg("native subscription: DHT put " + key)
|
||||
if strings.Contains(err.Error(), "failed to find any peer in table") {
|
||||
time.Sleep(10 * time.Second)
|
||||
continue
|
||||
}
|
||||
return
|
||||
}
|
||||
cancel()
|
||||
return
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// handleNativeGetIndexers returns this native's own list of reachable indexers.
|
||||
// If none are available, it self-delegates (becomes the fallback indexer for the caller).
|
||||
// Self-delegation (native acting as temporary fallback indexer) is only permitted
|
||||
// for nodes — never for peers that are themselves registered indexers in knownPeerIDs.
|
||||
// The consensus across natives is the responsibility of the requesting node/indexer.
|
||||
func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
|
||||
defer s.Close()
|
||||
@@ -220,14 +269,20 @@ func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
|
||||
if req.Count <= 0 {
|
||||
req.Count = 3
|
||||
}
|
||||
|
||||
reachable := ix.reachableLiveIndexers()
|
||||
callerPeerID := s.Conn().RemotePeer().String()
|
||||
reachable := ix.reachableLiveIndexers(req.Count, callerPeerID)
|
||||
var resp common.GetIndexersResponse
|
||||
|
||||
if len(reachable) == 0 {
|
||||
// No indexers known: become temporary fallback for this caller.
|
||||
ix.selfDelegate(s.Conn().RemotePeer(), &resp)
|
||||
logger.Info().Str("peer", s.Conn().RemotePeer().String()).Msg("native: no indexers, acting as fallback")
|
||||
// No live indexers reachable — try to self-delegate.
|
||||
if ix.selfDelegate(s.Conn().RemotePeer(), &resp) {
|
||||
logger.Info().Str("peer", callerPeerID).Msg("native: no indexers, acting as fallback for node")
|
||||
} else {
|
||||
// Fallback pool saturated: return empty so the caller retries another
|
||||
// native instead of piling more load onto this one.
|
||||
logger.Warn().Str("peer", callerPeerID).Int("pool", maxFallbackPeers).Msg(
|
||||
"native: fallback pool saturated, refusing self-delegation")
|
||||
}
|
||||
} else {
|
||||
rand.Shuffle(len(reachable), func(i, j int) { reachable[i], reachable[j] = reachable[j], reachable[i] })
|
||||
if req.Count > len(reachable) {
|
||||
@@ -255,7 +310,7 @@ func (ix *IndexerService) handleNativeConsensus(s network.Stream) {
|
||||
return
|
||||
}
|
||||
|
||||
myList := ix.reachableLiveIndexers()
|
||||
myList := ix.reachableLiveIndexers(-1, s.Conn().RemotePeer().String())
|
||||
mySet := make(map[string]struct{}, len(myList))
|
||||
for _, addr := range myList {
|
||||
mySet[addr] = struct{}{}
|
||||
@@ -285,31 +340,56 @@ func (ix *IndexerService) handleNativeConsensus(s network.Stream) {
|
||||
}
|
||||
|
||||
// selfDelegate marks the caller as a responsible peer and exposes this native's own
|
||||
// address as its temporary indexer.
|
||||
func (ix *IndexerService) selfDelegate(remotePeer pp.ID, resp *common.GetIndexersResponse) {
|
||||
// address as its temporary indexer. Returns false when the fallback pool is saturated
|
||||
// (maxFallbackPeers reached) — the caller must return an empty response so the node
|
||||
// retries later instead of pinning indefinitely to an overloaded native.
|
||||
func (ix *IndexerService) selfDelegate(remotePeer pp.ID, resp *common.GetIndexersResponse) bool {
|
||||
ix.Native.responsibleMu.Lock()
|
||||
ix.Native.responsiblePeers[remotePeer] = struct{}{}
|
||||
ix.Native.responsibleMu.Unlock()
|
||||
resp.IsSelfFallback = true
|
||||
for _, a := range ix.Host.Addrs() {
|
||||
resp.Indexers = []string{a.String() + "/p2p/" + ix.Host.ID().String()}
|
||||
break
|
||||
defer ix.Native.responsibleMu.Unlock()
|
||||
if len(ix.Native.responsiblePeers) >= maxFallbackPeers {
|
||||
return false
|
||||
}
|
||||
ix.Native.responsiblePeers[remotePeer] = struct{}{}
|
||||
resp.IsSelfFallback = true
|
||||
resp.Indexers = []string{ix.Host.Addrs()[len(ix.Host.Addrs())-1].String() + "/p2p/" + ix.Host.ID().String()}
|
||||
return true
|
||||
}
|
||||
|
||||
// reachableLiveIndexers returns the multiaddrs of non-expired, pingable indexers
|
||||
// from the local cache (kept fresh by refreshIndexersFromDHT in background).
|
||||
func (ix *IndexerService) reachableLiveIndexers() []string {
|
||||
func (ix *IndexerService) reachableLiveIndexers(count int, from ...string) []string {
|
||||
ix.Native.liveIndexersMu.RLock()
|
||||
now := time.Now().UTC()
|
||||
candidates := []*liveIndexerEntry{}
|
||||
for _, e := range ix.Native.liveIndexers {
|
||||
if e.ExpiresAt.After(now) {
|
||||
fmt.Println("liveIndexers", slices.Contains(from, e.PeerID), from, e.PeerID)
|
||||
if e.ExpiresAt.After(now) && !slices.Contains(from, e.PeerID) {
|
||||
candidates = append(candidates, e)
|
||||
}
|
||||
}
|
||||
ix.Native.liveIndexersMu.RUnlock()
|
||||
|
||||
fmt.Println("midway...", candidates, from, ix.Native.knownPeerIDs)
|
||||
|
||||
if (count > 0 && len(candidates) < count) || count < 0 {
|
||||
ix.Native.knownMu.RLock()
|
||||
for k, v := range ix.Native.knownPeerIDs {
|
||||
// Include peers whose liveIndexers entry is absent OR expired.
|
||||
// A non-nil but expired entry means the peer was once known but
|
||||
// has since timed out — PeerIsAlive below will decide if it's back.
|
||||
fmt.Println("knownPeerIDs", slices.Contains(from, k), from, k)
|
||||
if !slices.Contains(from, k) {
|
||||
candidates = append(candidates, &liveIndexerEntry{
|
||||
PeerID: k,
|
||||
Addr: v,
|
||||
})
|
||||
}
|
||||
}
|
||||
ix.Native.knownMu.RUnlock()
|
||||
}
|
||||
|
||||
fmt.Println("midway...1", candidates)
|
||||
|
||||
reachable := []string{}
|
||||
for _, e := range candidates {
|
||||
ad, err := pp.AddrInfoFromString(e.Addr)
|
||||
@@ -371,6 +451,12 @@ func (ix *IndexerService) refreshIndexersFromDHT() {
|
||||
ix.Native.liveIndexers[best.PeerID] = best
|
||||
ix.Native.liveIndexersMu.Unlock()
|
||||
logger.Info().Str("peer", best.PeerID).Msg("native: refreshed indexer from DHT")
|
||||
} else {
|
||||
// DHT has no fresh entry — peer is gone, prune from known set.
|
||||
ix.Native.knownMu.Lock()
|
||||
delete(ix.Native.knownPeerIDs, pid)
|
||||
ix.Native.knownMu.Unlock()
|
||||
logger.Info().Str("peer", pid).Msg("native: pruned stale peer from knownPeerIDs")
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -387,30 +473,107 @@ func (ix *IndexerService) runOffloadLoop() {
|
||||
defer t.Stop()
|
||||
logger := oclib.GetLogger()
|
||||
for range t.C {
|
||||
fmt.Println("runOffloadLoop", ix.Native.responsiblePeers)
|
||||
ix.Native.responsibleMu.RLock()
|
||||
count := len(ix.Native.responsiblePeers)
|
||||
ix.Native.responsibleMu.RUnlock()
|
||||
if count == 0 {
|
||||
continue
|
||||
}
|
||||
if len(ix.reachableLiveIndexers()) > 0 {
|
||||
ix.Native.responsibleMu.RLock()
|
||||
peerIDS := []string{}
|
||||
for p := range ix.Native.responsiblePeers {
|
||||
peerIDS = append(peerIDS, p.String())
|
||||
}
|
||||
fmt.Println("COUNT --> ", count, len(ix.reachableLiveIndexers(-1, peerIDS...)))
|
||||
ix.Native.responsibleMu.RUnlock()
|
||||
if len(ix.reachableLiveIndexers(-1, peerIDS...)) > 0 {
|
||||
ix.Native.responsibleMu.RLock()
|
||||
released := ix.Native.responsiblePeers
|
||||
ix.Native.responsibleMu.RUnlock()
|
||||
|
||||
// Reset (not Close) heartbeat streams of released peers.
|
||||
// Close() only half-closes the native's write direction — the peer's write
|
||||
// direction stays open and sendHeartbeat never sees an error.
|
||||
// Reset() abruptly terminates both directions, making the peer's next
|
||||
// json.Encode return an error which triggers replenishIndexersFromNative.
|
||||
ix.StreamMU.Lock()
|
||||
if streams := ix.StreamRecords[common.ProtocolHeartbeat]; streams != nil {
|
||||
for pid := range released {
|
||||
if rec, ok := streams[pid]; ok {
|
||||
if rec.HeartbeatStream != nil && rec.HeartbeatStream.Stream != nil {
|
||||
rec.HeartbeatStream.Stream.Reset()
|
||||
}
|
||||
ix.Native.responsibleMu.Lock()
|
||||
ix.Native.responsiblePeers = map[pp.ID]struct{}{}
|
||||
delete(ix.Native.responsiblePeers, pid)
|
||||
ix.Native.responsibleMu.Unlock()
|
||||
|
||||
delete(streams, pid)
|
||||
logger.Info().Str("peer", pid.String()).Str("proto", string(common.ProtocolHeartbeat)).Msg(
|
||||
"native: offload — stream reset, peer will reconnect to real indexer")
|
||||
} else {
|
||||
// No recorded heartbeat stream for this peer: either it never
|
||||
// passed the score check (new peer, uptime=0 → score<75) or the
|
||||
// stream was GC'd. We cannot send a Reset signal, so close the
|
||||
// whole connection instead — this makes the peer's sendHeartbeat
|
||||
// return an error, which triggers replenishIndexersFromNative and
|
||||
// migrates it to a real indexer.
|
||||
ix.Native.responsibleMu.Lock()
|
||||
delete(ix.Native.responsiblePeers, pid)
|
||||
ix.Native.responsibleMu.Unlock()
|
||||
go ix.Host.Network().ClosePeer(pid)
|
||||
logger.Info().Str("peer", pid.String()).Msg(
|
||||
"native: offload — no heartbeat stream, closing connection so peer re-requests real indexers")
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
|
||||
logger.Info().Int("released", count).Msg("native: offloaded responsible peers to real indexers")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// handleNativeGetPeers returns a random selection of this native's known native
|
||||
// contacts, excluding any in the request's Exclude list.
|
||||
func (ix *IndexerService) handleNativeGetPeers(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
var req common.GetNativePeersRequest
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
logger.Err(err).Msg("native get peers: decode")
|
||||
return
|
||||
}
|
||||
if req.Count <= 0 {
|
||||
req.Count = 1
|
||||
}
|
||||
|
||||
excludeSet := make(map[string]struct{}, len(req.Exclude))
|
||||
for _, e := range req.Exclude {
|
||||
excludeSet[e] = struct{}{}
|
||||
}
|
||||
|
||||
common.StreamNativeMu.RLock()
|
||||
candidates := make([]string, 0, len(common.StaticNatives))
|
||||
for addr := range common.StaticNatives {
|
||||
if _, excluded := excludeSet[addr]; !excluded {
|
||||
candidates = append(candidates, addr)
|
||||
}
|
||||
}
|
||||
common.StreamNativeMu.RUnlock()
|
||||
|
||||
rand.Shuffle(len(candidates), func(i, j int) { candidates[i], candidates[j] = candidates[j], candidates[i] })
|
||||
if req.Count > len(candidates) {
|
||||
req.Count = len(candidates)
|
||||
}
|
||||
|
||||
resp := common.GetNativePeersResponse{Peers: candidates[:req.Count]}
|
||||
if err := json.NewEncoder(s).Encode(resp); err != nil {
|
||||
logger.Err(err).Msg("native get peers: encode response")
|
||||
}
|
||||
}
|
||||
|
||||
// StartNativeRegistration starts a goroutine that periodically registers this
|
||||
// indexer with all configured native indexers (every RecommendedHeartbeatInterval).
|
||||
func StartNativeRegistration(h host.Host, nativeAddressesStr string) {
|
||||
go func() {
|
||||
common.RegisterWithNative(h, nativeAddressesStr)
|
||||
t := time.NewTicker(common.RecommendedHeartbeatInterval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
common.RegisterWithNative(h, nativeAddressesStr)
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
@@ -11,6 +11,7 @@ import (
|
||||
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||
record "github.com/libp2p/go-libp2p-record"
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
// IndexerService manages the indexer node's state: stream records, DHT, pubsub.
|
||||
@@ -22,6 +23,7 @@ type IndexerService struct {
|
||||
mu sync.RWMutex
|
||||
IsNative bool
|
||||
Native *NativeState // non-nil when IsNative == true
|
||||
nameIndex *nameIndexState
|
||||
}
|
||||
|
||||
// NewIndexerService creates an IndexerService.
|
||||
@@ -43,22 +45,34 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
|
||||
}
|
||||
ix.PS = ps
|
||||
|
||||
if ix.isStrictIndexer {
|
||||
if ix.isStrictIndexer && !isNative {
|
||||
logger.Info().Msg("connect to indexers as strict indexer...")
|
||||
common.ConnectToIndexers(h, 0, 5, ix.Host.ID())
|
||||
common.ConnectToIndexers(h, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, ix.Host.ID())
|
||||
logger.Info().Msg("subscribe to decentralized search flow as strict indexer...")
|
||||
ix.SubscribeToSearch(ix.PS, nil)
|
||||
go ix.SubscribeToSearch(ix.PS, nil)
|
||||
}
|
||||
|
||||
if !isNative {
|
||||
logger.Info().Msg("init distributed name index...")
|
||||
ix.initNameIndex(ps)
|
||||
ix.LongLivedStreamRecordedService.AfterDelete = func(pid pp.ID, name, did string) {
|
||||
ix.publishNameEvent(NameIndexDelete, name, pid.String(), did)
|
||||
}
|
||||
}
|
||||
|
||||
if ix.DHT, err = dht.New(
|
||||
context.Background(),
|
||||
ix.Host,
|
||||
dht.Mode(dht.ModeServer),
|
||||
dht.ProtocolPrefix("oc"), // 🔥 réseau privé
|
||||
dht.Validator(record.NamespacedValidator{
|
||||
"node": PeerRecordValidator{},
|
||||
"indexer": IndexerRecordValidator{}, // for native indexer registry
|
||||
"name": DefaultValidator{},
|
||||
"pid": DefaultValidator{},
|
||||
}),
|
||||
); err != nil {
|
||||
logger.Info().Msg(err.Error())
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -67,11 +81,10 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
|
||||
ix.InitNative()
|
||||
} else {
|
||||
ix.initNodeHandler()
|
||||
}
|
||||
|
||||
// Register with configured natives so this indexer appears in their cache
|
||||
if nativeAddrs := conf.GetConfig().NativeIndexerAddresses; nativeAddrs != "" {
|
||||
StartNativeRegistration(ix.Host, nativeAddrs)
|
||||
common.StartNativeRegistration(ix.Host, nativeAddrs)
|
||||
}
|
||||
}
|
||||
return ix
|
||||
}
|
||||
@@ -79,6 +92,9 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
|
||||
func (ix *IndexerService) Close() {
|
||||
ix.DHT.Close()
|
||||
ix.PS.UnregisterTopicValidator(common.TopicPubSubSearch)
|
||||
if ix.nameIndex != nil {
|
||||
ix.PS.UnregisterTopicValidator(TopicNameIndex)
|
||||
}
|
||||
for _, s := range ix.StreamRecords {
|
||||
for _, ss := range s {
|
||||
ss.HeartbeatStream.Stream.Close()
|
||||
|
||||
@@ -6,6 +6,16 @@ import (
|
||||
"time"
|
||||
)
|
||||
|
||||
type DefaultValidator struct{}
|
||||
|
||||
func (v DefaultValidator) Validate(key string, value []byte) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
func (v DefaultValidator) Select(key string, values [][]byte) (int, error) {
|
||||
return 0, nil
|
||||
}
|
||||
|
||||
type PeerRecordValidator struct{}
|
||||
|
||||
func (v PeerRecordValidator) Validate(key string, value []byte) error {
|
||||
@@ -26,14 +36,7 @@ func (v PeerRecordValidator) Validate(key string, value []byte) error {
|
||||
}
|
||||
|
||||
// Signature verification
|
||||
rec2 := PeerRecord{
|
||||
Name: rec.Name,
|
||||
DID: rec.DID,
|
||||
PubKey: rec.PubKey,
|
||||
PeerID: rec.PeerID,
|
||||
}
|
||||
|
||||
if _, err := rec2.Verify(); err != nil {
|
||||
if _, err := rec.Verify(); err != nil {
|
||||
return errors.New("invalid signature")
|
||||
}
|
||||
|
||||
|
||||
@@ -96,6 +96,7 @@ func ListenNATS(n *Node) {
|
||||
|
||||
},
|
||||
tools.PROPALGATION_EVENT: func(resp tools.NATSResponse) {
|
||||
fmt.Println("PROPALGATION")
|
||||
if resp.FromApp == config.GetAppName() {
|
||||
return
|
||||
}
|
||||
@@ -106,10 +107,10 @@ func ListenNATS(n *Node) {
|
||||
dtt := tools.DataType(propalgation.DataType)
|
||||
dt = &dtt
|
||||
}
|
||||
fmt.Println("PROPALGATION ACT", propalgation.Action, propalgation.Action == tools.PB_CREATE, err)
|
||||
if err == nil {
|
||||
switch propalgation.Action {
|
||||
case tools.PB_ADMIRALTY_CONFIG:
|
||||
case tools.PB_MINIO_CONFIG:
|
||||
case tools.PB_ADMIRALTY_CONFIG, tools.PB_MINIO_CONFIG:
|
||||
var m configPayload
|
||||
var proto protocol.ID = stream.ProtocolAdmiraltyConfigResource
|
||||
if propalgation.Action == tools.PB_MINIO_CONFIG {
|
||||
@@ -122,20 +123,17 @@ func ListenNATS(n *Node) {
|
||||
p.PeerID, proto, resp.Payload)
|
||||
}
|
||||
}
|
||||
case tools.PB_CREATE:
|
||||
case tools.PB_UPDATE:
|
||||
case tools.PB_DELETE:
|
||||
n.StreamService.ToPartnerPublishEvent(
|
||||
case tools.PB_CREATE, tools.PB_UPDATE, tools.PB_DELETE:
|
||||
fmt.Println(propalgation.Action, dt, resp.User, propalgation.Payload)
|
||||
fmt.Println(n.StreamService.ToPartnerPublishEvent(
|
||||
context.Background(),
|
||||
propalgation.Action,
|
||||
dt, resp.User,
|
||||
propalgation.Payload,
|
||||
)
|
||||
))
|
||||
case tools.PB_CONSIDERS:
|
||||
switch resp.Datatype {
|
||||
case tools.BOOKING:
|
||||
case tools.PURCHASE_RESOURCE:
|
||||
case tools.WORKFLOW_EXECUTION:
|
||||
case tools.BOOKING, tools.PURCHASE_RESOURCE, tools.WORKFLOW_EXECUTION:
|
||||
var m executionConsidersPayload
|
||||
if err := json.Unmarshal(resp.Payload, &m); err == nil {
|
||||
for _, p := range m.PeerIDs {
|
||||
|
||||
@@ -2,10 +2,10 @@ package node
|
||||
|
||||
import (
|
||||
"context"
|
||||
"crypto/sha256"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"maps"
|
||||
"oc-discovery/conf"
|
||||
"oc-discovery/daemons/node/common"
|
||||
"oc-discovery/daemons/node/indexer"
|
||||
@@ -15,6 +15,7 @@ import (
|
||||
"time"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"cloud.o-forge.io/core/oc-lib/dbs"
|
||||
"cloud.o-forge.io/core/oc-lib/models/peer"
|
||||
"cloud.o-forge.io/core/oc-lib/tools"
|
||||
"github.com/google/uuid"
|
||||
@@ -33,6 +34,7 @@ type Node struct {
|
||||
StreamService *stream.StreamService
|
||||
PeerID pp.ID
|
||||
isIndexer bool
|
||||
peerRecord *indexer.PeerRecord
|
||||
|
||||
Mu sync.RWMutex
|
||||
}
|
||||
@@ -69,6 +71,9 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
|
||||
isIndexer: isIndexer,
|
||||
LongLivedStreamRecordedService: common.NewStreamRecordedService[interface{}](h, 1000),
|
||||
}
|
||||
// Register the bandwidth probe handler so any peer measuring this node's
|
||||
// throughput can open a dedicated probe stream and read the echo.
|
||||
h.SetStreamHandler(common.ProtocolBandwidthProbe, common.HandleBandwidthProbe)
|
||||
var ps *pubsubs.PubSub
|
||||
if isNode {
|
||||
logger.Info().Msg("generate opencloud node...")
|
||||
@@ -77,8 +82,30 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
|
||||
panic(err) // can't run your node without a propalgation pubsub, of state of node.
|
||||
}
|
||||
node.PS = ps
|
||||
// buildRecord returns a fresh signed PeerRecord as JSON, embedded in each
|
||||
// heartbeat so the receiving indexer can republish it to the DHT directly.
|
||||
// peerRecord is nil until claimInfo runs, so the first ~20s heartbeats carry
|
||||
// no record — that's fine, claimInfo publishes once synchronously at startup.
|
||||
buildRecord := func() json.RawMessage {
|
||||
if node.peerRecord == nil {
|
||||
return nil
|
||||
}
|
||||
priv, err := tools.LoadKeyFromFilePrivate()
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
fresh := *node.peerRecord
|
||||
fresh.PeerRecordPayload.ExpiryDate = time.Now().UTC().Add(2 * time.Minute)
|
||||
payload, _ := json.Marshal(fresh.PeerRecordPayload)
|
||||
fresh.Signature, err = priv.Sign(payload)
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
b, _ := json.Marshal(fresh)
|
||||
return json.RawMessage(b)
|
||||
}
|
||||
logger.Info().Msg("connect to indexers...")
|
||||
common.ConnectToIndexers(node.Host, 0, 5, node.PeerID) // TODO : make var to change how many indexers are allowed.
|
||||
common.ConnectToIndexers(node.Host, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, node.PeerID, buildRecord)
|
||||
logger.Info().Msg("claims my node...")
|
||||
if _, err := node.claimInfo(conf.GetConfig().Name, conf.GetConfig().Hostname); err != nil {
|
||||
panic(err)
|
||||
@@ -100,14 +127,14 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
|
||||
}
|
||||
}
|
||||
node.SubscribeToSearch(node.PS, &f)
|
||||
logger.Info().Msg("connect to NATS")
|
||||
go ListenNATS(node)
|
||||
logger.Info().Msg("Node is actually running.")
|
||||
}
|
||||
if isIndexer {
|
||||
logger.Info().Msg("generate opencloud indexer...")
|
||||
node.IndexerService = indexer.NewIndexerService(node.Host, ps, 5, isNativeIndexer)
|
||||
node.IndexerService = indexer.NewIndexerService(node.Host, ps, 500, isNativeIndexer)
|
||||
}
|
||||
logger.Info().Msg("connect to NATS")
|
||||
ListenNATS(node)
|
||||
logger.Info().Msg("Node is actually running.")
|
||||
return node, nil
|
||||
}
|
||||
|
||||
@@ -127,24 +154,29 @@ func (d *Node) publishPeerRecord(
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
common.StreamMuIndexes.RLock()
|
||||
indexerSnapshot := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
|
||||
for _, ad := range common.StaticIndexers {
|
||||
indexerSnapshot = append(indexerSnapshot, ad)
|
||||
}
|
||||
common.StreamMuIndexes.RUnlock()
|
||||
|
||||
for _, ad := range indexerSnapshot {
|
||||
var err error
|
||||
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolPublish, "", common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{},
|
||||
&common.StreamMuIndexes); err != nil {
|
||||
continue
|
||||
}
|
||||
stream := common.StreamIndexers[common.ProtocolPublish][ad.ID]
|
||||
base := indexer.PeerRecord{
|
||||
base := indexer.PeerRecordPayload{
|
||||
Name: rec.Name,
|
||||
DID: rec.DID,
|
||||
PubKey: rec.PubKey,
|
||||
ExpiryDate: time.Now().UTC().Add(2 * time.Minute),
|
||||
}
|
||||
payload, _ := json.Marshal(base)
|
||||
hash := sha256.Sum256(payload)
|
||||
|
||||
rec.ExpiryDate = base.ExpiryDate
|
||||
rec.Signature, err = priv.Sign(hash[:])
|
||||
rec.PeerRecordPayload = base
|
||||
rec.Signature, err = priv.Sign(payload)
|
||||
if err := json.NewEncoder(stream.Stream).Encode(&rec); err != nil { // then publish on stream
|
||||
return err
|
||||
}
|
||||
@@ -156,38 +188,50 @@ func (d *Node) GetPeerRecord(
|
||||
ctx context.Context,
|
||||
pidOrdid string,
|
||||
) ([]*peer.Peer, error) {
|
||||
did := pidOrdid // if known pidOrdid is did
|
||||
pid := pidOrdid // if not known pidOrdid is pid
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
if data := access.Search(nil, did, true); len(data.Data) > 0 {
|
||||
did = data.Data[0].GetID()
|
||||
pid = data.Data[0].(*peer.Peer).PeerID
|
||||
}
|
||||
var err error
|
||||
var info map[string]indexer.PeerRecord
|
||||
common.StreamMuIndexes.RLock()
|
||||
indexerSnapshot2 := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
|
||||
for _, ad := range common.StaticIndexers {
|
||||
indexerSnapshot2 = append(indexerSnapshot2, ad)
|
||||
}
|
||||
common.StreamMuIndexes.RUnlock()
|
||||
|
||||
// Build the GetValue request: if pidOrdid is neither a UUID DID nor a libp2p
|
||||
// PeerID, treat it as a human-readable name and let the indexer resolve it.
|
||||
getReq := indexer.GetValue{Key: pidOrdid}
|
||||
isNameSearch := false
|
||||
if pidR, pidErr := pp.Decode(pidOrdid); pidErr == nil {
|
||||
getReq.PeerID = pidR
|
||||
} else if _, uuidErr := uuid.Parse(pidOrdid); uuidErr != nil {
|
||||
// Not a UUID DID → treat pidOrdid as a name substring search.
|
||||
getReq.Name = pidOrdid
|
||||
getReq.Key = ""
|
||||
isNameSearch = true
|
||||
}
|
||||
|
||||
for _, ad := range indexerSnapshot2 {
|
||||
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolGet, "",
|
||||
common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{}, &common.StreamMuIndexes); err != nil {
|
||||
continue
|
||||
}
|
||||
pidR, err := pp.Decode(pid)
|
||||
if err != nil {
|
||||
stream := common.StreamIndexers[common.ProtocolGet][ad.ID]
|
||||
if err := json.NewEncoder(stream.Stream).Encode(getReq); err != nil {
|
||||
continue
|
||||
}
|
||||
stream := common.StreamIndexers[common.ProtocolGet][ad.ID]
|
||||
if err := json.NewEncoder(stream.Stream).Encode(indexer.GetValue{
|
||||
Key: did,
|
||||
PeerID: pidR,
|
||||
}); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
for {
|
||||
var resp indexer.GetResponse
|
||||
if err := json.NewDecoder(stream.Stream).Decode(&resp); err != nil {
|
||||
return nil, err
|
||||
continue
|
||||
}
|
||||
if resp.Found {
|
||||
if info == nil {
|
||||
info = resp.Records
|
||||
} else {
|
||||
// Aggregate results from all indexers for name searches.
|
||||
maps.Copy(info, resp.Records)
|
||||
}
|
||||
// For exact lookups (PeerID / DID) stop at the first hit.
|
||||
if !isNameSearch {
|
||||
break
|
||||
}
|
||||
}
|
||||
@@ -196,7 +240,7 @@ func (d *Node) GetPeerRecord(
|
||||
for _, pr := range info {
|
||||
if pk, err := pr.Verify(); err != nil {
|
||||
return nil, err
|
||||
} else if ok, p, err := pr.ExtractPeer(d.PeerID.String(), did, pk); err != nil {
|
||||
} else if ok, p, err := pr.ExtractPeer(d.PeerID.String(), pr.PeerID, pk); err != nil {
|
||||
return nil, err
|
||||
} else {
|
||||
if ok {
|
||||
@@ -218,7 +262,11 @@ func (d *Node) claimInfo(
|
||||
}
|
||||
did := uuid.New().String()
|
||||
|
||||
peers := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil).Search(nil, fmt.Sprintf("%v", peer.SELF), false)
|
||||
peers := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil).Search(&dbs.Filters{
|
||||
And: map[string][]dbs.Filter{ // search by name if no filters are provided
|
||||
"peer_id": {{Operator: dbs.EQUAL.String(), Value: d.Host.ID().String()}},
|
||||
},
|
||||
}, "", false)
|
||||
if len(peers.Data) > 0 {
|
||||
did = peers.Data[0].GetID() // if already existing set up did as made
|
||||
}
|
||||
@@ -238,39 +286,38 @@ func (d *Node) claimInfo(
|
||||
now := time.Now().UTC()
|
||||
expiry := now.Add(150 * time.Second)
|
||||
|
||||
rec := &indexer.PeerRecord{
|
||||
pRec := indexer.PeerRecordPayload{
|
||||
Name: name,
|
||||
DID: did, // REAL PEER ID
|
||||
PubKey: pubBytes,
|
||||
ExpiryDate: expiry,
|
||||
}
|
||||
|
||||
rec.PeerID = d.Host.ID().String()
|
||||
d.PeerID = d.Host.ID()
|
||||
payload, _ := json.Marshal(pRec)
|
||||
|
||||
payload, _ := json.Marshal(rec)
|
||||
hash := sha256.Sum256(payload)
|
||||
|
||||
rec.Signature, err = priv.Sign(hash[:])
|
||||
rec := &indexer.PeerRecord{
|
||||
PeerRecordPayload: pRec,
|
||||
}
|
||||
rec.Signature, err = priv.Sign(payload)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
rec.PeerID = d.Host.ID().String()
|
||||
rec.APIUrl = endPoint
|
||||
rec.StreamAddress = "/ip4/" + conf.GetConfig().Hostname + "/tcp/" + fmt.Sprintf("%v", conf.GetConfig().NodeEndpointPort) + "/p2p/" + rec.PeerID
|
||||
rec.NATSAddress = oclib.GetConfig().NATSUrl
|
||||
rec.WalletAddress = "my-wallet"
|
||||
rec.ExpiryDate = expiry
|
||||
|
||||
if err := d.publishPeerRecord(rec); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
/*if pk, err := rec.Verify(); err != nil {
|
||||
fmt.Println("Verify")
|
||||
d.peerRecord = rec
|
||||
if _, err := rec.Verify(); err != nil {
|
||||
return nil, err
|
||||
} else {*/
|
||||
} else {
|
||||
_, p, err := rec.ExtractPeer(did, did, pub)
|
||||
return p, err
|
||||
//}
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
|
||||
@@ -4,47 +4,56 @@ import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"oc-discovery/daemons/node/stream"
|
||||
"oc-discovery/models"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"cloud.o-forge.io/core/oc-lib/dbs"
|
||||
"cloud.o-forge.io/core/oc-lib/models/peer"
|
||||
"cloud.o-forge.io/core/oc-lib/tools"
|
||||
)
|
||||
|
||||
func (ps *PubSubService) SearchPublishEvent(
|
||||
ctx context.Context, dt *tools.DataType, typ string, user string, search string) error {
|
||||
b, err := json.Marshal(map[string]string{"search": search})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
switch typ {
|
||||
case "known": // define Search Strategy
|
||||
return ps.StreamService.SearchKnownPublishEvent(dt, user, search) //if partners focus only them*/
|
||||
return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"": {{Operator: dbs.NOT.String(), Value: dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.BLACKLIST}},
|
||||
},
|
||||
}}},
|
||||
},
|
||||
}, b, stream.ProtocolSearchResource) //if partners focus only them*/
|
||||
case "partner": // define Search Strategy
|
||||
return ps.StreamService.SearchPartnersPublishEvent(dt, user, search) //if partners focus only them*/
|
||||
return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
|
||||
},
|
||||
}, b, stream.ProtocolSearchResource)
|
||||
case "all": // Gossip PubSub
|
||||
b, err := json.Marshal(map[string]string{"search": search})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return ps.searchPublishEvent(ctx, dt, user, b)
|
||||
return ps.publishEvent(ctx, dt, tools.PB_SEARCH, user, b)
|
||||
default:
|
||||
return errors.New("no type of research found")
|
||||
}
|
||||
}
|
||||
|
||||
func (ps *PubSubService) searchPublishEvent(
|
||||
ctx context.Context, dt *tools.DataType, user string, payload []byte) error {
|
||||
return ps.publishEvent(ctx, dt, tools.PB_SEARCH, user, payload)
|
||||
}
|
||||
|
||||
func (ps *PubSubService) publishEvent(
|
||||
ctx context.Context, dt *tools.DataType, action tools.PubSubAction, user string, payload []byte,
|
||||
) error {
|
||||
from, err := oclib.GenerateNodeID()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
priv, err := tools.LoadKeyFromFilePrivate()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
msg, _ := json.Marshal(models.NewEvent(action.String(), from, dt, user, payload, priv))
|
||||
msg, _ := json.Marshal(models.NewEvent(action.String(), ps.Host.ID().String(), dt, user, payload, priv))
|
||||
topic, err := ps.PS.Join(action.String())
|
||||
if err != nil {
|
||||
return err
|
||||
|
||||
@@ -5,6 +5,7 @@ import (
|
||||
"crypto/subtle"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"oc-discovery/daemons/node/common"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
@@ -19,6 +20,7 @@ type Verify struct {
|
||||
}
|
||||
|
||||
func (ps *StreamService) handleEvent(protocol string, evt *common.Event) error {
|
||||
fmt.Println("handleEvent")
|
||||
ps.handleEventFromPartner(evt, protocol)
|
||||
/*if protocol == ProtocolVerifyResource {
|
||||
if evt.DataType == -1 {
|
||||
@@ -148,14 +150,6 @@ func (abs *StreamService) pass(event *common.Event, action tools.PubSubAction) e
|
||||
}
|
||||
|
||||
func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol string) error {
|
||||
resource, err := resources.ToResource(int(evt.DataType), evt.Payload)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
b, err := json.Marshal(resource)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
switch protocol {
|
||||
case ProtocolSearchResource:
|
||||
if evt.DataType < 0 {
|
||||
@@ -169,20 +163,20 @@ func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol stri
|
||||
ps.SendResponse(p[0], evt)
|
||||
}
|
||||
}
|
||||
case ProtocolCreateResource:
|
||||
case ProtocolUpdateResource:
|
||||
case ProtocolCreateResource, ProtocolUpdateResource:
|
||||
fmt.Println("RECEIVED Protocol.Update")
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.CREATE_RESOURCE, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(evt.DataType),
|
||||
Method: int(tools.CREATE_RESOURCE),
|
||||
Payload: b,
|
||||
Payload: evt.Payload,
|
||||
})
|
||||
case ProtocolDeleteResource:
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.REMOVE_RESOURCE, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(evt.DataType),
|
||||
Method: int(tools.REMOVE_RESOURCE),
|
||||
Payload: b,
|
||||
Payload: evt.Payload,
|
||||
})
|
||||
default:
|
||||
return errors.New("no action authorized available : " + protocol)
|
||||
@@ -213,9 +207,9 @@ func (abs *StreamService) SendResponse(p *peer.Peer, event *common.Event) error
|
||||
if j, err := json.Marshal(ss); err == nil {
|
||||
if event.DataType != -1 {
|
||||
ndt := tools.DataType(dt.EnumIndex())
|
||||
abs.PublishResources(&ndt, event.User, peerID, j)
|
||||
abs.PublishCommon(&ndt, event.User, peerID, ProtocolSearchResource, j)
|
||||
} else {
|
||||
abs.PublishResources(nil, event.User, peerID, j)
|
||||
abs.PublishCommon(nil, event.User, peerID, ProtocolSearchResource, j)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -15,81 +15,45 @@ import (
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
)
|
||||
|
||||
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) {
|
||||
func (ps *StreamService) PublishesCommon(dt *tools.DataType, user string, filter *dbs.Filters, resource []byte, protos ...protocol.ID) error {
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
p := access.LoadOne(toPeerID)
|
||||
if p.Err != "" {
|
||||
return nil, errors.New(p.Err)
|
||||
} else {
|
||||
ad, err := pp.AddrInfoFromString(p.Data.(*peer.Peer).StreamAddress)
|
||||
p := access.Search(filter, "", false)
|
||||
for _, pes := range p.Data {
|
||||
for _, proto := range protos {
|
||||
if _, err := ps.PublishCommon(dt, user, pes.(*peer.Peer).PeerID, proto, resource); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) {
|
||||
fmt.Println("PublishCommon")
|
||||
if toPeerID == ps.Key.String() {
|
||||
return nil, errors.New("Can't send to ourself !")
|
||||
}
|
||||
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
p := access.Search(&dbs.Filters{
|
||||
And: map[string][]dbs.Filter{ // search by name if no filters are provided
|
||||
"peer_id": {{Operator: dbs.EQUAL.String(), Value: toPeerID}},
|
||||
},
|
||||
}, toPeerID, false)
|
||||
var pe *peer.Peer
|
||||
if len(p.Data) > 0 && p.Data[0].(*peer.Peer).Relation != peer.BLACKLIST {
|
||||
pe = p.Data[0].(*peer.Peer)
|
||||
} else if pps, err := ps.Node.GetPeerRecord(context.Background(), toPeerID); err == nil && len(pps) > 0 {
|
||||
pe = pps[0]
|
||||
}
|
||||
if pe != nil {
|
||||
ad, err := pp.AddrInfoFromString(p.Data[0].(*peer.Peer).StreamAddress)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return ps.write(toPeerID, ad, dt, user, resource, proto)
|
||||
}
|
||||
}
|
||||
|
||||
func (ps *StreamService) PublishResources(dt *tools.DataType, user string, toPeerID string, resource []byte) error {
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
p := access.LoadOne(toPeerID)
|
||||
if p.Err != "" {
|
||||
return errors.New(p.Err)
|
||||
} else {
|
||||
ad, err := pp.AddrInfoFromString(p.Data.(*peer.Peer).StreamAddress)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
ps.write(toPeerID, ad, dt, user, resource, ProtocolSearchResource)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (ps *StreamService) SearchKnownPublishEvent(dt *tools.DataType, user string, search string) error {
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
peers := access.Search(&dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"": {{Operator: dbs.NOT.String(), Value: dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.BLACKLIST}},
|
||||
},
|
||||
}}},
|
||||
},
|
||||
}, search, false)
|
||||
if peers.Err != "" {
|
||||
return errors.New(peers.Err)
|
||||
} else {
|
||||
b, err := json.Marshal(map[string]string{"search": search})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
for _, p := range peers.Data {
|
||||
ad, err := pp.AddrInfoFromString(p.(*peer.Peer).StreamAddress)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
ps.write(p.GetID(), ad, dt, user, b, ProtocolSearchResource)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (ps *StreamService) SearchPartnersPublishEvent(dt *tools.DataType, user string, search string) error {
|
||||
if peers, err := ps.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex())); err != nil {
|
||||
return err
|
||||
} else {
|
||||
b, err := json.Marshal(map[string]string{"search": search})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
for _, p := range peers {
|
||||
ad, err := pp.AddrInfoFromString(p.StreamAddress)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
ps.write(p.GetID(), ad, dt, user, b, ProtocolSearchResource)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
return nil, errors.New("peer unvalid " + toPeerID)
|
||||
}
|
||||
|
||||
func (ps *StreamService) ToPartnerPublishEvent(
|
||||
@@ -103,12 +67,23 @@ func (ps *StreamService) ToPartnerPublishEvent(
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
ps.Mu.Lock()
|
||||
defer ps.Mu.Unlock()
|
||||
|
||||
if pe, err := oclib.GetMySelf(); err != nil {
|
||||
return err
|
||||
} else if pe.GetID() == p.GetID() {
|
||||
return fmt.Errorf("can't send to ourself")
|
||||
} else {
|
||||
pe.Relation = p.Relation
|
||||
pe.Verify = false
|
||||
if b2, err := json.Marshal(pe); err == nil {
|
||||
if _, err := ps.PublishCommon(dt, user, p.PeerID, ProtocolUpdateResource, b2); err != nil {
|
||||
return err
|
||||
}
|
||||
if p.Relation == peer.PARTNER {
|
||||
if ps.Streams[ProtocolHeartbeatPartner] == nil {
|
||||
ps.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{}
|
||||
}
|
||||
fmt.Println("SHOULD CONNECT")
|
||||
ps.ConnectToPartner(p.StreamAddress)
|
||||
} else if ps.Streams[ProtocolHeartbeatPartner] != nil && ps.Streams[ProtocolHeartbeatPartner][pid] != nil {
|
||||
for _, pids := range ps.Streams {
|
||||
@@ -117,21 +92,19 @@ func (ps *StreamService) ToPartnerPublishEvent(
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
if peers, err := ps.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex())); err != nil {
|
||||
return err
|
||||
} else {
|
||||
for _, p := range peers {
|
||||
for protocol := range protocolsPartners {
|
||||
ad, err := pp.AddrInfoFromString(p.StreamAddress)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
ps.write(p.GetID(), ad, dt, user, payload, protocol)
|
||||
}
|
||||
}
|
||||
ks := []protocol.ID{}
|
||||
for k := range protocolsPartners {
|
||||
ks = append(ks, k)
|
||||
}
|
||||
ps.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
|
||||
},
|
||||
}, payload, ks...)
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -158,6 +131,7 @@ func (s *StreamService) write(
|
||||
}
|
||||
stream := s.Streams[proto][peerID.ID]
|
||||
evt := common.NewEvent(string(proto), peerID.ID.String(), dt, user, payload)
|
||||
fmt.Println("SEND EVENT ", evt.From, evt.DataType, evt.Timestamp)
|
||||
if err := json.NewEncoder(stream.Stream).Encode(evt); err != nil {
|
||||
stream.Stream.Close()
|
||||
logger.Err(err)
|
||||
|
||||
@@ -116,7 +116,7 @@ func (s *StreamService) HandlePartnerHeartbeat(stream network.Stream) {
|
||||
streamsAnonym[k] = v
|
||||
}
|
||||
s.Mu.Unlock()
|
||||
pid, hb, err := common.CheckHeartbeat(s.Host, stream, streamsAnonym, &s.Mu, s.maxNodesConn)
|
||||
pid, hb, err := common.CheckHeartbeat(s.Host, stream, json.NewDecoder(stream), streamsAnonym, &s.Mu, s.maxNodesConn)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
@@ -132,10 +132,12 @@ func (s *StreamService) HandlePartnerHeartbeat(stream network.Stream) {
|
||||
s.ConnectToPartner(val)
|
||||
}
|
||||
}
|
||||
go s.StartGC(30 * time.Second)
|
||||
// GC is already running via InitStream — starting a new ticker goroutine on
|
||||
// every heartbeat would leak an unbounded number of goroutines.
|
||||
}
|
||||
|
||||
func (s *StreamService) connectToPartners() error {
|
||||
logger := oclib.GetLogger()
|
||||
for proto, info := range protocolsPartners {
|
||||
f := func(ss network.Stream) {
|
||||
if s.Streams[proto] == nil {
|
||||
@@ -147,11 +149,12 @@ func (s *StreamService) connectToPartners() error {
|
||||
}
|
||||
go s.readLoop(s.Streams[proto][ss.Conn().RemotePeer()], ss.Conn().RemotePeer(), proto, info)
|
||||
}
|
||||
fmt.Println("SetStreamHandler", proto)
|
||||
logger.Info().Msg("SetStreamHandler " + string(proto))
|
||||
s.Host.SetStreamHandler(proto, f)
|
||||
}
|
||||
peers, err := s.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex()))
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
return err
|
||||
}
|
||||
for _, p := range peers {
|
||||
@@ -161,19 +164,19 @@ func (s *StreamService) connectToPartners() error {
|
||||
}
|
||||
|
||||
func (s *StreamService) ConnectToPartner(address string) {
|
||||
logger := oclib.GetLogger()
|
||||
if ad, err := pp.AddrInfoFromString(address); err == nil {
|
||||
logger.Info().Msg("Connect to Partner " + ProtocolHeartbeatPartner + " " + address)
|
||||
common.SendHeartbeat(context.Background(), ProtocolHeartbeatPartner, conf.GetConfig().Name,
|
||||
s.Host, s.Streams, map[string]*pp.AddrInfo{address: ad}, 20*time.Second)
|
||||
s.Host, s.Streams, map[string]*pp.AddrInfo{address: ad}, nil, 20*time.Second)
|
||||
}
|
||||
}
|
||||
|
||||
func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
|
||||
/* TODO FOR TEST ONLY A VARS THAT DEFINE ADDRESS... deserialize */
|
||||
ps := []*peer.Peer{}
|
||||
if conf.GetConfig().PeerIDS != "" {
|
||||
for _, peerID := range strings.Split(conf.GetConfig().PeerIDS, ",") {
|
||||
ppID := strings.Split(peerID, "/")
|
||||
fmt.Println(ppID, peerID)
|
||||
ps = append(ps, &peer.Peer{
|
||||
AbstractObject: utils.AbstractObject{
|
||||
UUID: uuid.New().String(),
|
||||
@@ -185,7 +188,6 @@ func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
peers := access.Search(nil, search, false)
|
||||
for _, p := range peers.Data {
|
||||
@@ -252,8 +254,9 @@ func (ps *StreamService) readLoop(s *common.Stream, id pp.ID, proto protocol.ID,
|
||||
}
|
||||
var evt common.Event
|
||||
if err := json.NewDecoder(s.Stream).Decode(&evt); err != nil {
|
||||
s.Stream.Close()
|
||||
continue
|
||||
// Any decode error (EOF, reset, malformed JSON) terminates the loop;
|
||||
// continuing on a dead/closed stream creates an infinite spin.
|
||||
return
|
||||
}
|
||||
ps.handleEvent(evt.Type, &evt)
|
||||
if protocolInfo.WaitResponse && !protocolInfo.PersistantStream {
|
||||
|
||||
@@ -1,23 +1,33 @@
|
||||
#!/bin/bash
|
||||
|
||||
IMAGE_BASE_NAME="oc-discovery"
|
||||
DOCKERFILE_PATH="."
|
||||
|
||||
for i in {0..3}; do
|
||||
docker network create \
|
||||
--subnet=172.40.0.0/24 \
|
||||
discovery
|
||||
|
||||
for i in $(seq ${1:-0} ${2:-3}); do
|
||||
NUM=$((i + 1))
|
||||
PORT=$((4000 + $NUM))
|
||||
|
||||
IMAGE_NAME="${IMAGE_BASE_NAME}:${NUM}"
|
||||
|
||||
|
||||
echo "▶ Building image ${IMAGE_NAME} with CONF_NUM=${NUM}"
|
||||
docker build \
|
||||
--build-arg CONF_NUM=${NUM} \
|
||||
-t ${IMAGE_NAME} \
|
||||
-t "${IMAGE_BASE_NAME}_${NUM}" \
|
||||
${DOCKERFILE_PATH}
|
||||
|
||||
docker kill "${IMAGE_BASE_NAME}_${NUM}" | true
|
||||
docker rm "${IMAGE_BASE_NAME}_${NUM}" | true
|
||||
|
||||
echo "▶ Running container ${IMAGE_NAME} on port ${PORT}:${PORT}"
|
||||
docker run -d \
|
||||
--network="${3:-oc}" \
|
||||
-p ${PORT}:${PORT} \
|
||||
--name "${IMAGE_BASE_NAME}_${NUM}" \
|
||||
${IMAGE_NAME}
|
||||
"${IMAGE_BASE_NAME}_${NUM}"
|
||||
|
||||
docker network connect --ip "172.40.0.${NUM}" discovery "${IMAGE_BASE_NAME}_${NUM}"
|
||||
done
|
||||
10
docker_discovery10.json
Normal file
10
docker_discovery10.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "node",
|
||||
"NODE_ENDPOINT_PORT": 4010,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
|
||||
"MIN_INDEXER": 2,
|
||||
"PEER_IDS": "/ip4/172.40.0.9/tcp/4009/p2p/12D3KooWGnQfKwX9E4umCPE8dUKZuig4vw5BndDowRLEbGmcZyta"
|
||||
}
|
||||
@@ -4,5 +4,5 @@
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "indexer",
|
||||
"NODE_ENDPOINT_PORT": 4002,
|
||||
"INDEXER_ADDRESSES": "/ip4/172.19.0.2/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||
"INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||
}
|
||||
@@ -4,5 +4,5 @@
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "node",
|
||||
"NODE_ENDPOINT_PORT": 4003,
|
||||
"INDEXER_ADDRESSES": "/ip4/172.19.0.3/tcp/4002/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
|
||||
"INDEXER_ADDRESSES": "/ip4/172.40.0.2/tcp/4002/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
|
||||
}
|
||||
@@ -4,6 +4,6 @@
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "node",
|
||||
"NODE_ENDPOINT_PORT": 4004,
|
||||
"INDEXER_ADDRESSES": "/ip4/172.19.0.2/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
|
||||
"PEER_IDS": "/ip4/172.19.0.4/tcp/4003/p2p/12D3KooWBh9kZrekBAE5G33q4jCLNRAzygem3gP1mMdK8mhoCTaw"
|
||||
"INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
|
||||
"PEER_IDS": "/ip4/172.40.0.3/tcp/4003/p2p/12D3KooWBh9kZrekBAE5G33q4jCLNRAzygem3gP1mMdK8mhoCTaw"
|
||||
}
|
||||
|
||||
7
docker_discovery5.json
Normal file
7
docker_discovery5.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "native-indexer",
|
||||
"NODE_ENDPOINT_PORT": 4005
|
||||
}
|
||||
8
docker_discovery6.json
Normal file
8
docker_discovery6.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "native-indexer",
|
||||
"NODE_ENDPOINT_PORT": 4006,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||
}
|
||||
8
docker_discovery7.json
Normal file
8
docker_discovery7.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "indexer",
|
||||
"NODE_ENDPOINT_PORT": 4007,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
|
||||
}
|
||||
8
docker_discovery8.json
Normal file
8
docker_discovery8.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "indexer",
|
||||
"NODE_ENDPOINT_PORT": 4008,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||
}
|
||||
8
docker_discovery9.json
Normal file
8
docker_discovery9.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "node",
|
||||
"NODE_ENDPOINT_PORT": 4009,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u,/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||
}
|
||||
2
go.mod
2
go.mod
@@ -3,7 +3,7 @@ module oc-discovery
|
||||
go 1.25.0
|
||||
|
||||
require (
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5
|
||||
github.com/libp2p/go-libp2p v0.47.0
|
||||
github.com/libp2p/go-libp2p-record v0.3.1
|
||||
github.com/multiformats/go-multiaddr v0.16.1
|
||||
|
||||
8
go.sum
8
go.sum
@@ -1,5 +1,13 @@
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7 h1:p9uJjMY+QkE4neA+xRmIRtAm9us94EKZqgajDdLOd0Y=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c h1:FTUu9tdEfib6J+fuc7e5wYTe++EIlB70bVNpOeFjnyU=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0 h1:lvrRF4ToIMl/5k1q4AiPEy6ycjwRtOaDhWnQ/LrW1ZA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31 h1:hvkvJibS9NmImw73j79Ov5VpIYs4WbP4SYGlK/XO82Q=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5 h1:h+Fkyj6cfwAirc0QGCBEkZSSrgcyThXswg7ytOLm948=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
||||
github.com/Masterminds/semver/v3 v3.4.0 h1:Zog+i5UMtVoCU8oKka5P7i9q9HgrJeGzI9SA1Xbatp0=
|
||||
github.com/Masterminds/semver/v3 v3.4.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM=
|
||||
|
||||
6
main.go
6
main.go
@@ -28,11 +28,15 @@ func main() {
|
||||
conf.GetConfig().PSKPath = o.GetStringDefault("PSK_PATH", "./psk/psk.key")
|
||||
conf.GetConfig().NodeEndpointPort = o.GetInt64Default("NODE_ENDPOINT_PORT", 4001)
|
||||
conf.GetConfig().IndexerAddresses = o.GetStringDefault("INDEXER_ADDRESSES", "")
|
||||
conf.GetConfig().NativeIndexerAddresses = o.GetStringDefault("NATIVE_INDEXER_ADDRESSES", "")
|
||||
|
||||
conf.GetConfig().PeerIDS = o.GetStringDefault("PEER_IDS", "")
|
||||
|
||||
conf.GetConfig().NodeMode = o.GetStringDefault("NODE_MODE", "node")
|
||||
|
||||
conf.GetConfig().MinIndexer = o.GetIntDefault("MIN_INDEXER", 1)
|
||||
conf.GetConfig().MaxIndexer = o.GetIntDefault("MAX_INDEXER", 5)
|
||||
|
||||
ctx, stop := signal.NotifyContext(
|
||||
context.Background(),
|
||||
os.Interrupt,
|
||||
@@ -47,7 +51,7 @@ func main() {
|
||||
if n, err := node.InitNode(isNode, isIndexer, isNativeIndexer); err != nil {
|
||||
panic(err)
|
||||
} else {
|
||||
<-ctx.Done() // 👈 the only blocking point
|
||||
<-ctx.Done() // the only blocking point
|
||||
log.Println("shutting down")
|
||||
n.Close()
|
||||
}
|
||||
|
||||
3
pem/private10.pem
Normal file
3
pem/private10.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIPc7D3Mgb1U2Ipyb/85hA4Ew7dC8zHDEuQYSjqzzRgLK
|
||||
-----END PRIVATE KEY-----
|
||||
3
pem/private5.pem
Normal file
3
pem/private5.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIK2oBaOtGNchE09MBRtPd5oEOUcVUQG2ndym5wKExj7R
|
||||
-----END PRIVATE KEY-----
|
||||
3
pem/private6.pem
Normal file
3
pem/private6.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIE58GDazCyF1jp796ivSmHiCepbkC8TpzliIaQ7eGEpu
|
||||
-----END PRIVATE KEY-----
|
||||
3
pem/private7.pem
Normal file
3
pem/private7.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIAeX4O7ldwehRSnPkbzuE6csyo63vjvqAcNNujENOKUC
|
||||
-----END PRIVATE KEY-----
|
||||
3
pem/private8.pem
Normal file
3
pem/private8.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIEkgqINXDLnxIJZs2LEK9O4vdsqk43dwbULGUE25AWuR
|
||||
-----END PRIVATE KEY-----
|
||||
3
pem/private9.pem
Normal file
3
pem/private9.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIBcflxGlZYyUVJoExC94rHZbIyKMwZ+Oh7EDkb0qUlxd
|
||||
-----END PRIVATE KEY-----
|
||||
3
pem/public10.pem
Normal file
3
pem/public10.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAEomuEQGmGsYVw35C6DB5tfY8LI8jm359ceAxRX8eQ0o=
|
||||
-----END PUBLIC KEY-----
|
||||
3
pem/public5.pem
Normal file
3
pem/public5.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAZ2nLJBL8a5opfa8nFeVj0SZToW8pl4+zgcSUkeZFRO4=
|
||||
-----END PUBLIC KEY-----
|
||||
3
pem/public6.pem
Normal file
3
pem/public6.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAIQVeSGwsjPjyepPTnzzYqVxIxviSEjZXU7C7zuNTui4=
|
||||
-----END PUBLIC KEY-----
|
||||
3
pem/public7.pem
Normal file
3
pem/public7.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAG95Ettl3jTi41HM8le1A9WDmOEq0ANEqpLF7zTZrfXA=
|
||||
-----END PUBLIC KEY-----
|
||||
3
pem/public8.pem
Normal file
3
pem/public8.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEA/ymOIb0sJ0qCWrf3mKz7ACCvsMXLog/EK533JfNXZTM=
|
||||
-----END PUBLIC KEY-----
|
||||
3
pem/public9.pem
Normal file
3
pem/public9.pem
Normal file
@@ -0,0 +1,3 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAZ4F3KqOp/5QrPdZGqqX6PYYEGd2snX4Q3AUt9XAG3v8=
|
||||
-----END PUBLIC KEY-----
|
||||
Reference in New Issue
Block a user