The Technology Behind The AI Club
The AI Club combines intelligent model routing with a novel crowd memory system. Every message you send is analysed and routed to the best specialist model — while your conversations quietly build a collective intelligence that benefits the entire community.
The Routing Pipeline
Queries pass through a three-tier routing system. Simple queries are handled instantly by keyword matching. Complex queries check a fuzzy cache first, and only fall through to AI classification when no match is found.
flowchart TD
A["User sends a message"] --> B{"Short query or\nstrong keyword signal?"}
B -->|"Yes"| C["Keyword Router\n(instant)"]
B -->|"No"| D["Compute SimHash\nfingerprint"]
D --> E{"Cache hit on\nany band key?"}
E -->|"Yes"| F["Return cached route\n(~5ms)"]
E -->|"No"| G["Grok AI Classification\n(~500ms)"]
G --> H["Write to 4 band keys\n(30-day TTL)"]
H --> I["Return route"]
C --> J["Specialist Model"]
F --> J
I --> J
J --> K["Stream response to user"]
style A fill:#1a1a2e,stroke:#00a3ff,color:#fff
style J fill:#1a1a2e,stroke:#22c55e,color:#fff
style K fill:#1a1a2e,stroke:#22c55e,color:#fff
SimHash Fuzzy Caching
Instead of exact-match caching, we use SimHash — a locality-sensitive hashing algorithm. Similar queries produce similar fingerprints, so "best ways to market a tattoo shop" and "how to market a tattoo shop" share cache entries, avoiding redundant AI classification calls.
flowchart LR
subgraph Query_A ["Query A: 'best ways to market a tattoo shop'"]
A1["Remove stop words"] --> A2["[market, tattoo,\nshop, cheshire]"]
A2 --> A3["+ bigrams:\nmarket_tattoo,\ntattoo_shop, ..."]
A3 --> A4["SimHash\nA3F1 8B2C 47D0 E519"]
end
subgraph Query_B ["Query B: 'how to market a tattoo shop'"]
B1["Remove stop words"] --> B2["[market, tattoo, shop,\ncheshire]"]
B2 --> B3["+ bigrams:\nmarket_tattoo,\ntattoo_shop, ..."]
B3 --> B4["SimHash\nA3F1 8B2C 45D0 E519"]
end
A4 --> M{"Compare\n4 bands"}
B4 --> M
M --> R["3 of 4 bands match\n= Cache HIT"]
style R fill:#1a1a2e,stroke:#22c55e,color:#fff
The 64-bit fingerprint is split into 4 bands of 16 bits each. Two queries sharing any single band are considered a match. This tolerates up to ~16 bits of difference while maintaining high precision.
Specialist Models
Each query category maps to a specialist model chosen for that task type. If the primary model is unavailable, the system falls back to an alternative.
flowchart TD
R["Router"] --> CODER["Code Specialist\ngrok-code-fast-1"]
R --> REASONER["Deep Thinker\ngrok-4-1-fast-reasoning"]
R --> CREATIVE["Creative Writer\ngrok-4-1-fast"]
R --> ANALYST["Research Analyst\ngrok-4-1-fast-reasoning"]
R --> IMAGE["Image Creator\nFlux 2 Klein 9B"]
R --> TEACHER["Tutor\ngrok-4-1-fast"]
R --> QUICK["Quick Responder\nGPT-OSS 120B"]
R --> POLY["Language Expert\ngrok-4-1-fast"]
R --> SUMM["Summarizer\ngrok-4-1-fast"]
R --> CHAT["Conversationalist\nGPT-OSS 120B"]
style R fill:#1a1a2e,stroke:#00a3ff,color:#fff
All specialist models have a Workers AI fallback (GPT-OSS 120B) that kicks in automatically if the primary provider is unreachable. Image generation uses Flux 2 Klein 9B running on Cloudflare's edge network.
Crowd Memory
Every conversation generates structured memories — preferences, expertise, facts, and personality traits — extracted asynchronously by AI. These memories are stored with weighted confidence scores that account for relationship context and conversation depth, creating a rich understanding of each member over time.
The innovative part is what happens next: the system uses these memories to find knowledge gaps and surface recommendations from the wider community.
flowchart TD
A["User chats with Zeno"] --> B["AI extracts memories\n(preferences, facts, expertise)"]
B --> C["Assign weighted score\nconfidence × tier × depth"]
C --> D["Generate 768-dim\nembedding vector"]
D --> E["Store in memory graph"]
E --> F{"Cross-user\ngap analysis"}
F --> G["Compare embeddings\nvia cosine similarity"]
G --> H{"Similarity < 0.5?\n(knowledge gap)"}
H -->|"Yes"| I["Cluster gaps\nby category"]
H -->|"No"| J["Already known\n(skip)"]
I --> K{"2+ members\nexploring topic?"}
K -->|"Yes"| L["Surface as\nrecommendation"]
K -->|"No"| M["Too niche\n(skip)"]
L --> N["Show in Daily Summary\n& Crowd Memory"]
style A fill:#1a1a2e,stroke:#00a3ff,color:#fff
style L fill:#1a1a2e,stroke:#a78bfa,color:#fff
style N fill:#1a1a2e,stroke:#22c55e,color:#fff
Weighted Memory Scoring
Not all memories are equal. The system weights each memory using three factors that determine how prominently it shapes Zeno's responses and the community recommendations.
flowchart LR
subgraph Factors ["Scoring Factors"]
C["Confidence\n(0–1)\nHow explicit was it?"]
T["Relationship Tier\nself: 1.0 · family: 0.9\nfriend: 0.7 · colleague: 0.5\nother: 0.3"]
D["Conversation Depth\nmin(1.0, messages / 10)\nDeeper = higher trust"]
end
C --> S["Weighted Score\n= confidence\n× tier\n× depth"]
T --> S
D --> S
S --> R["Top 75 memories\nloaded per query"]
style S fill:#1a1a2e,stroke:#a78bfa,color:#fff
style R fill:#1a1a2e,stroke:#22c55e,color:#fff
Relationship tiers allow the system to distinguish between facts about the user themselves (highest weight) versus people they've mentioned in passing (lowest weight). Conversation depth prevents single off-hand comments from becoming high-confidence memories — trust builds over multiple exchanges.
Cross-User Recommendations
The recommendation engine identifies topics that multiple community members are exploring but that a given user hasn't encountered yet. It works by comparing embedding vectors across the entire membership, finding semantic gaps, and clustering them into actionable recommendations.
Each user's memory embeddings (768-dimensional vectors from BGE-base-en-v1.5) are compared against a sample of other members' memories using cosine similarity. Topics scoring below 0.5 similarity represent genuine knowledge gaps — areas the community is active in but this user hasn't explored.
Gaps are clustered by category and ranked by a combined score of community size (how many distinct members are exploring it) and novelty (how different it is from the user's existing knowledge). Only clusters with two or more contributing members make the cut, preventing niche single-user topics from surfacing. The top four recommendations are generated and cached for 24 hours, refreshing as new conversations add to the collective memory.
Recommendations are privacy-preserving by design. The system identifies trending topics across the community without exposing individual conversations. Members see what topics are popular, not who said what.