The Technology Behind The AI Club
The AI Club combines intelligent model routing with a novel crowd memory system. Every message you send is analysed and routed to the best specialist model — while your conversations quietly build a collective intelligence that benefits the entire community.
The Routing Pipeline
Queries pass through a three-tier routing system. Simple queries are handled instantly by keyword matching. Complex queries check a fuzzy cache first, and only fall through to AI classification when no match is found.
flowchart TD
A["User sends a message"] --> B{"Short query or\nstrong keyword signal?"}
B -->|"Yes"| C["Keyword Router\n(instant)"]
B -->|"No"| D["Compute SimHash\nfingerprint"]
D --> E{"Cache hit on\nany band key?"}
E -->|"Yes"| F["Return cached route\n(~5ms)"]
E -->|"No"| G["Grok 4.1 Fast\nAI Classification (~500ms)"]
G --> H["Write to 4 band keys\n(30-day TTL)"]
H --> I["Return route"]
C --> J["Specialist Model"]
F --> J
I --> J
J --> K["Stream response to user"]
style A fill:#1a1a2e,stroke:#00a3ff,color:#fff
style J fill:#1a1a2e,stroke:#22c55e,color:#fff
style K fill:#1a1a2e,stroke:#22c55e,color:#fff
SimHash Fuzzy Caching
Instead of exact-match caching, we use SimHash — a locality-sensitive hashing algorithm. Similar queries produce similar fingerprints, so "best ways to market a tattoo shop" and "how to market a tattoo shop" share cache entries, avoiding redundant AI classification calls.
flowchart LR
subgraph Query_A ["Query A: 'best ways to market a tattoo shop'"]
A1["Remove stop words"] --> A2["[market, tattoo,\nshop, cheshire]"]
A2 --> A3["+ bigrams:\nmarket_tattoo,\ntattoo_shop, ..."]
A3 --> A4["SimHash\nA3F1 8B2C 47D0 E519"]
end
subgraph Query_B ["Query B: 'how to market a tattoo shop'"]
B1["Remove stop words"] --> B2["[market, tattoo, shop,\ncheshire]"]
B2 --> B3["+ bigrams:\nmarket_tattoo,\ntattoo_shop, ..."]
B3 --> B4["SimHash\nA3F1 8B2C 45D0 E519"]
end
A4 --> M{"Compare\n4 bands"}
B4 --> M
M --> R["3 of 4 bands match\n= Cache HIT"]
style R fill:#1a1a2e,stroke:#22c55e,color:#fff
The 64-bit fingerprint is split into 4 bands of 16 bits each. Two queries sharing any single band are considered a match. This tolerates up to ~16 bits of difference while maintaining high precision.
Specialist Models
Each query category maps to a specialist model chosen for that task type, sourced from multiple providers via OpenRouter. If the primary model is unavailable, the system falls back to an alternative.
flowchart TD
R["Router\nGrok 4.1 Fast"] --> CODER["Code Specialist\nMiniMax M2.7"]
R --> REASONER["Deep Thinker\nGrok 4.1 Fast"]
R --> DEEP_RES["Deep Researcher\nGrok 4.20 Multi-Agent"]
R --> CREATIVE["Creative Writer\nQwen 3.5 Flash"]
R --> ANALYST["Research Analyst\nQwen 3.5 Flash"]
R --> DESIGNER["Visual Designer\nGemini 3.1 Flash"]
R --> TEACHER["Tutor\nQwen 3.5 Flash"]
R --> QUICK["Quick Responder\nGPT OSS 120B"]
R --> POLY["Language Expert\nQwen 3.5 Flash"]
R --> SUMM["Summarizer\nGPT OSS 120B"]
R --> CHAT["Conversationalist\nGPT OSS 120B"]
style R fill:#1a1a2e,stroke:#00a3ff,color:#fff
All specialist models are accessed via OpenRouter with GPT OSS 120B as the universal fallback. The Code Specialist (MiniMax M2.7) now works directly without requiring boost mode. Image generation uses Flux 2 Klein 4B for text-to-image, Flux 2 Pro for transformations, and Flux 2 Flex for multi-image blending. The Visual Designer uses Gemini's native image generation capabilities.
Tools & Capabilities
Specialist models don't just generate text — they have access to a suite of tools that extend their capabilities. The router selects a specialist, and that specialist can invoke tools as needed to fulfil the request.
flowchart LR
A["User request"] --> B["Router"]
B --> C["Specialist\nwith Tools"]
C --> D["Tool calls"]
D --> E["Result"]
style A fill:#1a1a2e,stroke:#00a3ff,color:#fff
style C fill:#1a1a2e,stroke:#a78bfa,color:#fff
style E fill:#1a1a2e,stroke:#22c55e,color:#fff
Available tools include:
- •Web search & deep search — real-time information retrieval and multi-source research
- •Image generation — Flux Klein for text-to-image generation, Flux Pro for image transformations, Flux Max for diagrams and 3D isometric illustrations
- •Website builder — the
build_websitetool triggers MiniMax to generate complete HTML/CSS/JS websites - •Document creation — PDF and DOCX generation via a two-stage pipeline (content design then HTML coding)
- •Code execution —
run_javascriptsandbox for running JavaScript code and returning results - •Diagram generation — 3D isometric diagrams rendered via Flux Max from text descriptions
Document Generation Pipeline
When you ask Zeno to create a document — a resume, report, proposal, or any downloadable file — it triggers a two-stage pipeline. The calling model provides a content brief, then two specialist models handle the heavy lifting.
flowchart TD
A["User: 'Create a resume for...'"] --> B["Router classifies as\ndocument creation"]
B --> C["Calling model sends\ncontent brief to tool"]
C --> D["Stage 1: Designer\nGemini 3.1 Flash"]
D --> E["Full document content\nheadings, copy, structure"]
E --> F["Stage 2: Coder\nMiniMax M2.7"]
F --> G["Clean semantic HTML\nwith professional styling"]
G --> H{"Format?"}
H -->|"PDF"| I["HTML viewer with\nprint-to-PDF"]
H -->|"DOCX"| J["Generated Word\ndocument"]
I --> K["Download link\nshown to user"]
J --> K
style A fill:#1a1a2e,stroke:#00a3ff,color:#fff
style D fill:#1a1a2e,stroke:#a78bfa,color:#fff
style F fill:#1a1a2e,stroke:#22c55e,color:#fff
style K fill:#1a1a2e,stroke:#22c55e,color:#fff
The two-stage approach separates content design from HTML coding. The designer focuses on writing compelling, well-structured copy while the coder produces clean, semantic markup — each model doing what it's best at.
Website Builder
When Zeno detects website-related keywords in your message, it triggers the build_website tool. MiniMax M2.7 generates a complete, self-contained HTML/CSS/JS file based on your request and your stored memories for personalisation.
Generated files are stored in D1 and served via a preview endpoint, giving you an instant live preview. The system is conversation-aware — within the same conversation, Zeno iterates on the same project, refining the design based on your feedback rather than starting from scratch each time.
flowchart TD
A["User: 'Build me a portfolio site'"] --> B{"Keyword match?\n(website, landing page, etc.)"}
B -->|"Yes"| C["Load user memories\nfor personalisation"]
C --> D["MiniMax M2.7 generates\ncomplete HTML/CSS/JS"]
D --> E["Store in D1\n(build project)"]
E --> F["Serve via\npreview endpoint"]
F --> G["User sees live preview\nand can iterate"]
G -->|"Feedback"| C
style A fill:#1a1a2e,stroke:#00a3ff,color:#fff
style D fill:#1a1a2e,stroke:#a78bfa,color:#fff
style G fill:#1a1a2e,stroke:#22c55e,color:#fff
The website builder produces single-file HTML with embedded CSS and JavaScript, requiring no external dependencies. User memories (business name, brand preferences, expertise) are woven into the generated content for a personalised result.
Crowd Memory
Every conversation generates structured memories — preferences, expertise, facts, and personality traits — extracted asynchronously by AI. These memories are stored with weighted confidence scores that account for relationship context and conversation depth, creating a rich understanding of each member over time. Memories are only reinforced if the relevance score is 0.5 or above, preventing low-quality extractions from polluting the graph.
Stale connections fade naturally — edges between memories decay over 30 days if not reinforced by new conversations, keeping the graph fresh and relevant.
The innovative part is what happens next: the system uses these memories to find knowledge gaps and surface recommendations from the wider community. Suggestion generation uses Grok 4.20 multi-agent with a first-principles thinking approach to produce genuinely useful, personalised recommendations.
flowchart TD
A["User chats with Zeno"] --> B["AI extracts memories\n(preferences, facts, expertise)"]
B --> C["Assign weighted score\nconfidence × tier × depth"]
C --> D["Generate 768-dim\nembedding vector"]
D --> E["Store in memory graph"]
E --> F{"Cross-user\ngap analysis"}
F --> G["Compare embeddings\nvia cosine similarity"]
G --> H{"Similarity < 0.5?\n(knowledge gap)"}
H -->|"Yes"| I["Cluster gaps\nby category"]
H -->|"No"| J["Already known\n(skip)"]
I --> K{"1+ members\nexploring topic?"}
K -->|"Yes"| L["Surface as\nrecommendation"]
K -->|"No"| M["Too niche\n(skip)"]
L --> N["Show in Daily Summary\n& Crowd Memory"]
style A fill:#1a1a2e,stroke:#00a3ff,color:#fff
style L fill:#1a1a2e,stroke:#a78bfa,color:#fff
style N fill:#1a1a2e,stroke:#22c55e,color:#fff
Weighted Memory Scoring
Not all memories are equal. The system weights each memory using three factors that determine how prominently it shapes Zeno's responses and the community recommendations.
flowchart LR
subgraph Factors ["Scoring Factors"]
C["Confidence\n(0–1)\nHow explicit was it?"]
T["Relationship Tier\nself: 1.0 · family: 0.9\nfriend: 0.7 · colleague: 0.5\nother: 0.3"]
D["Conversation Depth\nmin(1.0, messages / 10)\nDeeper = higher trust"]
end
C --> S["Weighted Score\n= confidence\n× tier\n× depth"]
T --> S
D --> S
S --> R["Top 75 memories\nloaded per query"]
style S fill:#1a1a2e,stroke:#a78bfa,color:#fff
style R fill:#1a1a2e,stroke:#22c55e,color:#fff
Relationship tiers allow the system to distinguish between facts about the user themselves (highest weight) versus people they've mentioned in passing (lowest weight). Conversation depth prevents single off-hand comments from becoming high-confidence memories — trust builds over multiple exchanges.
Cross-User Recommendations
The recommendation engine identifies topics that multiple community members are exploring but that a given user hasn't encountered yet. It works by comparing embedding vectors across the entire membership, finding semantic gaps, and clustering them into actionable recommendations.
Each user's memory embeddings (768-dimensional vectors from BGE-base-en-v1.5) are compared against a sample of other members' memories using cosine similarity. Topics scoring below 0.5 similarity represent genuine knowledge gaps — areas the community is active in but this user hasn't explored.
Gaps are clustered by category and ranked by a combined score of community size (how many distinct members are exploring it) and novelty (how different it is from the user's existing knowledge). Clusters with one or more contributing members can surface as recommendations. Up to twelve recommendations are generated using Grok 4.20 multi-agent and cached for 8 hours, refreshing as new conversations add to the collective memory.
Recommendations are privacy-preserving by design. The system identifies trending topics across the community without exposing individual conversations. Members see what topics are popular, not who said what.