Cribble Whitepaper

The Filipino AI search engine: fast retrieval, grounded reasoning, and on-chain utility via $CRIB.

v1.0·Last updated: Sept 2025·Join community

Vision

Build the most trusted AI search engine for the Philippines. Cribble understands English, Tagalog, Bisaya, Ilocano, and more. It answers with citations, local awareness, and real-time info where it matters.

Problem

Global search works well for English at scale, but it often misses local nuance: regional languages, context, and sources. Filipino users bounce between tools to translate, fact-check, and verify. Time is lost and errors creep in.

  • Local sources are under-indexed or low-ranked.
  • Answers lack citations and traceability.
  • Regional language support is inconsistent.
  • AI results may hallucinate or omit local context.

Solution

Cribble combines a focused web index with retrieval-augmented generation (RAG), multilingual embeddings, and a verification step we call Cross-Check. Together, these keep responses grounded, fast, and locally relevant.

QueryMultilingualEmbeddingRetriever+ RankerGenerator+ Cross-Check
Query → Embedding → Retrieval & ranking → Generator with Cross-Check.

Cross-Check runs targeted searches against multiple sources before finalizing an answer. The model must reconcile conflicts and provide citations. If uncertainty is high, Cribble shows “what we know” and prompts the user to refine.

Architecture

  • Indexer. Crawls prioritized Filipino sources: news, agencies, education, civic data.
  • Vector store. Chunked passages with language tags, recency, domain trust, and geo hints.
  • Retriever. Hybrid lexical + dense retrieval; reranking favors relevance and authority.
  • Generation. Constrained decoding with citation slots, translation when needed.
  • Cross-Check. Secondary lookups for sensitive claims; fallbacks when evidence conflicts.
  • Feedback. Votes and flags feed into a lightweight reinforcement loop.

Data Pipeline

  1. Ingest. Fetch HTML, RSS, PDFs; normalize and strip boilerplate.
  2. Chunk. Split by semantic boundaries, add language and time features.
  3. Embed. Multilingual sentence embeddings stored alongside BM25 terms.
  4. Validate. Heuristics for deduplication, spam, mirrored posts.
  5. Schedule. Freshness for fast-moving topics; slower for stable docs.

Retrieval & Ranking

We use a hybrid stack. A narrow lexical filter reduces the candidate set. A dense retriever surfaces semantically related passages across languages. A reranker scores context windows by topicality, authority, and geographic relevance. Freshness acts as a tie-breaker for news.

Local preference: if two documents tie on rank, the system prefers Philippine domains, official sources, and reputable local media.

Reasoning

  • Citations first. The generator plans citation slots before prose.
  • Language fit. Output matches the user’s chosen language while retaining named entities.
  • Ambiguity. If evidence conflicts, the answer marks uncertainty and lists sources.
  • Refine loop. The UI exposes follow-ups: “narrow to last 30 days” or “show gov sources only.”

Safety

  • Explicit refusal patterns for harmful queries.
  • Source-level risk scoring and domain allowlists for sensitive topics.
  • Rate limits and abuse throttles for automation attempts.
  • Transparent warnings on uncertain or evolving information.

Token Utility ($CRIB)

$CRIB is a utility token on Solana used for premium features, developer API quotas, boosts for community curation, and access to advanced tools. It is not an investment.

T0T+Supply
Emissions schedule smooths early usage while targeting long-term stability.

Premium

Pro queries, larger context windows, faster refresh, and early feature access.

Developers

API credits for search, embeddings, and retrieval; fair-use rate limits apply.

Curation

Community boosts for high-quality sources and language coverage.

Fees

Swaps and fees routed through Solana programs where applicable.

Governance

Governance begins off-chain with transparent proposals and snapshots, then graduates to on-chain voting. Priorities include new language models, data partnerships, and safety policy updates.

Privacy

  • No selling of personal data. Search logs are minimized and aggregated.
  • Data retention. Short default retention with opt-out controls.
  • Encryption. TLS in transit and encrypted storage for sensitive metadata.

Roadmap

Q4 2025

Expanded local index, improved Tagalog/Bisaya ranking, Cross-Check v2.

Q1 2026

Developer API beta, citations UI upgrades, news freshness boosts.

Q2 2026

More regional languages, publisher program, governance pilot.

Glossary

RAG
Retrieval-augmented generation; models answer with retrieved context.
Embedding
Vector representation of text enabling semantic search across languages.
Cross-Check
Secondary verification step that re-queries and reconciles sources.

Disclaimer

This document is for information only. $CRIB is a utility token used inside the platform. Nothing here is financial advice. Availability of features, schedules, and specifications may change.