Vision
Build the most trusted AI search engine for the Philippines. Cribble understands English, Tagalog, Bisaya, Ilocano, and more. It answers with citations, local awareness, and real-time info where it matters.
Problem
Global search works well for English at scale, but it often misses local nuance: regional languages, context, and sources. Filipino users bounce between tools to translate, fact-check, and verify. Time is lost and errors creep in.
- Local sources are under-indexed or low-ranked.
- Answers lack citations and traceability.
- Regional language support is inconsistent.
- AI results may hallucinate or omit local context.
Solution
Cribble combines a focused web index with retrieval-augmented generation (RAG), multilingual embeddings, and a verification step we call Cross-Check. Together, these keep responses grounded, fast, and locally relevant.
Cross-Check runs targeted searches against multiple sources before finalizing an answer. The model must reconcile conflicts and provide citations. If uncertainty is high, Cribble shows “what we know” and prompts the user to refine.
Architecture
- Indexer. Crawls prioritized Filipino sources: news, agencies, education, civic data.
- Vector store. Chunked passages with language tags, recency, domain trust, and geo hints.
- Retriever. Hybrid lexical + dense retrieval; reranking favors relevance and authority.
- Generation. Constrained decoding with citation slots, translation when needed.
- Cross-Check. Secondary lookups for sensitive claims; fallbacks when evidence conflicts.
- Feedback. Votes and flags feed into a lightweight reinforcement loop.
Data Pipeline
- Ingest. Fetch HTML, RSS, PDFs; normalize and strip boilerplate.
- Chunk. Split by semantic boundaries, add language and time features.
- Embed. Multilingual sentence embeddings stored alongside BM25 terms.
- Validate. Heuristics for deduplication, spam, mirrored posts.
- Schedule. Freshness for fast-moving topics; slower for stable docs.
Retrieval & Ranking
We use a hybrid stack. A narrow lexical filter reduces the candidate set. A dense retriever surfaces semantically related passages across languages. A reranker scores context windows by topicality, authority, and geographic relevance. Freshness acts as a tie-breaker for news.
Local preference: if two documents tie on rank, the system prefers Philippine domains, official sources, and reputable local media.
Reasoning
- Citations first. The generator plans citation slots before prose.
- Language fit. Output matches the user’s chosen language while retaining named entities.
- Ambiguity. If evidence conflicts, the answer marks uncertainty and lists sources.
- Refine loop. The UI exposes follow-ups: “narrow to last 30 days” or “show gov sources only.”
Safety
- Explicit refusal patterns for harmful queries.
- Source-level risk scoring and domain allowlists for sensitive topics.
- Rate limits and abuse throttles for automation attempts.
- Transparent warnings on uncertain or evolving information.
Token Utility ($CRIB)
$CRIB is a utility token on Solana used for premium features, developer API quotas, boosts for community curation, and access to advanced tools. It is not an investment.
Premium
Pro queries, larger context windows, faster refresh, and early feature access.
Developers
API credits for search, embeddings, and retrieval; fair-use rate limits apply.
Curation
Community boosts for high-quality sources and language coverage.
Fees
Swaps and fees routed through Solana programs where applicable.
Governance
Governance begins off-chain with transparent proposals and snapshots, then graduates to on-chain voting. Priorities include new language models, data partnerships, and safety policy updates.
Privacy
- No selling of personal data. Search logs are minimized and aggregated.
- Data retention. Short default retention with opt-out controls.
- Encryption. TLS in transit and encrypted storage for sensitive metadata.
Roadmap
Q4 2025
Expanded local index, improved Tagalog/Bisaya ranking, Cross-Check v2.
Q1 2026
Developer API beta, citations UI upgrades, news freshness boosts.
Q2 2026
More regional languages, publisher program, governance pilot.
Glossary
- RAG
- Retrieval-augmented generation; models answer with retrieved context.
- Embedding
- Vector representation of text enabling semantic search across languages.
- Cross-Check
- Secondary verification step that re-queries and reconciles sources.
Disclaimer
This document is for information only. $CRIB is a utility token used inside the platform. Nothing here is financial advice. Availability of features, schedules, and specifications may change.