Theological Embedding Space

14,729 pieces of Christian content mapped into a shared semantic space. UMAP projection reveals natural theological clusters.

What You're Seeing

A map of meaning, not keywords

Every sermon, podcast, song, and scripture passage in our catalog has been converted into a mathematical representation of its meaning — called an embedding — using the BAAI/bge-small-en-v1.5 sentence transformer. Think of it as translating each piece of content into a point in 384-dimensional space, where content about similar theology ends up near each other.

The interactive map below projects all 14,729 items from 384 dimensions down to two dimensions using UMAP (Uniform Manifold Approximation and Projection), a technique that preserves local neighborhood structure. Content with similar theological meaning appears close together; content with different meaning appears far apart.

This is the foundation of how Rejoice searches: a sermon about Psalm 23 naturally clusters near a worship song about God as shepherd, even though they use completely different words.

Sermons Music Podcasts Scripture
Key Findings
  • 6 distinct clusters emerged from unsupervised clustering — the model discovered theological groupings from content metadata alone, without any manual labeling
  • Content types intermingle within clusters: sermons and podcasts on the same theological topic appear side-by-side, validating cross-format retrieval
  • Music forms its own dedicated cluster (100% music), while the other 5 clusters are primarily spoken-word, reflecting a fundamental distinction between devotional and expository content
  • The clusters map to recognizable theological traditions: Reformed/evangelical, charismatic/prophetic, liturgical/Orthodox, scholarly/historical, and practical Christian living
Click any point for details · Search to highlight · Scroll/pinch to zoom · Click legend to toggle
14,729
Items Mapped
384
Dimensions
6
Clusters
4
Content Types

Cluster Analysis

Switch to the "By Cluster" view above to see these groupings. Each cluster was identified through unsupervised clustering on the 384-dimensional embeddings, then manually examined to understand what theological themes the model discovered.

Cluster 0
Worship & Christian Music
1,526 items · 100% music
The only single-format cluster: every item is a music album or recording. Spans genres from contemporary Christian and praise & worship to hip hop, rock, metal, and Southern gospel. The model learned to distinguish devotional music from spoken-word theology entirely from metadata — titles, artist names, and genre tags.
Contemporary Christian Gospel Hip Hop Praise & Worship Rock
Cluster 1
Practical Christian Living
2,414 items · 98.8% podcasts, 1.2% sermons
Applied Christianity focused on relationships, family, and personal growth. Dominated by marriage counseling, biblical parenting, wellness coaching, and mental health content. Features psychologists, life coaches, and counselors applying faith to everyday challenges. Represents the "how to live it out" side of Christianity.
Marriage Parenting Counseling Mental Health Relationships
Cluster 2
Evangelical & Reformed Teaching
4,905 items · 97.4% podcasts, 2.6% sermons · Largest cluster
The largest cluster centers on contemporary evangelical and Reformed theology. Features systematic teaching on Christology, Christian identity, community, and spiritual transformation from major churches (Elevation, Saddleback, Gateway, Passion City). Includes Reformed teaching networks and youth ministry content. Represents mainstream evangelical discourse.
Christology Discipleship Identity Community Reformed
Cluster 3
Historical & Scholarly Theology
644 items · 100% podcasts · Smallest cluster
Academic and historical theology — church history, papal encyclicals (Fides et Ratio), manuscript studies, and classical theological scholarship. Draws from Catholic magisterial teaching, church fathers, LibriVox audiobook recordings, and academic biblical studies. Represents the scholarly study of Christianity across traditions.
Church History Patristics Manuscript Studies Catholic Teaching Academic
Cluster 4
Charismatic, Prophetic & Scripture
4,388 items · 69.4% podcasts, 18.9% sermons, 11.7% scripture
The most mixed-format cluster, containing all 513 scripture passages alongside charismatic sermons and prophetic podcasts. Features spiritual warfare, divine guidance, and transformational spirituality from revival-oriented churches alongside raw Bible text. The co-location of scripture with charismatic teaching suggests the model detects shared theological language about God's direct action.
Spiritual Warfare Prophecy Scripture Revival Eschatology
Cluster 5
Liturgical & Global Christianity
852 items · 98.8% podcasts, 1.2% sermons
Eastern Orthodox theology, Catholic liturgy, and multilingual global Christianity (Chinese, Spanish, Filipino churches). Dominated by Orthodox Christian teaching, sacramental theology, and patristic commentary. Represents non-Western evangelical traditions and liturgical forms of worship — the most ecumenically diverse cluster.
Orthodox Catholic Liturgical Multilingual Sacramental

Who ranges, who stays put?

Using the cluster assignments above, we measure each creator's theological footprint — the number of clusters their content touches.

What You're Seeing

Cluster span as a theological-diversity signal

For every creator (artist or speaker) with five or more items in the catalog, we look up which of the six clusters each item belongs to and count how many distinct clusters that creator's output spans. A single-cluster creator has a focused theological lane — a three- or four-cluster creator ranges across multiple traditions.

Because our clustering was unsupervised, this number isn't a judgement on breadth or depth — it's a shape. Some creators (Reformed teaching networks, liturgical podcasts, worship-only artists) stay firmly in one lane because that's their calling. Others (large multi-site churches, academic programs, denominations with varied output) span several because they're genuinely covering more ground.

Key Findings
  • 977 creators have 5+ items in the catalog. Of those, 442 (45%) stay in one cluster, 421 (43%) span two, 109 (11%) span three, and only 5 creators touch four of the six clusters
  • The most diverse creators in the catalog — Discerning Hearts Catholic Podcasts, Center for the Study of New Testament Manuscripts, Pastor Lance Ralston — span four clusters, reflecting wide-ranging editorial portfolios that cover Scripture, history, and practical teaching in one feed
  • The most focused creators by volume — Tze-John Liu (308 items, all Liturgical & Global), Orthodox Christian Teaching (99% Liturgical), Thomas Babington Macaulay (66 items, all Historical & Scholarly) — illustrate how tradition-bound voices cluster cleanly
  • Large multi-site evangelical churches like Elevation, Gateway, Watermark, and Steven Furtick all span exactly two clusters (typically Evangelical & Reformed + Charismatic & Scripture), revealing a consistent bi-modal pattern in the modern megachurch content portfolio

What the Clusters Reveal

The clustering reveals a meaningful theological spectrum that the model discovered without being told what any of these traditions are:

The content-type boundary is real but not absolute. Music forms a distinct cluster because its metadata (album titles, genre tags) differs fundamentally from spoken-word content. But within spoken-word content, the model groups by theological tradition rather than format — a Reformed sermon clusters with Reformed podcasts, not with charismatic sermons.

The largest clusters (2 & 4) represent the two dominant poles of American Protestantism: mainline evangelical teaching and charismatic/prophetic ministry. Together they account for 63% of all content.

The smallest clusters (3 & 5) — historical/scholarly and liturgical/Orthodox — are underrepresented in the catalog but clearly distinct in embedding space, suggesting the model could support these traditions well with more data.

Implications for Cross-Format Discovery

These findings validate Rejoice's approach: because content clusters by theological meaning rather than format, a user listening to a Reformed podcast can be recommended a Reformed sermon on the same topic. The embedding space "knows" they belong together even though they're in different formats. This wouldn't work if content clustered purely by format (all sermons together, all podcasts together).

Methodology & Limitations

Model: BAAI/bge-small-en-v1.5, a general-purpose sentence transformer (384 dimensions). This model was not fine-tuned on theological text, meaning the clusters emerge from general semantic understanding rather than domain-specific training. A theology-adapted model could yield different, potentially more nuanced groupings.

Clustering: Clusters were identified through unsupervised methods on the raw embeddings. No silhouette scores or formal cluster validation metrics are reported — the cluster labels were assigned through manual examination of member content. Future work should include quantitative cluster quality metrics.

Projection: UMAP reduces 384 dimensions to 2 for visualization. This necessarily distorts distances — points that appear nearby in 2D may not be nearest neighbors in the full space. The visualization is best understood as a qualitative overview, not a precise distance map.

Data bias: The catalog is heavily weighted toward English-language evangelical content (podcasts dominate at 79%). Underrepresented traditions (Orthodox, Catholic, non-English) may cluster differently with more balanced data.