Cross-Modal Retrieval Study
Can a sermon about forgiveness surface a relevant worship song? Testing semantic retrieval across content formats.
Research Question
Christian audio content exists in silos: sermons on YouTube, worship music on streaming platforms, podcasts in separate apps, and scripture in Bible apps. Can a single semantic embedding space bridge these formats, enabling cross-modal discovery?
Methodology
We encoded 14,729 items (993 sermons, 11,697 podcasts, 1,526 music albums, and 513 scripture passages) using BAAI/bge-small-en-v1.5 into a shared 384-dimensional embedding space. For each content type pair, we sampled 50 query items and measured cosine similarity against all items of the target type. Ground truth relevance was established via shared scripture references and thematic overlap.
Key Results
The embedding space successfully bridges content formats without any explicit cross-modal training:
Cross-Modal Heatmap
The heatmap below shows average cosine similarity between content type pairs. Higher values indicate stronger semantic bridging.
Example Showcase
The visualization below shows specific examples of successful cross-modal retrievals, where queries from one content type found semantically relevant results in a completely different format.
Implications for Rejoice
These results validate the core technical hypothesis behind Rejoice: a single embedding model can understand theological content well enough to recommend across format boundaries. A user listening to a sermon about grace can be shown a worship song about the same theme, a relevant scripture passage, and a deeper-dive podcast episode — all discovered through semantic similarity rather than manual curation.
Limitations
Music metadata is sparser than sermon transcripts, which may bias similarity scores. The current model (bge-small, 384d) was not fine-tuned on theological text; a domain-adapted model could improve results. Future work will explore fine-tuning on the Rejoice catalog itself.