CuratedAI: Multilingual Legal Retrieval on Vespa
Vector-only search works well for early semantic discovery. It breaks down when users need precise retrieval of specific legal articles, rulings, and citations.
Searchplex helped CuratedAI migrate from vector-only search to hybrid multilingual retrieval on Vespa, enabling reliable legal search across English, French, and Dutch legislation while supporting sovereignty-aligned deployment in Europe.
Client
CuratedAI is a Belgium-based legal technology company. Their platform enables legal professionals to search EU legislation, rulings, and regulatory materials in natural language - down to the relevant article or paragraph - for privacy and IT law research.
The situation
CuratedAI's search system was originally built on a vector-based architecture and served its initial use case well. Semantic search over English-language EU data protection legislation worked effectively for exploratory discovery.
As the product matured, however, new requirements emerged that the existing architecture was not designed to handle.
Known-item retrieval was unreliable. Broad topic exploration worked well - a user searching for GDPR controller obligations could surface relevant materials. But legal professionals frequently needed to retrieve a specific ruling or legal reference they already knew existed. Vector similarity handles thematic relationships well but is less reliable for exact lookup, identifiers, and legal citations. Without a lexical retrieval layer, these queries often failed.
Non-English legal content needed to be supported. CuratedAI needed to expand beyond English-language materials to include Belgian national legislation and rulings written in Dutch and French. Many of these documents do not have official English translations. Architectures that depend on translating source documents before indexing risk degrading the precision of legal terminology.
Deployment needed to align with European sovereignty requirements. The retrieval system needed to run on European infrastructure controlled by the customer rather than depend on managed platforms outside those constraints.
This pattern appears frequently in enterprise search systems: semantic search works well for exploratory discovery, but professional workflows - legal research, compliance analysis, technical documentation - often require retrieval systems that combine semantic recall with precise lexical targeting.
What we built
Searchplex migrated CuratedAI from vector-only search to a hybrid retrieval architecture on Vespa designed for multilingual legal research.
Hybrid retrieval. The core architectural change was combining lexical retrieval and semantic retrieval in a single ranking pipeline. Lexical retrieval improves precision for article numbers, legal references, and identifiers. Vector retrieval improves semantic recall and cross-lingual matching. Vespa's ranking framework combines both signals so the system can support exploratory search and precise legal lookup simultaneously. In legal research, retrieving the wrong article is often worse than retrieving nothing.
Native-language indexing. Legal documents are indexed in their original language with language-aware text processing. Cross-lingual retrieval is handled at query time rather than by translating source documents before indexing. This preserves the precision of Dutch and French legal terminology while still allowing multilingual search.
Fine-grained retrieval. Legal documents contain meaningful internal structure. Articles, paragraphs, recitals, and annexes answer different types of questions. The retrieval design surfaces the most relevant sections of legal documents rather than treating each document as a single undifferentiated unit.
Flexible deployment. The system runs on self-hosted European infrastructure aligned with CuratedAI's data residency requirements.
Architecture takeaway
This project highlights a retrieval pattern that appears frequently in enterprise search systems:
- Vector-only retrieval works well for early semantic exploration but struggles with exact lookup tasks such as legal references, identifiers, and known-item search.
- Hybrid retrieval becomes necessary as products mature, combining lexical precision with semantic recall.
- Multilingual legal search benefits from native-language indexing, since translating source documents before indexing can degrade legal terminology.
- Fine-grained retrieval improves professional workflows, where users need the relevant article or paragraph rather than just the correct document.
These constraints are common in legal technology, regulatory search, and other domains where precision matters.
Results
The resulting system gave CuratedAI a retrieval foundation better aligned with professional legal research workflows.
Queries in English, French, and Dutch now retrieve relevant legal materials across EU legislation and Belgian national legislation, including documents where no official English translation exists.
Known-item retrieval improved through the addition of lexical precision alongside vector retrieval.
Legal materials can now be searched in their original language while still supporting multilingual discovery.
The platform also runs reliably on European infrastructure aligned with sovereignty requirements.
Searchplex delivered a hybrid, multilingual search solution that elevated our document retrieval system, resulting in faster and more precise searches across multiple languages. This capability was crucial for our legal tech needs.
Vector-based semantic search
Exact legal lookup combined with multilingual semantic retrieval
European self-hosted infrastructure