Conference Talks
Conference talks by Searchplex on production search, inverse hybrid retrieval, alerting at scale, and applied AI—sessions at events such as Berlin Buzzwords.
Conference Talks
About Conference Talks
Conference sessions on production search, retrieval at scale, hybrid ranking, and applied AI—where correctness and economics meet operations.
Talks at industry conferences on how search behaves under real constraints—hybrid retrieval, alerting and inverse query patterns, scaling economics, and operational correctness.
The Three-Body Problem of Inverse Hybrid Search
Session details
When users expect alerts for new products matching an uploaded image, the problem becomes inverse hybrid search. Unlike top-K search, alerting must guarantee fetch-all semantics: zero missed matches across all saved searches, combining vector similarity, boolean filters, and lexical signals. We show why this breaks traditional scaling intuition.
Saved searches and alerts are common across e-commerce and marketplaces: price drops, availability notifications, and increasingly, visual alerts driven by images captured on mobile devices. While the user experience feels simple, the underlying system represents one of the most demanding forms of search.
This talk reframes alerting as a distinct retrieval discipline:
- Inverse: documents trigger queries, not the other way around
- Hybrid: vector similarity, boolean filters, and lexical constraints must all apply
- Fetch-All: every true match must be returned – no truncation, no approximation
We examine why traditional search assumptions fail under these constraints. In particular, we show how cost and instability are driven not by throughput (QPS), but by match cardinality – the number of alerts matched per incoming item – and how this interacts with scatter/gather execution, merge costs, and bursty ingestion patterns.
The talk focuses on:
- where inverse hybrid systems break silently
- why scaling infrastructure buys stability rather than throughput
- how correctness becomes an operational and economic concern
- why AI-driven recall often increases system pressure rather than reducing it
The talk provides the audience a concrete framework for reasoning about inverse hybrid search systems at scale.
Transcript
Welcome everyone to my talk. Today I will be talking about the three-body problem of inverse hybrid search.
I think it's a very interesting topic. I hope you enjoy the talk by the end of it.
Before I get into it, a bit of introduction.
So my name is Ravindra Harige. I am founder of Searchplex. I live in Amsterdam, Netherlands.
I have a background in artificial intelligence, machine learning and information retrieval.
I have been working on search problems for over 10 years now and my experience has mostly been building search solutions for specific domains and verticals and across solving problems across layers of retrieval, ranking and improving metrics and because of this interest in solving search problems, I founded Searchplex.
So a bit about that as well. We help teams to design and modernize and scale production search solutions.
Most of the time there are different entry points like either the teams know already where the problems are then we help to solve and optimize that.
But most often what we are seeing lately is also there are teams who have spent a lot of time over-tuning their specific search stack and they are held back in improving it further, especially for AI workloads.
So that's been a good chunk of work where we also help in modernizing the search stack of the teams.
Because of the domain-specific search work, we have worked across multiple domains so far in e-commerce, legal, regulatory as well as health sciences — in three specific areas: the audit, where we help diagnose issues; PoCs; building solutions; and modernization.
So do check out the website for case studies and use cases.
And yeah, happy to chat more afterwards about that.
So before moving into the specific use case, let's set the scene.
So, you're at home, you're watching a movie, and this is a Red Notice movie, and you see the dance on the screen, and you're like, "Oh, I really like the jacket he's wearing."
And I want to find if I see a similar jacket to wear myself or purchase.
And then you pause the screen, you take a snap, and then you head over to your favorite clothes shopping app, and then see if you find something similar.
So in this case, you do see something similar, but it's not similar enough to what you're liking.
And then do you see a jacket on the fourth position which is closer to what you want but it's out of stock.
So what do you do? You're like okay if not now let me track this.
So you save the search and you expect that the system understands your intent like you are looking for a specific jacket.
You're searching for it and you want to be notified when it enters the system when new products similar to that enter the system.
So that's the saved search use case or alerts.
This is not new for us. This is basically we are saying this is the query — if something matches this let us know — and the notifications can be email or push notifications and we have seen this kind of alert or saved search situations across multiple situations like for job alerts or maybe you're finding a house and you want to set up an alert if a rental opens up in your specific area or travel alerts if you're tracking the prices.
So alerts are familiar and mostly if you have been building search solutions and alert systems you would be very familiar with how that's done with Elasticsearch or OpenSearch percolator.
It's very common to use that.
But in this specific talk I'm going to focus more on the saved searches for e-commerce and quite some websites which are apparel shopping websites you will see this kind of option where you run a search — whether or not you find something now — you can always save it and when you save it you see the confirmation and you'll keep receiving the emails from them about the new matches.
So technically speaking alerts are an inverse search system and it's an inverse world because for us search is always like a query comes — user query — and then it is matched against the corpus of documents or catalog but in the inverse search or the alerts it's the other way around: you are storing the user query as the corpus and the new products that are entering the system are the queries.
So take a moment to keep that flip in mind because that's what the whole talk is going to be about.
So those who have developed anything like this before with percolator in general you would know that this is mostly the lexical queries that are used as alerts.
But what has changed in the last few years is the ability of using vector search — you can also use the image as the input just like what I shared in the first use case: you want to be able to be alerted if you can search with images, you also want to be alerted on the images — so that's the change and it captures richness which is tedious to describe in words, right?
Like the fabric, the style and things like that.
So take a moment and tell me what's your first intuition? How would you solve this at scale? What comes to your mind when we are talking about image embeddings and doing a search over that?
Any answers?
Cosine, right? Cosine? Cool. Also perhaps — at least this was my intuition when I started with it.
I was like hm this sounds like an image search — simple vector-to-vector cosine — and maybe if you want to do it at scale we might want to use ANN and things like that.
So let's call that top-K retrieval where you are searching with a vector across many vectors and you want to find what's the closest nearest neighbor and in these kind of systems there are multiple tricks that the engine employs to get you the best nearest matching image vectors.
But in the end you are really getting the top K.
But we talked about the alerts use case where every user has set up an alert with an intention to be notified whenever the match arrives.
So think of all the users who do not make it to the top K and that's not a good thing for them, right.
So that's why I call that use case fetch-all retrieval.
What I mean by fetch-all is we need to evaluate every saved search, which is a corpus, against the incoming query which is a product and wherever the matches are happening — however many items it's matching — all of them have to be processed for alerts.
There is no top-K in this situation and when we talk about matching there are multiple aspects where it has to match.
So there is the image vector itself, there's a threshold for how closely it should be matching, there are different types of filters — price, size — and a number of other things.
So it's not a simple yeah-the-closest-is-a-match case — there are vectors and many other things that need to be evaluated.
So to summarize these are the two contrasting situations: on both sides you have the vector and you need to get the closest matching but there is one contract where you are just asking for the top K.
Let's call that the forward search use case. And in the inverse search, you are like get me everything that's matching.
So that's the difference in contract.
So what's exactly in play? When we talk about matching there can be lexical signals, keywords — we definitely talked about the image embeddings — the similarity threshold is very important; you can think that the threshold would be different for each saved search.
So maybe if you are into something very unique, you probably want to give a broader threshold so that you get more items.
But if you are looking for something more common, you are probably going to give a higher threshold because you don't want to be spammed by many matches.
So think of it like every query has a different threshold.
Now we have to satisfy that along with the vector distance itself.
And of course the other constraints which were always there about size, price, whether it ships to a certain country or not.
And the scale. So scale is very interesting because here the scale is on these two sides.
On one side you have how many saved searches are there in your corpus and on the other side you have how many products are entering your system.
So on both sides you have the scale which needs to be managed by your system and the last part is the correctness.
So the correctness is you have to really get this right for an alert to fire.
Because there is approximation — there is distance, there's vector similarity involved — and the other conditions are involved.
You don't want to leave out any matches you are unsure about.
Right?
So you have to ensure that the correctness is right in the first place.
So that brings us to the three-body problem. I'll be sharing how this problem can be seen on these three axes.
One is the correctness. The second one is cardinality and the third one is the execution physics of this.
To ground this work I'm very happy to introduce my client here: GEM.
They are a marketplace aggregator who have more than 170 million live listings with embeddings.
They already support image search for their customers. They are in the vintage clothing and secondhand market niche.
So most of their users are very loyal and they are always looking for something vintage, something specific.
So that's the use case — that's the work where we are doing this — and they are already on Elasticsearch and they already have in production alerts based on percolator but as I said percolator is lexical as of now and the challenge was to allow the users to save their image search capability that they have already exposed, save it, and let the users be notified of the new changes in the catalog.
So let's dive into the technical pattern here. I told you already that they are on Elasticsearch percolator.
I think Elasticsearch has done a really great job here in the sense of abstracting the complexity.
It's really cool that to construct a forward search query is the same DSL and you can use the same DSL to store it and also be notified in the reverse direction if any matches are coming.
So in terms of the function or capability itself, it's pretty cool what Elasticsearch did.
But as of today I haven't seen it working really well or supported even with vectors in the mix.
So it works well for keywords but we are dealing with vectors now.
Now to solve this challenge we looked at and started using Vespa predicate field and ranking.
So this is a capability that is not originally designed for percolation.
The history of this feature is that the intended use was for ads.
But it has a capability to evaluate incoming products or anything based on boolean constraints.
So by itself the predicate field is not a vector solution but it works very seamlessly with the ranking functionality that Vespa provides.
So these two capabilities work seamlessly with each other — together you can deliver a percolator kind of solution with vectors in the mix.
So what the solution looks like, broadly speaking: you have incoming products and then you do the predicate matching which is the boolean constraints matching, then you do the ranking for vector similarity scoring, and in the end you have the clean set of alerts which you want to process.
Let's focus on the first body of the problem — that's correctness.
So as I said it's about arriving at the match set and there is no room for missing out anything that should be matching but you're leaving it out.
There's no room for that. And since this solution is not percolator by design like what Elasticsearch did, it means it has a different semantics for constructing a query as well as constructing the saved search representation, let's call it that.
So there's a semantic for that and the query — which is a product — how you represent that, that's also a separate semantic.
So before you get to the matching side of things, the first thing is you get these two representations correctly.
So ensure your product-as-query representation is done correctly and saved searches are represented correctly with the predicate semantics.
Then the second step — I won't say second step — but another important part is creating a golden evaluation set.
So it's really important that you come up with enough examples of positives and negatives.
What should be matching? What shouldn't be matching for a given input product query?
What saved searches should match or shouldn't match.
So this golden evaluation set generation itself is an interesting task because you can actually use the search engine where you have forward search set up already to come up with all types of permutation and combination of filters and create this data set.
But it has to be exhaustive and it has to be a confidence-inducing set because all your work is going to depend on this.
So once you have done the query representation, saved search representation, and evaluation data set, then it's a validation loop — you have to iterate because those are two different semantics — you need to iterate until you have got all the matches and non-matches in order and you have confidence in the solution, how correctly it is doing its job, and as I said if you don't solve this step correctly there's no point going to the performance or creating a solution around it.
So this is a really fundamental and important step to get it right.
So let's move on to the second body of the problem — that's cardinality. So we have looked at the correctness which you have to get it right.
The cardinality — we talked about how there is no top-K, it has to match everything.
So we are familiar with the query-per-second kind of metrics.
In this case the product arrival rate is the QPS because we're in an inverse world and cardinality is how many saved searches it matches and I'll explain how that decides a lot of things in the entire solution.
So the more the number of matches, the more it is going to impact all the downstream tasks.
So what is the first thought — cardinality could be just the corpus size, perhaps that's an intuition, but there are three axes to that: corpus size definitely matters — it's the number of possible matches — but filter complexity is the second one. You can imagine in the saved searches there will be users who are just being lazy, just take a photo and set up an alert, no other conditions.
Then that's the broader saved search. Then there can be saved searches where they are giving the range of price.
They are giving a number of possible countries it has to ship to or they are giving the size options which is also multiple values.
So the more things you are going to add, the evaluation complexity itself — the filter complexity — is going to matter.
The third part is after the eligibility of match or no match is determined, how many saved searches it is bringing out of that step.
That's the third important part. So cardinality is decided there and for each saved search it is doing two types of roles.
So one is it is carrying all the aspects that we just discussed about lexical boolean constraints and ranges and everything but it has to also run all this logic on every incoming query or incoming product.
For each incoming product all this logic has to be executed.
So just to reiterate — top-K: throughout the work it took multiple times for us to come to this realization that top-K — it's not a top-K problem, it is a fetch-all problem and no matter how much we think we keep coming back to this intuition of like okay maybe we just need X number of results but that's not the case and that has an effect on how you characterize the workload.
So having discussed there are different aspects that are involved.
What is the workload pressure of an inverse search system?
It's basically the number of products that are arriving in the system multiplied by all the work it has to do to get to the match set.
And this workload pressure is going to matter in terms of the performance because the more saved searches you have.
So that's the x-axis here — the more results or cardinality you're going to have for any incoming product — and the P95 is what is going to decide how much your system can perform because at some point you're going to start seeing the performance bottleneck starts rising if you have not sized for the P95 onwards and here we are just talking about P95 as a match set size and we'll come to how this matters further but match set decides — so cardinality decides the match set, match set decides the payload — so what is the payload in this? The bare minimum information we need is the alert ids.
Once you have the ids you can do all types of processing for sending out the notification.
But you can see the more number of matches results in more search ids means larger JSON arrays.
Larger JSON arrays means more bytes and more bytes you have to move across different layers.
And that brings us to the third body of the problem. That's the execution physics.
So execution physics is not something specific for inverse search.
It's there also for forward search or any type of search system.
But this becomes a bit more interesting. I'll explain why.
So what happens in a search system — and this applies also for the forward and inverse case — is you receive the query. In Vespa it's called container but in Elasticsearch it's coordinating root nodes for instance which takes the query and then fans it out to all the data nodes where the actual logic is performed on the specific data it contains and then it aggregates it, brings back all that information to that node, and then from there it has to combine and merge all those results back to the service.
The service has to take that payload and deliver it to the client.
So that's the entire flow that happens.
So when we think of the performance quite often we think hm how long is the search engine taking to do a job.
But in this case it's not only engine time — there is also a variable amount of time taken in scatter/gather, the merge operations, JSON serialization, deserialization, the wire transfer — because these steps although they apply in the forward search use cases or a forward search flow, in the inverse search there is no K; in a normal search you would get all these things for 10 items in the results back to the client but here you need to get all the matching ids so the more search ids there are, the more payload has to move across different layers which increases your overall end-to-end latency.
And when we are building this kind of service for alerting, for saved searches, you are of course once you have gone past the correctness part you're like okay is it performing, is it doing well, why is it slow — these are very common questions we'll have once you start doing performance tests and then you will see it's not up to your expectation, why it's not fast enough.
And then the reason is this — and in order to improve the performance, you're like let me increase more workers, let me increase the concurrency, maybe it will increase the performance, maybe I will have a higher throughput.
But then we are again back to the execution physics — every system in this case has a knee in the curve.
So the concurrency is only going to help you up to a certain point beyond which it will have diminishing returns.
So no matter how much you're going to increase the workers and the concurrency you may not benefit from the performance gains there.
So if you're sizing for this kind of service to work in production, your capacity planning has to be at the tail.
And what that means is the best case is zero matches for a particular saved search.
The worst case is all the matches which is very likely a bug in your system if that's happening.
So you need to account for the range realistically for your kind of saved searches you have.
What's the minimum and what's the maximum — whatever is on the P95 side is what is going to decide the sizing of your resourcing, of the service, and the performance.
So this is the flow that — sort of perturbation is what I call — it has a cascading effect: the broader eligibility brings more candidates, more candidates has more matches, more payload size, and more payload size moving around has an effect on the request times which has an effect on the throughput and if you're not managing this well your product arrival rate or QPS is going to start running into backlog risk.
So this is where the three bodies is coming together.
All three have to be taken into consideration while building a saved searches kind of solution.
In order to effectively diagnose and build a resilient system where you understand the performance, there's no exception — you need to have good observability across different layers that we just discussed so that you know what's the incoming load, where the time is being taken in the evaluation, and where the time is going in building the payload in which layers and the movement between them.
So that's the very important part.
So in the end, how can we think of an inverse search solution?
This is a new operating model. If you have always built a forward search solution, always pay attention on the contract.
What's the contract? Is it top-K? In this case, it's fetch-all. What is the shape?
How broad or constrained the saved searches are? What is the workload pressure?
Because the more cardinality, the more work for your system in the end.
The physics of it — how much latency it is taking to move the payload across different layers — and to do all that kind of understanding you need to have good instrumentation.
So the orbit — bringing back to the last step — we started with a three-body problem.
So the orbit only stabilizes when all three bodies are measured and you have good confidence that each of them is doing its job well and in cohesion.
So yeah that's it.
We have a question here. Thanks. That was a great talk. How does all this change depending on the frequency and timing of product updates?
So, a lot of e-commerce shops, new products might arrive in a big lump at 3:00 in the morning or they might arrive throughout the day at certain intervals.
If you suddenly have, I don't know, 3,000 new products appearing, they've all got to — you've got 3,000 inverse searches to carry out.
How do you balance that for performance and keeping your system up?
So there has to be some understanding of what kind of latency targets we are going to set to service that workload.
If you're dropping the entire load directly on the alerting service as it is then as I said the execution physics kicks in.
So there is only so much you can push through the system and if the requests already in flight are in saturation mode all the new requests that you're going to send through are going to be rejected by the system unless you have some queueing mechanism where you can control — okay you received a barrage of requests but the moment you start seeing rejection you pile it up and then you send it again, you try again.
So those kind of mechanisms need to be built around that.
Yeah, thank you.
Maybe a very naive question, but at what point do we need inverse search instead of just running the queries that were saved every once in a while like once per day.
Is that the question of delivering this alert that your product arrived in seconds but I can't really think of a use case for that, or is that the question of whether this kind of request will be much more efficient in terms of computational power than just the naive approach of running these searches in the normal direction.
Yeah, that's a good question. So that's the product question actually because every company, every product for which this feature is important, will determine what the urgency is.
In some cases people are really let's say tracking in-demand products and they want to be the first to be notified.
They want to be able to buy it quickly. In those cases, it has to be quick when the product arrives.
But of course there are other cases where it doesn't have to be real time, doesn't have to be immediate.
But yeah, it varies. It's a product contract question.
Yeah, one last question here.
Okay, thank you for the presentation. Just one question about the physics problem.
If I understand correctly, in alerting you don't have the gather problem.
So you don't have to gather all your results into 10 best results.
So why can't you partition all your saved searches?
For instance, if you have 1 million saved searches, you can put them in 100 partitions and run them possibly in parallel to avoid exploding your context.
Is that something possible?
I mean you need to combine — you need to prepare the final list of alert ids that needs to be processed, right?
So if you are partitioning a large amount of saved searches across different collections then I think that's a way to manage the load but in general the intuition is okay all saved searches are in one place, query enters, and then you need to bring all these things together and they are still separated by different data nodes right, so but somewhere it has to collect all the ids together to make the payload load.
But yeah, that's probably possible as well if you split into separate systems because at the end the alerts are processed one by one and you don't have to gather them at one point.
Yeah. But okay. And just a final question — you showed GEM, so the use case used Elasticsearch in the first place.
So do they currently use both Elasticsearch and Vespa or did they switch from one to the other?
So that's a migration process that's underway.
We started with this very hairy problem because this was blocking them in production first.
The forward search use cases are known to be relatively easier.
But now it's proven that this works.
This works great. Then the rest of the work is underway.
Great. Thank you very much Ravindra. Yeah.
Watch Now