Searchplex
  • Case Studies
  • Blog
  • About Us
Search Stack Audit
Search Stack Audit

Browse services, solutions, case studies, and Vespa consulting pages from the same navigation tree as desktop.

Case StudiesBlogAbout Us
Searchplex

AI-native search and retrieval engineering for enterprises where search drives revenue, productivity, and customer experience.

GitHubLinkedInTwitterYouTube
Amsterdam, Netherlands

Industries

  • News & Media
  • Finance Technology
  • Legal Technology

Quick Links

  • Diagnostic
  • Vespa.ai Consulting
  • AI Search & Personalization
  • Enterprise RAG
  • Visual Search
  • Retrieval architecture for AI
  • Case Studies
  • Blog

Company

  • About Us
  • Events
  • Subscribe
  • Contact
  • hello@searchplex.net
© 2026 Searchplex. All rights reserved.
Cookie SettingsPrivacy PolicyTerms of Service
    1. Home/
    2. Blog
    Retrieval architecture

    What Makes Search Hard

    Search gets hard when retrieval, filtering, ranking, and business rules must behave like one coherent system under real production pressure.

    Ravindra Harige
    Ravindra Harige

    Founder at Searchplex

    April 8, 2026Retrieval architecture
    What Makes Search Hard

    Similar-looking search systems can behave very differently in production. Take two search systems. Both have an index of around 100k docs.

    One is a search system for construction and civil engineering documents. Queries look like latest approved floor plan for basement mechanical room.

    Another is an ecommerce search system. Queries look like waterproof hiking shoes size 11 under 150, in stock, sorted by rating.

    Same broad feature set. Different systems.

    Both support hybrid search, filters, reranking - so, same broad feature set. But they behave differently in the production. What explains the difference is not the total number of documents. It is how six factors interact in practice, in the search engine: query shape, document shape, retrieval scope, execution shape, operating pressure, and product expectations.

    Search systems are more than retrieval

    At the interface boundary, search can look simple:

    query -> index lookup -> ranked results

    Once the system carries real product responsibility, retrieval has to coexist with filters, counts, sorting, paging, reranking, freshness, access control, and business rules.

    The hard part is getting those mechanisms to stop contradicting each other.

    Query shape, document shape, and retrieval scope

    Query shape.
    Queries do different jobs, and they come in different level of complexity:

    • state-aware lookup, e.g. latest approved floor plan for basement mechanical room
    • constraint-heavy retrieval, e.g. waterproof hiking shoes size 11 under 100, in stock
    • known-item lookup, e.g. Nike Pegasus 41
    • scoped similarity search, e.g. find similar language across these custodians
    • comparison over a bounded set, e.g. compare concrete curing requirements across these three revisions

    These are different retrieval jobs, which dictate the execution profile and resources used.

    Document shape.
    Short product records, long contracts, revision-heavy drawing packages, email threads, multilingual content, and mixed-content PDFs do not create the same retrieval problem. Nature of document defines how it is structured and stored and matched against the user query.

    A retrieval unit, e.g. chunk is important. Teams often think they are debugging documents when they are really debugging retrieval units.

    Retrieval scope.
    The system rarely searches everything it stores. It searches the live slice for this request: one project, one tenant, one matter, one seller slice, a full catalog, a portfolio-wide index.

    That changes the system before engine choice becomes interesting. A narrow, well-structured scope can make a large stored collection cheap to search. A smaller collection with a broad or unstable live scope can be much harder to run.

    This is why similar corpus sizes still produce very different systems. The stored collection may look similar on paper. The active retrieval job does not.

    Execution shape

    Execution shape is how the system turns that retrieval job into results.

    Lexical, vector, and hybrid retrieval are only the headline labels. Production behavior usually depends on lower-level choices.

    Approximation.
    Exact search and ANN change latency, recall, memory use, filtering behavior, and update cost. They also change what kinds of surprises the system produces.

    Candidate generation.
    A reranker only improves what it gets to see. If the right items never enter the candidate set, the reranker has nothing to rescue. Many ranking problems start as candidate-generation problems and only show up later as "relevance" complaints.

    Filter timing.
    Pre-search filtering defines eligibility before candidate generation. Post-search filtering trims what a broader search already surfaced. Those are not the same system. In vector and hybrid retrieval, that difference often decides recall, latency, and whether counts, hits, and ranking still agree.

    Seams.
    Responsibility is often split across retrieval, filtering, fusion, reranking, application logic, business rules, and sometimes a generative layer on top. Each boundary is another place where relevance shifts, latency accumulates, and debugging crosses ownership lines. When the same retrieval layer feeds RAG or multi-step agents, those shifts are carried forward into the answer layer.

    Operating pressures and Product contract

    Hardware pressure.
    CPU, RAM, storage, cache behavior, and headroom decide how much inefficiency the system can absorb before users notice.

    Update pressure.
    Static corpora, daily batch workloads, and continuously updated indexes are different systems. Refresh behavior, merge behavior, stale state, and indexing cost start to matter as soon as retrieval structures have to stay current.

    Traffic pressure.
    QPS and concurrency expose hidden cost fast. A system used by a few people can look fine with late filtering, over-retrieval, and expensive reranking. The same system under hundreds of concurrent users, while indexes are also updating, can become unstable quickly.

    Product contract.
    Low latency, stable paging, reliable counts, current state, explanation, auditability, and policy correctness are not the same kind of requirement, but they all define what "working" means. A system that is acceptable for internal search can be unusable for marketplace ranking or compliance review.

    This is why teams report opposite experiences with superficially similar systems. They are not operating under the same pressures, and they are not trying to satisfy the same contract.

    Comparison of construction or civil document search versus ecommerce or marketplace search across corpus, features, users, query shape, document shape, retrieval scope, execution shape, pressure, and product contract.

    Where failures show up

    Most failures collapse into three buckets:

    • recall failure: the right thing never enters the candidate set
    • scope failure: the system searches or returns the wrong slice
    • ranking failure: the right candidates are present, but the ordering is weak or unstable

    One more term belongs here:

    • cracks: seams that become visible under pressure

    Cracks show up when concurrent user queries are happening, simultaneously index is upating, counts stop matching hits, paging wobbles after reranking, lexical and vector paths disagree under filters, freshness diverges across paths, or one bad result requires tracing three different layers to explain.

    Case in point: hybrid search

    Hybrid search exposes these problems quickly because it combines more than one retrieval behavior inside one product surface.

    The hybrid search is lexical + vector with optionally, an reranker.

    In production, the system is retrieval, filters, aggregations, sorting, paging, reranking, freshness, explanation, and business rules all trying to behave like one result model.

    Each part wants something different. Filters want the right slice. Aggregations want counts over that slice. Sorting wants stable order. Paging wants page two to behave like page two. Reranking wants a rich candidate set. Vector retrieval wants semantic proximity. Lexical retrieval wants exact term sensitivity.

    This is where systems start to wobble in visible ways: page one looks plausible, counts drift, filters expose disagreements between retrieval paths - lexical and vectors, and latency climbs as more work gets pushed into reranking or application logic.

    In case of RAG systems, you see additional layer of LLM serving which depends on how good your hybrid retrieval is.

    Three systems with similar corpus size

    Construction and engineering search is typically revision-heavy and approval-sensitive.

    • latest approved floor plan for basement mechanical room
    • fire suppression specification for the east wing

    The dominant pressure is workflow-state correctness. What cracks first is usually scope, revision, or state coherence, especially around words like latest and approved.

    Ecommerce and marketplace search is typically filter-heavy, freshness-sensitive, and ranking-sensitive.

    • waterproof hiking shoes size 11 under 150, in stock
    • espresso machine with grinder sorted by rating

    The dominant pressure is keeping filters, ranking, paging, and current state coherent under interactive load. What cracks first is usually paging stability, facet coherence, freshness correctness, or tail latency. The result often looks plausible on page one and inconsistent by page two.

    Legal, litigation, and compliance retrieval is typically metadata-heavy, scope-heavy, and explanation-sensitive.

    • emails discussing pricing strategy between January and March
    • find similar language across these custodians

    The dominant pressure is defensibility under selective scope and permissions. What cracks first is usually scope correctness, family or thread coherence, or auditability. A system can look relevant and still be unusable if the scope is even slightly wrong.

    Same corpus size. Different retrieval job. Different execution shape. Different pressure. Different cracks.

    The takeaway

    Search gets hard when multiple parts of the system have to agree on one result set under real operating pressure. Concurrent user queries, under resourced hardware for load spikes, evolving indexes, latency and freshness requirements, llm infra dependency, etc. - these factors make search systems hard, when it stops behaving like one system.

    Demos mislead for the same reason. Similar corpus sizes can produce very different systems, and the same architecture can feel acceptable in one product and broken in another.

    The same is true for production AI. When retrieval stops behaving like one coherent system, the answer layer inherits the failure.

    Newsletter

    Get notified about new articles and updates in your inbox.

    In this article

    1. Search systems are more than retrieval
    2. Query shape, document shape, and retrieval scope
    3. Execution shape
    4. Operating pressures and Product contract
    5. Where failures show up
    6. Case in point: hybrid search
    7. Three systems with similar corpus size
    8. The takeaway
    Keep reading
    Industry Insights

    How AI Can Turn Your Publishing Archives Into a New Source of Engagement

    Surface hidden gems from your archive, automatically, right before you hit publish.
    Read article →
    news

    Introducing Find & Mind: The Search & AI Meetup for Netherlands

    Launching Find & Mind, a new Amsterdam-based community meetup for Search, IR, RAG, and Applied AI.
    Read article →
    Talk to Searchplex

    Want help with production retrieval systems?

    Searchplex helps teams design and run retrieval systems—including RAG—when relevance, latency, and day-to-day operations have to hold up outside the demo. Book a conversation or read how we work.

    Explore our work