This is some text inside of a div block.

Search with Neum AI

David de Matheu
November 29, 2023
min read

At the core of Neum AI is the ability to search. Once we go through the process of extracting, processing, and ingesting data into vector databases, we can use that data to power search capabilities. This could be a search bar on a website or Retrieval Augmented Generation for an LLM chatbot. In this blog, we will explore the end-to-end capabilities provided by Neum AI for search.

A primer on Semantic Search

Today, keyword-based and full-text search are not enough. The software industry is moving towards a new kind of search that goes beyond keywords and fuzzy-matching, it’s moving towards semantic search. A lot of Vector Databases have sprung around recently because they provide an easy way to store information that can then be retrieved with fancy algorithms using natural language, where the results will be the closest ones in terms of meaning, or, semantically similar.

An example from elastic:
Consider “chocolate milk.” A semantic search engine will distinguish between “chocolate milk” and “milk chocolate.” Though the keywords in the query are the same, the order in which they are written affects the meaning. As humans, we understand that milk chocolate refers to a variety of chocolate, whereas chocolate milk is chocolate-flavored milk.”

In this blog post we will show how to create embedding and indexing pipelines with Neum AI very easily so that you can take advantage of semantic search over your data. Whether it is for a Retrieval Augmented Generation (think a chatbot that needs constant context), or search functionality within an application.

Getting started

We will start by installing the neumai python sdk

pip install neumai

For this example, we will use Open AI embeddings and Weaviate vector database as part of our pipeline configuration:

  • Open AI embeddings model for which you will need an Open AI API Key. To get an API Key visit OpenAI. Make sure you have configured billing for the account.
  • Weaviate vector database for which you will need a Weaviate Cloud Service URL and API Key. To get a URL and API Key visit Weaviate Cloud Service.

Configure a simple pipeline

We will start with a pipeline configured using the Neum AI framework. The pipeline will extract data from a website, process it and drop it into a vector database. We will configure it using the OpenAI and Weaviate credentials. You can further customize and configure your desired pipeline using our components.

from neumai.DataConnectors import WebsiteConnector
from neumai.SinkConnectors import WeaviateSink
from neumai.EmbedConnectors import OpenAIEmbed
from neumai.Loaders.HTMLLoader import HTMLLoader
from neumai.Chunkers.RecursiveChunker import RecursiveChunker
from neumai.Sources import SourceConnector
from neumai.Shared import Selector

website_connector =  WebsiteConnector(
    url = '',
    selector = Selector(
source = SourceConnector(
  data_connector = website_connector, 
  loader = HTMLLoader(), 
  chunker = RecursiveChunker()
openai_embed = OpenAIEmbed(
    api_key = 'open ai key',
weaviate_sink = WeaviateSink(
  url = 'your-weaviate-url',
  api_key = 'your-api-key',
  class_name = 'your-class-name',

from neumai.Pipelines import Pipeline

pipeline = Pipeline(

Once we have the pipeline, we will run it for the first time to populate our vector database. The pipeline can be triggered again if the data needs to be updated. (ex. website is updated or if using document stores, new documents are added.)

print(f"Vectors stored: {}")

We now have a populated vector database and a pipeline configuration. Lets now search that vector database using the pipeline configuration to extract data.

Search data

Neum AI has built-in methods for search through the pipeline object that abstract the process of taking a text based query, translating it into a vector and performing a query on the underlying vector database.

results ="User Query", number_of_results=3)

This will output results in the form of a NeumSearchResult object. Each of the NeumSearchResult objects contains an id for the vector retrieved, the score of the similarity search and the metadata which includes the contents that were embedded to generate the vector as well as any other metadata that was included.


	"id":"unique identifier for the vector",
  "score": 0.1231231132,
  	"text":"text content that was embedded",
    "other_metadata_1":"some metadata"

These results can be used to generate context to be fed to an LLM application or presented to the user as search results. For example processing them into a context string:

context = "\\n".join(result.metadata['text'] for result in results)

Next, we will explore the additional search capabilities provided when we take a local pipeline and deploy it to Neum AI Cloud.

Deploy to Neum AI

Using the NeumClient, we can use the same pipeline configuration we used locally and deploy it to the managed cloud. In order to deploy it, you will need a Neum AI key which you can do by signing up for Neum AI at and going to the settings page.

from neumai.Client.NeumClient import NeumClient
neumClient = NeumClient(api_key = 'neum api key')

Then using the NeumClient, we can deploy the pipeline using the create_pipeline method. Simply pass the Pipeline object.

from neumai.Pipelines import Pipeline
pipeline = Pipeline(source=[...], embed=... , sink=...)
pipeline_id = neumClient.create_pipeline(pipeline=pipeline)
print(f"Pipeline ID: {pipeline_id}")

Save the the pipeline_id provided as you will need it to interact with the pipeline moving forward.

Once deployed, you can go to<pipeline_id> to check the status or use the get_pipeline() method in the sdk.

Search Pipeline

Once deployed, you have access to the search_pipeline() method through the NeumClient to directly query.

print(neumClient.search_pipeline(pipeline_id=pipeline_id, query="What is ....", num_of_results=3, track=False))

Alternatively you can query pipelines from the UX or using REST APIs.

You can leverage the search results in the same way as shown earlier for either search bar capabilities or as context for an LLM application.

One key capability to highlight is the track property. This property is only available currently through the Neum AI Cloud and allows you to track queries and retrieved responses through your Pipeline. This means you can see what your users are querying for and the data being provided back. You can get a dump of all the retrievals captured through REST APIs.

Provide feedback on retrieval quality

One handy feature the Neum AI Cloud provides out of the box, is the ability to provide feedback (good / bad) for retrieved data. This feedback can be captured either through the Neum Dashboard or through REST APIs. You can export this data to be used for fine tuning purposes as well as to improve the pipeline.

Check out our latest post

Follows us on social for additional content

Configuring RAG pipelines requires iteration across different parameters ranging from pre-processing loaders and chunkers, to the actual embedding model being used. To assist in testing different configurations, Neum AI provides several tools to test, evaluate and compare pipelines.
David de Matheu
December 6, 2023
min read
Real-time synchronization of embeddings into vector databases is now trivial! Learn how to create a real-time Retrieval Augmented Generation pipeline with Neum and Supabase.
Kevin Cohen
November 25, 2023
min read
Following the release of Neum AI framework, an open-source project to build large scale RAG pipelines, we explore how to get started building with the framework in a multi-part series.
David de Matheu
November 22, 2023
min read

Ready to start scaling your RAG solution?

We are here to help. Get started today with our SDK and Cloud offerings.