-
Supercharge Your Wisdom
Okay, folks, here’s the deal. This project is showing us how we can team up OpenAI with our knowledge base or other documents. And the cool part? We can do these fancy ‘semantic searches’ and even whip up prompts that we can tweak the generation of the LLM response just the way we like. This project contains a Streamlit Chat interface and a Luigi ETL Pipeline that processes and stores documents into a Weaviate Vectorstore instance. Github Repository The ETL pipeline performs several tasks: converting Jupyter notebooks and Python scripts to Markdown format, cleaning the code blocks in the Markdown files, removing unnecessary files and directories, and uploading the processed…
-
Hybrid Search is Enriching the Context of Search Queries
In an attempt to master the search experience, hybrid search integrates various algorithms, allowing a fusion of keyword-based search strategies and vector search methods. Weaviate Hybrid Search — 🦜🔗 LangChain Such cutting-edge tech is being implemented by Weaviate, a company that employs sparse and dense vectors to enrich the context of search queries and documents. Hybrid search brings together the advantages of multiple search paradigms. It harnesses the power of distinct algorithms such as BM25 and SPLADE, used to compute sparse vectors, and machine learning models like GloVe and Transformers, utilized for dense embeddings. A particular example of the hybrid search approach is seen in Weaviate, predominantly relying on: BM25/BM25F…
-
Evite el ruido y preserve el contexto
Esencialmente, se trata de desglosar contenido de texto grande en partes manejables para optimizar la relevancia del contenido que obtenemos de una base de datos vectorial utilizando LLM. Esto me recuerda a la búsqueda semántica. En este contexto, indexamos documentos llenos de información específica del tema. Si nuestra segmentación se hace correctamente, los resultados de la búsqueda se alinean bien con lo que el usuario está buscando. Pero si nuestros segmentos son demasiado pequeños o demasiado gigantes, podríamos pasar por alto contenido importante o devolver resultados menos precisos. Por lo tanto, es crucial encontrar ese punto dulce para el tamaño del segmento para asegurarnos de que los resultados de la…
-
Avoid noise and preserve context
Essentially, it’s about breaking down large text content into manageable parts to optimize the relevance of the content we retrieve from a vector database using LLM. This reminds me of semantic search. In this context, we index documents filled with topic-specific information. If our chunking is done just right, the search results align nicely with what the user is looking for. But if our chunks are too tiny or too gigantic, we might overlook important content or return less precise results. Hence, it’s crucial to find that sweet spot for chunk size to make sure search results are spot-on. OpenAIEmbeddings The OpenAIEmbeddings class is a wrapper around OpenAI’s API for…