Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60516

Need embedding and llm server access integrated in the UI

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • backlog
    • 7.6.0
    • fts
    • None
    • 0

    Description

      WE NEED TO INTEGRATE WITH ONE OR MORE EMBEDDING/LLM SERVICES

      I put together a prototype where no cut-n-paste is needed to demo the Vector capabilities in Couchbase onprem server. I will outline the behaviour via a series of steps and later attache a short video.   

      Adding this "real" functional capability took essential 12 hours on the weekend NOT to access the embedding servers or LLMs but rather to put it all together in our current UI to access a 55 line python proxy that supplies real embeddings and accesses real LLM at OpenAI.

      STEPS TO SHOW UTILITY OF THIS PROPOSED CAPABILITY

      Image 00 shows a blank query for on prem with some "hacked in" plugins.

      Image 01 shows selecting a query template based on indexed field (or type vector) where the template was made on the fly from the index we are querying.  By selecting a template we get a working query but a nonsense vector.

      Image 02 shows a template for a 1536 dimension vector.  The query is now legal it will return something when run (provided that data exists and was indexed).

      Image 03 shows an uncheck of "[  ] show advanced query settings" so we can see the query results, and hitting the magnifier to run the query.   THE USER DID NOT HAVE TO TYPE ANY SYNTAX.

      Image 04 type in a textual query under "Text for Embedding" this text should be relevant to the corpos that is vectorized in the couchbase database and also indexed under the specific index field. I chose "what is a romantic color that help people fall in love"  

      Image 05 recheck "[X] show advanced query settings"to hit the cloud button and the "hacked in" plugin for embedding contacts and fills in a Vector and shows the dimension (for fun).

      Image 06 now hit [ + ]next to the Embedding to replace the search vector in the JSON query.

      Image 07 again we uncheck "[  ] show advanced query settings" so we can see the query results, and hitting the magnifier to run the query.   THE USER DID NOT HAVE TO TYPE ANY SYNTAX - only the question to apply against the corpus.  We got three hits.

      Image 08 verify lets look at doc #FF007F and yes it looks good the text associated with the color had a description that included ".... Rose is a delicate and romantic color that evokes feelings of love and tenderness ..." this matched the semantic search of the query text we made our embedding from back in Image 04

      Image 09 Test the last part of retrieval augmented generation (RAG) via hitting the lower cloud button next to "Question to LLM" 

      Image 10 mouse over the textarea under Question to LLM it will consist of all stored field form each of the three reported index hits - this is the retrieval context.   Along with the original question form Image 04 and some directions to the LLM

      Image 11 final LLM answer.

      CONCLUSION

      The use of an embedding server and an LLM while quite simple code wise - below is an example of making an embedding call in python:

      import os
      import openai
      openai_api_key = os.getenv("OPENAI_API_KEY")
      text = "What is the meaning of the color that absorbs all light and conceals objects in darkness?"
      response = openai.Embedding.create( input=[text], model="text-embedding-ada-002" )
      print(response.data[0].embedding)

      The call to a large language model LLM is equally simple

      import os
      import sys
      import openai
      openai.api_key = os.getenv("OPENAI_API_KEY")
      user_input = "What color seems to minimize or hide all other colors?"
      messages = [{"role": "user", "content": user_input}]
      response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=messages )
      print(response.choices[0]["message"]["content"])

      The key is integrating these calls into our UIs to allow Couchbase to demonstrate the complex steps of RAG end to end within the UI to demonstrate the full power of our vector store.

      CLOSING THOUGHTS

      I used a proxy but we could also use Eventing or a UDF to do the same or even allow the user/customer to bring their own key and provide the needed service.

      If we provide the needed service we need to store the users "bearer" in a safe encrypted fashion for example OpenAI might have a key like the following (bash sh example not this key will not work)

      OPENAI_API_KEY=sk-39c8I25P2MHP5wfuJMKQT3BlwkFJ64IywBSGRhvHVlGI3Miv
      export OPENAI_API_KEY
      

      In Eventing we encrypt keys and passwords in metakv not sure if this is best practice.

      We could even use our own Couchbase key and allow at most 6 embedding calls and/or LLM calls per minute.  By this I mean just use the companies OPENAI_API_KEY and use Capella IQ but it would be critical to rate limit each authenticated user to say 5 calls per minute so as not to provide a free embedding/LLM backdoor.This would let our UI just work and make ease of kicking the tires better than the experience our customers give as we could do everything in our UI by this I mean demoing end-2-end retrieval augmented generation (RAG).

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            abhinav Abhi Dangeti
            jon.strabala Jon Strabala
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty