Integration: OpenAI
Use OpenAI Models with Haystack
Table of Contents
Haystack 2.0
You can use OpenAI Models in your Haystack 2.0 pipelines with the Generators, Embedders, LocalWhisperTranscriber and RemoteWhisperTranscriber.
Installation
pip install haystack-ai
Usage
You can use OpenAI models in various ways:
Embedding Models
You can leverage embedding models from OpenAI through two components: OpenAITextEmbedder and OpenAIDocumentEmbedder.
To create semantic embeddings for documents, use OpenAIDocumentEmbedder
in your indexing pipeline. For generating embeddings for queries, use OpenAITextEmbedder
. Once you’ve selected the suitable component for your specific use case, initialize the component with the model name and OpenAI API key.
Below is the example indexing pipeline with InMemoryDocumentStore
, OpenAIDocumentEmbedder
and DocumentWriter
:
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.embedders import OpenAIDocumentEmbedder
from haystack.components.writers import DocumentWriter
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
documents = [Document(content="My name is Wolfgang and I live in Berlin"),
Document(content="I saw a black horse running"),
Document(content="Germany has many big cities")]
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", OpenAIDocumentEmbedder(api_key="OPENAI_API_KEY", model="text-embedding-ada-002"))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"embedder": {"documents": documents}})
Generative Models (LLMs)
You can leverage OpenAI models through two components: GPTGenerator and GPTChatGenerator.
To use OpenAI’s GPT models for text generation, initialize a GPTGenerator
with the model name and OpenAI API key. You can then use the GPTGenerator
instance in a question answering pipeline after the PromptBuilder
.
Below is the example of generative questions answering pipeline using RAG with PromptBuilder
and GPTGenerator
:
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import GPTGenerator
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: What's the official language of {{ country }}?
"""
pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", GPTGenerator(model="gpt-4", api_key="OPENAI_API_KEY"))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")
pipe.run({
"prompt_builder": {
"country": "France"
}
})
Transcriber Models
To use Whisper models from OpenAI, initialize a LocalWhisperTranscriber
or RemoteWhisperTranscriber
based on hosting options. To use Whisper locally, install it following the instructions on the Whisper
GitHub repo. To use the OpenAI API, provide an API key. You can then use the suitable component to transcribe audio files.
Below is the example of indexing pipeline with LocalWhisperTranscriber
. If you’d like to run the Whisper model locally, you need to install two additional packages:
pip install transformers[torch]
pip install -U openai-whisper
from pathlib import Path
from haystack import Pipeline
from haystack.components.audio import LocalWhisperTranscriber
from haystack.components.preprocessors import DocumentSplitter, DocumentCleaner
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()
pipeline = Pipeline()
pipeline.add_component(instance=LocalWhisperTranscriber(model="small"), name="transcriber")
pipeline.add_component(instance=DocumentCleaner(), name="cleaner")
pipeline.add_component(instance=DocumentSplitter(), name="splitter")
pipeline.add_component(instance=DocumentWriter(document_store=document_store), name="writer")
pipeline.connect("transcriber.documents", "cleaner.documents")
pipeline.connect("cleaner.documents", "splitter.documents")
pipeline.connect("splitter.documents", "writer.documents")
pipeline.run({"transcriber": {"audio_files": list(Path("path/to/audio/folder").iterdir())}})
Installation (1.x)
pip install farm-haystack
Usage (1.x)
You can use OpenAI models in various ways:
Embedding Models
To use embedding models from OpenAI, initialize an EmbeddingRetriever
with the model name and OpenAI API key. You can then use this EmbeddingRetriever
in an indexing pipeline to create OpenAI embeddings for documents and index them to a document store.
Below is the example indexing pipeline with PreProcessor
, InMemoryDocumentStore
and EmbeddingRetriever
:
from haystack.nodes import EmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.pipelines import Pipeline
from haystack.schema import Document
document_store = InMemoryDocumentStore(embedding_dim=1024)
preprocessor = PreProcessor()
retriever = EmbeddingRetriever(
embedding_model="babbage-002", document_store=document_store, api_key=OPENAI_API_KEY
)
indexing_pipeline = Pipeline()
indexing_pipeline.add_node(component=preprocessor, name="Preprocessor", inputs=["File"])
indexing_pipeline.add_node(component=retriever, name="Retriever", inputs=["Preprocessor"])
indexing_pipeline.add_node(component=document_store, name="document_store", inputs=["Retriever"])
indexing_pipeline.run(documents=[Document("This is my document")])
Generative Models (LLMs)
To use GPT models from OpenAI, initialize a PromptNode
with the model name, OpenAI API key and the prompt template. You can then use this PromptNode
in a question answering pipeline to generate answers based on the given context.
Below is the example of generative questions answering pipeline using RAG with EmbeddingRetriever
and PromptNode
:
from haystack.nodes import PromptNode, EmbeddingRetriever
from haystack.pipelines import Pipeline
retriever = EmbeddingRetriever(
embedding_model="babbage", document_store=document_store, api_key=OPENAI_API_KEY
)
prompt_node = PromptNode(
model_name_or_path="gpt-3.5-turbo",
api_key=OPENAI_API_KEY,
default_prompt_template="deepset/question-answering"
)
query_pipeline = Pipeline()
query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
query_pipeline.run("YOUR_QUERY")
Transcriber Models
To use Whisper models from OpenAI, initialize a WhisperTranscriber
. To use Whisper locally, install it following the instructions on the Whisper
GitHub repo. To use the API implementation, provide an API key. You can then use this WhisperTranscriber
to transcribe audio files.
Below is the example of summarization pipeline with WhisperTranscriber
and PromptNode
:
from haystack.nodes import WhisperTranscriber, PromptNode
from haystack.pipelines import Pipeline
whisper = WhisperTranscriber(api_key=api_key)
prompt_node = PromptNode(
model_name_or_path="gpt-4",
api_key=api_key,
default_prompt_template="deepset/summarization"
)
pipeline = Pipeline()
pipeline.add_node(component=whisper, name="whisper", inputs=["File"])
pipeline.add_node(component=prompt_node, name="prompt", inputs=["whisper"])
output = pipeline.run(file_paths=["path/to/audio/file"])
Haystack 1.x
You can use OpenAI Models in your Haystack pipelines with the EmbeddingRetriever, PromptNode, and WhisperTranscriber.
Installation (1.x)
pip install farm-haystack
Usage (1.x)
You can use OpenAI models in various ways:
Embedding Models
To use embedding models from OpenAI, initialize an EmbeddingRetriever
with the model name and OpenAI API key. You can then use this EmbeddingRetriever
in an indexing pipeline to create OpenAI embeddings for documents and index them to a document store.
Below is the example indexing pipeline with PreProcessor
, InMemoryDocumentStore
and EmbeddingRetriever
:
from haystack.nodes import EmbeddingRetriever
from haystack.document_stores import InMemoryDocumentStore
from haystack.pipelines import Pipeline
from haystack.schema import Document
document_store = InMemoryDocumentStore(embedding_dim=1024)
preprocessor = PreProcessor()
retriever = EmbeddingRetriever(
embedding_model="babbage-002", document_store=document_store, api_key=OPENAI_API_KEY
)
indexing_pipeline = Pipeline()
indexing_pipeline.add_node(component=preprocessor, name="Preprocessor", inputs=["File"])
indexing_pipeline.add_node(component=retriever, name="Retriever", inputs=["Preprocessor"])
indexing_pipeline.add_node(component=document_store, name="document_store", inputs=["Retriever"])
indexing_pipeline.run(documents=[Document("This is my document")])
Generative Models (LLMs)
To use GPT models from OpenAI, initialize a PromptNode
with the model name, OpenAI API key and the prompt template. You can then use this PromptNode
in a question answering pipeline to generate answers based on the given context.
Below is the example of generative questions answering pipeline using RAG with EmbeddingRetriever
and PromptNode
:
from haystack.nodes import PromptNode, EmbeddingRetriever
from haystack.pipelines import Pipeline
retriever = EmbeddingRetriever(
embedding_model="babbage", document_store=document_store, api_key=OPENAI_API_KEY
)
prompt_node = PromptNode(
model_name_or_path="gpt-3.5-turbo",
api_key=OPENAI_API_KEY,
default_prompt_template="deepset/question-answering"
)
query_pipeline = Pipeline()
query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
query_pipeline.run("YOUR_QUERY")
Transcriber Models
To use Whisper models from OpenAI, initialize a WhisperTranscriber
. To use Whisper locally, install it following the instructions on the Whisper
GitHub repo. To use the API implementation, provide an API key. You can then use this WhisperTranscriber
to transcribe audio files.
Below is the example of summarization pipeline with WhisperTranscriber
and PromptNode
:
from haystack.nodes import WhisperTranscriber, PromptNode
from haystack.pipelines import Pipeline
whisper = WhisperTranscriber(api_key=api_key)
prompt_node = PromptNode(
model_name_or_path="gpt-4",
api_key=api_key,
default_prompt_template="deepset/summarization"
)
pipeline = Pipeline()
pipeline.add_node(component=whisper, name="whisper", inputs=["File"])
pipeline.add_node(component=prompt_node, name="prompt", inputs=["whisper"])
output = pipeline.run(file_paths=["path/to/audio/file"])