How to Set Up LlamaIndex: A Practical Guide for Enterprise AI Teams

TL;DR

Install LlamaIndex in 2 minutes with pip install llama-index (Python) or npm install llamaindex (TypeScript) Full Guide.
Load documents from 200+ sources (PDFs, SQL, Notion) and parse them into structured nodes Data Loaders.
Build production-ready RAG pipelines with hybrid search (vector + keyword) and reranking RAG Guide.
Deploy with observability (Arize, LangSmith) and scale with LlamaCloud Sign-up.
Gotchas: Large indexes need 16GB+ RAM; multi-modal RAG requires paid APIs Multi-Modal Costs.

1. Installation and Quickstart

Python (LlamaIndex.Py)

# Install core package (minimal dependencies)
pip install llama-index-core

# Install full suite (includes multi-modal, agents, etc.)
pip install llama-index

# Verify installation
python -c "from llama_index.core import VectorStoreIndex; print('LlamaIndex v0.10.28 ready')"
# Expected output: LlamaIndex v0.10.28 ready

Version verified from GitHub Releases.

Gotchas:

If you see ImportError: No module named 'llama_index', ensure you're using Python 3.9+ and a virtual environment.
For GPU acceleration, install PyTorch first: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118.

TypeScript (LlamaIndex.TS)

npm install llamaindex
# or
yarn add llamaindex

# Verify installation
node -e "const { VectorStoreIndex } = require('llamaindex'); console.log('LlamaIndex.TS v0.3.12 ready')"
# Expected output: LlamaIndex.TS v0.3.12 ready

TypeScript support documented in TS Guide.

2. Document Loading and Parsing

Load a PDF and Parse into Nodes

from llama_index.core import SimpleDirectoryReader

# Load documents from a directory (supports PDF, DOCX, CSV, etc.)
documents = SimpleDirectoryReader("data/").load_data()
print(f"Loaded {len(documents)} documents")
# Expected output: Loaded 3 documents

# Parse into nodes (chunks with metadata)
from llama_index.core.node_parser import SentenceSplitter
parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)
nodes = parser.get_nodes_from_documents(documents)
print(f"Created {len(nodes)} nodes")
# Expected output: Created 42 nodes

Documentation for data loaders available in Data Loaders.

Key Features:

200+ data connectors: Load from Notion, Slack, SQL, and more Data Loaders.

LlamaParse: Paid API for parsing complex PDFs (tables, multi-column layouts). Free tier available LlamaParse.

from llama_parse import LlamaParse
parser = LlamaParse(api_key="llx-...", result_type="markdown")
documents = parser.load_data("data/report.pdf")

Gotchas:

Large PDFs (>100 pages) may time out with SimpleDirectoryReader. Use LlamaParse for better results LlamaParse.
For SQL databases, use SQLTableRetrieverQueryEngine to auto-generate queries from natural language.

3. Index Types

Vector Index (Default for RAG)

from llama_index.core import VectorStoreIndex

# Create a vector index (uses OpenAI embeddings by default)
index = VectorStoreIndex.from_documents(documents)

# Persist to disk
index.storage_context.persist("storage/")

Expected Output:

INFO:llama_index.core.storage.storage_context:Saved VectorStoreIndex to storage/

Keyword Index (Lexical Search)

from llama_index.core import KeywordTableIndex

keyword_index = KeywordTableIndex.from_documents(documents)

Tree Index (Hierarchical Summarization)

from llama_index.core import TreeIndex

tree_index = TreeIndex.from_documents(documents)
# Query with a child-branch traversal
query_engine = tree_index.as_query_engine(child_branch_factor=2)
response = query_engine.query("Summarize the key points")
print(response)

When to Use Which:

Index Type	Use Case	Pros	Cons
Vector	Semantic search, RAG.	High accuracy, supports hybrid search.	Slower for large datasets.
Keyword	Lexical search (e.g., exact matches).	Fast, no embeddings needed.	No semantic understanding.
Tree	Hierarchical data (e.g., legal docs).	Preserves structure.	Complex queries.

Gotchas:

Vector indexes require an embedding model (default: OpenAI's text-embedding-3-small). For local embeddings, use HuggingFaceEmbedding:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

4. Query Engine Setup

Basic Query Engine

query_engine = index.as_query_engine()
response = query_engine.query("What are the risks of AI in 2026?")
print(response)

Expected Output:

The risks of AI in 2026 include:
1. Job displacement in creative industries.
2. Increased misinformation via deepfakes.
3. Regulatory gaps in multi-modal models.
Source: data/report.pdf (page 42)

Hybrid Search (Vector + Keyword)

from llama_index.core import QueryBundle
from llama_index.core.retrievers import BaseRetriever

class HybridRetriever(BaseRetriever):
    def __init__(self, vector_index, keyword_index):
        self.vector_retriever = vector_index.as_retriever()
        self.keyword_retriever = keyword_index.as_retriever()

    def _retrieve(self, query_bundle: QueryBundle):
        vector_nodes = self.vector_retriever.retrieve(query_bundle)
        keyword_nodes = self.keyword_retriever.retrieve(query_bundle)
        return vector_nodes + keyword_nodes

retriever = HybridRetriever(index, keyword_index)
query_engine = index.as_query_engine(retriever=retriever)

Advanced retrieval techniques documented in RAG Guide.

Gotchas:

Hybrid search adds ~200-500ms latency. Use similarity_top_k=2 to limit results.

For production, add a reranker (e.g., CohereRerank):

from llama_index.postprocessor.cohere_rerank import CohereRerank
reranker = CohereRerank(api_key="...", top_n=3)
query_engine = index.as_query_engine(node_postprocessors=[reranker])

5. Custom Retrievers

Build a Time-Based Retriever

from llama_index.core import get_response_synthesizer
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore
from typing import List

class TimeBasedRetriever(BaseRetriever):
    def __init__(self, index, time_field="date"):
        self.index = index
        self.time_field = time_field

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        # Filter nodes by time (e.g., "documents from 2025-2026")
        nodes = self.index.docstore.get_nodes()
        filtered_nodes = [
            node for node in nodes
            if node.metadata.get(self.time_field, "").startswith("2025")
        ]
        return [NodeWithScore(node=node, score=1.0) for node in filtered_nodes]

retriever = TimeBasedRetriever(index)
query_engine = index.as_query_engine(retriever=retriever)

Use Cases:

ASSESS (AI Security Posture Framework™): Retrieve logs from a specific time window to evaluate exposure.
COMPLY: Filter documents by compliance tags (e.g., "GDPR", "HIPAA").

6. Evaluation and Metrics

Run a RAG Evaluation

from llama_index.core.evaluation import (
    generate_question_context_pairs,
    EmbeddingQAFinetuneDataset,
)
from llama_index.evaluation import RetrieverEvaluator

# Generate synthetic Q&A pairs
qa_dataset = generate_question_context_pairs(documents, num_questions_per_chunk=2)

# Evaluate retriever
retriever = index.as_retriever(similarity_top_k=2)
evaluator = RetrieverEvaluator.from_metric_names(
    ["mrr", "hit_rate"], retriever=retriever
)
eval_results = await evaluator.aevaluate_dataset(qa_dataset)
print(eval_results)

Expected Output:

{'mrr': 0.85, 'hit_rate': 0.92}

Key Metrics:

Metric	Description	Target Value
MRR	Mean Reciprocal Rank.	>0.8
Hit Rate	% of queries with relevant results.	>0.9
Faithfulness	% of responses grounded in context.	>0.95

Gotchas:

Synthetic Q&A generation requires an LLM (default: OpenAI). For local evaluation, use LlamaCPP:

from llama_index.llms.llama_cpp import LlamaCPP
llm = LlamaCPP(model_path="models/<a href="/services/open-source-llm-integration">llama</a>-2-7b.Q4_K_M.<a href="/services/slm-edge-ai">gguf</a>")
qa_dataset = generate_question_context_pairs(documents, llm=llm)

7. Production Deployment Tips

Deploy with FastAPI

from fastapi import FastAPI
from llama_index.core import VectorStoreIndex
from pydantic import BaseModel

app = FastAPI()
index = VectorStoreIndex.from_documents(documents)

class QueryRequest(BaseModel):
    query: str

@app.post("/query")
async def query_index(request: QueryRequest):
    query_engine = index.as_query_engine()
    response = query_engine.query(request.query)
    return {"response": str(response)}

# Run with: uvicorn app:app --reload

Expected Output (API):

{
  "response": "The risks of AI in 2026 include job displacement and misinformation."
}

Observability with Arize

from llama_index.core.callbacks import CallbackManager
from llama_index.callbacks.arize_ai import ArizeCallbackHandler

arize_callback = ArizeCallbackHandler(
    api_key="...",
    space_key="...",
)
callback_manager = CallbackManager([arize_callback])
index = VectorStoreIndex.from_documents(
    documents, callback_manager=callback_manager
)

Observability features documented in Observability.

Scale with LlamaCloud

from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

index = LlamaCloudIndex.from_documents(
    documents,
    name="my-production-index",
    project_name="my-project",
    api_key="llx-...",
)

LlamaCloud documentation available at Sign-up.

Gotchas:

LlamaCloud indexes are eventually consistent (updates may take ~1 minute).
For multi-modal RAG, use LlamaCloudMultiModalIndex Multi-Modal.

Alternatives at a Glance

Tool	Best For	Key Limitation
LlamaIndex	Enterprise RAG, multi-modal apps.	TS version less mature [TS Roadmap](https://github

How to Set Up LlamaIndex: A Practical Guide for Enterprise AI Teams

1. Installation and Quickstart

Python (LlamaIndex.Py)

TypeScript (LlamaIndex.TS)

2. Document Loading and Parsing

Load a PDF and Parse into Nodes

3. Index Types

Vector Index (Default for RAG)

Keyword Index (Lexical Search)

Tree Index (Hierarchical Summarization)

4. Query Engine Setup

Basic Query Engine

Hybrid Search (Vector + Keyword)

5. Custom Retrievers

Build a Time-Based Retriever

6. Evaluation and Metrics

Run a RAG Evaluation

7. Production Deployment Tips

Deploy with FastAPI

Observability with Arize

Scale with LlamaCloud

Alternatives at a Glance

The 30% Report

関連記事

これらのアイデアについて話し合いませんか？

出典

How to Set Up Open WebUI: A Practical Guide for Enterprise AI Teams

How to Set Up LLM (Simon Willison): A Practical Guide