LlamaIndex のバックアップ(No.11) - .NET 開発基盤部会 Wiki

[ トップ ] [ 新規 | 一覧 | 単語検索 | 最終更新 | ヘルプ ]

「.NET 開発基盤部会 Wiki」は、「Open棟梁Project」,「OSSコンソーシアム .NET開発基盤部会」によって運営されています。

戻る
- ChatGPT
- OSSのLLM
- LLMのPE
- LLMのRAG
- LangChain
- LlamaIndex
- AutoGen

目次 †

概要 †

ステージ †

Loading †

テキストデータを読み込む

Indexing †

テキストデータからインデックスを作成する。

Storing †

テキストデータとインデックスを永続化する。

Querying †

インデックスを使用してテキストデータを検索する。

Evaluation †

検索のリクエストレスポンスを客観的に評価。

機能 †

データの取得 †

生のテキストデータだけでなく、

ファイル (PDF、ePub、Word、PowerPoint?、Audioなど) や
Webサービス (Notion、Slack、Wikipediaなど) を

データソースとして利用できる。

インデックスの作成 †

Keyword検索、Vector検索：テキストデータのチャンクをベクトルに変換する。
- Keyword検索
- Vector検索
  - VectorStoreIndex?
  - SummaryIndex?

以下のインデックスは他のプロバイダにoff-road
- 全文検索：フルテキスト・インデックスを作成する。
- Graph検索：テキストデータのチャンクからノードとエッジを抽出する。

データのストア †

Vector Store、Document Store、Index Storeなどのストアにデータを保存。

Vector Store
Document Store
Index Store

データの検索 †

インデックスを使用してデータを検索

Keyword検索：キーワードを使用し、文書ベクトルを検索し結果を得る。
Vector検索：クエリもベクトルに変換し、文書ベクトルと比較して結果を得る。
Graph検索：GraphDBを全文検索後、ノードとエッジから関連文書を検索し結果を得る。

プロバイダ †

1st Party †

各ステージを処理する基本的なライブラリ

3rd Party †

各ステージを処理するライブラリ

データ取得：LlamaHub?に様々なデータコネクタが提供されている。

ベクトル化、ストア
- NoSQLデータベース：MongoDBやElasticsearchなどのNoSQLデータベースを使用してデータを保存および検索できる。
- クラウドストレージ: AWS S3やCloudflare R2などのクラウドストレージサービスを利用してデータを保存できる。
- Vectorストア: DeepLake?やFAISSなどを使用して、効率的なベクトル化、ベクトル検索を実現する。

詳細 †

斯々然々で公式を読む事をオススメする。

主要機能 †

Loading †

Reader
- SimpleDirectoryReader?と言う汎用的なライブラリを利用できる他、
- Readerを使用する代わりに、ドキュメントを直接使用することもできる。
- また、数百のデータコネクタをLlamaHub?レジストリをダウンロードして使用できる。
- LlamaCloud?のコネクタは、LlamaIndex純正IaaSストレージということだろう。
- ストレージによっては、インデックス化処理がオフロードされているものもあり、その場合、Indexingのプロセスは不要になる。

node_parser
- API的には、Indexingと同じタイミングで実行されるが、
- 概念的には、Loading、Readerの後に実行されるもの。
- SplitterでChunkに分割する（APIはNodeを返す）。
- Splitterのインスタンスがnode_parserらしい。
- node_parserの単独実行も可能で、show_progressと言ったオプションもある。
- パイプライン（IngestionPipeline?）に組み込んで、複雑なパースを実装することもできる。
- IngestionPipeline?()には、Splitter、Extractor、Embeddingなどを指定できる模様。

Indexing †

VectorStoreIndex?
- テキストの意味またはセマンティクスをベクトル化、数学的関係（≒内積計算）により、ランキングが可能になる。
- また、検索性能を上げるために、近似最近傍探索（ANN）、次元削減、クラスタリングと事前フィルタリングなどがある。
- 引数には、documentsやnodesを指定可能で、show_progressと言ったオプションもある。

SummaryIndex?
- テキストを要約し、メタデータを付与することで、検索＋ランキングが可能になる。
- 検索＋ランキングには、付与したメタデータを活用する。

KnowledgeGraphIndex?
- GraphRAG のIndexing処理を行う。
- Graphでは、Indexingの段階で、LLMを使用する（処理内容は様々）。

Storing †

Document Store、Vector Store、Index Storeに、Storage Contextを設定する。

Document Store：既出の、Loadingの所で、Document Storeから読み出している。

Indexingで、Vector Store と Index Storeに書き出し（永続化し）ている。

通常、DBにストア機能とサーチ機能が実装されているので、
Vector Store、Index Storeには同じDBのStorage Contextを設定する。

インデックス	特性	適合する NoSQL
VectorStoreIndex?	ベクトルデータ管理	ANN Pinecone, Weaviate, Milvus, Qdrant, Redis
SummaryIndex?	文書型データ、メタデータ管理	MongoDB, Elasticsearch, DynamoDB, Cassandra, Firebase Firestore
KnowledgeGraphIndex?	グラフデータ（ノードとエッジ）管理	Neo4j, ArangoDB, Amazon Neptune, TigerGraph?, JanusGraph?

Querying †

Indexを使用してVectorをサーチする。

トップ「K」セマンティック検索。
デフォルトのPrompt Templateをカスタマイズ

Evaluation †

その多機能 †

エージェントの構築 †

ワークフローの構築 †

構造化データ抽出 †

トレースとデバッグ †

参考 †

RAGフレームワーク LlamaIndex の概要を整理してみる
https://zenn.dev/nomhiro/articles/llama-index-abstract

LlamaIndexを使ってローカル環境でRAGを実行する方法 - 電通総研テックブログ
https://tech.dentsusoken.com/entry/2024/01/22/LlamaIndex%E3%82%92%E4%BD%BF%E3%81%A3%E3%81%A6%E3%83%AD%E3%83%BC%E3%82%AB%E3%83%AB%E7%92%B0%E5%A2%83%E3%81%A7RAG%E3%82%92%E5%AE%9F%E8%A1%8C%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95

公式 †

LlamaIndex - LlamaIndex
https://docs.llamaindex.ai/en/stable/

Home †

https://docs.llamaindex.ai/en/stable/

High-Level Concepts
Installation and Setup
How to read these docs
Starter Examples
- Starter Tutorial (Local Models) - LlamaIndex
  https://docs.llamaindex.ai/en/stable/getting_started/starter_example_local/
Discover LlamaIndex Video Series
Frequently Asked Questions (FAQ)
Starter Tools

Learn †

https://docs.llamaindex.ai/en/stable/understanding/

Using LLMs
https://docs.llamaindex.ai/en/stable/understanding/rag/

Building a RAG pipeline

Loading & Ingestion
- Loading Data (Ingestion)
  https://docs.llamaindex.ai/en/stable/understanding/loading/loading/
- LlamaHub?
  https://docs.llamaindex.ai/en/stable/understanding/loading/llamahub/
- Loading from LlamaCloud?
  https://docs.llamaindex.ai/en/stable/understanding/loading/llamacloud/

Indexing & Embedding
https://docs.llamaindex.ai/en/stable/understanding/indexing/indexing/
Storing
https://docs.llamaindex.ai/en/stable/understanding/storing/storing/
Querying
https://docs.llamaindex.ai/en/stable/understanding/querying/querying/

Building an agent
Building Workflows
Structured Data Extraction
Tracing and Debugging
Evaluating
Putting it all Together

Use Cases †

https://docs.llamaindex.ai/en/stable/use_cases/

Prompting
Question-Answering (RAG)
Chatbots
Structured Data Extraction

Examples †

https://docs.llamaindex.ai/en/stable/examples/

Agents
Chat Engines
Cookbooks
Customization

Component Guides †

https://docs.llamaindex.ai/en/stable/module_guides/

Models
Prompts
Loading
Indexing

非公式 †

LlamaIndex クイックスタートガイド｜npaka †

LlamaIndexのXについてやってみた(v0.10 対応)｜Aya* †

① Data Loding
https://note.com/rhe/n/ndf6d42efe273
②Indexing
https://note.com/rhe/n/n27b4ad617226
③StoringStoring?
https://note.com/rhe/n/n852087f2d905
④Querying
https://note.com/rhe/n/n665a9a24ae17

LlamaIndexを完全に理解するチュートリアル †

その１：処理の概念や流れを理解する基礎編（v0.6.8対応）
https://dev.classmethod.jp/articles/llamaindex-tutorial-001-overview/
その１：処理の概念や流れを理解する基礎編（v0.7.9対応）
https://dev.classmethod.jp/articles/llamaindex-tutorial-001-overview-v0-7-9/
その２：テキスト分割のカスタマイズ
https://dev.classmethod.jp/articles/llamaindex-tutorial-002-text-splitter/
その３：CallbackManager?で内部動作の把握やデバッグを可能にする
https://dev.classmethod.jp/articles/llamaindex-tutorial-003-callback-manager/
その４：ListIndex?で埋め込みベクトルを使用する方法
https://dev.classmethod.jp/articles/llamaindex-tutorial-004-listindex-use-embedding-vector/
その５：TreeIndex?を使ってその動作を確認してみる
https://dev.classmethod.jp/articles/llamaindex-tutorial-005-treeindex/