docs.langflux.space
  • Welcome to LangFlux
  • Using LangFlux
    • API
    • Streaming
    • Embed
    • Variables
  • Configuration
    • Auth
      • Chatflow Level
    • Rate Limit
  • Integrations
    • Cache
      • InMemory Cache
    • Chains
      • Conversational Retrieval QA Chain
      • Vectara QA Chain
    • Document Loaders
      • S3 File Loader
      • PDF Files
    • Chat Models
      • Azure ChatOpenAI
      • ChatLocalAI
      • Google VertexAI
    • Embeddings
      • Azure OpenAI Embeddings
      • LocalAI Embeddings
    • Memory
      • Short Term Memory
      • Long Term Memory
        • Zep Memory
      • Threads
    • Text Splitters
      • Character Text Splitter
    • Tools
      • Custom Tool
    • Vector Stores
      • Chroma
      • Pinecone
      • Elastic
      • Qdrant
      • SingleStore
      • Supabase
      • Vectara
    • Utilities
      • Set/Get Variable
      • If Else
    • External Integrations
      • Zapier Zaps
  • Use Cases
    • Web Scrape QnA
    • Webhook Tool
Powered by GitBook
On this page
  • Upsert
  • Query

Was this helpful?

  1. Use Cases

Web Scrape QnA

PreviousUse CasesNextWebhook Tool

Last updated 1 year ago

Was this helpful?

Let's say you have a website (could be a store, an ecommerce site, a blog), and you want to scrap all the relative links of that website and have LLM answer any question on your website. In this tutorial, we are going to go through how to achieve that.

You can find the example flow called - WebPage QnA from the marketplace templates.

Upsert

We are going to use Cheerio Web Scraper node to scrape links from a given URL.

HtmlToMarkdown Text Splitter to split the scraped content into smaller pieces.

If you do not specify anything, by default only the given URL page will be scraped. If you want to crawl the rest of relative links, click Additional Parameters.

  • Get Relative Links Method - how to crawl all relative links, Web Crawl or Sitemap

  • Get Relative Links Limit - how many links to crawl, set 0 to crawl all

On the top right corner, you will notice a green button:

A dialog will be shown that allow users to upsert data to Pinecone:

Under the hood, following actions will be executed:

  1. Scraped all HTML data using Cheerio Web Scraper

  2. Convert all scraped data from HTML to Markdown, then split it

  3. Splitted data will be looped over, and converted to vector embeddings using OpenAI Embeddings

  4. Vector embeddings will be upserted to Pinecone

Navigate to Pinecone dashboard, you will be able to see new vectors being added.

Query

Querying is relatively straight-forward. After you have verified that data is upserted to vector database, you can start asking question in the chat:

It is recommended to specify a system message for the Conversational Retrieval QA Chain. For example, you can specify the name of AI, the language to answer, the response when answer its not found (to prevent hallucination).

I want you to act as a document that I am having a conversation with. Your name is "AI Assistant". You will provide me with answers from the given info. If the answer is not included, say exactly "Hmm, I am not sure." and stop after that. Refuse to answer any question not about the info. Only answer in English. Never break character.

You can also turn on the Return Source Documents option to return a list of document chunks where the AI's response is coming from.

The same logic can be applied to any document use cases, not just limited to web scraping.

If you have any suggestion on how to improve the performance, we'd love your contribution!