Skip to content

πŸ›°οΈ Build Your Own Perplexity Clone

πŸ”€ Workflow: Build Your Own Perplexity Clone

This blog contains the Webhook + RAG + Internet Search workflow for n8n. It extends 301 by adding a web search fallback for non-S3 AWS questions, while still using a vector store (RAG) for S3 queries, and politely refusing non-AWS topics.


✨ Overview

This workflow demonstrates tool-routed answering inside an n8n Agent:

  • πŸ“¦ S3 questions β†’ RAG (vector store built from your S3 docs)
  • 🌐 Other AWS questions β†’ Internet Search (up-to-date info)
  • 🚫 Non-AWS β†’ respectful refusal

Learners see how an agent can classify, choose tools, and ground answers.


πŸ”„ How It Works

Ingestion (one-time / as needed)

graph LR
  MT["πŸ–±οΈ Manual Trigger"] --> GD["⬇️ Google Drive: Download File"]
  GD --> DL["πŸ“‚ Data Loader"]
  DL --> TS["πŸ“„ Text Splitter"]
  TS --> EM["πŸ”€ Embeddings"]
  EM --> VS["πŸ“š Vector Store (S3 KB)"]

Runtime (per request)

graph LR
  WB["🌐 Webhook (POST)"] --> AG["🧠 AI Agent (Router)"]
  AG --> MEM["πŸ’Ύ Memory (sessionKey = username)"]
  AG --> LLM["πŸ€– OpenAI Chat Model"]
  AG --> S3["πŸ“¦ s3_knowledge_base (RAG)"]
  AG --> NET["🌐 internet_search (HTTP Tool)"]
  AG --> RSP["↩️ Respond to Webhook"]
  1. Webhook receives { query, username }.
  2. AI Agent classifies:

  3. If S3 β†’ query s3_knowledge_base (RAG) and answer.

  4. If AWS but not S3 β†’ call internet_search and answer from results.
  5. If non-AWS
  6. β†’ refuse.

  7. Memory keeps context per username.

  8. Respond to Webhook returns the final answer.

πŸ›οΈ Architecture

Your Own Perplexity Clone

Perplexity Clone Ingestion


πŸ›‚ Inputs (JSON Body)

  • query (string, required) β€” user question.
  • username (string, recommended) β€” stable ID for memory.

Example

{
  "query": "What's the difference between S3 Standard and S3 Glacier?",
  "username": "demo-user-1"
}

πŸ“€ Output

  • HTTP 200 with the agent’s answer.
  • Replies indicate source:

  • (Answer based on S3 knowledge base)

  • (Answer enriched with Internet Search results)
  • (Refusal: non-AWS topic)

βš™οΈ Setup

  1. Import perplexity-clone.json into n8n Cloud.
  2. Credentials

  3. πŸ”‘ OpenAI (for the Agent’s LLM)

  4. πŸ”‘ Google Drive (document download for S3 KB)
  5. πŸ”‘ Internet Search tool (set x-api-key header in the HTTP Request Tool)

  6. Activate the workflow and copy the Production Webhook URL.

  7. (Optional) Update Google Drive β†’ fileId to your own S3 reference doc and run the Manual Trigger to rebuild the vector store.

βœ… Tip: Keep temperature low (0.1–0.2) in the OpenAI node so the agent follows tool rules reliably.


πŸ§ͺ Try It

  1. Open the instructor’s Colab: Webhook Client (Colab)
  2. Click Copy to Drive to make it editable.
  3. In n8n, Activate this 401 workflow and copy the Production Webhook URL (not the Test URL).
  4. In your Colab copy, replace the webhook variable (url or WEBHOOK_URL) with the Production URL.
  5. Run all cells. Try:

  6. S3 (RAG expected): β€œHow do I enable S3 versioning?”

  7. AWS non-S3 (Search expected): β€œWhat is AWS Lambda?”
  8. Non-AWS (Refusal): β€œTell me about Paris.”

πŸ’‘ Use the same username to observe memory continuity.

Option B β€” cURL

WEBHOOK_URL="https://<your-n8n>/webhook/<id>"  # Production URL
curl -X POST "$WEBHOOK_URL" \
  -H "Content-Type: application/json" \
  -d '{"query":"What is AWS Lambda?","username":"demo-user-1"}'

Option C β€” Postman

  • New POST β†’ Production Webhook URL
  • Body β†’ raw β†’ JSON:

json { "query": "How do I enable S3 versioning?", "username": "demo-user-1" }

  • Send β†’ view response.

🧠 Teaching Notes

  • Routing pattern: Students see S3 β†’ RAG vs other AWS β†’ search.
  • Guardrails: Non-AWS questions are politely declined.
  • Grounding: Answers always cite source mode in the closing tag.
  • Maintainability: Docs can be refreshed without changing the runtime flow.

🩹 Troubleshooting

  • Refuses AWS question: Ensure tool names in the Agent match node names (s3_knowledge_base, internet_search) and the Internet Search API key is set.
  • Schema errors: Internet Search expects {"query": ["..."]} (array of strings). The S3 tool expects {"query": "..."} (string).
  • No response / 404: Workflow may not be Active; use Production webhook URL.

πŸ“š References


Don’t forget to check out my Agentic AI System Design for PMs course on Maven if you are interested to be a part of something bigger.

AI Bootcamp

πŸ‘‰ These resources expand on the workflows here and show how to apply AI + n8n in real projects.


🏠 Home - Agents in Action

➑️ Previous - Chatbot that Knows Your Documents

➑️ Next - Teach Your RAG Agent to Remember