Generated with sparks and insights from 5 sources

img6

img7

img8

img9

img10

img11

Introduction

  • Firecrawl is an open-source tool that converts websites into LLM-ready markdown or structured data.

  • To install Firecrawl locally, you need to clone the repository, set up environment variables, and run the necessary services.

  • Firecrawl can be installed on a Kubernetes cluster for more advanced deployment.

  • The installation process involves setting up dependencies like Node.js, pnpm, and Redis.

  • Firecrawl offers both Python and Node SDKs for easier integration and usage.

Local Installation [1]

  • Clone the repository: git clone https://github.com/mendableai/firecrawl.git.

  • Navigate to the project directory: cd firecrawl.

  • Copy the example environment file: cp ./apps/api/.env.example ./.env.

  • Edit the .env file to set USE_DB_AUTHENTICATION=false.

  • Update the Redis URL in the .env file: REDIS_URL=redis://localhost:6379.

  • Run the local instance: pnpm install and then pnpm run dev.

Kubernetes Installation [2]

  • Clone the repository: git clone https://github.com/mendableai/firecrawl.git.

  • Navigate to the project directory: cd firecrawl.

  • Copy the example environment file: cp ./apps/api/.env.example ./.env.

  • Edit the .env file to set USE_DB_AUTHENTICATION=false.

  • Update the Redis URL in the .env file: REDIS_URL=redis://redis:6379.

  • Follow the instructions in examples/kubernetes-cluster-install/README.md for Kubernetes setup.

Setting Up Dependencies [3]

  • Install Node.js: Follow the instructions at Node.js.

  • Install pnpm: Follow the instructions at pnpm.

  • Install Redis: Follow the instructions at Redis.

  • Set environment variables in a .env file in the /apps/api/ directory.

  • Use the template in .env.example to set up your .env file.

[Using Python SDK](/spark?generatorapi=generate_by_article_name&generatorapi_param=query=Firecrawl+Python+SDK+usage) [4]

  • Install the Firecrawl Python SDK: pip install firecrawl-py.

  • Get an API key from firecrawl.dev.

  • Set the API key as an environment variable named FIRECRAWL_API_KEY or pass it as a parameter to the FirecrawlApp class.

  • Scrape a URL: app.[scrape_url](prompt://ask_markdown?question=scrape_url)('https://example.com').

  • Crawl a website: app.crawl_url('https://example.com', params={'pageOptions': {'onlyMainContent': True}}).

img6

[Using Node SDK](/spark?generatorapi=generate_by_article_name&generatorapi_param=query=Firecrawl+Node+SDK+usage) [5]

  • Install the Firecrawl Node SDK: npm install @mendable/firecrawl-js.

  • Get an API key from firecrawl.dev.

  • Set the API key as an environment variable named FIRECRAWL_API_KEY or pass it as a parameter to the FirecrawlApp class.

  • Scrape a URL: app.scrapeUrl('https://example.com').

  • Crawl a website: app.crawlUrl('https://example.com', params={crawlerOptions: {excludes: ['blog/'], limit: 1000}, pageOptions: {onlyMainContent: true}}, waitUntilDone=true, timeout=5).

<br><br>