AI-First Ruby Web Scraping Framework

gem install kimurai

AI-First
Ruby Web Scraping Framework

Trusted by 1.1k+ developers

Write web scrapers in Ruby using a clean, AI-assisted DSL. Kimurai uses AI to figure out where the data lives, then caches the selectors and scrapes with pure Ruby. Get the intelligence of an LLM without the per-request latency or token costs.

GitHub Repository Quick Start Guide

1class HackernewsSpider < Kimurai::Base
2  @start_urls = ["https://news.ycombinator.com/"]
3 
4  def parse(response, url:, data: {})
5    # Vibe Scraping: AI generates reusable extraction schema
6    posts = extract(response) do
7      array :posts do
8        object do
9          string :title
10          string :url
11          string :author
12          string :points_count
13          string :posted_at
14        end
15      end
16    end
17 
18    save_to 'hackernews_results.json', posts, format: :json
19  end
20end

1class HackernewsSpider < Kimurai::Base
2  @start_urls = ["https://news.ycombinator.com/"]
3 
4  def parse(response, url:, data: {})
5    # Classic Mode: Manual CSS/XPath selectors
6    posts = response.xpath("//tr[@id='bigbox']//tr[@id]").map do |tr|
7      {
8        title: tr.xpath(".//span[@class='titleline']").text,
9        url: tr.xpath(".//span[@class='titleline']/a/@href").to_s,
10        author: tr.xpath("./following-sibling::tr[1]//a[@class='hnuser']").text,
11        points_count: tr.xpath("./following-sibling::tr[1]//span[@class='score']").text,
12        posted_at: tr.xpath("./following-sibling::tr[1]//span[@class='age']").text
13      }
14    end
15 
16    save_to 'hackernews_results.json', posts, format: :json
17  end
18end

AI-Powered Extraction

Use Nukitori to extract structured data without selectors—just describe the schema.

JavaScript Support

Render complex JS sites out of the box with Headless Chrome or Firefox.

Capybara DSL

Interact naturally: click_on, fill_in, and scroll using the powerful Capybara syntax.

Parallel Scraping

Scale horizontally with the simple `in_parallel` method for concurrent execution.

Smart Rotation

Built-in configuration for rotating proxies and user-agents automatically.

Auto-Healing

Handles request errors and restarts browsers upon hitting memory limits.

Project Structure

Run single-file spiders or generate full Scrapy-like projects with CLI runners.

Built-in Helpers

Export to JSON/CSV, filter duplicates, and schedule jobs with zero boilerplate.

Quick Start Guide

Scrape first 3 pages of google search results

Google Search Spider

1class GoogleSpider < Kimurai::Base
2  @start_urls = ['https://www.google.com/search?q=web+scraping+ai']
3  @delay = 1
4 
5  def parse(response, url:, data: {})
6    results = extract(response) do
7      array :organic_results do
8        object do
9          string :title
10          string :snippet
11          string :url
12        end
13      end
14 
15      array :sponsored_results do
16        object do
17          string :title
18          string :snippet
19          string :url
20        end
21      end
22 
23      array :people_also_search_for, of: :string
24 
25      string :next_page_link
26      number :current_page_number
27    end
28 
29    save_to 'google_results.json', results, format: :json
30 
31    if results[:next_page_link] && results[:current_page_number] < 3
32      request_to :parse, url: absolute_url(results[:next_page_link], base: url)
33    end
34  end
35end
36 
37GoogleSpider.crawl!

{
  "parse": {
    "organic_results": {
      "type": "array",
      "container_xpath": "//div[@id='rso']//div[contains(@class,'MjjYud')]",
      "items": {
        "title": { "xpath": ".//h3", "type": "string" },
        "snippet": { "xpath": ".//div[contains(@class,'VwiC3b')]", "type": "string" },
        "url": { "xpath": ".//a/@href", "type": "string" }
      }
    },
    "sponsored_results": {
      "type": "array",
      "container_xpath": "//div[@id='tads']//a",
      "items": {
        "title": { "xpath": ".//div[@role='heading']", "type": "string" },
        "url": { "xpath": "@href", "type": "string" }
      }
    },
    "people_also_search_for": {
      "type": "array",
      "container_xpath": "//div[@id='bres']//span",
      "items": { "xpath": ".", "type": "string" }
    },
    "next_page_link": { "xpath": "//a[@id='pnnext']/@href", "type": "string" },
    "current_page_number": { "xpath": "//td[contains(@class,'YyVfkd')]", "type": "number" }
  }
}

[
  {
    "organic_results": [
      {
        "title": "5 best AI web scraper tools I'm using in 2026",
        "snippet": "5 best AI web scraper tools to use in 2026...",
        "url": "https://www.gumloop.com/blog/best-ai-web-scrapers"
      },
      {
        "title": "Browse AI: Scrape and Monitor Data from Any Website",
        "snippet": "Browse AI is the most reliable AI-powered...",
        "url": "https://www.browse.ai/"
      }
    ],
    "sponsored_results": [
      {
        "title": "Scrape Data From Any Website",
        "url": "https://www.browse.ai/"
      }
    ],
    "people_also_search_for": [
      "Web scraping ai free",
      "Browse AI",
      "Web scraping ai online"
    ],
    "next_page_link": "/search?q=web+scraping+ai&start=10",
    "current_page_number": 1
  }
  // ...
]

How to Run

1. Configure AI Provider

1require 'kimurai'
2 
3Kimurai.configure do |config|
4  config.default_model = 'gpt-5.2'
5  config.openai_api_key = ENV['OPENAI_API_KEY']
6end

2. Run Spider

$ ruby google_spider.rb

Want more examples? Check docs at github