gem install kimurai

AI-First
Ruby Web Scraping Framework

Trusted by 1.1k+ developers

Write web scrapers in Ruby using a clean, AI-assisted DSL. Define what to scrape once, then run crawls self-hosted with minimal ongoing cost.

1class HackernewsSpider < Kimurai::Base
2 @start_urls = ["https://news.ycombinator.com/"]
3 
4 def parse(response, url:, data: {})
5 # Vibe Scraping: Let AI generate reusable cached extraction schema
6 posts = extract(response) do
7 array :posts do
8 object do
9 string :title
10 string :url
11 string :author
12 string :points_count
13 string :posted_at
14 end
15 end
16 end
17 
18 save_to 'hackernews_results.json', posts, format: :json
19 end
20end
AI-Powered Extraction

Use Nukitori to extract structured data without selectors—just describe the schema.

JavaScript Support

Render complex JS sites out of the box with Headless Chrome or Firefox.

Capybara DSL

Interact naturally: click_on, fill_in, and scroll using the powerful Capybara syntax.

Parallel Scraping

Scale horizontally with the simple `in_parallel` method for concurrent execution.

Smart Rotation

Built-in configuration for rotating proxies and user-agents automatically.

Auto-Healing

Handles request errors and restarts browsers upon hitting memory limits.

Project Structure

Run single-file spiders or generate full Scrapy-like projects with CLI runners.

Built-in Helpers

Export to JSON/CSV, filter duplicates, and schedule jobs with zero boilerplate.

Quick Start Guide

Scrape first 3 pages of google search results

Google Search Spider

1class GoogleSpider < Kimurai::Base
2 @start_urls = ['https://www.google.com/search?q=web+scraping+ai']
3 @delay = 1
4 
5 def parse(response, url:, data: {})
6 results = extract(response) do
7 array :organic_results do
8 object do
9 string :title
10 string :snippet
11 string :url
12 end
13 end
14 
15 array :sponsored_results do
16 object do
17 string :title
18 string :snippet
19 string :url
20 end
21 end
22 
23 array :people_also_search_for, of: :string
24 
25 string :next_page_link
26 number :current_page_number
27 end
28 
29 save_to 'google_results.json', results, format: :json
30 
31 if results[:next_page_link] && results[:current_page_number] < 3
32 request_to :parse, url: absolute_url(results[:next_page_link], base: url)
33 end
34 end
35end
36 
37GoogleSpider.crawl!
How to Run

1. Configure AI Provider

1require 'kimurai'
2 
3Kimurai.configure do |config|
4 config.default_model = 'gpt-5.2'
5 config.openai_api_key = ENV['OPENAI_API_KEY']
6end

2. Run Spider

$ ruby google_spider.rb

Want more examples? Check docs at github