AI-First
Ruby Web Scraping Framework
Write web scrapers in Ruby using a clean, AI-assisted DSL. Define what to scrape once, then run crawls self-hosted with minimal ongoing cost.
1class HackernewsSpider < Kimurai::Base2 @start_urls = ["https://news.ycombinator.com/"]34 def parse(response, url:, data: {})5 # Vibe Scraping: Let AI generate reusable cached extraction schema6 posts = extract(response) do7 array :posts do8 object do9 string :title10 string :url11 string :author12 string :points_count13 string :posted_at14 end15 end16 end1718 save_to 'hackernews_results.json', posts, format: :json19 end20end
1class HackernewsSpider < Kimurai::Base2 @start_urls = ["https://news.ycombinator.com/"]34 def parse(response, url:, data: {})5 # Classic Mode: Manual CSS/XPath selectors6 posts = response.xpath("//tr[@id='bigbox']//tr[@id]").map do |tr|7 {8 title: tr.xpath(".//span[@class='titleline']").text,9 url: tr.xpath(".//span[@class='titleline']/a/@href").to_s,10 author: tr.xpath("./following-sibling::tr[1]//a[@class='hnuser']").text,11 points_count: tr.xpath("./following-sibling::tr[1]//span[@class='score']").text,12 posted_at: tr.xpath("./following-sibling::tr[1]//span[@class='age']").text13 }14 end1516 save_to 'hackernews_results.json', posts, format: :json17 end18end
AI-Powered Extraction
Use Nukitori to extract structured data without selectors—just describe the schema.
JavaScript Support
Render complex JS sites out of the box with Headless Chrome or Firefox.
Capybara DSL
Interact naturally: click_on, fill_in, and scroll using the powerful Capybara syntax.
Parallel Scraping
Scale horizontally with the simple `in_parallel` method for concurrent execution.
Smart Rotation
Built-in configuration for rotating proxies and user-agents automatically.
Auto-Healing
Handles request errors and restarts browsers upon hitting memory limits.
Project Structure
Run single-file spiders or generate full Scrapy-like projects with CLI runners.
Built-in Helpers
Export to JSON/CSV, filter duplicates, and schedule jobs with zero boilerplate.
Quick Start Guide
Scrape first 3 pages of google search results
Google Search Spider
1class GoogleSpider < Kimurai::Base2 @start_urls = ['https://www.google.com/search?q=web+scraping+ai']3 @delay = 145 def parse(response, url:, data: {})6 results = extract(response) do7 array :organic_results do8 object do9 string :title10 string :snippet11 string :url12 end13 end1415 array :sponsored_results do16 object do17 string :title18 string :snippet19 string :url20 end21 end2223 array :people_also_search_for, of: :string2425 string :next_page_link26 number :current_page_number27 end2829 save_to 'google_results.json', results, format: :json3031 if results[:next_page_link] && results[:current_page_number] < 332 request_to :parse, url: absolute_url(results[:next_page_link], base: url)33 end34 end35end3637GoogleSpider.crawl!
{
"parse": {
"organic_results": {
"type": "array",
"container_xpath": "//div[@id='rso']//div[contains(@class,'MjjYud')]",
"items": {
"title": { "xpath": ".//h3", "type": "string" },
"snippet": { "xpath": ".//div[contains(@class,'VwiC3b')]", "type": "string" },
"url": { "xpath": ".//a/@href", "type": "string" }
}
},
"sponsored_results": {
"type": "array",
"container_xpath": "//div[@id='tads']//a",
"items": {
"title": { "xpath": ".//div[@role='heading']", "type": "string" },
"url": { "xpath": "@href", "type": "string" }
}
},
"people_also_search_for": {
"type": "array",
"container_xpath": "//div[@id='bres']//span",
"items": { "xpath": ".", "type": "string" }
},
"next_page_link": { "xpath": "//a[@id='pnnext']/@href", "type": "string" },
"current_page_number": { "xpath": "//td[contains(@class,'YyVfkd')]", "type": "number" }
}
}
[
{
"organic_results": [
{
"title": "5 best AI web scraper tools I'm using in 2026",
"snippet": "5 best AI web scraper tools to use in 2026...",
"url": "https://www.gumloop.com/blog/best-ai-web-scrapers"
},
{
"title": "Browse AI: Scrape and Monitor Data from Any Website",
"snippet": "Browse AI is the most reliable AI-powered...",
"url": "https://www.browse.ai/"
}
],
"sponsored_results": [
{
"title": "Scrape Data From Any Website",
"url": "https://www.browse.ai/"
}
],
"people_also_search_for": [
"Web scraping ai free",
"Browse AI",
"Web scraping ai online"
],
"next_page_link": "/search?q=web+scraping+ai&start=10",
"current_page_number": 1
}
// ...
]
How to Run
1. Configure AI Provider
1require 'kimurai'23Kimurai.configure do |config|4 config.default_model = 'gpt-5.2'5 config.openai_api_key = ENV['OPENAI_API_KEY']6end
2. Run Spider
Want more examples? Check docs at github