Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Scrape

Scape a website and collect the resource data.

import { Website } from '@spider-rs/spider-rs'

// pass in the website url
const website = new Website('https://rsseau.fr')

await website.scrape()

// [ { url: "https://rsseau.fr/blog", html: "<html>...</html>"}, ...]
console.log(website.getPages())

Headless Chrome

Headless Chrome rendering can be done by setting the third param in crawl or scrape to true. It will attempt to connect to chrome running remotely if the CHROME_URL env variable is set with chrome launching as a fallback. Using a remote connection with CHROME_URL will drastically speed up runs.

import { Website } from '@spider-rs/spider-rs'

const website = new Website('https://rsseau.fr')

const onPageEvent = (err, value) => {
  console.log(value)
}

// all params are optional. The third param determines headless rendering.
await website.scrape(onPageEvent, false, true)