Scrape

We will assume that you have installed the Spider package and exported your API key as an environment variable. If you haven't, please refer to the Getting Started guide.

Scrape a website and return the content.

import { Spider } from "@spider-cloud/spider-client";

const app = new Spider();
const url = "https://spider.cloud";
const scrapedData = await app.scrapeUrl(url);
console.log(scrapedData);

The scrapeUrl method returns the content of the website in markdown format as default. Next we will see how to scrape with with different parameters.

Scrape with different parameters

The scrapeUrl method has the following parameters:

  • url (str): The URL of the website to scrape.

the following are optional parameters and can be set in the params dictionary:

  • request ("http", "chrome", "smart") : The type of request to make. Default is "http".
  • return_format ("raw", "markdown", "commonmark", "html2text", "text", "bytes") : The format in which to return the scraped data. Default is "markdown".
  • stealth, anti_bot and a ton of other parameters that you can find in the documentation.
import { Spider } from "@spider-cloud/spider-client";

const app = new Spider();
const url = "https://spider.cloud";
const scrapedData = await app.scrapeUrl(url, {
  return_format: "raw",
  anti_bot: true,
});
console.log(scrapedData);

If you have a lot of params, setting them inside the scrapeUrl method can be cumbersome. You can set them in a seperate params variable that has the SpiderParams type which is also available in the spider package. You will have to use Typescript if you want type annotations.

import { Spider } from "@spider-cloud/spider-client";
import type { SpiderParams } from "@spider-cloud/spider-client/dist/config";

const app = new Spider();
const url = "https://spider.cloud";
const params: SpiderParams = {
  return_format: "raw",
  anti_bot: true,
};
const scrapedData = await app.scrapeUrl(url, params);
console.log(scrapedData);