Scrape
Scape a website and collect the resource data.
import asyncio
from spider_rs import Website
async def main():
website = Website("https://choosealicense.com")
website.scrape()
print(website.get_pages())
# [ { url: "https://rsseau.fr/blog", html: "<html>...</html>"}, ...]
asyncio.run(main())
Headless Chrome
Headless Chrome rendering can be done by setting the third param in crawl or scrape to true.
It will attempt to connect to chrome running remotely if the CHROME_URL env variable is set with chrome launching as a fallback. Using a remote connection with CHROME_URL will
drastically speed up runs.
import asyncio
from spider_rs import Website
async def main():
website = Website("https://choosealicense.com")
website.scrape(NULL, NULL, True)
print(website.get_pages())
# [ { url: "https://rsseau.fr/blog", html: "<html>...</html>"}, ...]
asyncio.run(main())