Page

A single page on a website, useful if you need just the root url.

New Page

Get a new page with content.

The first param is the url, followed by if subdomains should be included, and last to include TLD's in links.

Calling page.fetch is needed to get the content.

import asyncio
from spider_rs import Page

async def main():
    page = Page("https://choosealicense.com")
    page.fetch()

asyncio.run(main())

Page Links

get all the links related to a page.

import asyncio
from spider_rs import Page

async def main():
    page = Page("https://choosealicense.com")
    page.fetch()
    links = page.get_links()
    print(links)
asyncio.run(main())

Page Html

Get the markup for the page or HTML.

import asyncio
from spider_rs import Page

async def main():
    page = Page("https://choosealicense.com")
    page.fetch()
    links = page.get_html()
    print(links)

asyncio.run(main())

Page Bytes

Get the raw bytes of a page to store the files in a database.

import asyncio
from spider_rs import Page

async def main():
    page = Page("https://choosealicense.com")
    page.fetch()
    links = page.get_bytes()
    print(links)

asyncio.run(main())

spider-py

Page

New Page

Page Links

Page Html

Page Bytes