2022-11-13

How to access page.getby* functions inside crawlee

I'm using crawlee with PlaywrightCrawler. I'm getting a new url to crawl after clicking a few elements in the starting page. The way that i'm clicking those elements is using page.getByRole().click(), which codegen playwright used it:

import { chromium } from "@playwright/test";
const browser = await chromium.launch({
    headless: true,
  });
  const context = await browser.newContext();
  const page = await context.newPage();

for (let i = 0; i < brandSection.length; i++) {
    let [brandName, brandCount] = brandSection[i].split("\n");
    await page
      .getByRole("button", { name: `${brandName} ${brandCount}` })
      .click();
  }

So this works without crawlee, but when I try to use inside a PlaywrightCrawler, It fails saying that the page instance doesn't have a function called .getByRole().

import { createPlaywrightRouter, enqueueLinks, Dataset } from "crawlee";
import { PlaywrightCrawler } from "crawlee";
....
....
router.addDefaultHandler(async ({ page, request, enqueueLinks }) => {
  const prodGridSel = ".catalog-grid a";
//**here goes same code as the previous snippet**//
  await enqueueLinks({
    ...
    },
  });
...
...
const crawler = new PlaywrightCrawler({
  requestHandler: router,
});

I haven't used playwright for testing, only for crawling with crawlee, so I'm guessing the getby*() functions are available when using "@playwright/test". I didn't found any information except this, which is related to cypress and probably a faulty import.

So, can I have a page instance inside crawlee that has these functions?



No comments:

Post a Comment