I'm using crawlee
with PlaywrightCrawler
. I'm getting a new url to crawl after clicking a few elements in the starting page. The way that i'm clicking those elements is using page.getByRole().click()
, which codegen playwright
used it:
import { chromium } from "@playwright/test";
const browser = await chromium.launch({
headless: true,
});
const context = await browser.newContext();
const page = await context.newPage();
for (let i = 0; i < brandSection.length; i++) {
let [brandName, brandCount] = brandSection[i].split("\n");
await page
.getByRole("button", { name: `${brandName} ${brandCount}` })
.click();
}
So this works without crawlee, but when I try to use inside a PlaywrightCrawler
, It fails saying that the page
instance doesn't have a function called .getByRole()
.
import { createPlaywrightRouter, enqueueLinks, Dataset } from "crawlee";
import { PlaywrightCrawler } from "crawlee";
....
....
router.addDefaultHandler(async ({ page, request, enqueueLinks }) => {
const prodGridSel = ".catalog-grid a";
//**here goes same code as the previous snippet**//
await enqueueLinks({
...
},
});
...
...
const crawler = new PlaywrightCrawler({
requestHandler: router,
});
I haven't used playwright for testing, only for crawling with crawlee
, so I'm guessing the getby*()
functions are available when using "@playwright/test"
. I didn't found any information except this, which is related to cypress
and probably a faulty import
.
So, can I have a page
instance inside crawlee
that has these functions?
No comments:
Post a Comment