I’m trying to scrape text data from an iframe on a webpage using Playwright with Python. Using page.query_selector_all directly doesn’t work because the elements are inside the iframe.
For example, this works:
inner_text = await page.frame_locator('iframe#mypage_frm').locator('//*[@id="List"]').inner_text()
print(inner_text)
But I want to use query_selector_all to loop through multiple repeating items:
elements = await page.frame_locator('iframe#mypage_frm').query_selector_all('//*[@id="List"]/li')
for el in elements:
text = await el.inner_text()
print(text)
The above doesn’t work. How can I correctly use query_selector_all (or query_selector) inside an iframe?
In Playwright, query_selector_all doesn’t exist on FrameLocator. You need to get the actual Frame object first:
frame = page.frame(name="mypage_frm") # or use url / selector
elements = await frame.query_selector_all('#List li') # CSS selector works
for el in elements:
text = await el.inner_text()
print(text)
I switched from frame_locator to frame() in one of my scraping scripts, and it let me use query_selector_all normally inside the iframe.
If you just need all texts at once, you can stick with FrameLocator and locator():
texts = await page.frame_locator('iframe#mypage_frm').locator('#List li').all_inner_texts()
print(texts)
I found this simpler when I had multiple <li> items to extract; it avoids manually looping over each element.
If you want to stick with your frame_locator style but iterate individually:
frame_locator = page.frame_locator('iframe#mypage_frm')
elements = await frame_locator.locator('#List li').element_handles()
for el in elements:
text = await el.inner_text()
print(text)
Using element_handles() here gives you the actual ElementHandle objects, which you can loop through just like query_selector_all. I use this approach when I need both text and other properties from each item.