I will use Puppeteer-a JavaScript browser automation framework that uses the DevTools Protocol API to drive a bundled version of Chromium-but you should be able to achieve similar results with other headless technologies, like Selenium. The techniques covered in this post are roughly split into those that execute JavaScript on the page and those that try to extract a cashed or in-memory version of the image. Whatever your motivation, there are plenty of options at your disposal. Maybe you just don’t want to put unnecessary strain on their servers by requesting the image multiple times. Perhaps the images you need are generated dynamically or you’re visiting a website which only serves images to logged-in users. The simplest solution would be to extract the image URLs from the headless browser and then download them separately, but what if that’s not possible? In this post, I will highlight a few ways to save images while scraping the web through a headless browser.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |