Using Proxy with Puppeteer and Headless Chrome

How can I use a proxy with Puppeteer and headless Chrome? I’m trying to configure a proxy with Puppeteer, but my current approach isn’t working. Here’s the code I’m using:

const puppeteer = require('puppeteer');

(async () => {
  const argv = require('minimist')(process.argv.slice(2));

  const browser = await puppeteer.launch({
    args: ["--proxy-server=${argv.proxy}", "--no-sandbox", "--disable-setuid-sandbox"]
  });
  const page = await browser.newPage();

  await page.setJavaScriptEnabled(false);
  await page.setUserAgent(argv.agent);
  await page.setDefaultNavigationTimeout(20000);
  
  try {
    await page.goto(argv.page);
    const bodyHTML = await page.evaluate(() => new XMLSerializer().serializeToString(document));
    const body = bodyHTML.replace(/\r|\n/g, '');
    console.log(body);
  } catch (e) {
    console.log(e);
  }
  
  await browser.close();
})();

How should I properly use puppeteer proxy and headless Chrome?

Hey, great question! I’ve had some experience with this. First off, it’s important to make sure your puppeteer proxy configuration syntax is correct, especially in the argument for the proxy server. You’ll want to check that there are no spaces around the equals sign in your proxy argument. So, your Puppeteer launch command would look like this:

const puppeteer = require('puppeteer');

(async () => {
  const argv = require('minimist')(process.argv.slice(2));

  const browser = await puppeteer.launch({
    args: [`--proxy-server=${argv.proxy}`, "--no-sandbox", "--disable-setuid-sandbox"]
  });
  const page = await browser.newPage();

  await page.setJavaScriptEnabled(false);
  await page.setUserAgent(argv.agent);
  await page.setDefaultNavigationTimeout(20000);

  try {
    await page.goto(argv.page);

    const bodyHTML = await page.evaluate(() => new XMLSerializer().serializeToString(document));
    const body = bodyHTML.replace(/\r|\n/g, '');
    console.log(body);
  } catch (e) {
    console.log(e);
  }

  await browser.close();
})();

That should get you on the right path!

In my experience working with puppeteer proxy configurations, I’d suggest a slightly different approach if you’re dealing with a proxy that requires authentication. Instead of specifying the proxy directly in the launch arguments, you can launch the browser without the proxy argument, and then authenticate afterward using page.authenticate(). This is super useful when your proxy needs a username and password. Here’s an example of how you can handle it:

const puppeteer = require('puppeteer');

(async () => {
  const argv = require('minimist')(process.argv.slice(2));

  const browser = await puppeteer.launch({
    args: ["--no-sandbox", "--disable-setuid-sandbox"]
  });
  const page = await browser.newPage();

  // Set proxy authentication
  await page.authenticate({ username: 'your-username', password: 'your-password' });

  await page.setJavaScriptEnabled(false);
  await page.setUserAgent(argv.agent);
  await page.setDefaultNavigationTimeout(20000);

  try {
    await page.goto(argv.page);

    const bodyHTML = await page.evaluate(() => new XMLSerializer().serializeToString(document));
    const body = bodyHTML.replace(/\r|\n/g, '');
    console.log(body);
  } catch (e) {
    console.log(e);
  }

  await browser.close();
})();

This way, you have more flexibility when dealing with authenticated proxies.

Both great suggestions so far! I’ve also worked a lot with puppeteer proxy setups and, for even more control, another technique is to use request interception. This allows you to manually route each request through a proxy, which is especially handy if you want to customize headers or even handle different requests in unique ways. You can do it like this:

const puppeteer = require('puppeteer');

(async () => {
  const argv = require('minimist')(process.argv.slice(2));

  const browser = await puppeteer.launch({
    args: ["--no-sandbox", "--disable-setuid-sandbox"]
  });
  const page = await browser.newPage();

  // Set up request interception
  await page.setRequestInterception(true);
  page.on('request', request => {
    // Add custom logic or modifications for each request
    request.continue({
      // You can modify the request headers, URL, etc.
    });
  });

  await page.setJavaScriptEnabled(false);
  await page.setUserAgent(argv.agent);
  await page.setDefaultNavigationTimeout(20000);

  try {
    await page.goto(argv.page, { waitUntil: 'networkidle2' });

    const bodyHTML = await page.evaluate(() => new XMLSerializer().serializeToString(document));
    const body = bodyHTML.replace(/\r|\n/g, '');
    console.log(body);
  } catch (e) {
    console.log(e);
  }

  await browser.close();
})();

This method gives you complete control over how requests are handled when using a puppeteer proxy, especially if you want to dynamically adjust settings or tweak requests mid-execution!