Axios is one of the most commonly used Node.js libraries for web scraping. It enables developers to download the contents of websites, which can later be parsed with a library like Cheerio.
By default, Axios exposes your IP address when you connect to a website. This can lead to repercussions such as an IP ban. To avoid that, web scrapers use proxies—servers that act as middlemen between the client and the server. They help you hide your web scraping activities and protect your IP address.
First, make sure you have Node.js installed on your device. If you don’t, you can use the official instructions to download and install it.
ℹ️ If you already have NVM installed, you can just install the latest nodejs version
nvm install 22.20
Then, create a new folder called axios_proxy and move inside that folder. Run the npm init command to create a new Node.js project. Finally, create a file called index.js and open it in your favorite code editor.
mkdir axios_proxy
cd axios_proxy
npm init
touch index.js
After that, install Axios and Cheerio with the following terminal command. Axios will be used for making web requests, while Cheerio will be used for the web scraping script example.
npm i axios cheerio --save
You can find an example package.json file here
Here’s how a basic HTTP request made with Axios looks.
const axios = require('axios')
const cheerio = require('cheerio');
axios.get('https://quotes.toscrape.com/')
.then((r) => {
console.log(r.data)
})
All the function above does is request the content of a web page (in this case, it’s Quotes to Scrape) and print it out.
With Cheerio, you can parse the request to extract the necessary information.
const axios = require('axios')
const cheerio = require('cheerio');
axios.get('https://quotes.toscrape.com/')
.then((r) => {
const $ = cheerio.load(r.data);
const quote_blocks = $('.quote')
quotes = quote_blocks.map((_, quote_block) => {
text = $(quote_block).find('.text').text();
author = $(quote_block).find('.author').text();
return { 'text': text, 'author': author }
}).toArray();
console.log(quotes)
})
A working example can be found here.
In the case of Quotes to Scrape, the website is made for web scraping. But a website like Amazon will make you fill out captchas or log in if it detects that a lot of requests have been made from the same address. For this reason, it’s useful to add a proxy to your Axios requests.
To use a proxy server with Axios, you need to create a new variable that holds the value for the protocol, IP address, and port of the proxy that you want to connect to.
const proxy = {
protocol: 'http',
host: '176.193.113.206',
port: 8989
}
After that, you can use the proxy as an additional argument in the axios.get() request.
axios.get('https://quotes.toscrape.com/', {proxy: proxy})
Requests will now be funneled through the proxy that you provided.
A working example can be found here.
There are two ways to find a proxy to connect to.
You can either scour the internet for lists of free proxy servers or pay for access to a professional proxy server.
In the first case, the proxy servers that you will find will most likely be slow, unsafe, unreliable, and also will provide you with just one IP address. These issues can be addressed, but it takes quite a bit of time and expertise.
The proxy service provider will provide you with a secure endpoint to connect to. It will also provide proxy rotation by default. This means that it can change your IP address on every request, masking both your IP address and the fact that there’s any scraping being done on the page at all!
If you are looking for a reliable proxy provider that can provide hard-to-detect proxies that are ethically sourced from real devices all around the world, take a look at IPRoyal residential proxies. They are easy to set up and use, and the next section will show an example on how you can use them with Axios.
If you buy proxy services, the provider should supply you with all the available information: the host, port, username, and password you need to connect to the proxy.
For example, if you use IPRoyal residential proxies, you can find the necessary information in your dashboard.
Now, you can put all the information provided into a proxy variable.
proxy = {
protocol: 'http',
host: 'geo.iproyal.com',
port: 12321,
auth: {
username: 'cool username',
password: 'cool password'
}
}
After that, you can use the proxy as an argument in your axios.get() calls.
axios.get('https://quotes.toscrape.com/')
.then((r) => {
//... handle response here
})
Here’s an example of how a small web scraping script using an authenticated proxy and Axios might look:
const axios = require('axios')
const cheerio = require('cheerio');
const proxy = {
protocol: 'http',
host: 'geo.iproyal.com',
port: 12321,
auth: {
username: 'cool username',
password: 'cool password'
}
}
axios.get('https://quotes.toscrape.com/', {proxy: proxy})
.then((r) => {
const $ = cheerio.load(r.data);
const quote_blocks = $('.quote')
quotes = quote_blocks.map((_, quote_block) => {
text = $(quote_block).find('.text').text();
author = $(quote_block).find('.author').text();
return { 'text': text, 'author': author }
}).toArray();
console.log(quotes)
})
You can find the above example here.
Storing sensitive information—such as the username and password you use for a proxy—in code is not very secure. If you accidentally share the file with another person or put it on a public GitHub repository, the credentials will be exposed.
To fix that, this information is usually stored in environment variables—user-defined that are accessible to programs running on a computer.
Using the terminal, you can define HTTP_PROXY and HTTPS_PROXY environment variables that include the link to your proxy, which includes host, port, and (optionally) authentication details.
If you’re using IPRoyal residential proxies, this link is accessible in your dashboard.
Copy the link and set it as an environment variable for both HTTP and HTTPS using the following commands if you’re using Windows:
set HTTP_PROXY=http://username:password@host:port
set HTTPS_PROXY=http://username:password@host:port
If you’re using Linux or MacOS, you need to use the export command instead of set:
export HTTP_PROXY=http://username:password@host:port
export HTTPS_PROXY=http://username:password@host:port
If you run your Axios web scraping script using this terminal, it will use the defined proxy by default. Axios automatically looks for HTTP_PROXY and HTTPS_PROXY environment variables and uses those for proxies if it finds any.
If you have a bunch of proxy servers that you can use and you don’t use a professional proxy service that provides proxy rotation by default, it’s possible to stitch together a working solution that picks and tries a random proxy from a list of possible options.
First, create an array with multiple proxies
const proxies = [
{
protocol: 'http',
host: '128.172.183.18',
port: 8000
},
{
protocol: 'http',
host: '18.4.13.6',
port: 8080
},
{
protocol: 'http',
host: '65.108.34.224',
port: 8888
}
]
Then, create a function that chooses one of the proxies at random
function get_random_proxy(proxies) {
return proxies[Math.floor((Math.random() * proxies.length))];
}
Now you can call it on the proxies array to pick a random proxy to go through every time you want to make a request
axios.get('https://quotes.toscrape.com/', { proxy: get_random_proxy(proxies) })
.then((r) => {
// ... handle the response
});
You can find the above example here.
Using proxy services is a great way to enable large-scale web scraping, since it lets you hide web scraping activity from website administrators. If you’re using Axios, it’s quite easy to set up both unauthenticated and authenticated proxies for your web scraping projects.