blog/content/cheat-sheets/wget.md

---
Title: wget/curl
Date: 2022-07-25 13:45 CEST
Author: Fabrice
Category: cheat sheets
Tags: wget, curl, cli
Slug: wget-curl
Header_Cover: images/covers/speedboat.jpg
Summary: Some useful wget and curl commands, such as downloading a repository.
Lang: en
---

# wget or curl?

`wget` is a tool to download contents from the command line.
In its basic form, it allows downloading a file quite easily just by typing `wget <url>` in your favorite terminal.

However, a simple look to the [man](https://www.gnu.org/software/wget/manual/wget.html) page directly shows how powerful this tool is.

Similarily, `curl` is another tool to handle internet requests, however, a look at the [man](https://curl.haxx.se/docs/manpage.html) page shows that it supports  more protocols than `wget` which only handles https(s) and ftp requests.

On the other hand, `wget` can follow links (recursively), apply filters on your requests, transform relative links,…
Thus, they don't cover the same area of usage (even if the intersection is non-empty).
To put it short `wget` will prove useful whenever you have to download a part of a website while exploring links, while curl can be very handy to tweak single requests in an atomic fashion.
Moreover, if you want to analyze web information, firefox and chromium (I didn't try on other browsers) allows exporting requests directly as a curl command from the web inspector, which makes the job less painful than with [netcat](https://en.wikipedia.org/wiki/Netcat).

To conclude, I'm definitely not a `wget`/`curl` poweruser, so there may be very basic stuff here, but as I'm not using those tools on a daily basis.
Anyway, as I said, this section is to help me remember these commands to [reduce my google requests](https://degooglisons-internet.org/en/).

# wget

## Download a full repository

Download a repository selecting specific files
```sh
wget --recursive --no-parent --no-host-directories --cut-dirs=<n> --accept <extension list> <url>
```

Where `<n>` denotes the number of subdirectories to omit from saving. For instance, to download the cover images from this blog at the address “<https://blog.epheme.re/images/covers/>”, you can put:
```sh
wget -rnpnH --cut-dirs=2 -A jpg https://blog.epheme.re/
```

Anyhow, a simpler method, if you don't need the directory structure (for instance in the above example), is to use the `--no-directories`/`-nd` option. However, the cut-dirs can be useful if you need some architecture information (e.g., if the files are sorted in directories by date or categories)
To reject some documents, you can also use the option `-R`, which also accepts regular expressions (which type can be specified using --regex-type)

## Mirror a website

Another useful use of `wget` is just to make a local copy of a website. To do this, the long version is:
```sh
wget --mirror --no-host-directories --convert-links --adjust-extension --page-requisites --no-parent <url>
```

The name of options are quite straightforward, and the shorten version of it is: `wget -mkEp -np <url>`

### Ignoring robots.txt

Sometimes, [robots.txt](https://en.wikipedia.org/wiki/Robots_exclusion_standard) forbids you the access to some resources. You can easily bypass this with the option `-e robots=off`.

### Number of tries

Occasionally, when the server is busy answering you, `wget` will try again and again (20 times by default), which can slower your mirroring quite a bit (especially if the timeout is big). You can lower this bound using the… `--tries`/`-t` option.

## Finding 404 on a website

Using the `--spider` option to not actually download files, you can use it as a debugger for your website with `--output-file`/`-o` to log the result in a file.

```sh
wget --spider -r -nd -o <logfile> <url>
```

The list of broken links is then summarized at the end of the log file.

# Curl

## Send a POST request

My most frequent use of `curl` is to send POST requests to different kind of API, the syntax is quite simple using the `--form`/`-F` option:

```sh
curl -F <field1>=<content1> -F <field2>=<content2> <url>
```

Note that to send a file, precede the filename with an `@`:

```sh
curl -F picture=@face.jpg <url>
```

<!-- vim: spl=en
-->
wget/curl 2022-07-25 11:46:33 +00:00			`---`
			`Title: wget/curl`
			`Date: 2022-07-25 13:45 CEST`
			`Author: Fabrice`
			`Category: cheat sheets`
+ CLI tag 2023-10-29 21:13:57 +00:00			`Tags: wget, curl, cli`
wget/curl 2022-07-25 11:46:33 +00:00			`Slug: wget-curl`
			`Header_Cover: images/covers/speedboat.jpg`
			`Summary: Some useful wget and curl commands, such as downloading a repository.`
			`Lang: en`
			`---`

			`# wget or curl?`

			`wget` is a tool to download contents from the command line.
			In its basic form, it allows downloading a file quite easily just by typing `wget <url>` in your favorite terminal.

			`However, a simple look to the [man](https://www.gnu.org/software/wget/manual/wget.html) page directly shows how powerful this tool is.`

Small fixes in the `wget` article 2022-07-26 08:24:43 +00:00			Similarily, `curl` is another tool to handle internet requests, however, a look at the [man](https://curl.haxx.se/docs/manpage.html) page shows that it supports more protocols than `wget` which only handles https(s) and ftp requests.
wget/curl 2022-07-25 11:46:33 +00:00
Small fixes in the `wget` article 2022-07-26 08:24:43 +00:00			On the other hand, `wget` can follow links (recursively), apply filters on your requests, transform relative links,…
wget/curl 2022-07-25 11:46:33 +00:00			`Thus, they don't cover the same area of usage (even if the intersection is non-empty).`
Small fixes in the `wget` article 2022-07-26 08:24:43 +00:00			To put it short `wget` will prove useful whenever you have to download a part of a website while exploring links, while curl can be very handy to tweak single requests in an atomic fashion.
wget/curl 2022-07-25 11:46:33 +00:00			`Moreover, if you want to analyze web information, firefox and chromium (I didn't try on other browsers) allows exporting requests directly as a curl command from the web inspector, which makes the job less painful than with [netcat](https://en.wikipedia.org/wiki/Netcat).`

Small fixes in the `wget` article 2022-07-26 08:24:43 +00:00			To conclude, I'm definitely not a `wget`/`curl` poweruser, so there may be very basic stuff here, but as I'm not using those tools on a daily basis.
wget/curl 2022-07-25 11:46:33 +00:00			`Anyway, as I said, this section is to help me remember these commands to [reduce my google requests](https://degooglisons-internet.org/en/).`

			`# wget`

			`## Download a full repository`

			`Download a repository selecting specific files`
			```sh
			`wget --recursive --no-parent --no-host-directories --cut-dirs=<n> --accept <extension list> <url>`
			```

			Where `<n>` denotes the number of subdirectories to omit from saving. For instance, to download the cover images from this blog at the address “<https://blog.epheme.re/images/covers/>”, you can put:
			```sh
Error in the example 2022-07-25 12:12:01 +00:00			`wget -rnpnH --cut-dirs=2 -A jpg https://blog.epheme.re/`
wget/curl 2022-07-25 11:46:33 +00:00			```

Small fixes in the `wget` article 2022-07-26 08:24:43 +00:00			Anyhow, a simpler method, if you don't need the directory structure (for instance in the above example), is to use the `--no-directories`/`-nd` option. However, the cut-dirs can be useful if you need some architecture information (e.g., if the files are sorted in directories by date or categories)
wget/curl 2022-07-25 11:46:33 +00:00			To reject some documents, you can also use the option `-R`, which also accepts regular expressions (which type can be specified using --regex-type)

			`## Mirror a website`

Small fixes in the `wget` article 2022-07-26 08:24:43 +00:00			Another useful use of `wget` is just to make a local copy of a website. To do this, the long version is:
wget/curl 2022-07-25 11:46:33 +00:00			```sh
			`wget --mirror --no-host-directories --convert-links --adjust-extension --page-requisites --no-parent <url>`
			```

			The name of options are quite straightforward, and the shorten version of it is: `wget -mkEp -np <url>`

Uppercase (typo) 2022-07-26 08:26:57 +00:00			`### Ignoring robots.txt`
wget/curl 2022-07-25 11:46:33 +00:00
			Sometimes, [robots.txt](https://en.wikipedia.org/wiki/Robots_exclusion_standard) forbids you the access to some resources. You can easily bypass this with the option `-e robots=off`.

			`### Number of tries`

Separate <code> and / 2022-07-26 08:26:04 +00:00			Occasionally, when the server is busy answering you, `wget` will try again and again (20 times by default), which can slower your mirroring quite a bit (especially if the timeout is big). You can lower this bound using the… `--tries`/`-t` option.
wget/curl 2022-07-25 11:46:33 +00:00
			`## Finding 404 on a website`

Separate <code> and / 2022-07-26 08:26:04 +00:00			Using the `--spider` option to not actually download files, you can use it as a debugger for your website with `--output-file`/`-o` to log the result in a file.
wget/curl 2022-07-25 11:46:33 +00:00
			```sh
			`wget --spider -r -nd -o <logfile> <url>`
			```

			`The list of broken links is then summarized at the end of the log file.`

			`# Curl`

			`## Send a POST request`

Separate <code> and / 2022-07-26 08:26:04 +00:00			My most frequent use of `curl` is to send POST requests to different kind of API, the syntax is quite simple using the `--form`/`-F` option:
wget/curl 2022-07-25 11:46:33 +00:00
			```sh
			`curl -F <field1>=<content1> -F <field2>=<content2> <url>`
			```

			Note that to send a file, precede the filename with an `@`:

			```sh
			`curl -F picture=@face.jpg <url>`
			```

			`<!-- vim: spl=en`
			`-->`