Download web pages recursively under an URL1

 wget \
 --recursive \
 --no-clobber \
 --page-requisites \
 --adjust-extension \
 --convert-links \
 --restrict-file-names=windows \
 --domains \
 -nH --cut-dirs=some_subdir \
 -e robots=off \
 --random-wait \
 --wait 5 \
 --no-parent \

  • Substitute and with relevant expressions in your problem.
  • -r --recursive: download the entire Web site.
  • -D --domains don't follow links outside
  • -np --no-parent: don't follow links outside the directory subdirectory.
  • -p --page-requisites: get all the elements that compose the page (images, CSS and so on).
  • -E `--adjust-xtension.
  • -k --convert-links: convert links so that they work locally, off-line.
  • --restrict-file-names=windows: modify filenames so that they will work in Windows as well.
  • -nc --no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed).
  • -e robots=off: force crawling regardless of robots.txt setting.
  • -nH --cut-dirs=some_subdir: cuts out hostname and subdirectory name.
  • --random-wait: randomizes the time between requests to vary between 0.5 and 1.5 times of the waiting time specified by the --wait option.
  • -w --wait=5: number of seconds to wait between requests. (See --random-wait.)


  1. Downloading an Entire Web Site with wget. 2008.
blog comments powered by Disqus