How to Download & Install NLTK

โšก Smart Summary

Download and Install NLTK on Windows, Mac, or Linux by installing Python first, then adding the Natural Language Toolkit through pip or Anaconda and downloading the corpus datasets.

  • โœ… Requirement: Install Python before adding NLTK.
  • โš™๏ธ Install: Use pip, easy_install, or Anaconda.
  • ๐Ÿ“š Datasets: Run nltk.download() to fetch corpora.
  • ๐Ÿ Verify: import nltk in the Python shell.
  • ๐Ÿค– AI Use: Tokenization and tagging for NLP pipelines.

Download and Install NLTK

Installing NLTK in Windows

Learn how to set up NLTK on Windows from the command prompt. The instructions below assume Python is not installed yet, so the first step is to install Python.

Installing Python in Windows

Step 1) Open the link https://www.python.org/downloads/, and select the latest Windows release.

Installing Python in Windows

Note: For an older version, visit the Downloads tab to see all releases.

Installing Python in Windows

Step 2) Click the downloaded installer file.

Installing Python in Windows

Step 3) Select Customize Installation.

Installing Python in Windows

Step 4) Click NEXT.

Installing Python in Windows

Step 5) On the next screen:

  1. Select the advanced options.
  2. Provide a custom install location. In this example, a folder on the C drive is chosen for easier access.
  3. Click Install.

Installing Python in Windows

Step 6) Click the Close button once the install finishes.

Installing Python in Windows

Step 7) Copy the path of your Scripts folder.

Installing Python in Windows

Step 8) In the Windows command prompt:

  • Navigate to the location of the pip folder.
  • Enter the command to install NLTK:
    pip3 install nltk
  • The installation should complete successfully.

Installing Python in Windows

NOTE: For Python 2, use the command pip2 install nltk.

Step 9) From the Windows Start menu, search for and open the Python Shell.

Installing Python in Windows

Step 10) Verify that the installation works by running the command below:

import nltk

Installing Python in Windows

If no error appears, the installation is complete.

Installing NLTK in Mac/Linux

Installing NLTK on Mac or Linux requires the Python package manager pip. If pip is not installed, follow the instructions below to complete the process.

Step 1) Update the package index by typing the command below:

sudo apt update

Step 2) Install pip for Python 3:

sudo apt install python3-pip

You can also install pip through easy_install:

sudo apt-get install python-setuptools  python-dev build-essential

Once easy_install is installed, run the command below to install pip:

sudo easy_install pip

Step 3) Use the following command to install NLTK:

sudo pip install -U nltk
sudo pip3 install -U nltk

Installing NLTK through Anaconda

Step 1) Install Anaconda by visiting https://www.anaconda.com/products/individual and selecting the Python version you need.

Installing NLTK through Anaconda

Note: Refer to this tutorial for detailed steps to install Anaconda.

Step 2) In the Anaconda prompt:

  1. Enter the command:
    conda install -c anaconda nltk
  2. Review the package upgrade, downgrade, and install information, then enter yes.
  3. NLTK is downloaded and installed.

Installing NLTK through Anaconda

NLTK Dataset

The NLTK module ships with many datasets that you need to download before use. Technically, each dataset is called a corpus. Common examples include stopwords, gutenberg, framenet_v15, large_grammars, brown, and wordnet.

How to Download all packages of NLTK

Step 1) Run the Python interpreter in Windows or Linux.

Step 2)

  1. Enter the commands:
import nltk
nltk.download ()
  1. The NLTK Downloader window opens. Click the Download button to fetch the dataset. This process takes time depending on your internet connection.

Download all Packages of NLTK

NOTE: You can change the download location by clicking File > Change Download Directory.

Download all Packages of NLTK

Step 3) To test the installed data, use the following code:

>>> from nltk.corpus import brown
>>>brown.words()

[‘The’, ‘Fulton’, ‘County’, ‘Grand’, ‘Jury’, ‘said’, …]

Download all Packages of NLTK

Running the NLP Script

This section explains how an NLP script runs on a local PC. The right library choice depends on your requirements. See the official list of NLP libraries for alternatives such as spaCy, gensim, and TextBlob.

How to Run NLTK Script

Step 1) In your favorite code editor, copy the code and save the file as NLTKsample.py:

from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')
filterdText=tokenizer.tokenize('Hello Guru99, You have build a very good site and I love visiting your site.')
print(filterdText)

Run NLTK Script

Code Explanation:

  1. The objective of this program is to remove every kind of punctuation from a given text. We imported “RegexpTokenizer”, a module of NLTK that removes any expression, symbol, character, or numeric value you choose.
  2. A regular expression is passed to the “RegexpTokenizer” module.
  3. The text is tokenized using the “tokenize” method, and the output is stored in the “filterdText” variable.
  4. The result is printed using “print()”.

Step 2) In the command prompt:

  • Navigate to the location where you saved the file.
  • Run the command python NLTKsample.py.

Run NLTK Script

The output is:

[‘Hello’, ‘Guru99’, ‘You’, ‘have’, ‘build’, ‘a’, ‘very’, ‘good’, ‘site’, ‘and’, ‘I’, ‘love’, ‘visiting’, ‘your’, ‘site’]

FAQs

The pip command installs the library itself, while nltk.download() fetches corpora and trained models such as stopwords, punkt, and wordnet. Both steps are needed before tokenizing or tagging text.

Yes. NLTK remains popular for preprocessing text that feeds LLMs, including tokenization, stop-word removal, stemming, and POS tagging. It is also widely used in teaching and research thanks to its clear API and classic corpora.

NLTK is best for learning NLP fundamentals. spaCy is faster for production, while Hugging Face Transformers offers pretrained deep-learning models. Many AI projects combine NLTK preprocessing with transformer inference.

Summarize this post with: