In this tutorial, you will learn –
- Installing NLTK in Windows
- Installing Python in Windows
- Installing NLTK in Mac/Linux
- Installing NLTK through Anaconda
- NLTK Dataset
- How to Download all packages of NLTK
- Running the NLP Script
- How to Run NLTK Script
In this part, we will learn that how to make setup NLTK via terminal (Command prompt in windows).
The instruction given below are based on the assumption that you don’t have python installed. So, first step is to install python.
Step 1) Go to link https://www.python.org/downloads/, and select the latest version for windows.
Note: If you don’t want to download the latest version, you can visit the download tab and see all releases.
Step 2) Click on the Downloaded File
Step 3)Select Customize Installation
Step 4) Click NEXT
Step 5) In next screen
- Select the advanced options
- Give a Custom install location. In my case, a folder on C drive is chosen for ease in operation
- Click Install
Step 6) Click Close button once install is done.
Step 7) Copy the path of your Scripts folder.
Step 8) In windows command prompt
- Navigate to the location of the pip folder
- Enter command to install NLTK
pip3 install nltk
- Installation should be done successfully
NOTE: For Python2 use the commandpip2 install nltk
Step 9) In Windows Start Menu, search and open PythonShell
Step 10) You can verify whether the installation is accurate supplying the below command
If you see no error, Installation is complete.
Installing NLTK in Mac/Unix requires python package manager pip to install nltk. If pip is not installed, please follow the below instructions to complete the process
Step1) Update the package index by typing the below command
sudo apt update
Step2) Installing pip for Python 3:
sudo apt install python3-pip
You can also install pip using easy_install.
sudo apt-get install python-setuptools python-dev build-essential
Now easy_install is installed. Run the below command to install pip
sudo easy_install pip
Step3)Use following command to install NLTK
sudo pip install -U nltk sudo pip3 install -U nltk
Step1) Please install anaconda (which can also be used to install different packages) by visiting https://www.anaconda.com/products/individual and select which version of python you need to install for anaconda.
Note: Refer to this tutorial for detailed steps to install anaconda
Step 2)In the Anaconda prompt,
- Enter command
conda install -c anaconda nltk
- Review the package upgrade, downgrade, install information and enter yes
- NLTK is downloaded and installed
NLTK module has many datasets available that you need to download to use. More technically it is called corpus. Some of the examples are stopwords, gutenberg, framenet_v15, large_grammarsand so on.
Step 1)Run the Python interpreter in Windows or Linux
- Enter the commands
import nltk nltk.download ()
- NLTK Downloaded Window Opens. Click the Download Button to download the dataset. This process will take time, based on your internet connection
NOTE: You can change the download location by Clicking File> Change Download Directory
Step 3) To test the installed data use the following code
>>> from nltk.corpus import brown >>>brown.words()
[‘The’, ‘Fulton’, ‘County’, ‘Grand’, ‘Jury’, ‘said’, …]
We are going to discuss how NLP script will be executed on our local PC. There are many libraries for Natural Language Processing present in the market. So choosing a library depends on fitting your requirements. Here is the list of NLP libraries.
Step1) In your favorite code editor, copy the code and save the file as “NLTKsample.py “
from nltk.tokenize import RegexpTokenizer tokenizer = RegexpTokenizer(r'\w+') filterdText=tokenizer.tokenize('Hello Guru99, You have build a very good site and I love visiting your site.') print(filterdText)
- In this program, the objective was to remove all type of punctuations from given text. We imported “RegexpTokenizer” which is a module of NLTK. It removes all the expression, symbol, character, numeric or any things whatever you want.
- You just have passed the regular Expression to the “RegexpTokenizer” module.
- Further, we tokenized the word using “tokenize” module. The output is stored in the “filterdText” variable.
- And printed them using “print().”
Step2)In the command prompt
- Navigate to the location where you have saved the file
- Run the command Python NLTKsample.py
This will show output as :
[‘Hello’, ‘Guru99’, ‘You’, ‘have’, ‘build’, ‘a’, ‘very’, ‘good’, ‘site’, ‘and’, ‘I’, ‘love’, ‘visiting’, ‘your’, ‘site’]