Pentaho Data Integration Tutorial: What is, Pentaho ETL Tool

What is Pentaho BI?

Pentaho is a Business Intelligence tool which provides a wide range of business intelligence solutions to the customers. It is capable of reporting, data analysis, data integration, data mining, etc. Pentaho also offers a comprehensive set of BI features which allows you to improve business performance and efficiency.

Features of Pentaho

Following, are important features of Pentaho:

  • ETL capabilities for business intelligence needs
  • Understanding Pentaho Report Designer
  • Product Expertise
  • Offers Side-by-side subreports
  • Unlocking new capabilities
  • Professional Support
  • Query and Reporting
  • Offers Enhanced Functionality
  • Full runtime metadata support from data sources

Pentaho BI suite

Now, we will learn about Pentaho BI suite in this Pentaho tutorial:

Pentaho BI Suite
Pentaho BI Suite

Pentaho BI Suite includes the following components:

Pentaho Reporting

Pentaho Reporting depends on the JFreeReport project. It helps you to fulfill your business reporting needs. This component also offers both scheduled and on-demand report publishing in popular formats such as XLS, PDF, TXT, and HTML.


It offers a wide range of analysis a wide range of features that includes a pivot table view. The tool provides enhanced GUI features (using Flash or SVG), integrated dashboard widgets, portal, and workflow integration.

Moreover, Pentaho Spreadsheet Services allows a user to browse, pivot, and use chart from within MS Excel.


The dashboard offers Reporting and Analysis, which contribute content to Pentaho Dashboards. The self-service dashboard designer includes extensive built-in dashboard templates and layout. It allows business users to build personalized dashboards with little training.

Data Mining

Data mining tool discovers hidden patterns and indicators of future performance. It offers the most comprehensive set of machine learning algorithms from the Weka project, which includes clustering, decision trees, random forests, principal component analysis, neural networks.

It allows you to view data graphically, interact with it programmatically, or use multiple data sources for reports, further analysis, and other processes.

Pentaho Data Integration

This component is used to integrate data wherever it exists.

Rich transformation library with over 150 out-of-the-box mapping objects.

It supports a wide range of data source which includes more than 30 open source and proprietary database platforms, flat files. It also helps Big Data analytics with integration and management of Hadoop data.

Who are using Pentaho BI?

Pentaho BI is a widely used tool by may software professionals like:

  • Open source software programs
  • Business analyst and researcher
  • College students
  • Business intelligence councilor

How to Install Pentaho in AWS

Following is a step by step process on How to Install Pentaho in AWS.

Step 1) Click Continue to Subscribe
Go to and click Continue

Install Pentaho in AWS

Step 2) Accept Terms & Conditions
On next page, Accept License Agreement

Install Pentaho in AWS

Step 3) Click Continue to Configuration
Proceed for Configuration

Install Pentaho in AWS

Step 4) Click Continue to Launch
Keep the settings default, and Click to Launch

Install Pentaho in AWS

Step 5) Wait for 5 minutes for instance to launch
Check the usage instructions and wait

Install Pentaho in AWS

Step 6) Get Public IP
Copy Public IP of the instance.

Install Pentaho in AWS

Step 7) Use the public IP for Login
Paste public IP of the instance to access Pentaho.

Install Pentaho in AWS

Prerequisite of Pentaho

  • Hardware requirements
  • Software requirements
  • Downloading and installing Bl suite
  • Starting the Bl suite
  • Administration of the Bl suite

Hardware requirement

The Pentaho Bl Suite software does not have any fix limits on a computer or network hardware as long as you can meet the minimum software requirements. It is easy to install this Business intelligence tool. However, a recommended set of system specifications:

RAM Minimum 2GB
Hard drive space Minimum 1GB
Processor Dual-core EM64T or AMD64

Software requirements

  • Installation of Sun JRE 5.0
  • The environment can be either 32-bit or 64-bit
  • Supported Operating systems: Linux, Solaris, Windows, Mac
  • A workstation that has a modern web browser interface such as Chrome, Internet Explorer, Firefox

To start Bl-server

  • On Windows from the start, button click on start Bl server icon.
  • On Linux OS run start-pentaho script on /biserver-ce/directory

To start the administrator server

  • On Windows from start button click on start Bl enterprise server.
  • For Linux: goto the command window and run the start-up script in /biserver-ce/administration-console/directory.

To Stop administrator server

  • To stop the server in windows, click on stop bi-server icons.
  • On Linux. You need to go to the terminal and goto installed directory and run stop.bat

Pentaho Administration Console

Report Designer

It is an advanced report creation tool. This is an ideal tool for you if you want to build a complete data-drive report. This tool offers plenty of flexibility and functionality than the ad hoc reporting capabilities of the Pentaho User Console.

Design Studio

It is an Eclipse-based tool. It allows you to hand-edit a report or analysis. It is widely used to add modifications to an existing report that cannot be added with Report Designer.

Aggregation Designer

This graphical tool allows you to improve Mondrian cube efficiency.

Metadata Editor

It is used to add custom metadata layer to any existing data source.

Pentaho Data Integration

The Kettle extract, transform, and load (ETL) tool, which enables

Pentaho Tool vs. BI stack

Pentaho Tool BI Stack
Data Integration (PDI) ETL
It offers metadata Editor It provides metadata management
Pentaho BA Analytics
Reports Designer Operational Reporting
Saiku Ad-hoc Reporting
CDE Dashboards
Pentaho User Console (PUC) Governance/Monitoring

Advantages of Pentaho

Now in this Pentaho data integration tutorial, we will learn about some advantages of Pentaho Business Intelligence Tool:

  • Pentaho BI is a very intuitive tool. With some basic concepts, you can work with it.
  • Simple and easy to use Business Intelligence tool
  • Offers a wide range of BI capabilities which includes reporting, dashboard, interactive analysis, data integration, data mining, etc.
  • Comes with a user-friendly interface and provides various tools to Retrieve data from multiple data sources
  • Offers single package to work on Data
  • Has a community edition with a lot of contributors along with Enterprise edition.
  • The capability of running on the Hadoop cluster
  • JavaScript code written in the step components can be reused in other components.

Disadvantages of Pentaho

Here, are cons/drawbacks of using Pentaho BI tool:

  • The design of the interface can be weak, and there is no unified interface for all components.
  • Much slower tool evolution compared to other BI tools.
  • Pentaho Business analytics offers a limited number of components.
  • Poor community support. So, if you don’t get a working component, you need to wait till the next version is released.


  • Pentaho is a Business Intelligence tool which provides a wide range of business intelligence solutions to the customers
  • It offers ETL capabilities for business intelligence needs.
  • Pentaho suites offer components like Report, Analysis, Dashboard, and Data Mining
  • Pentaho Business Intelligence is widely used by 1) Business analyst 2) Open source software programmers 3) Researcher and 4) College Students.
  • The installation process of Pentaho includes: 1)Hardware requirements 2) Software requirements, 3) Downloading Bl suite, 4) Starting the Bl suite, and 5) Administration of the Bl suite
  • Important components of Pentaho Administration console are 1) Report Designer, 2) Design Studio, 3) Aggregation Designer 4) Metadata Editor 5) Pentaho Data Integration
  • Pentaho is a Data Integration (PDI) tool while BI stack is an ETL Tool.
  • The biggest advantage of Pentaho is that it is simple and easy to use Business Intelligence tool.
  • The main drawback of Pentaho is that it is a much slower tool evolution compared to other BI tools