Data Warehousing
ETL vs ELT: Must Know Differences
What is ETL? ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool...
Teradata is massively parallel open processing system for developing large-scale data warehousing applications. Teradata is an open system. It can run on Unix/Linux/Windows server platform. This tool provides support to multiple data warehouse operations at the same time to different clients.
Teradata Corporation is an American IT firm. It is a vendor of analytic data platforms, application, and other related services. The firm develops a product to consolidate data from the various source and make the data available for analysis.
In this training, you will learn-
Teradata was a division of NCR Corporation. It incorporated in 1979 but parted away from NCR in October 2007. Michael Koehler became the first CEO of Teradata.
Milestones of Teradata Corporation:
1979 - Teradata was incorporated
1984 - Release of first database computer DBC/1012
1986 - Fortune magazine declared Teradata as 'Product of the Year'
1999 - Largest database built using Teradata with 130 Terabytes
2002 - Teradata V2R5 version release with compression and Partition Primary
2006 - Launch of Teradata Master Data Management solution
2008 - Teradata 13.0 released with Active Data Warehousing
2011 - Acquires Teradata Aster and plunges into the Advanced Analytics Space
2012 - Teradata 14.0 introduced
2014 - Teradata 15.0 introduced
2015- Teradata Buys Apps Marketing Platform Appoxee
2016- Terada join hands with Big data
2017- Teradata Acquires San Diego's StackIQ
Teradata offers following powerful features:
Linear Scalability: Offers linear scalability when dealing with large volumes of data by adding nodes to increase the performance of the system.
Unlimited Parallelism: Teradata is based on MPP (Massively Parallel Processing Architecture). So, it is designed to be parallel since the beginning. It can divide a large task into smaller tasks and run them in parallel
Mature Optimizer: Teradata Optimizer can handle up to 64 joins in a query.
Low TCO: Tera data has a low total cost of ownership. It is easy to setup, maintain, and administrate.
Load & Unload utilities: Teradata provides load & unload utilities to move data into/from Teradata System.
Connectivity: This MPP system can connect to channel-attached systems like a mainframe or network-attached systems.
SQL: Teradata supports SQL to interact with the data stored in tables. It provides its extension.
Robust Utilities: Teradata provides robust utilities to import/export data from/to Teradata systems like FastExport, FastLoad, MultiLoad, and TPT.
Automatic Distribution: Teradata can distribute the data to the disks automatically with no manual intervention.
Teradata architecture is a Massively Parallel Processing Architecture.
Three important components of Teradata are:
Teradata Architecture Diagram:
Parsing Engine:
The Parsing Engine parses the queries and prepares the execution plan. It manages sessions for users. It optimizes & sends a request to the users.
So, when the client executes queries for inserting records, Parsing Engine sends the records to Message Passing layer. Message passing layer or BYNET is a software and hardware component. It offers networking capability. It also retrieves the records and sends the row to the target AMP.
AMP:
AMP stands for Access Module Processor. It stores records on these disks. AMP conduct following activities:
When the client runs queries to retrieve records, the Parsing engine sends a request to BYNET. Then BYNET sends the retrieval request to appropriate AMPs.
AMPs search their disks in parallel and recognize the required records and send to BYNET. BYNET sends the records to Parsing Engine which in turn will send to the client.
MPP | SMP |
---|---|
MPP - Massively Parallel Processing. It is Computer system which is attached to many independent arithmetic units or entire microprocessors, that run in parallel. | Symmetric Multi-Processing. In an SMP processing system, the CPU's share the same memory, and as a result code running in one system may affect the memory used by another. |
Databases can expand by adding new CPUs. | SMP databases generally use one CPU to perform database searches. |
In an MPP environment, performance is improved because no resources must be shared among physical computers. | The workload for a parallel job is distributed across the processors in the system. |
Performance of a Massive parallel processing system is linear. However, it will increase in proportion to the number of nodes. | SMP databases can run on multiple servers. However, will share another resource. |
Teradata offers a complete range of product suite to meet Data warehousing and ETL needs of any organization. Important Teradata products are mentioned below:
Product suite name | Usage | Tool links |
---|---|---|
Analytics | Teradata Analytics Platform | Analytics PlatformAnalytics on Hadoop (Aster)Analytics Portfolio (Aster) |
Cloud | High-impact hybrid cloud solutions that help any businesses | Cloud, HybridCloud, Managed (Hadoop) IntelliCloud |
Data Ingestion | Simplify Big Data Streaming | ListenerData Mover |
Data Management | Data management tools used for data protection and recoverability. | Backup and RestoreData MoverColumnar |
Database | Real-time, system analysis tools for DBA for ease of monitoring and system managing. | Database (Teradata)Database (Teradata Express, a Free Download) |
Eco system Management | Eco system tool helps you to Manage your Teradata environment. | Ecosystem Manager Unity |
Workload Management | Workload management tools help you to keep pace with growing business and user demands. | Active System Manager (TASM)Workload Management |
SQL Query Engine | It is a powerful SQL Engine for Hadoop and More | Presto (Free Download) |
Load & Unload Utilities | Fast, fully parallel extract and load utilities. The only products which offer automatic check-point restart and one-step load from mainframes. | Parallel Transporter (TPT) FastLoadMultiLoad |
UDA enabling software | This kind of tools allows Processing Across all Workload Engines. | QueryGrid Listener Teradata AppCenter |
Customer Data Management: Helps to maintain long-lasting relationships with customers.
Master Data Management: Helps to develop an environment where master data can be used, synchronized, and stored.
Finance and Performance Management: Helps organization to improve the speed and quality of financial reporting. It reduces finance infrastructure costs, and proactively manage enterprise performance.
Supply Chain Management: Improve supply chain operations which help to improved customer service, reduced cycle times, and lower inventories.
Demand Chain Management: Helps to Increase customer service levels and sales. It also helps companies to predict the demand for their store item accurately.
TERA DATA | RDBMS | |
---|---|---|
Architectures | Follows Shared Nothing Architecture. | Shared Everything and allows resource contention. |
Processes | MIPS [Millions of Instructions/sec | KIPS [Thousands of institutions/sec] |
Indexes | Better Distribution and Retrieval | Only offers FASI Retrieval |
Parallelism | Supports Un-conditional parallelism. | Parallelism is conditional and unpredictable |
Bulk Load | Teradata allows bulk load. | Allows only limited bulk load. |
Scalability | Linear scalability with a slope of one | Scalability with diminishing returns |
Database buffer | A single database buffer used by all UoP's. (A unite of parallelism). A single data store accessed by all UoP's | Query Controller ships functions to UoP's that own the data |
Stores | It stores TERA BYTES[Billions of rows] | GIGA BYTES[Millions of rows] |
Conclusion
What is ETL? ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool...
{loadposition top-ads-automation-testing-tools} ETL testing is performed before data is moved into...
{loadposition top-ads-automation-testing-tools} A flowchart is a diagram that shows the steps in a...
What is OLTP? OLTP is an operational system that supports transaction-oriented applications in a...
Here are data modelling interview questions for fresher as well as experienced candidates. 1) What...
What is OLAP? Online Analytical Processing (OLAP) is a category of software that allows users to...