SSIS Tutorial for Beginners: What is, Architecture, Packages
What is SSIS?
SQL Server Integration Service (SSIS) is a component of the Microsoft SQL Server database software that can be used to execute a wide range of data migration tasks. SSIS is a fast & flexible data warehousing tool used for data extraction, loading and transformation like cleaning, aggregating, merging data, etc.
It makes it easy to move data from one database to another database. SSIS can extract data from a wide variety of sources like SQL Server databases, Excel files, Oracle and DB2 databases, etc.
SSIS also includes graphical tools & wizards for performing workflow functions like sending email messages, FTP operations, data sources, and destinations.
Why we use SSIS?
Here, are key reasons for using SSIS tool:
- SSIS tool helps you to merge data from various data stores
- Automates Administrative Functions and Data Loading
- Populates Data Marts & Data Warehouses
- Helps you to clean and standardize data
- Building BI into a Data Transformation Process
- Automating Administrative Functions and Data Loading
- SIS contains a GUI that helps users to transform data easily rather than writing large programs
- It can load millions of rows from one data source to another in very few minutes
- Identifying, capturing, and processing data changes
- Coordinating data maintenance, processing, or analysis
- SSIS eliminates the need of hardcore programmers
- SSIS offers robust error and event handling
History of SSIS
Prior, to SSIS, SQL Server, Data Transformation Services (DTS) was used, which was part of SQL Server 7 and 2000
Version | Detail |
---|---|
SQL Server 2005 | The Microsoft team decided to revamp DTS. However, instead of update DTS, they decided to name the product Integration Services (SSIS). |
2008 SQL server version | Plenty of performance improvements were made to SSIS. New sources were also introduced. |
SQL Server 2012 | It was the biggest release for SSIS. With this version, the concept of the project deployment model introduced. It allows entire projects, and their packages are deployed to a server, in place of specific packages. |
SQL Server 2014 | In this version, not many changes are made for SSIS. But new sources or transformations were added which was done by separate downloads through CodePlex or the SQL Server Feature Pack. |
In SQL Server 2016 | The version allows you to deploy entire projects, instead, of individual packages. There are additional sources especially cloud, and big data sources and few changes were made to the catalog. |
SSIS Salient Features
Here, are some important SSIS basics features:
- Studio Environments
- Relevant data integration functions
- Effective implementation speed
- Tight integration with other Microsoft SQL family
- Data Mining Query Transformation
- Fuzzy Lookup and Grouping Transformations
- Term Extraction and Term Lookup Transformations
- Higher speed data connectivity components such as connectivity to SAP or Oracle
SSIS Architecture
Following are components of SSIS architecture:
- Control Flow (Stores containers and Tasks)
- Data Flow (Source, Destination, Transformations)
- Event Handler (sending of messages, Emails)
- Package Explorer (Offers a single view for all in package)
- Parameters (User Interaction)
Let’s understand each component in detail:
1. Control Flow
Control flow is a brain of SSIS package. It helps you to arranges the order of execution for all its components. The components contain containers and tasks which are managed by precedence constraints.
2. Precedence Constraints
Precedence constrain are package component which direct tasks to execute in a predefined order. It also defines the workflow of the entire SSIS package. It controls the execution of the two linked tasks by executing the destination tasks based on the result of the earlier task — business rules which are defined using special expressions.
3. Task
A ‘Task’ is an individual unit of work. It is the same as a method/function used in a programming language. However, in SSIS, you don’t use coding methods. Instead, you will use drag & drop technique to design surface and to configure them.
4. Containers
The container is units for grouping tasks together into units of work. Apart from offering visual consistency, it also allows you to declare variables and event handlers which should be in the scope of that specific container.
Four types of containers in SSIS are:
- A Sequence Container
- A For Loop Container
- Foreach Loop Container
Sequence Container: allows you to organize subsidiary tasks by grouping them, and allows you to you apply transactions or assign logging to the container.
For loop container:Provides the same functionality as the sequence Container except that it also lets you run the tasks multiple times. However, it is based on an evaluation condition, like a looping from 1 to 100.
For each Loop Container: It also allows looping. But the difference that instead of using a condition expression, loop s done over a set of objects, likes files in a folder.
5. Data Flow
The main use of the SSIS tool is to extract data into the server’s memory, transform it, and write it to another destination. If Control Flow is the brain, Data Flow is the heart of SSIS
6. SSIS Packages
Another core component of SSIS is the notion of a package. It is a collection of tasks which execute in an orderly fashion. Here, president constraints help manage the order in which the task will execute.
A package can help you to saves files onto a SQL Server, in the msdb or package catalog database. It can save as a .dtsx file, which is a structured file very similar to .rdl files are to Reporting Services.
7. Parameters
Parameters behave much like a variable but with a few main exceptions. It can be set outside the package easily. It can be designated as values that must be passed in for the package to start.
SSIS Tasks Types
In SSIS tool, you can add a task to control flow. There are different types of tasks which perform various kinds of works.
Some important SSIS tasks are mentioned below:
Task Name | Descriptions |
---|---|
Execute SQL Task | As its name suggests, it will execute a SQL statement against a relational database. |
Data Flow Task | This task can read data from one or more sources. Transform the data when it is in the memory and write it out against one or more destinations. |
Analysis Services Processing Task | Use this task to process objects of a Tabular model or as an SSAS cube. |
Execute Package Task | Use can use this SSIS task to execute other packages from within the same project. |
Execute Process Task | With the help of this task, you can specify command line parameters. |
File System Task | It performs manipulations in the file system. Like moving, renaming, deleting files, and creating directories. |
FTP Tasks | It allows you to perform basic FTP functionalities. |
Script Task | This is a blank task. You can write NET code which performs any task; you want to perform. |
Send Mail Task | You can send an email to notifying users that your package has is finished, or some error occurs. |
Bulk Insert Task | Use can loads data into a table by using the bulk insert command. |
Script Task | Runs a set of VB.NET or C# coding inside a Visual Studio environment. |
Web Service Task | It executes a method on a web service. |
WMI Event Watcher Task | This task allows the SSIS package to wait for and respond to certain WMI events. |
XML Task | This task helps you to merge, split, or reformat any XML file. |
Other Important ETL tools
- SAP Data Services
- SAS Data Management
- Oracle Warehouse Builder (OWB)
- PowerCenter Informatica
- IBM Infosphere Information Server
- Elixir Repertoire for Data ETL
- Sargent Data Flow
Advantages and Disadvantages of using SSIS
SSIS tool the offers the following advantages:
- Broad documentation and support
- Ease and speed of implementation
- Tight integration with SQL Server and visual studio
- Standardized data integration
- Offers real-time, message-based capabilities
- Support for distribution model
- Helps you to remove network as a bottleneck for insertion of data by SSIS into SQL
- SISS allows you to use the SQL Server Destination instead of OLE DB to load the data faster
Disadvantages of SSIS
Few drawbacks of using SSIS tools are as follows:
- Sometimes create issues in non-windows environments
- Unclear vision and strategy
- SSIS lacks support for alternative data integration styles
- Problematic integration with other products
SSIS Best Practices Example
- SSIS is an in-memory pipeline. That’s why it’s important to make sure that all transformations occur in memory
- Try to minimize logged operations
- Plan for capacity by understanding resource utilization
- Optimize the SQL lookup transformation, data source, and destination
- Schedule and distribute it correctly
Summary
- The full form of SSIS is SQL Server Integration Services
- SSIS tool helps you to merge data from various data stores
- Important versions of SQL Server Integration Service are 2005, 2008, 2012, 2014 and 216
- Studio Environments, Relevant data integration functions, and Effective implementation speed are some important features of SSIS
- Control Flow, Data Flow, Event Handler, Package Explorer, and Parameters are essential SSIS architecture components
- Execute SQL Task, Data Flow Task, Analysis Services Processing Task, Execute Package Task, Execute Process Task, File System Task, FTP Tasks, Send Mail Task, Web Service Task are some important
- Broad documentation and support
- The biggest drawback of SSIS is that it lacks support for alternative data integration styles
- SAP Data Services, SAS Data Management, Oracle Warehouse Builder (OWB), PowerCenter Informatica, IBM Infosphere Information Server
- SSIS is an in-memory pipeline. Therefore, it’s essential to make sure that all transformations occur in memory