Informatica ETL tool consists of following services & components
- Repository Service – Responsible for maintaining Informatica metadata & providing access of same to other services.
- Integration Service – Responsible for the movement of data from sources to targets
- Reporting Service - Enables the generation of reports
- Nodes – Computing platform where the above services are executed
- Informatica Designer - Used for creation of mappings between source and target
- Workflow Manager – Used to create workflows and other task & their execution
- Workflow Monitor – Used to monitor the execution of workflows
- Repository Manager – Used to manage objects in repository
In this tutorial- you will learn
The overall architecture of Informatica is Service Oriented Architecture (SOA).
- Informatica Domain is the fundamental administrative unit in Informatica tool
- It is a collection of nodes and services. Further, this nodes and services can be categorized into folders and sub-folders based on the administration requirement.
For example, in the below screenshot, you can see under domain window the folder "Domain_Rajesh" is created under which we have created a node name "node01_rajesh" and services as "guru99 integration services".
Node is a logical representation of a machine inside the domain. Node is required to run services and processes for Informatica.
You can have multiple nodes in a domain. In a domain, you will also find a gateway node.
The gateway node is responsible for receiving requests from different client tools and routing those requests to different nodes and services.
There are two types of services in Domain
- Service Manager: Service manager manages domain operations like authentication, authorization, and logging. It also runs application services on the nodes as well as manages users and groups.
- Application Services: Application service represents the server specific services like integration service, repository service, and reporting service. These services run on different nodes based upon the configuration.
PowerCenter repository is a relational database like Oracle, Sybase, SQL server and it is managed by repository service. It consists of database tables that store metadata.
There are three Informatica Client tools available in Informatica Powercenter. They are Informatica
- Workflow Monitor
- Workflow Manager
These clients can access to the repository using repository service only.
To manage a repository there exists an Informatica service called Repository Service. A single repository service handles exclusively only one repository. Also, a repository service can execute on multiple nodes to increase the performance.
The repository services use locks on the objects, so multiple users cannot modify the same object same time.
You can enable version control in the repository. With the version control feature, you can maintain different versions of the same object.
Objects created in the repository can have following three state
- Valid: Valid objects are those objects whose syntax is correct according to Informatica. These objects can be used in the execution of workflows.
- Invalid: Invalid objects are those who does not adhere to the standard or rules specified. When any object is saved in Informatica, it is checked whether its syntax and properties are valid or not, and the object is marked with the status accordingly.
- Impacted: Impacted objects are those whose child objects are invalid. For example in a mapping if you are using a reusable transformation, and this transformation object becomes invalid then the mapping will be marked as impacted.
As mentioned earlier, domain is the basic administrative control in Informatica. It is the parent entity which consists of other services like integration service, repository service, and various nodes.
The domain configuration can be done using the Informatica admin console. The console can be launched using web browsers.
Once open in a web browser it prompts for administrator login. The password is set during the Informatica installation.
After login into the Informatica domain, the home page looks something like this.
In the left pane it shows the existing nodes, repository services, integration services under the domain.
On the main window, it shows the status of those services, whether those are up or down.
Click on the properties menu in the admin page to view the properties of the domain.
Key properties of the domain are
Resilience timeout – If any of the integration service or repository services goes down then resilience timeout is the no of seconds the application service tries to connect to those services.
Restart Period – It is the maximum number of seconds the domain spends to restart a service.
Dispatch Mode – It is the policy used by the load balancer to dispatch tasks to various nodes.
Database type – The type of database on which domain is configured.
Database host – Hostname of the machine on which domain is configured.
Database port & name – It is the database port and the database instance name for the domain.
These properties can be modified based upon requirement.
PowerCenter client tools are development tools which are installed on the client machines. Powercenter designer, workflow manager, a repository manager, and workflow monitor are the main client tools.
The mappings and objects that we create in these client tools are saved in the Informatica repository which resides on the Informatica server. So the client tools must have network connectivity to the server.
On the other hand, PowerCenter client connects to the sources and targets to import the metadata and source/target structure definitions. So it also must have connectivity to the source/target systems.
- To connect to the integration service and repository service, PowerCenter client uses TCP/IP protocols and
- To connect to the sources/targets PowerCenter client uses ODBC drivers.
The repository service maintains the connections from Powercenter clients to the PowerCenter repository. It is a separate multi-threaded process, and it fetches, inserts and updates the metadata inside the repository. It is also responsible for maintaining consistency inside the repository metadata.
Integration service is the executing engine for the Informatica, in other words, this is the entity which executes the tasks that we create in Informatica. This is how it works
- A user executes a workflow
- Informatica instructs the integration service to execute the workflow
- The integration service reads workflow details from the repository
- Integration service starts execution of the tasks inside the workflow
- Once execution is complete, the status of the task is updated i.e. failed, succeeded or aborted.
- After completion of execution, session log and workflow log is generated.
- This service is responsible for loading data into the target systems
- The integration service also combines data from different sources
For example, it can combine data from an oracle table and a flat file source.
So, in summary, Informatica integration service is a process residing on the Informatica server waiting for tasks to be assigned for the execution. When we execute a workflow, the integration service receives a notification to execute the workflow. Then the integration service reads the workflow to know the details like which tasks it has to execute like mappings & at what timings. Then the service reads the task details from the repository and proceeds with the execution.
Informatica being an ETL and Data integration tool, you would be always handling and transforming some form of data. The input to our mappings in Informatica is called source system. We import source definitions from the source and then connect to it to fetch the source data in our mappings. There can be different types of sources and can be located at multiple locations. Based upon your requirement the target system can be a relational or flat file system. Flat file targets are generated on the Informatica server machine, which can be transferred later on using ftp.
Relational– these types of sources are database system tables. These database systems are generally owned by other applications which create and maintain this data. It can be a Customer Relationship Management Database, Human Resource Database, etc. for using such sources in Informatica we either get a replica of these datasets, or we get select privileges on these systems.
Flat Files - Flat files are most common data sources after relational databases in Informatica. A flat file can be a comma separated file, a tab delimited file or fixed width file. Informatica supports any of the code pages like ascii or Unicode. To use the flat file in Informatica, its definitions must be imported similar to as we do for relational tables.