Informatica Performance Tuning and Optimization Techniques
The purpose of this course is to walk you through the steps to identify performance bottlenecks in Informatica Power Center and different optimization techniques to fix them. After completing this course, you will have a working understanding of the different performance bottlenecks, how to identify them based on the iterative approach and the thread statistics. Different best practices and optimization techniques which should be consideration while designing the mapping or fine tuning the existing ones.
Welcome to the Informatica Power Center course!
What should be the goal of any Performance Tuning exercise? is discussed in this lecture.
Performance Tuning is always an iterative approach. In this lecture, we see how the iterative process works and how to track the numbers.
Pre-requisuts for Performance Tuning
In this lecture, we understand the importance of Business process in resolving the Performance Tuning issues.
In this lecture, we understand the importance of Technology Landscape in resolving the Performance Tuning issues.
In this lecture, we discuss about the impact of development and design process on the performance tuning exercise.
In this lecture, we see how the known issues, concerns and the expectations of multiple stakeholders will help resolve lot of questions on the non functional requirements and fix the performance tuning issues.
What really happens once the workflow is triggered is discussed in this lecture.
How does a Source Bottleneck occur is discussed in this lecture.
How does a Target Bottleneck occur is discussed in this lecture.
How does a Transformation Bottleneck occur is discussed in this lecture.
How does a Session Bottleneck occur?
In this lecture, we discuss about the System Bottlenecks.
A quick review on all the types of bottlenecks.
How to identify different Bottlenecks?
Thread statics gives run time information from the reader, transformation and writer thread. The session log provides enough run time thread statistics to help us understand and identify the performance bottleneck.
Partitions in Informatica
In this lecture,we talk about the different situations in which Partitions can be used.
By default, IS sets partition points at various transformations in the pipeline. Partition points mark thread boundaries and divide the pipeline into stages.A stage is a section of a pipeline between any two partition points.
In this lecture, we talk about the different types of partitions available in Informatica.
In this lecture, the Database Partitions are discussed.
This difference between the Database partitions and Informatica partitions are discussed in this lecture.
What happens at the back end of the database and how Informatica gets the database partition information is discussed in this lecture.
In this lecture, an example of the Database Partitions is shown.
This lecture details the Passthrough partitions with an example.
The PowerCenter Integration Service distributes blocks of data to one or more partitions. Use round-robin partitioning so that each partition processes rows based on the number and size of the blocks.
If the volume of data grows or you add more CPUs, session run time increases. In dynamic partitioning, Integration Service determines the number of partitions to create at run time based on factors such as source database partitions or the number of nodes in a grid.
Types of Dynamic Partitioning:
1. Based on Source Partitioning - Determines the number of partitions using database partition information. The number of partitions is the maximum of the number of partitions at the source.
2. Based on number of CPUs - Sets the number of partitions equal to the number of CPUs on the node that prepares the session. If the session is configured to run on a grid, dynamic partitioning sets the number of partitions equal to the number of CPUs on the node that prepares the session multiplied by the number of nodes in the grid.
3. Based on number of Nodes in Grid - Sets the partitions to the number of nodes in the grid running the session. If you configure this option for sessions that do not run on a grid, the session runs in one partition and logs a message in the session log.
4. Based on No. of Partitions - Sets the partitions to a number that you define in the Number of Partitions attribute. Use the $DynamicPartitionCount session parameter, or enter a number greater than 1.
In this lecture, we talk about the Dynamic Partitions with the option of 'Based on Source Partitions'
Push Down Optimization (PDO)
Pushdown optimization technique in inIormatica pushes the part or complete transformation logic to the source or target database. The integration service translates the transformation logic into SQL queries and sends those SQL statements to the database. Then the source or target database runs those SQL queries to process the transformation logic.
The amount of transformation logic you can push to the database depends on the database, transformation logic, mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database.
To preview the SQL statements and mapping logic that the integration service pushes to the source or target database, use the pushdown optimization viewer.
The Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the source or the target database which executes the SQL queries to process the transformations. The amount of transformation logic one can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service analyzes the transformation logic it can push to the database and executes the SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database.
There are three types of pushdown optimization techniques that you can configure in a session. They are:
- Source-side Pushdown Optimization
- Target-side Pushdown Optimization
- Full Pushdown Optimization
Source-Side Pushdown Optimization:
The Integration Service first analyzes the mapping from the source to the target or until it reaches a downstream transformation that it cannot push to the source database. Then the integration service generates the SELECT statement and executes the SQL in the source database. Then it reads the result rows and process the remaining transformations.
Target-Side Pushdown Optimization:
The Integration Service first analyzes the mapping from the target to the source or until it reaches an upstream transformation that it cannot push to the target database. Then the integration service generates the Insert, Update or Delete statements. The integration service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQLs on the target database.
Full Pushdown Optimization:
You can use full pushdown optimization only when the source and target databases are in the same relational database management system.
The Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. If the Integration Service cannot push all transformation logic to the target database, it tries to push all transformation logic to the source database.
If it cannot push all transformation logic to the source or target, the Integration Service pushes as much transformation logic to the source database, processes intermediate transformations that it cannot push to any database, and then pushes the remaining transformation logic to the target database. The Integration Service generates and executes an INSERT SELECT, DELETE, or UPDATE statement for each database to which it pushes transformation logic.
In this lecture, we discuss the different options available at session level for PDO.
Example of the PDO.
In this lecture, we see the list of different databases which can be used for Push Down Optimization.
This lecture, has the details of Error handling while performing the PDO.
Transformation Optimization Techniques and Best Practices
- Only use SQL Overrides if there is a substantial performance gain or complexity decrease. SQL Overrides need to be maintained manually and any changes to the data structure will result in rewriting or modifying the SQL Override.
- Do use the WHERE condition and SORTED ports in the Source Qualifier if possible, rather than adding a filter or a sorter transformation.
- Delete unused ports / only connect what is used. Reducing the number of records used throughout the mapping provides better performance by minimizing the amount of data moved.
- Tune source qualifier queries to return only the data you need.
- Perform large lookups in the Source Qualifier instead of through a traditional lookup.
- When applicable, generate the default SQL in the Source Qualifier and use the ‘Validate’ option to verify that the resulting SQL is valid.
- Calculate once, use many times. Avoid calculating or testing the same value over and over. Calculate it once in an expression, and set a true/false flag. Within an expression, use variables to calculate a value used several times.
- Create an anchor expression transformation that will map the source table to an intermediary transformation using the source column names. Do simple processes (LTRIM/RTRIM, string/numeric conversions, testing for NULL, etc.) in this transformation. This will enable an easier transition if the source table changes in the future.
- Watch your data types. The engine will automatically convert compatible types. Sometimes conversion is excessive and happens on every transformation which slows the mapping.
- Expression names should begin with "EXP" followed by descriptive words
- Do not propagate ports out of an Expression transformation if they are not used in the mapping going forward.
- Group input-outputs ports first, followed by variable ports and then by output ports. Incorrectly ordering the ports in an Expression transformation can lead to errors and/or inaccurate results.
- Avoid using the $Source and $Target variables in the Lookup Connection Information. Connection names have been set up to be generic across Production, Test. If possible, set the Connection Information in the Lookup transformation to one of these non-level specific connection names.
- Set the connections in the session for ease of migration.
- Do not include any more ports in the Lookup other than necessary. Reducing the amount of data processed provides better performance.
- Avoid date time comparisons in lookup; replace with string.
- Not all sources and targets treat strings with leading or trailing blanks the same. It may be necessary to RTRIM and LTRIM string data prior to using it in a Lookup.
- Lookups on small tables (<10,000) records can be cached and use ODBC. Lookups on large tables should be cached As a general practice, do not use uncached lookups.
- In place of lookups, tables can be joined in the source qualifier. However, often necessitates left joins, which can complicate source qualifiers. Weigh performance vs ease of maintenance when deciding between source qualifiers and lookups.
Try to use reusable sequence generator than using separate sequence generator if you are using it for generating unique primary key.
- Do not use Aggregators for simple sorting; use the sorter transformation or the SORTED ports option of the Source Qualifier.
- Minimize aggregate function calls by using “group by”.
- Do place Aggregators as early in the mapping as possible, as they reduce the number of records being processed, thereby improving performance.
- Wherever possible, sort incoming data to an Aggregated use the ‘Sorted input’ option to improve performance.
- Place Filters as early in the mapping as possible, as they reduce the number of records being processed, thereby improving performance.
- Use a Filter to screen rows that would be rejected by an update strategy. (Rejected rows from an update strategy are logged to the bad file, decreasing performance.)
- If you have Aggregator transformation in mapping, use filter before aggregation to avoid unnecessary aggregation.
- Routers may not be the best choice if load order of the target(s) is important since it is not possible to control the load order of the legs from a router.
- The target load method(s) must be carefully chosen when using routers, especially if the data is loading to the same target, in order to avoid table locks and ensure that the data is loaded in the correct order.
- Whenever possible, perform joins in the database i.e. the source qualifier itself.
- Whenever possible, sort incoming data to a Joiner transformation and use the ‘Sorted input’ option to improve performance.
- To improve performance of an unsorted Joiner transformation, designate the source with fewer s as the ‘Master’.
- Do not codes update strategies when all rows to the target insert.
- DO include an update strategy when all rows to the target are update, unless a proof of concept shows that there is performance degradation. This adds clarity to the mapping for future developers.
- Rejected rows from an update strategy are logged to the bad file. Consider filtering if retaining these rows isn’t critical due to the performance hit caused by logging.
- Avoid loading to the same target between different data flows where possible.
Whenever possible, sort source data in the database i.e. the source qualifier itself.
New Topics are added every week
New lectures will be added on a weekly basis.
This lecture will provide you with the coupons for other courses.