SAP Data services is leader in the data integration and the transformation space. It is a critical component of the SAP HANA solution. With SAP Data Services, you can quickly discover, cleanse, and integrate data – and make it available for real-time analysis. Discover all formats of data in your organization including unstructured data to understand the business better, discover profitable patterns and make better decisions.
Key Learnings:
- Understand Data Services big picture
- Work with heterogeneous Source and Target Connections
- Build ETL Jobs using data services transforms and functions
- Trace, Validate, and Debug Data Services Jobs
- Work with Global/Local variables, scripts, Custom Functions
- Implement Change Data Capture in Data Services
- Implement error handling
Engaging Teaching:
- Expressive and immersive teaching with demonstrations and learning tools
- Small, purposeful videos, packed with information, designed specifically for a virtual self paced audience.
- Exercises for hands on practice
Introduction to the course,use case, tool and basics
Hello !!! Welcome .
In this video I will walk you through the course structure, the layout. What you except from each topic and some tips for best learning outcomes.
In this video we will understand the ETL process and its aspects from a 50,000 feet level and set the precedence for the coming lectures.
We will understand the positioning of the tool, its core functionalities, its connectivity capabilities and technical and business benefits.
Architecture and GUI
Here we look into the several unique components Data Services relies on to accomplish the data integration and data quality activities required to manage corporate data.
Data Services Designer is a Windows client application used to create, test, and manually execute
jobs that transform data and populate a data warehouse. Using the Designer, create data management applications that consist of data mappings, transformations, and control logic.
Connect to DB's, Flat Files, XML data sources
You are responsible for extracting data into the company's Business Warehouse system and want to convert it using Data Services as the new data transfer process.
A datastore provides a connection or multiple connections to data sources such as a database.
Using the datastore connection. Data Services can import the metadata that describes the data
from the data source
You are working as an ETL developer using SAP Data Services Designer. You will create data stores for the source, target, and staging databases.
You are working as an ETL developer using SAP Data Services Designer. You will create data stores for the source, target, and staging databases.
Data Services determines and stores a specific set of metadata information for tables. Import
metadata by naming, searching, and browsing. After importing metadata, edit column names.
descriptions, and data types.
Lets connect to an SAP ERP system from Data Services.For most Enterprise Resource Planning (ERP) applications. Data Services generates SQL that is optimized for the specific target database (for example. Oracle. DB2, SQL Server. and Inform ix).
we will learn to Use template tables in early application development when you are designing and testing a
project
We well see how to work with flat files, Create file formats, work with different delimiters.
Here we learn how to read multiple flat files with identical formats with a single file format.
By substituting a wild card character or list of file names.
We will learn how to import create excel workbooks, create formats , filter data and more.
Data services can interpret XML's as well. lets see how to create xml file formats and import data
Concept, Working and Scheduling of a ETL job
Lets understand the Data services job structure, the different components and how to sequence them in parallel and series.
Lets see how to build Data flows they contain the source, transform, and target objects that represent the key activities in data integration and data quality processes.
Trouble shooting is a key skill for the developers , it starts with Understanding where to find the error logs, trace's and job statistics.
Top customer concern is to be able to schedule a job for automatic execution. lets see how to do that.
A job is the only executable object in Data Services. When developing data flows, you can
manually execute and test jobs directly in Data Services.
Platform Transforms - They make developers life easy!
Here we will see that the Query transform is the most commonly used transform, and is included in most data flows. enables you to select data from a source and filter it or reformat it as it moves to the target.
Here we learn to use the Case transform to simplify branch logic in data flows by consolidating case or decision making logic into one transform.
Here we learn to use the Merge transform to combine incoming data sets with the same schema structure.
the merge produces a single output data set with the same schema as the input data sets.
The Validation transform enables you to create validation rules and move data into target objects
based on whether they pass or fail validations
Use the SQL transform to submit SQL commands that generate data to be moved into target objects
Using Variables. Parameters, and Scripts
Local variables are restricted to the job or work flow in which they are created. Use parameters to
pass local variables to the work flows and data flows in the object. A local variable is included as
part of the definition of the work flow or data flow, and so it is portable between jobs.
Global variables are also restricted to the job in which they are created. However, they do not
require parameters to be passed to work flows and data flows in that job. You can reference the
global variable directly in expressions for any object of the job.
A script is a single-use object that is used to call functions and assign values in a work flow.
Execute a script before data flows for initialization steps and use a script with conditionals to
determine execution paths. You may also use a script after work flows or data flows to record
execution information such as time, or to record a change in the number of rows in a data set.
Use a script to calculate values that are passed on to other parts of the work flow or to assign
values to variables and execute functions.
If the built-in functions that are provided by Data Services do not meet your requirements. you
can create your own custom functions using the Data Services scripting language.
Create your own functions by writ ing script functions in the Data Services scripting language
using the Smart Editor. Saved custom functions appear under the Custom Functions category in
the Function Wizard and the Smart Editor. Custom functions are also displayed on t he Custom
Using Built-In Functions
Retrieve a value in a table or file based on the values in a different source table or file:
• Return multiple columns from a single lookup.
• Choose from additional operators to specify a lookup condition.
• Specify a return policy for your lookup.
• Perform multiple look ups.
It performs a simple search and replace based on a string value, word value, or an entire field.
# Print functions
print('Host Name: [host_name()]');
print('Repository Name: [repository_name()]');
print('Starting execution of Job: [job_name()] as user: [system_user_name()]');
print('Work Flow Name: [workflow_name()]');
# SYSTEM Info
print('System Date: [sysdate()]');
print('System Time: [systime()]');
print('System_user_name: [system_user_name()]');
$FileName_Pre = 'C:Usershana1DesktopCSV filesCSV.txt';
$FileName= $FileName_Pre;
print ($FileName );
$NewFileName = 'C:Usershana1DesktopCSV files2CDVNEW.txt';
If ( file_exists ( $FileName ) = 1 )
begin
print('File exists');
file_move($FileName,$NewFileName,1);
print('File has been copied');
#file_delete($FileName);
#print('Old File has been deleted');
end
else
begin
print( '**********************************');
print( '*****File does not exist**********');
print( '*****Waiting For The File*********');
wait_for_file($FileName, 120000, 60000);
end
# To Char functions
print(to_char(sysdate(), 'MM'));
print(to_char(sysdate(), 'yyyy.mm.dd'));
print(to_char(sysdate(), 'dd.mm.yyyy'));
#to_char(sysdate(),’MONTH’)
#to_char(sysdate(),’DD’)
#to_char(sysdate(),’YY’)
#to_char(sysdate(),’YYYY’)
#to_char(sysdate(),’HH’)
#to_char(sysdate(),’MI’)
#to_char(sysdate(),’SS’)
#to_char(sysdate(),’FF’)
print(To_date('jun 18,2012', 'MON DD,YYYY'));
print(CAST('13.54','INT'));
print(CAST('12.20','DECIMAL(3,1)'));
print( ltrim('Marilyn', 'Ma'));
print( ltrim('ABCABCD', 'ABC'));
print(ceil(12.12345));
print(ceil(-12.223));
Managing Slowly Changing Dimensions in BODS
Setting up a full CDC solution within Data Services may not be requ ired. Many databases now
have CDC support built into them. such as Oracle. SQL Server, DB2 and SAP Sybase.
Alternatively, you can combine surrogate keys with t he Map Operation transform to change all
UPDATE row types to INSERT row types to capture changes.
Source-based Changed Data Capture (CDC) is t he preferred method of updating data because it
improves performance by extracting the fewest rows. Source-based CDC, also referred to as
incremental extraction. extracts only the changed rows from the source as shown in the figure
Target-based Change Data Capture (CDC) compares the source to the target to determine
which records have changed.
Target-based Change Data Capture (CDC) compares the source to the target to determine
which records have changed.
Converts rows flagged as UPDATE to UPDATE plus INSERT. so
that the original values are preserved in the target. Specify the
column in which to look for updated data.
Use the Pivot transform to convert columns into rows and to
convert rows back into columns.
Batch Job Troubleshooting
Anotations is a sticky-note with folded-down corner
Tracing job - Use trace properties to select the information that Data Services monitors and writes to the
trace log file during a job. Data Services writes trace messages to the trace log associated with the
current job server and writes error messages to the error log associated with the current job server
Use the Interactive Debugger to examine what happens to data after each transform or object in
a flow and to troubleshoot any issues that arise when executing your jobs.
Learn to set up audit rules to ensure the correct data is loaded to the target when executing jobs
Error Handling
Resolve issues if a Data Services job execution is not successful . for example, if a server failure
prevents the completion of the job. Use recoverable work f lows and try/catch blocks to recover
data for sophisticated error handling.
Resolve issues if a Data Services job execution is not successful . for example, if a server failure
prevents the completion of the job. Use recoverable work f lows and try/catch blocks to recover
data for sophisticated error handling.