Most of the organizations deal with different types of data sources. They may get data in different formats from different databases. To store data in Data Warehouse it is required that we convert all the data in same format and make it usable. There are several concepts, I am going to talk about ETL process.
What is ETL?
Extract - to extract/read data from different sources
Transform - Perform the required tranformations on the data
Load - Load the data into the Target/Data Warehouse.
There are several tools in the market to perform ETL or Data Integration. Whenever we need to perform an ETL, we should first come up with a strategy. There are following suggestions for a good ETL strategy.
- Keep it simple.
- Try to break the task in pieces in order to accomplish the whole task.
- Use data driven methods.
- Avoid custom SQL codes.
- Use files, variables and parameters to avoid rewriting values and to achieve efficiency.
Here are some qualities of a good ETL architecture:
- The performance should be good.
- The ETL design should be scalable.
- Migratibility is another quality.
- Robustness and ability to recover are very important.
- Operable (completion-codes for phases, re-running from checkpoints, etc.)
- Auditable (in two dimensions: business requirements and technical troubleshooting)
No comments:
Post a Comment