Database and Business Intelligence : 2013

Wednesday, September 18, 2013

How to Delete duplicate records in Oracle table

Many times we get duplicate records in a table. Mostly we can truncate the whole table and reload it again from source, but if the table is big and takes long time to reload, we should probably not truncate the table and reload it again. We can manually run a script to delete all the duplicates from the table and make it clean for users.

There are various approaches through which we can delete duplicate records, but my favorite is by row_id

Here is a script to perform that

DELETE FROM
table_name A
WHERE
a.rowid >
ANY (
SELECT
B.rowid
FROM
table_name B
WHERE
A.col1 = B.col1
AND
A.col2 = B.col2
);

Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause.
Hope this is helpful. :)

Tuesday, July 16, 2013

How to truncate a partition of a table in Oracle with Informatica Pre SQL

So, I only wanted to delete a set of data from a huge table and not wanted to truncate the whole table, as it would take long time to reload all the data.So I decided to partition the table by list on a column values. I created informatica mappings for each partition separately and put all the sessions in one workflow. Now here is the trick I applied. For each session's properties in under mapping tab, click on the target and scroll down, you will see a PRE SQL section where you write a truncate statement.

ALTER TABLE table_name TRUNCATE PARTITION partition_name;

Click OK and save the workflow. I did this for all the sessions, basically for each partition I had created.
This allows me to run the sessions manually if I want and run one at a time or all of them. Pre and Post sql options are awesome and gives you lots of flexibility to work with.

List all the table information with specified column_name

Say, we have a column_name and we want to find out which table it belongs to, or to know all the tables and views have this column. Here is my way

SELECT * FROM ALL_TAB_COLUMNS
WHERE COLUMN_NAME = 'ID'

OR

SELECT * FROM ALL_TAB_COLUMNS
WHERE COLUMN_NAME LIKE '%ID%'

Always use upper case for column_name value. It will list all the tables info along with the owner schema.

Tuesday, June 25, 2013

How to Create Index on a table in Oracle

Here is the syntax to create an index on a table in oracle:

CREATE INDEX INDEX_NAME ON TABLE_NAME(COLUMN_NAME);

If you want to specify tablespace you can do it as follows:

CREATE INDEX INDEX_NAME ON TABLE_NAME(COLUMN_NAME)
TABLESPACE TABLE_SPACE;

Isn't that easy?

Wednesday, June 19, 2013

How to add more columns to an existing table with a simple SQL in Oracle

Most of you may know that you can add a column easily with Toad by going through add column wizard or Alter table wizard. Let's say if you have to add lots of columns to a table, I would rather write a simple sql code than to add columns through wizard. Here is the simple sql you can write for the same

Alter table_name add

(column_a datatype,

column_b datatype,

column_c datatype,

column_d datatype,

....

)

You can add as many columns as you want in just one shot. Is' nt that useful??

Create a Table from another table/sql with/without copying data in Oracle

Sometimes we need to create a table from a specified pl/sql code or other table. Say we only want to create the table columns and not copy the data. Here is the sql code to create a table from another table along with data.

Create table table_name as Select * from table_a

You can substitute red part of the query to any sql query. This will create table with all the data in it. Now, I want to create the same table but without copying the data. It's actually very simple, you only have to add a where clause to restrict the row number to zero. This is how you can write it:

Create table table_name as Select * from table_a where 1=0

This could be useful when you have to create a table from different table columns. This automatically adapts datatypes of columns.

Wednesday, June 5, 2013

How to add ON DELETE CASCADE feature to a table/constraint when the foreign key constraint is already added?

Well, I figured out that we cannot add ON DELETE CASCADE to an existing constraint or already added foreign key constraint. You need to drop the constraint and recreate it with the ON DELETE CASCADE added to it. Here is how you can do it.

1.First drop the constraint.
2.Add the new constraint with ON DELETE CASCADE option.

Here is the example of syntax:

ALTER TABLE table_1
DROP CONSTRAINT constraint_1

This will drop the constraint.

ALTER TABLE table_1
DROP CONSTRAINT constraint_1 FOREIGN KEY(a) REFERENCES table_2(a)
ON DELETE CASCADE

This option will enable you to delete the child record automatically whenever the parent record is deleted.

Tuesday, June 4, 2013

Execute Stored Procedure in Informatica Pre SQL

There are times in practice when you may want to execute a stored procedure in pre sql of a session. For example: In my case, I wanted to truncate a table before loading data into it each time I run the workflow/mapping/session. I know you can truncate a table by selecting an option inside a session task(Target properties), but I was not able to do that because of limited privileges. The connection was not allowing me truncating a table. To resolve it, I got to call a stored procedure, which is written to truncate the related table. Now my job was to execute the stored procedure each time the session is run. There are 2 methods to do so

1) Create a dummy mapping and use the Stored Procedure transformation in it and run this mapping before my required mapping each time.

2) Write a pre sql to execute the stored procedure.

I chose option 2 as it takes less time and effort. So here is how it works:

Open the session you want to edit.
Click on the mappings tab.
Click on the target properties.
Go to the Pre-Sql.
Write following command

CALL SCHEMA_NAME.SP_NAME();
OR
CALL SP_NAME();

Click OK and Save.

I have to use schema name as I was using a different schema/role in connection. Do not use 'EXEC' , instead use CALL and remember to use braces in the end. Picture might help :)

You can also use Post SQL depending on the needs and requirements. Just remember the syntax to call stored procedure.

Friday, May 17, 2013

How to Resume a Failed Workflow in Informatica

I am pleased to write this article, as today I have figured out a very important feature of Informatica. I was in a Informatica user group meeting about a month ago. We discussed there that the other organizations are using a third party scheduling tool to schedule workflows of Informatica. I asked them why they do so and they replied that if something fails in the middle of the workflow, there is no way to resume the workflow from the point they want it. They need to run the whole workflow again. Well, that's not true. There is a way to resume the execution of a workflow.

Here is what we need to do in order to resume a workflow execution.

Open the workflow in workflow manager.
Right click on the task you want to resume execution from.
Click on the 'Start Workflow from Task' option.

That's it.

This will run all the tasks after it, that means all the way down. Note that if there is anything which do not have dependency on it or not in the downstream, will not be executed.

Wednesday, May 15, 2013

How to find the max column value from more than one tables

I was needed to find out the maximum value for a column from 16 tables. The column was ROW_LAST_MODIFIED_THIS and I was needed to compare it with a source table field value, which contains the row_last_modified value at source side. The idea behind was to compare both the fields and get only records which are new or updated later than the max modification date. Here is how I figured it out:

SELECT * FROM table WHERE
table.ROW_LAST_MODIFIED > NVL((select max(a) from(
(SELECT MAX(row_last_modified_this) as a
FROM table1) union all
(SELECT MAX(row_last_modified_this) as a
FROM table2) union all
(SELECT MAX(row_last_modified_this) as a
FROM table3) union all
(SELECT MAX(row_last_modified_this) as a
FROM table4) union all
(SELECT MAX(row_last_modified_this) as a
FROM table5) union all
(SELECT MAX(row_last_modified_this) as a
FROM table6) union all
(SELECT MAX(row_last_modified_this) as a
FROM table7) union all
(SELECT MAX(row_last_modified_this) as a
FROM table7) union all
(SELECT MAX(row_last_modified_this) as a
FROM table8) union all
(SELECT MAX(row_last_modified_this) as a
FROM table9) union all
(SELECT MAX(row_last_modified_this) as a
FROM table10) union all
(SELECT MAX(row_last_modified_this) as a
FROM table11) union all
(SELECT MAX(row_last_modified_this) as a
FROM table12) union all
(SELECT MAX(row_last_modified_this) as a
FROM table13) union all
(SELECT MAX(row_last_modified_this) as a
FROM table14)))
,SYSDATE-36500)

You can use it with any number of tables. The field you are comparing should contain the similar value, like in this example it is date. Hope it will help :)

Friday, March 15, 2013

WorkFlow Manager Problem(Repository Navigator invisible)

Few days back, I was working with Informatica and I faced a problem in the Workflow Manager.The repository navigator window in Workflow manager just went away. I was not able to bring it back anyhow. Probably it was invisible :) . I tried few things listed below.

Clicked on View menu option. Checked the Repository Navigator.
Restarted Whole application.
Rebooted System.

Nothing helped. I was still not able to see then Repository Navigator in Workflow Manager. Finally what worked is as below.

1.Exit from all Informatica applications.

2. From the Start menu, select Run.

3. Type REGEDIT in the Open textbox and click OK.

4. Look for the HKEY_CURRENT_USER folder and expand it.

5. Look for the Software folder and expand it.

6. Look for the Informatica folder and expand it.

7. Look for the PowerMart Client Tools folder and expand it.

8. Look for the version number(9.x in my case) folder and expand it.

9. Look for the Workflow folder and select it.

10. Export the Registry file. (From the Registry menu, select Export
Registry File…).

11. Save the Registry file on your Desktop.

12. Open the Registry file from your Desktop (Double click on the file, when
prompted click OK, click OK again).

13. Delete the Workflow folder in REGEDIT.

14. Close any applications currently running on your computer.

15. Restart your computer.

16. Log into the Informatica Client tool that displayed the corrupted output
window (i.e. Informatica Workflow).

17. The output window should now be fixed and displayed properly.

18. Delete the Registry file from your Desktop.

Hope this will be useful.

ETL

Most of the organizations deal with different types of data sources. They may get data in different formats from different databases. To store data in Data Warehouse it is required that we convert all the data in same format and make it usable. There are several concepts, I am going to talk about ETL process.

What is ETL?

Extract - to extract/read data from different sources

Transform - Perform the required tranformations on the data

Load - Load the data into the Target/Data Warehouse.

There are several tools in the market to perform ETL or Data Integration. Whenever we need to perform an ETL, we should first come up with a strategy. There are following suggestions for a good ETL strategy.

Keep it simple.
Try to break the task in pieces in order to accomplish the whole task.
Use data driven methods.
Avoid custom SQL codes.
Use files, variables and parameters to avoid rewriting values and to achieve efficiency.

Here are some qualities of a good ETL architecture:

The performance should be good.
The ETL design should be scalable.
Migratibility is another quality.
Robustness and ability to recover are very important.
Operable (completion-codes for phases, re-running from checkpoints, etc.)
Auditable (in two dimensions: business requirements and technical troubleshooting)

Thursday, March 14, 2013

How to Create DataBase Links in ORACLE

Many times we need to interact with different databases at same time. I assume many of you, like me has faced this situation. There is an easy way to do so. DB Links are really useful. I mostly use them while data comparison and validation, but they can be used for much more. Here is a way you can easily create and use a DB Link.

CREATE DATABASE LINK "Link_Name"
CONNECT TO Schema Name
IDENTIFIED BY Password
USING 'DB Alias';

This code will create the DB Link and then you can use it in following manner:

Select * from TableName@Link_Name

You can execute this script in other database connections.