Monday, February 20, 2012

Working with Oracle Sequence Objects in Talend Open Studio


In case you need a column that contains unique, sequentially generated numbers, you can realize autonumber fields (also known as "auto-increment" fields) in RDMBS engines like Oracle and PostgreSQL by using sequence objects. This is often the case of the primary surrogate keys in DWH environments.

In Oracle, a sequence object is a separate structure specifically created to generate sequential values; it can be efficiently and concurrently accessed by more than one process at time, and the same sequence can be used for one or more tables.

If you are using an ELT tool like Oracle Warehouse Builder, the access and increment of the sequence object into a mapping is pretty much easy: we simply drag and drop the sequence object itself, and connect the "NEXTVAL" field to the desired column of the target table.

Mapping of a sequence object in Oracle Warehouse Builder.


With our favourite Data Integration tool, however, things are a bit more complicated. Suppose we don' t wanna delegate the "autoincrement" logic to the ETL process, and that we simply wanna insert new rows into one table, letting the RDBMS manage the sequential logic by itself. In this case, we would obviously make use of a sequence object:


DROP  SEQUENCE seq_name;

CREATE  SEQUENCE seq_nameSTART WITH 1 INCREMENT BY 1 MINVALUE 1 NOCACHE NOCYCLE NOORDER;


The problem is that, with Talend Open Studio, we don't have a way to directly access the sequence object inside the ETL flow - and we can't therefore implement any incremental logic at ETL level; neverthless, the sequence object doesn't simply increment by itself - you have to access it in some way, at application level.

We can' t directly access an Oracle sequence object in Talend Open Studio.


The solution is to implement a trigger object: each row in the target table will trigger an increment of the sequence object, whose value will be stored in the "autoincrement" column itself: 


CREATE OR REPLACE TRIGGER  trg_nameBEFORE INSERT ON  table_name FOR EACH ROW WHEN     (new.autoincr_col IS NULL)  BEGIN    SELECT       seq_name.NEXTVAL    INTO         :new.autoincr_col    FROM        DUAL;  END;/

The trigger object makes slower the loading process - as a matter of fact each insert row triggers a different operation. However, it perfectly works in case of small amount of insert data.

Do not forget to add the backslash "/" at the end of the trigger creation script.


3 comments:

  1. Our Company http://www.sqiar.com/ Which Provide Services to Small and Medium Size of Businesses.the Sqair Provide Data warehousing (Data Mining,Data Modelling).in this Services We Present Company Data in Meaningful Way.so User Easily Understand it.and also the help of Data Warehousing its Reduce the Company Complexity.

    ReplyDelete
  2. I had such hopes when I saw this article.

    I am migrating some data using Talend. The DBAs designed two "kind-of" related child tables, but the required foreign keys to the table that has no relation to the parent table (EQ_Mode). And the "closer" child table (An_Mode), while having the key from the parent, requires the (valid) foreign key of the further out child's record, before I can insert ITs record.

    It's driving me crazy. Why would you suspend a child table from a table with no relationship to the parent?

    Anyway, I have a partial solution, as poor as it is:

    Use the parent table as the MAIN input to tMap.
    Create a view (or modify the SQL of a table - make a fake table) which just gets the NextVal of the poor relation table (EQ_Mode.PK_ID).
    This is a "Lookup" into the tMap. Link this to the parent table so that on every record of the parent, the sequence.NextVal for EQ_Mode is generated. Another input to this first tMap is the record source that EQ_Mode needs... and the output is the new EQ_Mode, complete with a referenced PK_ID.
    Pass the parent's PK_ID, the EQ_Mode.PK_ID, and some other stuff to the next tMap. This tMap also as the record source that will become AN_Mode, and with the addition of PK_ID -- which, at this stage in the workflow has been inserted into EQ_Mode, so the foreign key constraint is met.
    It's slow, it's kludgey, and it's currently broken, but I DID have it working... and so I updated other required workflow, and now my NextVal query won't run for every parent record. I think I hosed my join, and I haven't figured it back out yet. Discouraging, but I know it's in there somewhere.

    There's probably an easier way, but I don't know it, and I'm not finding it on the web.

    Cheers, and thanks for your article.

    ReplyDelete