Pages

Showing posts with label Tech. Show all posts
Showing posts with label Tech. Show all posts

Thursday, September 29, 2011

Talend Open Studio - The OpenSource ETL Tool

Talend Open Studio is used to perform various ETL process. It is one of the famous Open Source Tool in the market. I have created this post here so that people who are interested in using a Talend Open Studio can have a look at the product through different perspective and do not have to search for multiple websites. I have consolidated my findings about TOS  from different websites. Having used this tool for sometime now, I am still exploring the components and still trying to get aquianted with the User Interface.

Overview of Talend Open Studio: Talend offer a variety of products to meet an organisation data needs. These fall into the two categories:
1.  Community – core functionality available under Open Source GPL
2. Commercial – enhanced functionality under open subscription license.

Image and video hosting by TinyPic

A look into Open Source ETL Tools

Finding a right Open Source ETL tool is dificult. I have done some searching on the internet on different websites and blogs and I have listed out the following Open Source ETL tools.  Different Open Source ETL Tools was studied based on License requirements, history, Open source nature and support of the product. I have singled out four tools namely Talend, CloverETL, Pentaho and SpagoBI. The fundamental overview of these tools are as under:

Talend – It is a startup of French origin that has positioned itself as a pure play of open source data integration and now offers its product - Open Studio. For vendors wishing to embed Open Studio capabilities in their products, Talend has an OEM license agreement. That is what JasperSoft has done, thus creating an open source BI stack to compete with Pentaho's Kettle. Talend is a commercial open source vendor which generates profit from support, training and consulting services. What Open Studio offers is a user-friendly graphical modeling environment as it provides traditional approach for performance management as well as a pushdown optimization (architectural approach). The latter allows users to bypass the actual cost of dedicated hardware to support an ETL engine and enables users to leverage spare capacity of the server within both the source and target environments to power the transformations. Talend Website

CloverETL – Clover ETL is a data transformation and data integration tool (ETL) distributed as a Commercial Open Source software. As the Clover ETL framework is Java based, it is independent and resource- efficient.  CloverETL is used to cleanse, standardize, transform and distribute data to applications, database and warehouses. Clover ETL has been used not only on the most wide spread Windows platform but also on Linux, HP-UX, AIX, AS/400, Solaris and OSX. It can be both used on low-cost PC as on high- end multi processors servers. CloverETL is easy to buy. No complex subscriptions or confusing service tail. Just the software license and maintenance, pure and simple. License is priced by CPU and in easy-to-use packages. Allows you to grow and upgrade to higher levels within the year. CLoverETL Website

Talend Implementation to Load Staging tables

Working with ETL and finding a new tool to implement a code that is already implemented in another tool is a nice work. I am trying to explore the Open Source Tool (Talend Open Studio (TOS)) and yes if I find this Open Source Tool equally powerful as a Licensed tool then it means a lot of savings in the License costs. The preliminary research was obviously on the internet, cosolidating views and posts from different websites and blogs.

After doing some research on thy open source nature of TOS, I explored the tool at the component level. Now I am trying to write my first piece of code, an I am stuck. For me this TOS is not very intuitive in understanding (not like Abiniti ETL). Given a proper classroom training this tool can be easy but learning it on the net with the help of documents posted on the Talend Forum may take time. Not sure though. I have just started.

I have been working with Abinitio ETL earlier and now I have moved to Talend Open Studio. I am new to talend ETL. I am trying to implement a simple logic for which if anyone has a clue please help me. It was pretty simple with Abinitio Tool but I am finding it dificult with Talend ETL.

The Problem: 
 1. I have a table RUNDT which has Runkey, Date and a runflag as attribute. This is extracted using Stored Procedures in MSSQL.

2. If the runflag is "Y" then I have to run the following process:
           2(a): Extract Project_ID from Project_Master table. Project_master tables has number of projects. Means it has multiple records.

           2(b) Now for each project_id, I have to perform the following steps
                        i) Read another Table Resource_table
                       ii)  Along with project_id plus attributes from Resource_table I have to load it into a staging table.

I have implemented the same logic in SSIS but I am finding it difficult to use the tforeach iteration in Talend Open Source.

The snapshot of the SSIS application:


Image and video hosting by TinyPic

All help appreciated!!

I have posted the same questions in Talend Community Forum