Pages

Thursday, September 29, 2011

Talend Open Studio - The OpenSource ETL Tool

Talend Open Studio is used to perform various ETL process. It is one of the famous Open Source Tool in the market. I have created this post here so that people who are interested in using a Talend Open Studio can have a look at the product through different perspective and do not have to search for multiple websites. I have consolidated my findings about TOS  from different websites. Having used this tool for sometime now, I am still exploring the components and still trying to get aquianted with the User Interface.

Overview of Talend Open Studio: Talend offer a variety of products to meet an organisation data needs. These fall into the two categories:
1.  Community – core functionality available under Open Source GPL
2. Commercial – enhanced functionality under open subscription license.

Image and video hosting by TinyPic

Talend Open Studio and Open Profile are the community versions and include the core base functionality. Talend Integration Suite has the same basic core functionality as Open Studio with enhanced functions dependant on version. Functionality increases through team to professional to enterprise editions and includes specific versions for real-time EAI and high data volumes. The data quality product is a commercial add-on to the Integration Suite product. Talend claims to support more than 100 different source systems. These include different DBMSs, files, and web services, Subversion logs etc. Further CRM systems like SugarCRM, CentricCRM,  SalesForce.com,  and VtigerCRM are supported. The ETL jobs are specified in a GUI. Like Talend  Open Studio generates code for a stand‐alone ETL application. The generated code is Java or Perl. It is possible for the user to specify transformations (also in Java or Perl). Further,Talend Open Studio comes with a set of predefined transformations, including six for data quality (matching, replacing, etc.) and from version 2.4 also name and address parsing (but only when Perl code is generated). The generated code can execute parts in parallel, but the parallelism support in Talend is still being extended. Talend supports slowly changing dimensions and bulk load, while incremental load is done by use of look‐ups possibly followed by inserts or updates.

Platform:  Talend can be installed on Windows, Unix and Linux.

 Key Features
  • GUI: Drag and drop interface for components, connectors and relationships.
  • Code: Generates components as Perl, Java or native SQL code.
  • Source: The Java source code is available for download and customisation.
  • Support: A social online community with Talend's wiki, the Talend Forum and a bugtracker.
  • Connectivity: Native connectivity stages for Oracle, DB2, MySQL, Sybase and Postgres. ODBC connectivity for other databases.
  • Scalability: Supports grid processing and a combination of ETL and ELT for leveraging processing capability of the architecture.

Advantages Noticed
  1. Business oriented Process Modelling tool is available.
  2. Set of components is used to build a job.
  3. Code is converted to Java or Perl language for deployment.
  4. The source code can be run on any platform.
  5. Rich Component set. Variety of components to use to perform different tasks.
  6. Different components to connect to different database manufacturaers. Talend has a connectivity with almost all of the popular Database viz. AS400,  Oracle, MySQL, Teradata, Sybase, Netezza,MS  SQL Server,  Java DB, LDAM Postgress etc.
  7. New plugins can be installed at any point of time, since it’s a open source.
  8. Tools available to create Business Models. High Level design can be achieved through the Talend Open Suite.
  9. The talend community provides a place to get product support and the forum provides free support. While this forum comes without any service guarantees it is monitored by Talend company staff to maintain the quality of the product

 Disadvantages Noticed
  1. The tool is not user friendly. Difficult to understand, basically unintuitive.
  2. After Installing the tool the PC gets very slow. It may be due to the Java language installed.
  3. Parallelism is very small in Open Suite Version.. Advanced functionality in paid versions (Integration Suite).

 Functionality
  1. Connection to MS SQL: Need to enable TCP/IP networking in the MS SQL server to successfully connect. 
  2. Built-in advanced components for ETL; string manipulations, Slowly Changing Dimensions, automatic lookup handling, bulk loads support, etc. 
  3. Talend includes powerful testing, debugging and tuning features that allow the real-time tracking of data flowing through the whole transformation processes, including execution statistics and an advanced trace mode 
  4. Once the program has been generated, it is installed on the target machine by an administrator and its execution is planned using the cron UNIX service or Windows task planner.

No comments:

Post a Comment