• United States



by Michael Schiff

The ETA on ETL

Feb 06, 200411 mins
CSO and CISOData and Information Security

Extraction, transformation, transport, and load (ETL or ETT) includes the processes involved in moving data from its origin into the data warehouse. This consists of extracting the data from the source operational system, transforming or cleansing the data to achieve consistency, and transporting and loading it into the warehouse. While early data warehousing projects often utilized homegrown (usually COBOL) code in lieu of purchased software to perform the ETL process, even those organizations that were successful quickly realized that source data file formats or validation rules tended to change over time, thus requiring continued maintenance to the homegrown code. Eventually, the economics (and increased functionality and integration capabilities) of purchased ETL software won out, and more vendors have entered at a variety of price points.

The ETL marketplace also includes products and services for augmenting the source data with outside information such as customer demographics and interests, geocoding, and purchasing behavior. Data quality and the requisite data cleansing and deduplication process are part of the overall ETL marketplace. With the trend towards more frequent, perhaps near-real-time updates, enterprise application integration (EAI) capabilities are becoming part of many ETL solutions.

Market Review

  • The Consolidation Trend Continues: Several vendors have augmented their data integration capabilities through technology acquisitions. These include Ascential’s acquisition of Mercator in August 2003, Informatica’s acquisition of Striva for its mainframe and replication technology in September, IBM’s acquisition of CrossAccess’s technology in October, and Information Builders’s iWay unit’s acquisition of Actional’s adapter portfolio in October. Several vendors are now entering or significantly enhancing their data integration capabilities as a result of acquisitions including Pervasive Software with its acquisition of Data Junction and Progress Software with its acquisition of DataDirect and its driver technology; both acquisitions occurred in December 2003.
  • Metadata Matters: Recognizing that data integration involves the integration of the associated metadata as well, ETL vendors including Ascential and Informatica continue to enhance offerings. Informatica, already a player since 1997 with its Metadata Exchange Architecture, released Web-based SuperGlue in August of 2003 while Ascential, whose metadata technology derives from its 1998 acquisition of Dovetail, continues to enhance MetaStage, a component of its Enterprise Integration suite.
  • Group 1 Acquires Sagent Technology’s Assets: The acquisition (completed in October 2003) serves to strengthen the overall competitive position of Group 1 and should make it more of a competitive threat not only to its data cleansing rivals, but to ETL vendors as well. Ascential Software in particular will now have a new end-to-end data integration rival to deal with.
  • Analytic Applications May Not be Applicable: Although still having a BI presence due to its PowerAnalyzer offering, Informatica has backed away from its own analytic applications while offering pre-packaged mappings for others (or its consultants) to use instead. This may have the positive effect of reducing conflicts with potential analytical application partners while also allowing it to better focus on its heritage.
  • Enterprise Information Integration (EII) Emerges: Driven by the need for “quick and dirty,” short-lived project data marts, oftentimes requiring both historic and near real-time data from both operational and data warehouse sources, EII emerges as a sub-segment of the ETL/DSS market place with vendors such as Ascential, Group 1 (via Sagent), and IBM aggressively moving to stake out a claim, while newcomers such as Avaki include EII as an one of the applications for its data grid technology. Reporting specialist Actuate now has the potential to play in the EII market due to its summer 2003 acquisition of Nimble Technology.

Near-Term Market Drivers

  • Improved Data Access Drivers: Each new release of most ETL products seems to offer new drivers for accessing additional data sources and enterprise applications. For example, the ability to access and understand data in SAP R/3 systems in order to extract it and use it to populate a data warehouse was once a competitive differentiator; now it has become a checklist item. The ability to work with Extensible Markup Language (XML) tagged data is becoming increasingly more important, in particular as enabling technology for Web services, as is integration with messaging queuing technology for rapid, transaction level, message delivery. The ability to track and integrate Web site behavior is also quite important and is rapidly becoming less of competitive differentiator.
  • Convergence of ETL and EAI: Although the traditional differentiation between ETL and EAI involved bulk transfers verses event-driven transaction level data exchange and batch verses more immediate updates, these lines continue to blur as batch windows close and “active” data warehousing leads to near-real-time update requirements.
  • Data Profiling Capabilities: Data Quality is becoming recognized as an up-front requirement not an after-the-fact fix. As the benefits of data profiling become increasingly obvious, ETL and data cleansing vendors are either developing or acquiring this capability on their own or forming partnerships with established data profiling vendors (for example, Avellino and Evoke).
  • Data Augmentation Capabilities are Featured not Hidden: As a result of the September 11th terrorist attacks, privacy has become a secondary concern as the need for closer monitoring and scrutiny has come to the forefront. Data augmentation vendors that used to downplay the depth and breadth of the consumer data they collected are once again actively marketing their content.
  • Enterprise Information Integration (EII) Emerges: Driven by the need for “quick and dirty,” short-lived, project data marts, oftentimes requiring both historic and near real-time data from both operational and data warehouse sources, EII emerges as a sub-segment of the ETL/DSS market place with vendors such as Ascential, Group 1 (via Sagent), and IBM aggressively moving to stake out a claim, while newcomers such as Avaki include EII as one of the applications for its data grid technology. Reporting specialist Actuate now has the potential to play in the EII market due to its summer 2003 acquisition of Nimble Technology.

    Long-Term Market Drivers

    • Continued Market Expansion and Growth: Although vendor consolidations will continue, the overall market will continue to grow as ETT vendors are now competing with each other, instead of with in-house programming staffs. In addition, data quality has been recognized as a major concern as the ramifications of “garbage in, garbage out” quickly become obvious in near real-time applications that require linking customer records across multiple, disparate database environments, and there will likely be even higher growth in this market segment. The demand for data marts to support analytic applications and the use of data augmentation services to enhance customer records with demographic, socio-economic, and lifestyle data will also contribute to this growth.
    • Alliances and/or Acquisitions for a More Complete Solution: While we have divided the ETL space into data extraction, data cleansing, data transformation, and data augmentation, vendors realize that the implemented customer solution requires components of each and have moved to expand the scope of their offerings through their own technology, acquisitions, or partnerships. As the lines between EAI and ETL continue to blur, due in no small part to the need for near-real-time information, traditional ETL vendors are targeting EAI as well (sometimes as a cooperative partner).
    • Data Augmentation and Data Enhancement: In addition to consolidating data from multiple sources within a data warehouse, it is often desirable to augment the internally collected data with external data such as customer demographics, which are available from a variety of third-party sources including Acxiom. This is especially useful when conducting “one-to-one” marketing campaigns or utilizing data mining for market segmentation and differentiation.
    • Access to Non-database Data: While ODBC, the open database connectivity standard, was once considered state-of-the-art with its ability to access data in relational databases, newer standards such as OLE DB, used for accessing relational, non-relational, and semi-structured data including documents and spreadsheets, as well as the acceleration adoption of XML, will enable data warehouses to be populated from additional data sources.
    • Database Vendors Bundle ETL Functionality: The bundling of its Data Transformation Services (DTS) with Microsoft SQL Server 7.0 in late 1998, the inclusion of Oracle Warehouse Builder with the Oracle Developer Suite in mid-2001, and the inclusion of IBM’s Information Integrator and Warehouse Manager within its DB2 UDB Data Warehouse Enterprise Edition software bundle were early examples of database vendors including ETL functionality within their core products. Not only will this continue to place price pressure on third-party data integration vendors, it will almost certainly place the database vendor’s own offerings on the long, if not the short list, in competitive evaluations.
    • Expectations that Company Representatives are Familiar with Their Customers’ History: In general, consumers and organizations expect vendor representatives to be familiar with their prior interactions with the company, including purchase history and details of prior support center calls. This “360 degree” customer view has helped fuel the growth of both analytic and operational customer relationship management (CRM) systems and the need for data integration, data cleansing, and data augmentation products and services to populate the underlying database. Consumers are openly hostile about having to supply the same information twice, either over time or during the same phone call or Web site visit, and companies must recognize that asking consumers to do so because “the system requires it” has about the same negative effect as did telling them that “the computer made an error” two decades ago.

    Offensive vs. Defensive Responses

    Improved Data Access Drivers

    Offensive: Even in cases where it may have become a checklist item, vendors will continue to emphasize the wide variety of data sources and targets that they can respectively access and populate, as well as their ability to work with XML-tagged data and message queuing systems. Vendors whose data access capabilities have been certified by enterprise application (EA) vendors should definitely highlight this certification when selling into accounts utilizing the EA system.

    Defensive: Vendors lacking the ability to access a particular data source will focus on their overall access capabilities and stress any particular advantages they may possess in areas such as data transformation capabilities or data augmentation. They should downplay the data access weakness and propose that it can be easily resolved with a custom program that will export the selected data to a flat file that its product can then access. They should also consider partnering with specialists such as iWay, rather than developing this capability on their own.

    Data Profiling Capabilities

    Offensive: ETL vendors with data profiling capabilities should highlight the need for these data warehouse developers to understand the data contained in the organizations’ source systems and point to horror stories and implementation failures resulting from erroneously assuming that documentation reflected actual content.

    Defensive: Vendors lacking data profiling capabilities will mention the capabilities of their partners; lacking such partners they will downplay the importance of data profiling, while actively seeking to form partnerships of their own.

    EAI and ETL Convergence

    Offensive: While most vendors now position themselves as data integration vendors, those with both EAI and ETL capabilities should focus on the flexibility of their platform and its applicability to a wide variety of problems and the ability to work directly with a wide range of sources including traditional databases, Web logs, XML documents, message queues, and software application packages.

    Defensive: ETL vendors lacking some EAI capabilities should form partnerships with specialists that can fulfill these needs. They should claim that by focusing on their heritage capabilities, and partnering with others for capabilities they lack, they provide a best-of-breed solution to their customers.

    Data Augmentation

    Offensive: Vendors with strong data augmentation capabilities should focus on the value of, for example, enhancing customer records with demographic and psychographic information in order to better serve these customers and/or identify appropriate sales opportunities.

    Defensive: Vendors that do not directly offer data augmentation capabilities should be ready to point to third parties that do and perhaps claim “that while concerns about individual privacy dissuaded them from directly entering this market, they formed partnerships to provide this option to their customers.”

    Enterprise Information Integration (EII)

    Offensive: Vendors actively targeting the EII market should explain their definition of the concept (and it seems to vary by vendor) and reference industry luminaries such as Bill Inmon whose use of the term “Adaptive Project Mart” is evidence of its credibility. At the same time, these vendors should carefully point out where EII fits in the overall data warehouse spectrum and the risks of “quick and dirty” data integration in order to set appropriate expectations.

    Defensive: Vendors not actively targeting this market will dismiss it as a new buzzword and merely a reincarnation of the virtual data warehouse. They will rightly emphasize that quality decisions require quality data, which if drawn from multiple sources, must be “refined” to ensure consistency.