• United States



by Michael Schiff

Market Assessment: Data Warehouse Administration

Jul 26, 20047 mins
CSO and CISOData and Information Security

Data warehouse administration is concerned with the design, monitoring, and administrative aspects of data warehouse development and deployment. This includes data modeling and design, data profiling (that is, to determine the data, its values and distribution, and its relationship and redundancies with other data elements in the database), usage tracking and monitoring (that is, which users are accessing the data warehouse, and what data is being accessed), security and privacy, data cataloging, and metadata management.

Market Review:

  • Metadata Standards Become Reality: The Common Warehouse Metamodel (CWM) V1.1 specification is now official while the Object Management Group (OMG) BoD reviews V1.2, while work on V2.1 of its XML Metadata Interchange specification is underway. Data warehouse vendors will need to respond to questions as to their plans to adopt these specifications.
  • Data Profiling Partnerships Proliferate: As the benefits of data profiling of source data continue to be recognized, many ETL vendors have partnered with data profiling specialists in order to offer a more complete data integration solution. Ascential Software, as a result of its April 2002 acquisition of Metagenix’s MetaRecon is now an established data profiling vendor. Several data quality vendors now offer data profiling, some through their own development efforts.
  • Financial Concerns: Some of the pioneering, but nonetheless niche players, are experiencing major financial difficulties and it would not be surprising to see some of the larger players acquire some of the well-established, but financially unsuccessful, smaller ones. While Teleran had previously told us it was now profitable, at least one other data warehouse data monitoring vendor is still experiencing serious financial difficulties.
  • Database Vendors Augment Their Toolsets: In addition to, and in many cases competing with, the tools provided by third-party vendors, and database vendors continue to augment their own product suite by offering both included, and add-on database monitoring and database administration tools. In particular, IBM’s DB2 UDB V8 placed particular emphasis on increasing DBA productivity and lowered the degree of expertise required to administer the product. Several vendors have introduced versions of their database products that, while lacking some of the enterprise-scale functionality, are easier to deploy and administer.
  • The Database Market Has Its Giants, but Smaller Players are Not Being Ignored by Database Tools Vendors: Although the market oftentimes fixates on the big three database vendors (that is, IBM, Microsoft, and Oracle) major database tools vendors (for example, BMC) have not forgotten some of the lesser players and continue to support, via updated and new product releases, lesser, but nonetheless still important vendors such as Sybase.
  • The Grid Battle has Begin: By branding the next release of its Oracle Database with the 10g moniker, Oracle is clearly attempting to establish its grid presence and define the market in its terms. Oracle’s marketing resources will serve not only to benefit Oracle, but also to focus attention on grid technology and its role in non- academic settings. As grid technology proliferates, it will need to be monitored and administered.

Near-Term Market Drivers:

  • Ease of Administration and Automatic Error Recovery: As data marts and analytic applications continue to proliferate in user departments, end-users are suddenly finding that they are being delegated tasks previously done by the computer operations staff. In order to prevent business users from having to become full-time data warehouse administrators, vendors are offering “lights out” capabilities and features such as guided analysis and agents as well as automatic error recovery and restart procedures aimed at making administrators more productive.
  • Invalid Assumptions Regarding Data Content: The data warehousing industry has had more than its share of implementations that did not meet user expectations. One of the root causes has been that data contained in legacy file structures did not agree with the source system’s documentation (if documentation was even available!). This has led to the recognition of the savings (both in terms of dollars and careers) that can result from doing data profiling early in the design, if not the requirements, phase of the data warehouse project and has created a niche market for vendors such as Ascential, Evoke Software, and UK-based Avellino, while making the latter two potential acquisition targets.
  • Industry Metadata Integration Efforts: The OMG CWM specification has now been approved. As industry vendors adopt the specification, if only to be able to declare themselves as “open,” it will facilitate metadata integration efforts.
  • Accountability: There is a growing concern for accountability (such as, who has access to, and who has accessed, or forwarded what), especially with sensitive customer-centric financial and healthcare data. This has lead to regulatory requirements for audit trails that can be examined to determine who has updated, or even accessed what data elements, which in turn will lead to increased security and proof of identity. Government regulations such as HIPAA further reinforce this requirement.
  • Independence from IT: Users want to be able to search or browse their data warehouses and data marts without requiring IT intervention or help. Data warehouse directories will become of prime importance, as users will depend on them to determine what is stored in the data warehouse, what its lineage is, and what it represents.

Long-Term Market Drivers:

  • Industry Consolidations: Database vendors will continue to move additional auditing and administration functionality into the products, and thus compete with vendors with whom they once partnered. This could potentially result in earning shortfalls for some of the less-established or emerging vendors, eventually leading to a market consolidation with some of the smaller players being acquired by stronger competitors or partners.
  • Identification of Useful Data: Instead of utilizing the data warehouse as a melting pot for all of an organization’s data, there is a recognized need to store what is useful and important to the business in order to reduce storage costs and/or increase performance. Vendors recognize the need to offer monitoring tools to determine access and usage patterns in order to ensure that the data warehouse content reflects the requirements of the enterprise that data that is not being used is periodically purged, or at least move to near-line, rather than online devices.
  • Ability to Charge for Data Warehouse Access: Rather than fund the data warehouse as a “corporate lighthouse,” organizations have recognized the need to charge internal users based on actual usage. Monitoring tools with the ability to capture statistics for user or departmental charge-back and billing will be required.
  • Declining Storage Costs: This has led to larger data warehouses in terms of both detail-level data and additional summaries, as the costs of storage have been reduced. However, less expensive storage is only one piece of the puzzle. As data warehouses continue to grow in physical size, storage of vast quantities of data will negatively influence query performance. It will be of paramount importance that users are aware of, and can find, what is stored in the warehouse. Administrators must be able to monitor what is and is not being accessed in order to purge (or move to secondary or tertiary storage) data not being used or to create summary data if details are not required.
  • Access Control: As Web-enabled data warehouses continue to proliferate, cross-border data restrictions may come into play along with the ability to monitor or prohibit data access from select countries or nations. Passwords may not be sufficient to establish user authorization and biometric identification for access control may become the norm. In many situations, it will be necessary to go beyond access control and log the specific data a user is accessing. In the commercial sector, vendors that allow others to violate the privacy of their customers will be at a competitive disadvantage.
  • Enhanced Database Functionality: This continuing trend has increased the flexibility of data warehouse administrators and their ability to logically and physically partition data across dimensions such as time (for example, by year). Data warehouse usage monitoring will become even more important in order to determine optimal partitioning strategies.