Microsoft have released the first ‘CTP’ Community Technology Preview of its next database suite SQL Server. This will be an eagerly awaited release of SQL Server, notably for its feature sets around data warehousing. Firstly Microsoft will be adding a column-based storage engine “Apollo”, Column Storage is another approach to database storage (most storage engines are row based) designed for large data warehouses. The simplest way to understand Column storage is when you want aggregate queries (snapshots of sums, averages or other models) kept. Because the storage engine stores the column data sequentially, the disk seek times to run statistics over a column are vastly reduced because the data is sequential instead of random. For example, imagine a very basic row based storage of people ages:
Key (INT), Name (BLOB), Age (INT) 1, Anthony, 24 2, Barry, 35 3, Charles, 50
If you were to run a query on the total age of those people, this would read from variable positions on disk, depending on the size of the name (in reality you would used fixed column lengths but this could be variable data). In a column storage engine, the data would look like this:
1,2,3 Anthony, Barry, Charles 24, 35, 50
So, to run the aggregate query, it would use sequential reads from the disk (much faster than random reads). Microsoft has allegedly claimed that in test environments on very large databases, it has reduced query times of such queries from eight minutes down to two seconds! This also marries with Microsoft’s purchase of DATAllegro, a data warehousing company. Microsoft and HP have announced an appliance around SQL 2011 and the DATAllegro called the “Parallel Data Warehouse” as a super-computer of data warehousing. When they say this is for large requirements, orders are placed for either 11 or 22 racks of equipment so these are some BIG databases.