Monday, October 18, 2004

I figure out a way about getting the distinct store count from the summary table, and here is the detail.
We have a summary table like it


period_key	item_key	store_key

20030102 1 1
20030102 1 2
20030102 1 5
20030102 1 7
20030102 1 9
..
20030302 1 2
20030302 1 5
20030302 1 6
20030302 1 8



We normally use the following Query to run against the summary table and get the
distinct store count




Select count(distinct store_key) from [summary table]
Where period_key between ? and ?
And item_key = ?



If the summary table has lots of rows, this query would be running long time.

We only need to know the position of each store in store lookup table when we are calculating the distinct store count, so we could come out the following structure



Bit Position 1 2 3 4 5 6 7
Store key 1 2 5 6 7 8 9

{1,2,5,7,9} 1 1 1 0 1 0 1
{2,5,6,8} 0 1 1 1 0 1 0



The store lookup table doesn’t have too many records, so we could just use a
column which type is varchar2 to store the resulting bit string for store key.



period_key item_key store_key_raw
20030102 1 1110101
20030302 1 0111010



Once we need to get the distinct count of store, we could just do BIT_OR operation
on this column, and get the distinct store count by counting the amount of 1 bit inside the final bit string for store key. there are an existing Oracle function UTL_RAW.BIT_OR that could do BIT_OR operation to two strings, and it is quite fast, there are some code snippets.




PROCEDURE test_raw_or IS
lvc_cursor ref_cursor;
lvn_counter BINARY_INTEGER;
lvs_a VARCHAR2(4000);
raw_a RAW(4000);
raw_b RAW(4000);
raw_c RAW(4000);
BEGIN
-- get the bit string for store key
OPEN lvc_cursor FOR
SELECT store_key_raw
FROM store_bitmap;

-- use loop to do the bit_or for all of rows
lvn_counter := 0;
lvs_a := '';
LOOP
EXIT WHEN lvc_cursor%NOTFOUND;
lvn_counter := lvn_counter + 1;

FETCH lvc_cursor
INTO lvs_a;
raw_a := utl_raw.cast_to_raw(lvs_a);
IF lvn_counter > 1 THEN
raw_c := utl_raw.bit_or(raw_a, raw_b);
raw_b := raw_c;
ELSE
raw_b := raw_a;
END IF;

END LOOP;
CLOSE lvc_cursor;

-- get the final bit string for store key
INSERT INTO calc_store_bitmap
VALUES
(utl_raw.cast_to_varchar2(raw_c));

-- get the distinct store count by counting the amount of 1 bit inside the
final bit string for store key
....

End;



There are another problem raised that UTL_RAW.BIT_OR could only do the BIT_OR on
two strings, it couldn’t do the BIT_OR on a column in a table. It is not convenient for us to write SQL,

Fortunately oracle 9i has a great feature named as User defined aggregate functions, you could do any aggregate operation with it, just like the regular oracle aggregate functions such as sum or count. First all, we need to create a new oracle type that includes a set of oracle pre-defined member function. such as ODCIAggregateInitialize, ODCIAggregateiterate. Then we could write the code to override these functions. The following is code snippets.




-- define a temp variable to store the middle value
total raw(4000),
--override the function that is doing the Iteration
member function ODCIAggregateIterate(self IN OUT str_agg_type,
value IN raw )
return number
is
begin
self.total := utl_raw.bit_or(self.total, value);
return ODCIConst.Success;
end;
--override the function that returns the final result
member function ODCIAggregateTerminate(self IN str_agg_type,
returnValue OUT raw,
flags IN number)
return number
is
begin
returnValue := self.total;
return ODCIConst.Success;
end;




And I created two functions to get the distinct store count, the first one is a User
defined aggregate function raw_bit_or_agg(input varchar2) that is used to do the BIT_OR on a column in the table, this function is just like other aggregate function. Another is count_bit_in_str that could get the bit count from a string, so the Query that getting count will be like this,




SELECT count_bit_in_str(raw_bit_or_agg(p.store_bitmap))
FROM [summary table] p
WHERE p.period_key BETWEEN ? AND ?
And p. item_key = ?
ORDER BY p.period_key




I wrote a unit test app to compare the difference, the method using BIT_OR is

500 times faster than the one using count(distinct store_key).

No comments: