Tuesday, December 31, 2013

Analytic dilemma: row_number vs rank vs dense_rank

Row_number() , rank(), dense_rank()

                                                                                             
So what are the differences between those functions? And for which purpose they are used for

create table bank_customers (
id number,
full_name varchar2(100),
birth_date date);

create table current_account (
customer_id number,
action varchar2(100),
amount number,
action_date date);
----------------------------------------
Table created.
Table created.

insert into bank_customers values (1, 'Tal Avinari', sysdate);
insert into bank_customers values (2, 'Daniel Kagan', sysdate);
insert into bank_customers values (3, 'Ben David', sysdate);
insert into bank_customers values (4, 'Alex Rubin', sysdate);
insert into current_account
values (1, 'deposit', 100, to_date('01/01/2013', 'dd/mm/yyyy'));
insert into current_account
values (1, 'withdrawal', 230, to_date('01/01/2013', 'dd/mm/yyyy'));
insert into current_account
values (1, 'withdrawal', 100, to_date('02/01/2012', 'dd/mm/yyyy'));
insert into current_account
values (2, 'withdrawal', 1000, to_date('03/01/2012', 'dd/mm/yyyy'));
insert into current_account
values (2, 'deposit', 1060, to_date('25/05/2012', 'dd/mm/yyyy'));
insert into current_account
values (3, 'withdrawal', 560, to_date('25/01/2013', 'dd/mm/yyyy'));
insert into current_account
values (1, 'deposit', 670, to_date('01/01/2012', 'dd/mm/yyyy'));


select b.full_name, a.action_date, a.amount, a.action
from current_account a,
bank_customers b
where a.customer_id = b.id;

FULL_NAM   ACTION_DATE                                                                         AMOUNT ACTION
-------------------- ------------------ ---------- --------------------
Tal Avinari     01-JAN-12                                                                                          100 deposit
Tal Avinari     01-JAN-13                                                                                          100 deposit
Tal Avinari     01-JAN-13                                                                                          230 withdrawal
Tal Avinari     02-JAN-12                                                                                          100 withdrawal
Daniel Kagan     03-JAN-12                                                                                     1000 withdrawal
Daniel Kagan     25-MAY-12                                                                                  1060 deposit
Ben David     25-JAN-13                                                                                           560 withdrawal
Tal Avinari     01-JAN-12                                                                                          670 deposit

8 rows selected.

Let’s start with a little motivation, assuming we want to show all the bank customers and their last account action.
The analytic function row_number() is a good solution, let’s see how it works:

 select row_number() over (partition by b.id order by a.action_date desc) as row_number,b.full_name, a.action_date,a.amountfrom current_account a,bank_customers bwhere a.customer_id = b.id; 
ROW_NUMBER FULL_NAME              ACTION_DATE        AMOUNT
---------- ------------------ ------------------ ----------
             1 Tal Avinari               01-JAN-13                      230
             2 Tal Avinari               01-JAN-13                      100
             3 Tal Avinari               02-JAN-12                      100
             4 Tal Avinari               01-JAN-12                      670
             5 Tal Avinari               01-JAN-12                      100
             1 Daniel Kagan           25-MAY-12                  1060

             2 Daniel Kagan           03-JAN-12                    1000


So the function actually partitions the table by the customer id and counts the rows one by one according to the order by clause. So the final step is to filter it to rownumber = 1 and we’ve finished!

Unfortunately, in our case we have 2 account actions for Tal at the same date, so if we take the first rownumber we’ll lose information.
Rank() is the answer for that !
Let’s have a look

select rank() over (partition by b.id order by a.action_date desc) as rank,
b.full_name, 
a.action_date,
a.amount
from current_account a,
bank_customers b
where a.customer_id = b.id;

 ------------------------------------------------------------------------------------

RANK         FULL_NAME                      ACTION_DATE        AMOUNT
---------- ------------  ------------------------  -------------------------
             1 Tal Avinari  01-JAN-13                      230
             1 Tal Avinari   01-JAN-13                      100
             3 Tal Avinari   02-JAN-12                      100
             4 Tal Avinari   01-JAN-12                      670
             4 Tal Avinari   01-JAN-12                      100
             1 Daniel Kagan 25-MAY-12               1060
             2 Daniel Kagan 03-JAN-12                 1000
             1 Ben David    25-JAN-13                      560

8 rows selected.

Notice that after the two rows of rank = 1 we have the next rank, =3.
If we don’t want this gap then we use the dense_rank() function.
  
select dense_rank() over (partition by b.id order by a.action_date desc) as dense_rank,
b.full_name, 
a.action_date,
a.amount
from current_account a,
bank_customers b
where a.customer_id = b.id;

DENSE_RANK FULL_NAME                 ACTION_DATE        AMOUNT
----------------------  ------------------------  ------------------------- --------------
             1 Tal Avinari  01-JAN-13                      230
             1 Tal Avinari   01-JAN-13                      100
             2 Tal Avinari   02-JAN-12                      100
             3 Tal Avinari   01-JAN-12                      670
             3 Tal Avinari   01-JAN-12                      100
             1 Daniel Kagan 25-MAY-12               1060
             2 Daniel Kagan 03-JAN-12                 1000
             1 Ben David    25-JAN-13                      560
8 rows selected.


Hope you enjoyed it.
Happy new year !

  

No comments:

Post a Comment