AI智能
改变未来

oracle分析函数 percent_rank, percentile_cont, percentile_disc

感谢以下博文
http://www.2cto.com/database/201210/162822.html

percent_rank()

percent_rank函数以0到1之间的分数形式返回某个值在数据分区中的排名。 建表:Sql代码  create table SMALL_CUSTOMERS(CUSTOMER_ID NUMBER,SUM_ORDERS  NUMBER);       insert into SMALL_CUSTOMERS (CUSTOMER_ID, SUM_ORDERS) values (1000, 10);       insert into SMALL_CUSTOMERS (CUSTOMER_ID, SUM_ORDERS) values (1000, 20);       insert into SMALL_CUSTOMERS (CUSTOMER_ID, SUM_ORDERS) values (1000, 30);       insert into SMALL_CUSTOMERS (CUSTOMER_ID, SUM_ORDERS) values (800, 5);       insert into SMALL_CUSTOMERS (CUSTOMER_ID, SUM_ORDERS) values (800, 10);       insert into SMALL_CUSTOMERS (CUSTOMER_ID, SUM_ORDERS) values (800, 1);   具体用法:Sql代码  select customer_id,         sum_orders,         percent_rank() over(order by sum_orders) percentRank    from small_customers t;   

percentile_cont()

官网文档地址:http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions127.htm#SQLRF00687

Purpose

PERCENTILE_CONT is an inverse distribution function that assumes a continuous distribution model. It takes a percentile value and a sort specification, and returns an interpolated value that would fall into that percentile value with respect to the sort specification. Nulls are ignored in the calculation.

This function takes as an argument any numeric data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.

The first expr must evaluate to a numeric value between 0 and 1, because it is a percentile value. This expr must be constant within each aggregation group. The ORDER BY clause takes a single expression that must be a numeric or datetime value, as these are the types over which Oracle can perform interpolation.

The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, you can compute the row number you are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+(P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).

The final result will be:

If (CRN = FRN = RN) then the result is(value of expression from row at RN)Otherwise the result is(CRN - RN) * (value of expression for row at FRN) +(RN - FRN) * (value of expression for row at CRN)

You can use the PERCENTILE_CONT function as an analytic function. You can specify only the query_partitioning_clause in its OVER clause. It returns, for each row, the value that would fall into the specified percentile among a set of values within each partition.

The MEDIAN function is a specific case of PERCENTILE_CONT where the percentile value defaults to 0.5. For more information, refer to MEDIAN.
percentile
percentile_cont:percentile_cont函数对于计算内插值是非常有用的。percentile_cont函数接收一个0到1之间的几率值并返回与声明了排序的percent_rank函数计算值相等的内插值百分比。

语法:percentile_con(expr) within group(sort-clause) over(partition-clause order-by-clause)具体用法:Sql代码  select customer_id,         sum_orders,         percent_rank() over(order by sum_orders) percentRank,         percentile_cont(0.4) within group(order by sum_orders) over(partition by customer_id) percentileCont    from small_customers t;  

取 p=0.4  可以看出n=3
rn=1.8  crn=2  frn=1 得出  (2-1.8)*1+(1.8-1)*5=4.2    (2-1.8)*10+(1.8-1)*20=18

percentile_disc()

PERCENTILE_DISC

这里就不做详细分析。

Syntax


Description of the illustration \’\’percentile_disc.gif\’\’

Purpose

PERCENTILE_DISC is an inverse distribution function that assumes a discrete distribution model. It takes a percentile value and a sort specification and returns an element from the set. Nulls are ignored in the calculation.

This function takes as an argument any numeric data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.

The first expr must evaluate to a numeric value between 0 and 1, because it is a percentile value. This expression must be constant within each aggregate group. The ORDER BY clause takes a single expression that can be of any type that can be sorted.

For a given percentile value P, PERCENTILE_DISC sorts the values of the expression in the ORDER BY clause and returns the value with the smallest CUME_DIST value (with respect to the same sort specification) that is greater than or equal to P.

Aggregate Example

See aggregate example for PERCENTILE_CONT.

Analytic Example

The following example calculates the median discrete percentile of the salary of each employee in the sample table hr.employees:

SELECT last_name, salary, department_id,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC)OVER (PARTITION BY department_id) \"Percentile_Disc\",CUME_DIST() OVER (PARTITION BY department_idORDER BY salary DESC) \"Cume_Dist\"FROM employeesWHERE department_id in (30, 60)ORDER BY last_name, salary, department_id;LAST_NAME                     SALARY DEPARTMENT_ID Percentile_Disc  Cume_Dist------------------------- ---------- ------------- --------------- ----------Austin                          4800            60            4800         .8Baida                           2900            30            2900         .5Colmenares                      2500            30            2900          1Ernst                           6000            60            4800         .4Himuro                          2600            30            2900 .833333333Hunold                          9000            60            4800         .2Khoo                            3100            30            2900 .333333333Lorentz                         4200            60            4800          1Pataballa                       4800            60            4800         .8Raphaely                       11000            30            2900 .166666667Tobias                          2800            30            2900 .666666667

The median value for Department 30 is 2900, which is the value whose corresponding percentile (Cume_Dist) is the smallest value greater than or equal to 0.5. The median value for Department 60 is 4800, which is the value whose corresponding percentile is the smallest value greater than or equal to 0.5.
percentile_disc:percentile_disc函数在功能上类似于percentile_cont函数,只是percentile_cont函数使用了连续分布模型,而percentile_disc函数使用了离期 分布模型。当没有值与指定的percent_rank精确匹配的时候,percentile_cont(0.5)会计算两个离得最近的值的平均值。相反,在升序排列的情况下,percentile_disc函数只取比所传递的参数percent_rank值更大的值。在降序排列的时候,percentile_disc函数只取比所传递的参数percent_rank值更小的值。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29074224/viewspace-2123194/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/29074224/viewspace-2123194/

赞(0) 打赏
未经允许不得转载:爱站程序员基地 » oracle分析函数 percent_rank, percentile_cont, percentile_disc