Querying by Date
Suppose that we have the following table to record user registrations:Let's query for people who registered during the last day:create table users ( user_id integer primary key, first_names varchar(50), last_name varchar(50) not null, email varchar(100) not null unique, -- we encrypt passwords using operating system crypt function password varchar(30) not null, -- we only need precision to within one second registration_date timestamp(0) ); -- add some sample data insert into users (user_id, first_names, last_name, email, password, registration_date) values (1,'schlomo','mendelowitz','[email protected]','67xui2', to_timestamp('2003-06-13 09:15:00','YYYY-MM-DD HH24:MI:SS')); insert into users (user_id, first_names, last_name, email, password, registration_date) values (2,'George Herbert Walker','Bush','[email protected]','kl88q', to_timestamp('2003-06-13 15:18:22','YYYY-MM-DD HH24:MI:SS'));
Note how the registration date comes out in a non-standard format that won't sort lexicographically and that does not have a full four digits for the year. You should curse your database administrator at this point for not configuring Oracle with a more sensible default. You can fix the problem for yourself right now, however:column email format a35 column registration_date format a25 select email, registration_date from users where registration_date > current_date - interval '1' day; EMAIL REGISTRATION_DATE ----------------------------------- ------------------------- [email protected] 13-JUN-03 09.15.00 AM [email protected] 13-JUN-03 03.18.22 PM
You can query for shorter time intervals:alter session set nls_timestamp_format = 'YYYY-MM-DD HH24:MI:SS'; select email, registration_date from users where registration_date > current_date - interval '1' day; EMAIL REGISTRATION_DATE ----------------------------------- ---------------------- [email protected] 2003-06-13 09:15:00 [email protected] 2003-06-13 15:18:22
You can be explicit about how you'd like the timestamps formatted:select email, registration_date from users where registration_date > current_date - interval '1' hour; EMAIL REGISTRATION_DATE ----------------------------------- ------------------------- [email protected] 2003-06-13 15:18:22 select email, registration_date from users where registration_date > current_date - interval '1' minute; no rows selected select email, registration_date from users where registration_date > current_date - interval '1' second; no rows selected
Oops. Oracle pads some of these fields by default so that reports will be lined up and neat. We'll have to trim the strings ourselves:select email, to_char(registration_date,'Day, Month DD, YYYY') as reg_day from users order by registration_date; EMAIL REG_DAY ----------------------------------- ----------------------------- [email protected] Friday , June 13, 2003 [email protected] Friday , June 13, 2003
select email, trim(to_char(registration_date,'Day')) || ', ' || trim(to_char(registration_date,'Month')) || ' ' || trim(to_char(registration_date,'DD, YYYY')) as reg_day from users order by registration_date; EMAIL REG_DAY ----------------------------------- ---------------------------- [email protected] Friday, June 13, 2003 [email protected] Friday, June 13, 2003
Some Very Weird Things
One reason that Oracle may have resisted ANSI date-time datatypes and arithmetic is that they can make life very strange for the programmer.alter session set nls_date_format = 'YYYY-MM-DD'; -- old select add_months(to_date('2003-07-31','YYYY-MM-DD'),-1) from dual; ADD_MONTHS ---------- 2003-06-30 -- new select to_timestamp('2003-07-31','YYYY-MM-DD') - interval '1' month from dual; ERROR at line 1: ORA-01839: date not valid for month specified -- old select to_date('2003-07-31','YYYY-MM-DD') - 100 from dual; TO_DATE('2 ---------- 2003-04-22 -- new (broken) select to_timestamp('2003-07-31','YYYY-MM-DD') - interval '100' day from dual; ERROR at line 1: ORA-01873: the leading precision of the interval is too small -- new (note the extra "(3)") select to_timestamp('2003-07-31','YYYY-MM-DD') - interval '100' day(3) from dual; TO_TIMESTAMP('2003-07-31','YYYY-MM-DD')-INTERVAL'100'DAY(3) ------------------------------------------------------------- 2003-04-22 00:00:00
Some Profoundly Painful Things
Calculating time intervals between rows in a table can be very painful because there is no way in standard SQL to refer to "the value of this column from the previous row in the report". You can do this easily enough in an imperative computer language, e.g., C#, Java, or Visual Basic, that is reading rows from an SQL database but doing it purely in SQL is tough.Let's add a few more rows to our users table to see how this works.
Suppose that we're interested in the average length of time between registrations. With so few rows we could just query all the data out and eyeball it:insert into users (user_id, first_names, last_name, email, password, registration_date) values (3,'Osama','bin Laden','[email protected]','dieusa', to_timestamp('2003-06-13 17:56:03','YYYY-MM-DD HH24:MI:SS')); insert into users (user_id, first_names, last_name, email, password, registration_date) values (4,'Saddam','Hussein','[email protected]','wmd34', to_timestamp('2003-06-13 19:12:43','YYYY-MM-DD HH24:MI:SS'));
If we have a lot of data, however, we'll need to do a self-join.select registration_date from users order by registration_date; REGISTRATION_DATE ------------------------- 2003-06-13 09:15:00 2003-06-13 15:18:22 2003-06-13 17:56:03 2003-06-13 19:12:43
Notice that to find the "next row" for the pairing we are using thecolumn r1 format a21 column r2 format a21 select u1.registration_date as r1, u2.registration_date as r2 from users u1, users u2 where u2.user_id = (select min(user_id) from users where registration_date > u1.registration_date) order by r1; R1 R2 --------------------- --------------------- 2003-06-13 09:15:00 2003-06-13 15:18:22 2003-06-13 15:18:22 2003-06-13 17:56:03 2003-06-13 17:56:03 2003-06-13 19:12:43
user_id
column, which we know to be sequential and unique, rather than the registration_date column, which may not be unique because two users could register at exactly the same time.
Now that we have information from adjacent rows paired up in the same report we can begin to calculate intervals:
The interval for each row of the report has come back as days, hours, minutes, and seconds. At this point you'd expect to be able to average the intervals:column reg_gap format a21 select u1.registration_date as r1, u2.registration_date as r2, u2.registration_date-u1.registration_date as reg_gap from users u1, users u2 where u2.user_id = (select min(user_id) from users where registration_date > u1.registration_date) order by r1; R1 R2 REG_GAP --------------------- --------------------- --------------------- 2003-06-13 09:15:00 2003-06-13 15:18:22 +000000000 06:03:22 2003-06-13 15:18:22 2003-06-13 17:56:03 +000000000 02:37:41 2003-06-13 17:56:03 2003-06-13 19:12:43 +000000000 01:16:40
Oops. Oracle isn't smart enough to aggregate time intervals. And sadly there doesn't seem to be an easy way to turn a time interval into a number of seconds, for example, that would be amenable to averaging. If you figure how out to do it, please let me know!select avg(reg_gap) from (select u1.registration_date as r1, u2.registration_date as r2, u2.registration_date-u1.registration_date as reg_gap from users u1, users u2 where u2.user_id = (select min(user_id) from users where registration_date > u1.registration_date)) ERROR at line 1: ORA-00932: inconsistent datatypes: expected NUMBER got INTERVAL
Should we give up? If you have a strong stomach you can convert the timestamps to old-style Oracle dates through character strings before creating the intervals. This will give us a result as a fraction of a day:
If we're going to continue using this ugly query we ought to create a view:select avg(reg_gap) from (select u1.registration_date as r1, u2.registration_date as r2, to_date(to_char(u2.registration_date,'YYYY-MM-DD HH24:MI:SS'),'YYYY-MM-DD HH24:MI:SS') - to_date(to_char(u1.registration_date,'YYYY-MM-DD HH24:MI:SS'),'YYYY-MM-DD HH24:MI:SS') as reg_gap from users u1, users u2 where u2.user_id = (select min(user_id) from users where registration_date > u1.registration_date)) AVG(REG_GAP) ------------ .13836034
Now we can calculate the average time interval in minutes:create view registration_intervals as select u1.registration_date as r1, u2.registration_date as r2, to_date(to_char(u2.registration_date,'YYYY-MM-DD HH24:MI:SS'),'YYYY-MM-DD HH24:MI:SS') - to_date(to_char(u1.registration_date,'YYYY-MM-DD HH24:MI:SS'),'YYYY-MM-DD HH24:MI:SS') as reg_gap from users u1, users u2 where u2.user_id = (select min(user_id) from users where registration_date > u1.registration_date)
select 24*60*avg(reg_gap) as avg_gap_minutes from registration_intervals; AVG_GAP_MINUTES --------------- 199.238889
Reporting
Here's an example of using theto_char
function an GROUP BY to generate a report of sales by calendar quarter:
This is a hint that Oracle has all kinds of fancy date formats (covered in their online documentation). We're using the "Q" mask to get the calendar quarter. We can see that this product started shipping in Q2 1998 and that revenues trailed off in Q4 1998.select to_char(shipped_date,'YYYY') as shipped_year, to_char(shipped_date,'Q') as shipped_quarter, sum(price_charged) as revenue from sh_orders_reportable where product_id = 143 and shipped_date is not null group by to_char(shipped_date,'YYYY'), to_char(shipped_date,'Q') order by to_char(shipped_date,'YYYY'), to_char(shipped_date,'Q'); SHIPPED_YEAR SHIPPED_QUARTER REVENUE -------------------- -------------------- ---------- 1998 2 1280 1998 3 1150 1998 4 350 1999 1 210