![]() Let’s say we want to see all periods of inactivity longer than 2 days for each user. A good example for our dataset would be periods of inactivity between calls. Lead function: Periods of inactivityĪnytime we need to calculate values based on neighbouring rows, window functions should be the first thing that come to mind. Some other numbers took less than 1 percent of total call time. We can discover from the percentage column for example, that the first user spent almost half of his calling time by calling number 49508432274. Window functions produce an output row for each call, but similar to the previous example all rows with the same phone number contain the same aggregated information. The WHERE clause collects only rows with outgoing calls since we do not know the phone number of incoming calls.If the condition is false, there are only calls of zero length for the phone number, hence we return 0 as the total duration in this case. The CASE clause checks if the total duration is greater than 0 so that we do not divide by zero.After the division, the result is multiplied by 100 and rounded to get the percentage. The type of duration is INT, which would cause the division to give integer result, and therefore we need to cast the denominator to FLOAT before the division. To compute the percentage, we use the previously mentioned expression in the numerator and sum over all calls in the denominator.The same is applied to INTERVAL and INT type. The total duration over user and phone is computed by the window function SUM and partitioned by the mentioned fields.Let’s understand the query starting from the inner core section: Let’s also print the percentage of this sum relative to all user’s calls. In other words, we are interested in the total sum of call durations for the particular counterpart. Let’s find out who receives the longest calls. SUM function: Total duration over phone numbers Also note that to convert the number of seconds in INT to INTERVAL, we concatenate the number with ‘ secs’string and then cast to INTERVAL with a double colon. This expression is valid only for Redshift and equivalent in Postgres is DATE_TRUNC(‘day’, date_time). ![]() To extract a date from a timestamp, we use TRUNC function in the query. Using the traditional GROUP BY aggregation would require self-joining tables. The aggregate function MAX together with the partitioning by phone clause computes the longest call made by a user and puts this against all user’s calls so we have this repeating extra column available for various comparisons. See the comparison with the longest_today column. The duration time in bold marks the calls that were longest on a particular day. Notice that they share the same number_longest time. Rows with the same phone number are in the same colour.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |