关于数据分析:看看你离世界一流大厂有多远3道Google最新SQL面试题-⛵

💡 作者:韩信子@ShowMeAI
📘 数据分析◉技能晋升系列:https://www.showmeai.tech/tutorials/33
📘 AI 面试题库系列:https://www.showmeai.tech/tutorials/48
📘 本文地址:https://www.showmeai.tech/article-detail/297
📢 申明:版权所有,转载请分割平台与作者并注明出处
📢 珍藏ShowMeAI查看更多精彩内容

上面是最新的 3 道 Google SQL 面试题和参考答案。这些题目面向的 Google 职位包含:数据迷信 数据分析师商业智能 工程师数据工程师商业分析师

ShowMeAI 制作了快捷即查即用的 SQL 速查表手册,大家能够在下述地位取得:

  • 编程语言速查表 | SQL 速查表

💡 面试题 1:墨西哥和美国第三顶峰

问题: 请实现1个 SQL 来找出每个国家第三高的山名,并按 ASC 程序对国家/地区排序。

Table: mountains
+---------------------+------+-------------+
|name                 |height|country      |
+---------------------+------+-------------+
|Denalli              |20310 |United States|
|Saint Elias          |18008 |United States|
|Foraker              |17402 |United States|
|Pico de Orizab       |18491 |Mexico       |
|Popocatépetl         |17820 |Mexico       |
|Iztaccihuatl         |17160 |Mexico       |
+---------------------+------+-------------+

参考答案:

SELECT "country",
       "name"
FROM   (SELECT "country",
               "name",
               Rank()
                 OVER (
                   partition BY "country"
                   ORDER BY "height" DESC) AS "rank"
        FROM   mountains) AS m
WHERE  "rank" = 3
ORDER  BY country ASC 

💡 面试题 2:用 latest_event 查找以后关上的页数

问题: 给定下表,表中蕴含无关页面状态更改工夫的信息。实现 SQL 查找以后应用 latest_event 的页面数。 留神,表中 page_flag 列将用于辨认页面是『OFF』还是『ON』。

Table: pages_info
+-------+--------------------------------------+----------+
|page_id|event_time                            |page_flag |
+-------+--------------------------------------+----------+
|1      |current_timestamp - interval '6 hours'|ON        |
|1      |current_timestamp - interval '3 hours'|OFF       |
|1      |current_timestamp - interval '1 hours'|ON        |
|2      |current_timestamp - interval '3 hours'|ON        |
|2      |current_timestamp - interval '1 hours'|OFF       |
|3      |current_timestamp                     |ON        |
+-------+--------------------------------------+----------+

参考答案:

-- 首先,对于每个页面ID,让咱们抉择最新的记录(基于事件工夫列)。
SELECT page_id,
       Max(event_time) AS latest_event
FROM   pages_info
GROUP  BY page_id 

-- 接着,咱们将后面的查问与原表连接起来,并查看其中有多少人的标记页等于ON。
WITH latest_event
     AS (SELECT page_id,
                Max(event_time) AS latest_event
         FROM   pages_info
         GROUP  BY page_id)
SELECT Sum(CASE
             WHEN page_flag = 'ON' THEN 1
             ELSE 0
           END) AS result
FROM   pages_info pi
       JOIN latest_event le
         ON pi.page_id = le.page_id
            AND pi.event_time = le.latest_event; 

💡 面试题 3:回访用户

问题: 在如下的数据库表中,蕴含无关用户拜访网页的信息。 实现 SQL 返回间断拜访该页面最长的 3 个用户,按长短的倒序排列 3 个用户。

Table: visits
+--------+----------------------------+
|user_id |date                        | 
+--------+----------------------------+
|1       |current_timestamp::DATE - 0 |
|1       |current_timestamp::DATE - 1 |
|1       |current_timestamp::DATE - 2 |
|1       |current_timestamp::DATE - 3 |
|1       |current_timestamp::DATE - 4 |
|2       |current_timestamp::DATE - 1 |
|4       |current_timestamp::DATE - 0 |
|4       |current_timestamp::DATE - 1 |
|4       |current_timestamp::DATE - 3 |
|4       |current_timestamp::DATE - 4 |
|4       |current_timestamp::DATE - 62|   
+--------+----------------------------+

参考答案:

--首先,让咱们增加一个新的列,其值是每个用户的下一次拜访(与以后日期不同)。咱们将应用lead函数来实现:
SELECT DISTINCT user_id,
                date,
                Lead(date)
                  OVER (
                    partition BY user_id
                    ORDER BY date) AS next_date
FROM   (SELECT DISTINCT *
        FROM   visits) AS t; 
--接着,让咱们创立另一个列,其目标是让咱们晓得拜访的进行。这包含查看下一个日期是否与以后日期+1是否不同。
WITH next_dates
     AS (SELECT DISTINCT user_id,
                         date,
                         Lead(date)
                           OVER (
                             partition BY user_id
                             ORDER BY date) AS next_date
         FROM   (SELECT DISTINCT *
                 FROM   visits) AS t) --去重
SELECT user_id,
       date,
       next_date,
       CASE
         WHEN next_date IS NULL
               OR next_date = date + 1 THEN 1
         ELSE NULL
       END AS streak
FROM   next_dates; 
--接着,咱们将为每个用户创立一个分区,每个分区代表一个间断的拜访。从概念上讲,咱们要做的是,对于每个用户,取最近的记录(基于日期)并赋值为0,而后寻找上面的记录,如果拜访没有进行就赋值为0,如果拜访进行就赋值为1(如果连胜列为空),而后持续这样做,直到每个间断拜访被一个不同的分区所代表。执行这一逻辑的代码如下。
WITH next_dates
     AS (SELECT DISTINCT user_id,
                         date,
                         Lead(date)
                           OVER (
                             partition BY user_id
                             ORDER BY date) AS next_date
         FROM   (SELECT DISTINCT *
                 FROM   visits)),
     streaks
     AS (SELECT user_id,
                date,
                next_date,
                CASE
                  WHEN next_date IS NULL
                        OR next_date = date + 1 THEN 1
                  ELSE NULL
                END AS streak
         FROM   next_dates)
SELECT *,
       Sum(CASE
             WHEN streak IS NULL THEN 1
             ELSE 0
           END)
         OVER (
           partition BY user_id
           ORDER BY date) AS partition
FROM   streaks; 
--一旦咱们有了这个分区,问题就容易了,当初咱们只须要计算每个用户和分区的记录数,并找到计数最多的用户。残缺的查问如下
WITH next_dates AS
(
                SELECT DISTINCT user_id,
                                date,
                                Lead(date) OVER (partition BY user_id ORDER BY date) AS next_date
                FROM            visits ), streaks AS
(
       SELECT user_id,
              date,
              next_date,
              CASE
                     WHEN next_date IS NULL
                     OR     next_date = date + 1 THEN 1
                     ELSE NULL
              END AS streak
       FROM   next_dates ), partitions AS
(
         SELECT   *,
                  Sum(
                  CASE
                           WHEN streak IS NULL THEN 1
                           ELSE 0
                  END ) OVER (partition BY user_id ORDER BY date) AS partition
         FROM     streaks ), count_partitions AS
(
         SELECT   user_id,
                  partition,
                  Count(1) AS streak_days
         FROM     partitions
         GROUP BY user_id,
                  partition )
SELECT   user_id,
         Max(streak_days) AS longest_streak
FROM     count_partitions
GROUP BY user_id
ORDER BY 2 DESC limit 3;

参考资料

  • 📘 编程语言速查表 | SQL 速查表:https://www.showmeai.tech/article-detail/99

评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理