일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
- TNE목배게
- 수면위내시경
- 수면대장내시경
- 이스탄불택시
- 위내시경
- 택시바가지
- SQL
- 캐리어보스턴백
- 건강검진
- 양재동맛집
- kmi
- 경량보스턴백
- 대장내시경
- 터키택시
- 신당동한우
- 튀르키예택시
- 중구사랑상품권
- Mode
- TDay
- 카보목배게
- 터키택시사기
- 페이북
- 터키택시미터기
- SKT
- 내돈내산
- 네이버머니포인트카드
- 튀르키예환전
- 부모님모시고오기좋은음식점
- 포켓몬고
- 이스탄불택시사기
- Today
- Total
집순이 놀이터
[Analysis] Mode 2. Understanding Search Functionality (Feature 분석) 본문
[Analysis] Mode 2. Understanding Search Functionality (Feature 분석)
방구석집순이 2022. 7. 14. 14:47▶
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM (SELECT bounds.*,
CASE WHEN last_event ISNULL THEN id
WHEN last_event > INTERVAL '10 MINUTE' THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY bounds.occurred_at) END AS session
FROM (SELECT user_id, event_type, event_name,occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event ISNULL
OR next_event ISNULL
ORDER BY user_id ) final
GROUP BY user_id, session
https://mode.com/sql-tutorial/understanding-search-functionality/
Understanding Search Functionality | SQL Analytics Training - Mode
In this lesson we'll cover: Before starting, be sure to read the overview to learn a bit about Yammer as a company. The product team is determining priorities for the next development cycle and they are considering improving the site's search functionality
mode.com
서비스
검색 기능 개선을 고려하고 있다.
웹페이지 모든 사용자에게 보이는 검색어 상자가 있다.
사용자가 검색창에 입력을 시작하면 가장 관련성이 높은 결과가 드롭다운 목록으로 나타난다.
결과는 범례별
로 구분되며, 모든결과를 볼수 있는 옵션도 있다. 엔터키를 누그러나 모든 결과 보기를 누르면, 그 결과는 범주별로 탭에 의해 분리된다. 각 탭은 관련성 및 연대순으로 wkehdr정렬된다 .
검색결과 페이지에는 사용자가 포함된 그룹 또는 날짜 범위내에서 다시 검색할수 있는 고급검색 상자 기능이 있다.
The problem
검색을 처리하기 전에 제품 팀은 엔지니어링 팀의 시간이 검색에 잘 사용되는지 확인하려고 합니다. 결국, 각각의 새로운 기능은 다른 잠재적 기능을 희생시키면서 옵니다. 제품 팀은 처음부터 검색 작업을 수행해야 하는지 여부와 검색 작업을 수행해야 하는 경우 수정 방법을 결정하는 데 가장 관심이 있습니다.
-> 검색 기능 개선의 필요성이 있는가 ?
있다면 어떤 부분을 먼저 수정해야 하는가 ?
Getting oriented
어떻게 정적으로 표현할 것인가 ?
The data
search_autocomplete: 사용자가 자동 완성에서 검색 옵션을 클릭할 때 기록됩니다.
search_run: 사용자가 검색을 실행하고 검색 결과 페이지를 볼 때 기록됩니다.
search_click_X: 사용자가 검색 결과를 클릭할 때 기록됩니다. 1에서 10 사이의 X는 클릭된 검색 결과를 나타냅니다.
테이블 이름과 열 정의가 아래에 나열되어 있습니다. 테이블 이름을 클릭하면 해당 테이블에 대한 정보를 볼 수 있습니다.
Table 1. User
user_id | |
created_at | first signed up |
state | active or pending |
activated_at | The time the user was activated, if they are active |
company_id | |
language |
Table. Events
user_id | |
occured_at | |
event_type | signup_flow,, engagement |
event_name | create_user: User is added to Yammer's database during signup process enter_email: User begins the signup process by entering her email address enter_info: User enters her name and personal information during signup process complete_signup: User completes the entire signup/authentication process home_page: User loads the home page like_message: User likes another user's message login: User logs into Yammer search_autocomplete: User selects a search result from the autocomplete list search_run: User runs a search query and is taken to the search results page search_click_result_X: User clicks search result X on the results page, where X is a number from 1 through 10. send_message: User posts a message view_inbox: User views messages in her inbox |
location | |
device |
Making a recommendation
--------------------------------------------------------------------------
Developing hypotheses
이 서비스의 목적을 찾아 보자
검색 서비스의 목적 : 빠르게 , 정확하게 최소한의 행동으로 원하는 것을 찾는 것.
- Search Use : 우리 서비스에 사용되는 기능인지 확인( 서비스 내에서 주요한 기능인지 확인 )
- Search frequency : 짧은 시간 내 검색 빈도 확인
- Repeated terms : 관련된 단어로 검색하는지 확인하는 것이 빈도를 확인하는 것 보다 정확하지만, 훨씬 더 어렵기 때문에 빈도를 확인하는것으로 대체
- Clickthroughts : 한 번의 검색에서 몇개의 결과를 클릭했는 지 확인 -> 원하는 결과를 찾기 어려웠다는 의미
- Autocomplete Clickthroughts: 자동완성 클릭수 확인
The state of search
1. Search Use : 우리 서비스에 사용되는 기능인지 확인( 서비스 내에서 주요한 기능인지 확인 )
-> 주별로 전체 세션 중, auto_complete 과 run 이벤트가 포함된 session의 비율 구하기
-> 분석 결과 : 전체 세션 중 약 25% 세션에서 자동완성 기능이, 약 8% 세션에서 run기능이 사용된다.
즉, 개선사항에 대해 주의 깊게 볼 가치가 있는 서비스이다.
step1. 유저별로 전/후 이벤트와의 시간 차 구하기
SELECT user_id, event_type, event_name,occurred_at,
#'유저별'로 통계구하기 > PARTITION BY user_id
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at
step2. 이벤트를 세션으로구분하기 (세션 = 전/후 이벤트와의 시간 차가 10분 이상인 이벤트 )
SELECT bounds.*,
CASE WHEN last_event ISNULL THEN id
WHEN last_event > INTERVAL '10 MINUTE' THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY bounds.occurred_at) END AS session
FROM (SELECT user_id, event_type, event_name,occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at) AS bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event ISNULL
OR next_event ISNULL
ORDER BY user_id
step 3. 유저별, 세션별 이벤트 시작시간/종료시간 구하기
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM (SELECT bounds.*,
CASE WHEN last_event ISNULL THEN id
WHEN last_event > INTERVAL '10 MINUTE' THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY bounds.occurred_at) END AS session
FROM (SELECT user_id, event_type, event_name,occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event ISNULL
OR next_event ISNULL
ORDER BY user_id ) final
GROUP BY user_id, session
step 4. tutorial.yammer.event 테이블과 위에서 구한 session 테이블을 left 조인하여
모든 event 들에 대해 session 번호와 , 세션시작시간 정보 찾기
=> 하나의 세션에 어떤 이벤트 행동이 있는지 확인할 수 있다.
SELECT e.*, session, session_start
FROM tutorial.yammer_events e
LEFT JOIN (SELECT user_id, session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (SELECT bounds.*,
CASE WHEN last_event ISNULL THEN id
WHEN last_event > INTERVAL '10 MINUTE' THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY bounds.occurred_at) END AS session
FROM (SELECT user_id, event_type, event_name,occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event ISNULL
OR next_event ISNULL
ORDER BY user_id ) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
and e.occurred_at <= session.session_end
WHERE event_type = 'engagement'
step5. 각 세션별, 유저별로 각 이벤트가 발생한 수 계산
SELECT x.session , x.session_start, x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click%' THEN x.user_id ELSE NULL END) AS clicks
FROM (SELECT e.*, session, session_start
FROM tutorial.yammer_events e
LEFT JOIN (SELECT user_id, session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (SELECT bounds.*,
CASE WHEN last_event ISNULL THEN id
WHEN last_event > INTERVAL '10 MINUTE' THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY bounds.occurred_at) END AS session
FROM (SELECT user_id, event_type, event_name,occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event ISNULL
OR next_event ISNULL
ORDER BY user_id ) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
and e.occurred_at <= session.session_end
WHERE event_type = 'engagement' ) x
GROUP BY x.session, x.session_start, x.user_id
step 6. 주 별로 , 세션에 autocomplete , run 이벤트가 포함된 세션의 비율 구하기
SELECT DATE_TRUNC('week',z.session_start) as week,
COUNT(CASE WHEN z.autocompletes > 0 THEN z.session ELSE NULL END)/COUNT(*)::FLOAT AS with_autocompletes,
COUNT(CASE WHEN z.runs >0 THEN z.session ELSE NULL END)/COUNT(*) :: float AS with_runs
FROM(SELECT x.session , x.session_start, x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click%' THEN x.user_id ELSE NULL END) AS clicks
FROM (SELECT e.*, session, session_start
FROM tutorial.yammer_events e
LEFT JOIN (SELECT user_id, session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (SELECT bounds.*,
CASE WHEN last_event ISNULL THEN id
WHEN last_event > INTERVAL '10 MINUTE' THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY bounds.occurred_at) END AS session
FROM (SELECT user_id, event_type, event_name,occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event ISNULL
OR next_event ISNULL
ORDER BY user_id ) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
and e.occurred_at <= session.session_end
WHERE event_type = 'engagement' ) x
GROUP BY x.session, x.session_start, x.user_id ) z
GROUP BY week
ORDER BY week
2. Search frequency : 짧은 시간 내 검색 빈도 확인
-> 1개의 세션 당 실행되는 autocomplete/run 이벤트 개수 확인하기
-> 구하는 방법 : step 5. 세션별,유저별 각 이벤트의 개수에서 autocomplete 과 run 이벤트를 세션단위로 집계하면 된다.
#세션당 auto_completes 이벤트 개수
SELECT z.autocompletes, COUNT(*) as sessions
FROM(SELECT x.session , x.session_start, x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click%' THEN x.user_id ELSE NULL END) AS clicks
FROM (SELECT e.*, session, session_start
FROM tutorial.yammer_events e
LEFT JOIN (SELECT user_id, session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (SELECT bounds.*,
CASE WHEN last_event ISNULL THEN id
WHEN last_event > INTERVAL '10 MINUTE' THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY bounds.occurred_at) END AS session
FROM (SELECT user_id, event_type, event_name,occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event ISNULL
OR next_event ISNULL
ORDER BY user_id ) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
and e.occurred_at <= session.session_end
WHERE event_type = 'engagement' ) x
GROUP BY x.session, x.session_start, x.user_id) z
WHERE z.autocompletes >0
GROUP BY z.autocompletes
ORDER BY z.autocompletes
3. Clickthroughts : 한 번의 검색에서 몇개의 결과를 클릭했는 지 확인
Clicks per Session With at Least One Search Run
-> (최소 1 번의 run 수행 후 ) click_result 수에 따른 세션 개수
-> 분석결과 : 한번도 클릭하지 않은 경우가 제일 많다 . 즉 , 검색결과가 적절하지 않은 경우가 제일 많다는 것.
SELECT z.clicks, COUNT(z.session)
FROM(SELECT x.session , x.session_start, x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click%' THEN x.user_id ELSE NULL END) AS clicks
FROM (SELECT e.*, session, session_start
FROM tutorial.yammer_events e
LEFT JOIN (SELECT user_id, session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (SELECT bounds.*,
CASE WHEN last_event ISNULL THEN id
WHEN last_event > INTERVAL '10 MINUTE' THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY bounds.occurred_at) END AS session
FROM (SELECT user_id, event_type, event_name,occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event ISNULL
OR next_event ISNULL
ORDER BY user_id ) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
and e.occurred_at <= session.session_end
WHERE event_type = 'engagement' ) x
GROUP BY x.session, x.session_start, x.user_id) z
WHERE z.runs >0
GROUP BY z.clicks
ORDER BY z.clicks
Average Clicks per Session by Searches per Session
-> 세션별로 runs 수 , click_result 수의 평균을 구해보자
runs 개수 보다 click_result 수가 적다면, 한번의 검색에 하나의 게시글도 누르지 않은 것 -> 검색 자체가 잘못 되었다고 판단.
-> 분석 방법 : step5 에서 runs 개수별 click 수 집계
-> 분석 결과 : run 의 경우, 평균 click_result의 이벤트 수 보다 낮다. 즉, 일반 검색의 경우 의미있는 검색이 아닐 확률이 더 높다.
SELECT z.runs, AVG(clicks)
FROM(SELECT x.session , x.session_start, x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click%' THEN x.user_id ELSE NULL END) AS clicks
FROM (SELECT e.*, session, session_start
FROM tutorial.yammer_events e
LEFT JOIN (SELECT user_id, session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (SELECT bounds.*,
CASE WHEN last_event ISNULL THEN id
WHEN last_event > INTERVAL '10 MINUTE' THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY bounds.occurred_at) END AS session
FROM (SELECT user_id, event_type, event_name,occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event ISNULL
OR next_event ISNULL
ORDER BY user_id ) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
and e.occurred_at <= session.session_end
WHERE event_type = 'engagement' ) x
GROUP BY x.session, x.session_start, x.user_id) z
WHERE z.runs >0
GROUP BY z.runs
ORDER BY z.runs
4. 검색 결과물 중 click 한 게시글의 순서
clicks by search result
-> click_result 이벤트 의 순서 분포확인
-> 분석 결과 : 고르게 분포되어있다고 판단한다 ( 순서에 따른 가중치가 잘 부여되어있다면 2-3 번째에 많이 몰려있어야한다.
SELECT TRIM(LEADING 'search_click_result' FROM event_name) ::INT AS number, count(*)
FROM tutorial.yammer_events
WHERE event_name LIKE 'search_click_result%'
GROUP BY 1
ORDER BY 1
5. 재사용률 확인
Sessions with Search Runs in Month after Users' First Search
'Study > SQL' 카테고리의 다른 글
[Analysis] Mode 1. Investigating a Drop in User Engagement (0) | 2022.07.06 |
---|---|
[Partition By] (0) | 2022.06.21 |
[SQL] With ( Common Table Expression)(작성중) (0) | 2022.06.17 |
[SQL zoo] Self Join 10 (0) | 2022.06.16 |
[프로그래머스] 입양 시각 구하기(2) (0) | 2022.06.16 |