RSTUDIO로 위키피디아 문서 검색 (tidywikidatar)
해들리 위컴의 깃허브를 가끔씩 들어가보면 매우 유용한 패키지 개발내역들이 많다.
위키피디아 문서를 텍스트로 읽어와서 분석하는 패키지가 "tidywikidatar" 이다.
https://github.com/hadley/tidywikidatar
GitHub - hadley/tidywikidatar: This is a read-only mirror of the CRAN R package repository. tidywikidatar — Exp
:exclamation: This is a read-only mirror of the CRAN R package repository. tidywikidatar — Explore 'Wikidata' Through Tidy Data Frames. Homepage: https://edjnet.github.io/tidywikidatar/ ...
github.com
GPT가 나오기 전 많은 정보는 구글링을 통해 이루어졌고, 특히 위키피디아라는 대백과 사전의 도움을 많이 받았다.
그리고 앞으로 한동안은 대백과사전의 역할을 할 것이다.
1. 패키지 설치 및 불러오기
install.packages("tidywikidatar")
library(tidywikidatar)
2. 그리스 신화를 주제로 찾아보았다. 한글로 검색하려면 en을 kr로 바꾸어서 검색하면 됨
tw_enable_cache()
tw_set_cache_folder(path = fs::path(fs::path_home_r(), "R", "tw_data"))
tw_set_language(language = "kr")
tw_create_cache_folder(ask = FALSE)
tw_search(search = "그리스 신화")
> tw_search(search = "그리스 신화")
# A tibble: 3 × 3
id label description
<chr> <chr> <chr>
1 Q34726 Greek mythology myths of a…
2 Q516588 list of Greek gods and goddesses Wikimedia …
3 Q719488 Greek mythology in western art and liter… NA
3. 문재인 대통령을 주제로 살펴보자
tw_search(search = "Moon Jae-in")
# A tibble: 3 × 3
id label description
<chr> <chr> <chr>
1 Q21001 Moon Jae-in 12th Presi…
2 Q31180800 Moon Jae-in Government NA
3 Q33020572 Moon Jae-in becomes President of South… Wikinews a…
tw_search(search = "Moon Jae-in") %>%
tw_filter_first(p = "P31", q = "Q5")
# A tibble: 1 × 3
id label description
<chr> <chr> <chr>
1 Q21001 Moon Jae-in 12th President of South Korea (2017–2022)
아래는 출생지를 알아내는 방법이다.
> tw_search(search = "Moon Jae-in") %>% # search for Moon Jae-in
+ tw_filter_first(p = "P31", q = "Q5") %>% # keep only the first result that is of a human
+ tw_get_property(p = "P19") %>% # ask for the place of birth
+ dplyr::pull(value) %>% # take its result and
+ tw_get_property(p = "P17") %>% # ask for the country where that place of birth is located
+ tw_get_label() # ask what that id stands for
[1] "Geoje"
그러면 해들리위컴이 만든 함수를 이용하여 언제 무엇을 했는지 살펴보자
get_bio <- function(id, language = "en") {
+ tibble::tibble(
+ label = tw_get_label(id = id, language = language),
+ description = tw_get_description(id = id, language = language),
+ year_of_birth = tw_get_property(id = id, p = "P569") %>%
+ dplyr::pull(value) %>%
+ head(1) %>%
+ lubridate::ymd_hms() %>%
+ lubridate::year(),
+ year_of_death = tw_get_property(id = id, p = "P570") %>%
+ dplyr::pull(value) %>%
+ head(1) %>%
+ lubridate::ymd_hms() %>%
+ lubridate::year()
+ )
+ }
> tw_search(search = "Moon Jae-in") %>%
+ tw_filter_first(p = "P31", q = "Q5") %>%
+ get_bio()
결과는 아래와 같다.
whodidwhathowvalueset
Moon Jae-in | position held | Member of the National Assembly of South Korea | start time | 2012-05-30 | 1 |
Moon Jae-in | position held | Member of the National Assembly of South Korea | electoral district | NA | 1 |
Moon Jae-in | position held | Member of the National Assembly of South Korea | end time | 2016-05-29 | 1 |
Moon Jae-in | position held | Member of the National Assembly of South Korea | replaces | Chang Je-won | 1 |
Moon Jae-in | position held | Member of the National Assembly of South Korea | replaced by | Chang Je-won | 1 |
Moon Jae-in | position held | Member of the National Assembly of South Korea | elected in | 2012 South Korean legislative election | 1 |
Moon Jae-in | position held | Member of the National Assembly of South Korea | parliamentary term | 19th Legislative Assembly | 1 |
Moon Jae-in | position held | Member of the National Assembly of South Korea | parliamentary group | Democratic Party of Korea | 1 |
Moon Jae-in | position held | President of South Korea | start time | 2017-05-10 | 2 |
Moon Jae-in | position held | President of South Korea | end time | 2022-05-09 | 2 |
Moon Jae-in | position held | President of South Korea | replaces | Hwang Kyo-ahn | 2 |
Moon Jae-in | position held | President of South Korea | replaced by | Yoon Suk Yeol | 2 |
Moon Jae-in | position held | President of South Korea | elected in | 2017 South Korean presidential election | 2 |
Moon Jae-in | position held | President of South Korea | series ordinal | NA | 2 |
Moon Jae-in | position held | Chief of Staff to the President of South Korea | start time | 2007-03-12 | 3 |
Moon Jae-in | position held | Chief of Staff to the President of South Korea | end time | 2008-02-24 | 3 |