- RSTUDIO로 위키피디아 문서 검색 (tidywikidatar)2023년 05월 30일 17시 03분 57초에 업로드 된 글입니다.작성자: r-code-for-data-analysis
해들리 위컴의 깃허브를 가끔씩 들어가보면 매우 유용한 패키지 개발내역들이 많다.
위키피디아 문서를 텍스트로 읽어와서 분석하는 패키지가 "tidywikidatar" 이다.
https://github.com/hadley/tidywikidatar
GitHub - hadley/tidywikidatar: This is a read-only mirror of the CRAN R package repository. tidywikidatar — Exp
:exclamation: This is a read-only mirror of the CRAN R package repository. tidywikidatar — Explore 'Wikidata' Through Tidy Data Frames. Homepage: https://edjnet.github.io/tidywikidatar/ ...
github.com
GPT가 나오기 전 많은 정보는 구글링을 통해 이루어졌고, 특히 위키피디아라는 대백과 사전의 도움을 많이 받았다.
그리고 앞으로 한동안은 대백과사전의 역할을 할 것이다.
1. 패키지 설치 및 불러오기
install.packages("tidywikidatar") library(tidywikidatar)
2. 그리스 신화를 주제로 찾아보았다. 한글로 검색하려면 en을 kr로 바꾸어서 검색하면 됨
tw_enable_cache() tw_set_cache_folder(path = fs::path(fs::path_home_r(), "R", "tw_data")) tw_set_language(language = "kr") tw_create_cache_folder(ask = FALSE) tw_search(search = "그리스 신화")
> tw_search(search = "그리스 신화") # A tibble: 3 × 3 id label description <chr> <chr> <chr> 1 Q34726 Greek mythology myths of a… 2 Q516588 list of Greek gods and goddesses Wikimedia … 3 Q719488 Greek mythology in western art and liter… NA
3. 문재인 대통령을 주제로 살펴보자
tw_search(search = "Moon Jae-in")
# A tibble: 3 × 3 id label description <chr> <chr> <chr> 1 Q21001 Moon Jae-in 12th Presi… 2 Q31180800 Moon Jae-in Government NA 3 Q33020572 Moon Jae-in becomes President of South… Wikinews a…
tw_search(search = "Moon Jae-in") %>%
tw_filter_first(p = "P31", q = "Q5")# A tibble: 1 × 3 id label description <chr> <chr> <chr> 1 Q21001 Moon Jae-in 12th President of South Korea (2017–2022)
아래는 출생지를 알아내는 방법이다.
> tw_search(search = "Moon Jae-in") %>% # search for Moon Jae-in + tw_filter_first(p = "P31", q = "Q5") %>% # keep only the first result that is of a human + tw_get_property(p = "P19") %>% # ask for the place of birth + dplyr::pull(value) %>% # take its result and + tw_get_property(p = "P17") %>% # ask for the country where that place of birth is located + tw_get_label() # ask what that id stands for [1] "Geoje"
그러면 해들리위컴이 만든 함수를 이용하여 언제 무엇을 했는지 살펴보자
get_bio <- function(id, language = "en") { + tibble::tibble( + label = tw_get_label(id = id, language = language), + description = tw_get_description(id = id, language = language), + year_of_birth = tw_get_property(id = id, p = "P569") %>% + dplyr::pull(value) %>% + head(1) %>% + lubridate::ymd_hms() %>% + lubridate::year(), + year_of_death = tw_get_property(id = id, p = "P570") %>% + dplyr::pull(value) %>% + head(1) %>% + lubridate::ymd_hms() %>% + lubridate::year() + ) + } > tw_search(search = "Moon Jae-in") %>% + tw_filter_first(p = "P31", q = "Q5") %>% + get_bio()
결과는 아래와 같다.
whodidwhathowvalueset
Moon Jae-in position held Member of the National Assembly of South Korea start time 2012-05-30 1 Moon Jae-in position held Member of the National Assembly of South Korea electoral district NA 1 Moon Jae-in position held Member of the National Assembly of South Korea end time 2016-05-29 1 Moon Jae-in position held Member of the National Assembly of South Korea replaces Chang Je-won 1 Moon Jae-in position held Member of the National Assembly of South Korea replaced by Chang Je-won 1 Moon Jae-in position held Member of the National Assembly of South Korea elected in 2012 South Korean legislative election 1 Moon Jae-in position held Member of the National Assembly of South Korea parliamentary term 19th Legislative Assembly 1 Moon Jae-in position held Member of the National Assembly of South Korea parliamentary group Democratic Party of Korea 1 Moon Jae-in position held President of South Korea start time 2017-05-10 2 Moon Jae-in position held President of South Korea end time 2022-05-09 2 Moon Jae-in position held President of South Korea replaces Hwang Kyo-ahn 2 Moon Jae-in position held President of South Korea replaced by Yoon Suk Yeol 2 Moon Jae-in position held President of South Korea elected in 2017 South Korean presidential election 2 Moon Jae-in position held President of South Korea series ordinal NA 2 Moon Jae-in position held Chief of Staff to the President of South Korea start time 2007-03-12 3 Moon Jae-in position held Chief of Staff to the President of South Korea end time 2008-02-24 3 728x90반응형'데이터 전처리' 카테고리의 다른 글
R을 이용한 PDF 에서 Table 추출하기 (1) 2023.11.13 임의의 데이터 만들기 (wakefield) (0) 2023.05.30 구글스프레드 시트 불러와서 RSTUDIO로 코딩하기(googlesheets4) (0) 2023.05.29 GPT를 이용한 R code 수정(gptstudio) (0) 2023.05.28 댓글