데이터 전처리

RSTUDIO로 위키피디아 문서 검색 (tidywikidatar)

r-code-for-data-analysis 2023. 5. 30. 17:03

해들리 위컴의 깃허브를 가끔씩 들어가보면 매우 유용한 패키지 개발내역들이 많다.

위키피디아 문서를 텍스트로 읽어와서 분석하는 패키지가 "tidywikidatar" 이다. 

 

https://github.com/hadley/tidywikidatar

 

GitHub - hadley/tidywikidatar:  This is a read-only mirror of the CRAN R package repository. tidywikidatar — Exp

:exclamation: This is a read-only mirror of the CRAN R package repository. tidywikidatar — Explore 'Wikidata' Through Tidy Data Frames. Homepage: https://edjnet.github.io/tidywikidatar/ ...

github.com

 

GPT가 나오기 전 많은 정보는 구글링을 통해 이루어졌고, 특히 위키피디아라는 대백과 사전의 도움을 많이 받았다.

그리고 앞으로 한동안은 대백과사전의 역할을 할 것이다. 

 

1. 패키지 설치 및 불러오기

install.packages("tidywikidatar")
library(tidywikidatar)

 

2. 그리스 신화를 주제로 찾아보았다. 한글로 검색하려면 en을 kr로 바꾸어서 검색하면 됨

tw_enable_cache()
tw_set_cache_folder(path = fs::path(fs::path_home_r(), "R", "tw_data"))
tw_set_language(language = "kr")
tw_create_cache_folder(ask = FALSE)
tw_search(search = "그리스 신화")

 

> tw_search(search = "그리스 신화")
# A tibble: 3 × 3
  id      label                                     description
  <chr>   <chr>                                     <chr>      
1 Q34726  Greek mythology                           myths of a…
2 Q516588 list of Greek gods and goddesses          Wikimedia …
3 Q719488 Greek mythology in western art and liter… NA     

 

3. 문재인 대통령을 주제로 살펴보자

 

tw_search(search = "Moon Jae-in")

# A tibble: 3 × 3
  id        label                                   description
  <chr>     <chr>                                   <chr>      
1 Q21001    Moon Jae-in                             12th Presi…
2 Q31180800 Moon Jae-in Government                  NA         
3 Q33020572 Moon Jae-in becomes President of South… Wikinews a…

tw_search(search = "Moon Jae-in") %>%
  tw_filter_first(p = "P31", q = "Q5")

 

# A tibble: 1 × 3
  id     label       description                              
  <chr>  <chr>       <chr>                                    
1 Q21001 Moon Jae-in 12th President of South Korea (2017–2022)

 

아래는 출생지를 알아내는 방법이다.

> tw_search(search = "Moon Jae-in") %>% # search for Moon Jae-in
+   tw_filter_first(p = "P31", q = "Q5") %>% # keep only the first result that is of a human
+   tw_get_property(p = "P19") %>% # ask for the place of birth
+   dplyr::pull(value) %>% # take its result and
+   tw_get_property(p = "P17") %>% # ask for the country where that place of birth is located
+   tw_get_label() # ask what that id stands for
[1] "Geoje"

 

 

그러면 해들리위컴이 만든 함수를 이용하여 언제 무엇을 했는지 살펴보자

get_bio <- function(id, language = "en") {
+   tibble::tibble(
+     label = tw_get_label(id = id, language = language),
+     description = tw_get_description(id = id, language = language),
+     year_of_birth = tw_get_property(id = id, p = "P569") %>%
+       dplyr::pull(value) %>%
+       head(1) %>%
+       lubridate::ymd_hms() %>%
+       lubridate::year(),
+     year_of_death = tw_get_property(id = id, p = "P570") %>%
+       dplyr::pull(value) %>%
+       head(1) %>%
+       lubridate::ymd_hms() %>%
+       lubridate::year()
+   )
+ }
> tw_search(search = "Moon Jae-in") %>%
+   tw_filter_first(p = "P31", q = "Q5") %>%
+   get_bio()

결과는 아래와 같다.

 

whodidwhathowvalueset

Moon Jae-in position held Member of the National Assembly of South Korea start time 2012-05-30 1
Moon Jae-in position held Member of the National Assembly of South Korea electoral district NA 1
Moon Jae-in position held Member of the National Assembly of South Korea end time 2016-05-29 1
Moon Jae-in position held Member of the National Assembly of South Korea replaces Chang Je-won 1
Moon Jae-in position held Member of the National Assembly of South Korea replaced by Chang Je-won 1
Moon Jae-in position held Member of the National Assembly of South Korea elected in 2012 South Korean legislative election 1
Moon Jae-in position held Member of the National Assembly of South Korea parliamentary term 19th Legislative Assembly 1
Moon Jae-in position held Member of the National Assembly of South Korea parliamentary group Democratic Party of Korea 1
Moon Jae-in position held President of South Korea start time 2017-05-10 2
Moon Jae-in position held President of South Korea end time 2022-05-09 2
Moon Jae-in position held President of South Korea replaces Hwang Kyo-ahn 2
Moon Jae-in position held President of South Korea replaced by Yoon Suk Yeol 2
Moon Jae-in position held President of South Korea elected in 2017 South Korean presidential election 2
Moon Jae-in position held President of South Korea series ordinal NA 2
Moon Jae-in position held Chief of Staff to the President of South Korea start time 2007-03-12 3
Moon Jae-in position held Chief of Staff to the President of South Korea end time 2008-02-24 3
728x90
반응형