Title: | Access to CAPES Data |
---|---|
Description: | Provides simplified access to the data from the Catalog of Theses and Dissertations of the Brazilian Coordination for the Improvement of Higher Education Personnel (CAPES, <https://catalogodeteses.capes.gov.br>) for the years 1987 through 2022. The dataset includes variables such as Higher Education Institution (institution), Area of Concentration (area), Graduate Program Name (program_name), Type of Work (type), Language of Work (language), Author Identification (author), Abstract (abstract), Advisor Identification (advisor), Development Region (region), State (state). |
Authors: | Hugo Vasconcelos Medeiros [aut, cre], Dalson Figueiredo Filho [aut], André Leite [aut] |
Maintainer: | Hugo Vasconcelos Medeiros <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2024-12-20 07:01:33 UTC |
Source: | https://github.com/hugoavmedeiros/capesr |
Aggregated data from the CAPES Catalog of Theses and Dissertations, containing summarized information by year, institution, area, program, type, region, and state (UF).
capes_synthetic_df
capes_synthetic_df
A data frame with the following columns:
Reference year of the data.
Higher Education Institution.
Area of Concentration.
Name of the Graduate Program.
Type of work (e.g., Master's, Doctorate).
Region of Brazil.
Federative Unit (state).
Total number of works.
Synthetic data created from the CAPES Catalog of Theses and Dissertations.
data(capes_synthetic_df) head(capes_synthetic_df)
data(capes_synthetic_df) head(capes_synthetic_df)
Downloads CAPES theses and dissertations data files from OSF for selected years.
download_capes_data(years, destination = tempdir(), timeout = 120) baixar_dados_capes(years, destination = tempdir(), timeout = 120)
download_capes_data(years, destination = tempdir(), timeout = 120) baixar_dados_capes(years, destination = tempdir(), timeout = 120)
years |
A vector with the desired years. |
destination |
The directory where the files will be saved (default: temporary directory). |
timeout |
The timeout in seconds for the download process (default: 120 seconds). |
A list of file paths for the downloaded or already existing files.
# Download data for the years 1987 and 1990 capes_files <- download_capes_data(c(1987, 1990))
# Download data for the years 1987 and 1990 capes_files <- download_capes_data(c(1987, 1990))
This function combines data from multiple Parquet files and applies optional filters, including text-based searches.
read_capes_data(files, filters = list()) ler_dados_capes(files, filters = list())
read_capes_data(files, filters = list()) ler_dados_capes(files, filters = list())
files |
A vector or list of paths to Parquet files. |
filters |
A list of filters to apply (e.g., list(base_year = 1987, state = "SP", title = "education")). |
A 'data.frame' containing the combined and filtered data.
# Download data for the years 1987 and 1990 capes_files <- download_capes_data(c(1987, 1990)) # Combine all selected data combined_data <- read_capes_data(capes_files)
# Download data for the years 1987 and 1990 capes_files <- download_capes_data(c(1987, 1990)) # Combine all selected data combined_data <- read_capes_data(capes_files)
This function allows searching for specific terms in the text fields of a previously loaded 'data.frame'.
search_capes_text(data, term, field) buscar_texto_capes(data, term, field)
search_capes_text(data, term, field) buscar_texto_capes(data, term, field)
data |
A 'data.frame' containing the CAPES Catalog of Theses and Dissertations data. |
term |
A string, the term to search for. |
field |
A string, the name of the field to search in (e.g., "resumo", "titulo"). |
A 'data.frame' with rows matching the search or a message indicating no results were found.
# Download data for the years 1987 and 1990 capes_files <- download_capes_data(c(1987, 1990)) # Combine all selected data combined_data <- read_capes_data(capes_files) # Search data results <- search_capes_text( data = combined_data, term = "Educação", field = "titulo" )
# Download data for the years 1987 and 1990 capes_files <- download_capes_data(c(1987, 1990)) # Combine all selected data combined_data <- read_capes_data(capes_files) # Search data results <- search_capes_text( data = combined_data, term = "Educação", field = "titulo" )
A data frame containing the years and the corresponding IDs for downloading the files.
years_osf
years_osf
A data frame with the following columns:
Year of the data (1987-2022).
OSF ID corresponding to the year.
data(years_osf) head(years_osf)
data(years_osf) head(years_osf)