Search This Blog

6.6.20

Learning various data operation in R with references to SQL commands for easy understanding

Hello there, welcome to my blog post. The reason behind this post is to practice R and R markdown for myself and share my work with the audience. I would make references to SQL/relational database commands while working with R. If you are not familiar with SQL then ignore SQL parts, this is just for easy understanding.

Prerequisites:
  1. Install R
  2. R Studio (for ease otherwise R itself is sufficient)
Assumption : You know what your working directory is and where you have to keep the files to upload.

Loading data

Before we learn how to get data, we should know that unlike relational databases, data in R resides in memory. All of the imported data would be loaded into the memory of your computer and hence before loading the data, you must check that the memory of your computer is sufficient enough to support the dataset.
Data can come from any source, it can be a flat file, database system or handwritten notes. Usually, flat files are the most common source of the data. In this section we will see how to load data from a csv file. Example csv file here has been downloaded from data.gov.in website and it can be accessed from here [https://data.gov.in/catalog/master-data-madhya-pradesh?filters%5Bfield_catalog_reference%5D=122638&format=json&offset=0&limit=6&sort%5Bcreated%5D=desc] (Master Data of Madhya Pradesh)
mdmp.Data <- aster_data_madhya_pradesh_1.csv="" code="" read.csv="">
Command above will import content of the data.csv file into an object called data frame. Object mdmp.Data is a data frame that contains all the data from Master_Data_Madhya_Pradesh_1.csv file. Similar to a relational database object "table", data frame too have rows and columns and present data into structured form.
SQL reference - insert into table1 or select * into table1 from table
Command read.csv above take multiple other arguments other than just the name of the file. Information on additional arguments can be found [https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/read.table] (read.csv)

Selecting data from data frame

Before we learn how to select data from data frame mdmp.Data, I would recommend learning following commands using mdmp.Data frame:
names(mdmp.Data)
str(mdmp.Data)
First command {r setup, include=FALSE} names(mdmp.Data) would retrun all the column names of the data frame. Checking column names just after loading the data is useful as this will make you familier with the data frame. Following is the output of the names command:
{r setup, include=FALSE} names(mdmp.Data) str(mdmp.Data)

to be continued...

No comments:

Post a Comment