R Tutorial
Last updated on 13th Oct 2020, Blog, Tutorials
R is a language and environment for statistical computing and graphics.It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R
What is R Programming?
“R is an interpreted computer programming language which was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand.” The R Development Core Team currently develops R. It is also a software environment used to analyze statistical information, graphical representation, reporting, and data modeling. R is the implementation of the S programming language, which is combined with lexical scoping semantics.
R not only allows us to do branching and looping but also allows to do modular programming using functions. R allows integration with the procedures written in the C, C++, .Net, Python, and FORTRAN languages to improve efficiency.
In the present era, R is one of the most important tool which is used by researchers, data analyst, statisticians, and marketers for retrieving, cleaning, analyzing, visualizing, and presenting data.
History of R Programming
The history of R goes back about 20-30 years ago. R was developed by Ross lhaka and Robert Gentleman in the University of Auckland, New Zealand, and the R Development Core Team currently develops it. This programming language name is taken from the name of both the developers. The first project was considered in 1992. The initial version was released in 1995, and in 2000, a stable beta version was released.
The following table shows the release date, version, and description of R language:
Version-Release | Date | Description |
---|---|---|
0.49 | 1997-04-23 | First time R’s source was released, and CRAN (Comprehensive R Archive Network) was started. |
0.60 | 1997-12-05 | R officially gets the GNU license. |
0.65.1 | 1999-10-07 | update.packages and install.packages both are included. |
1.0 | 2000-02-29 | The first production-ready version was released. |
1.4 | 2001-12-19 | First version for Mac OS is made available. |
2.0 | 2004-10-04 | The first version for Mac OS is made available. |
2.1 | 2005-04-18 | Add support for UTF-8encoding, internationalization, localization etc. |
2.11 | 2010-04-22 | Add support for Windows 64-bit systems. |
2.13 | 2011-04-14 | Added a function that rapidly converts code to byte code. |
2.14 | 2011-10-31 | Added some new packages. |
2.15 | 2012-03-30 | Improved serialization speed for long vectors. |
3.0 | 2013-04-03 | Support for larger numeric values on 64-bit systems. |
3.4 | 2017-04-21 | The just-in-time compilation (JIT) is enabled by default. |
3.5 | 2018-04-23 | Added new features such as compact internal representation of integer sequences, serialization format etc. |
Features of R programming
R is a domain-specific programming language which aims to do data analysis. It has some unique features which make it very powerful. The most important arguably being the notation of vectors. These vectors allow us to perform a complex operation on a set of values in a single command. There are the following features of R programming:
- It is a simple and effective programming language which has been well developed.
- It is data analysis software.
- It is a well-designed, easy, and effective language which has the concepts of user-defined, looping, conditional, and various I/O facilities.
- It has a consistent and incorporated set of tools which are used for data analysis.
- For different types of calculation on arrays, lists and vectors, R contains a suite of operators.
- It provides effective data handling and storage facility.
- It is an open-source, powerful, and highly extensible software.
- It provides highly extensible graphical techniques.
- It allows us to perform multiple calculations using vectors.
- R is an interpreted language.
There are several tools available in the market to perform data analysis. Learning new languages is time taken. The data scientist can use two excellent tools, i.e., R and Python. We may not have time to learn them both at the time when we get started to learn data science. Learning statistical modeling and algorithm is more important than to learn a programming language. A programming language is used to compute and communicate our discovery.
The important task in data science is the way we deal with the data: clean, feature engineering, feature selection, and import. It should be our primary focus. Data scientist job is to understand the data, manipulate it, and expose the best approach. For machine learning, the best algorithms can be implemented with R. Keras and TensorFlow allow us to create high-end machine learning techniques. R has a package to perform Xgboost. Xgboost is one of the best algorithms for Kaggle competition.
R communicate with the other languages and possibly calls Python, Java, C++. The big data world is also accessible to R. We can connect R with different databases like Spark or Hadoop.
In brief, R is a great tool to investigate and explore the data. The elaborate analysis such as clustering, correlation, and data reduction are done with R.
Subscribe For Free Demo
Error: Contact form not found.
The R environment
R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes
- an effective data handling and storage facility,
- a suite of operators for calculations on arrays, in particular matrices,
- a large, coherent, integrated collection of intermediate tools for data analysis,
- graphical facilities for data analysis and display either on-screen or on hardcopy, and
- a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.
For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.
R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hardcopy.
Why use R for statistical computing and graphics?
- R is open source and free.
- R is popular and increasing in popularity.
- R runs on all platforms.
- Learning R will increase your chances of getting a job.
- R is being used by the biggest tech giants.
Companies using R
- Facebook – For behavior analysis related to status updates and profile pictures.
- Google – For advertising effectiveness and economic forecasting.
- Twitter – For data visualization and semantic clustering
- Microsoft – Acquired Revolution R company and use it for a variety of purposes.
- Uber – For statistical analysis
- Airbnb – Scale data science.
- IBM – Joined R Consortium Group
- ANZ – For credit risk modeling
Advantages of R Programming
Various benefits of R language are mentioned below, which will help you to grasp the concept:
- Open Source- R is an open-source programming language. This means that anyone can work with R without any need for a license or a fee. Furthermore, you can contribute towards the development of R by customizing its packages, developing new ones and resolving issues.
- Exemplary Support for Data Wrangling- R provides exemplary support for data wrangling. The packages like dplyr, readr are capable of transforming messy data into a structured form.
- The Array of Packages- R has a vast array of packages. With over 10,000 packages in the CRAN repository, the number is constantly growing. These packages appeal to all the areas of industry.
- Quality Plotting and Graphing- R facilitates quality plotting and graphing. The popular libraries like ggplot2 and plotly advocate for aesthetic and visually appealing graphs that set R apart from other programming languages.
- Highly Compatible- R is highly compatible and can be paired with many other programming languages like C, C++, Java, and Python. It can also be integrated with technologies like Hadoop and various other database management systems as well.
- Platform Independent- R is a platform-independent language. It is a cross-platform programming language, meaning that it can be run quite easily on Windows, Linux, and Mac.
- Eye-Catching Reports- With packages like Shiny and Markdown, reporting the results of an analysis is extremely easy with R. You can make reports with the data, plots and R scripts embedded in them. You can even make interactive web apps that allow the user to play with the results and the data.
- Machine Learning Operations- R provides various facilities for carrying out machine learning operations like classification, regression and also provides features for developing artificial neural networks.
- Statistics- R is prominently known as the lingua franca of statistics. This is the main reason as to why R is dominant among other programming languages for developing statistical tools.
- Continuously Growing- R is a constantly evolving programming language. It is a state of the art technology that provides updates whenever any new feature is added.
Disadvantages of R Programming
- Weak Origin- R shares its origin with a much older programming language “S”. This means that it’s base package does not have support for dynamic or 3D graphics. With common packages of R like Ggplot2 and Plotly, it is possible to create dynamic, 3D as well as animated graphics.
- Data Handling- In R, the physical memory stores the objects. This is in contrast to other languages like Python. Furthermore, R utilizes more memory as compared with Python. Also, R requires the entire data in one single place, that is, in the memory. Therefore, it is not an ideal option when dealing with Big Data. However, with data management packages and integration with Hadoop possible, this is easily covered.
- Basic Security- R lacks basic security. This feature is an essential part of most programming languages like Python. Because of this, there are several restrictions with R as it cannot be embedded into a web-application.
- Complicated Language- R is not an easy language to learn. It has a steep learning curve. Due to this, people who do not have prior programming experience may find it difficult to learn R.
- Lesser Speed- R packages and the R programming language is much slower than other languages like MATLAB and Python.
- Spread Across various Packages- The algorithms in R are spread across different packages. Programmers without prior knowledge of packages may find it difficult to implement algorithms.
Installation of R
For Mac users:
Install R:
- Open an internet browser and go to www.r-project.org.
- Click the “download R” link in the middle of the page under “Getting Started.”
- Select a CRAN location (a mirror site) and click the corresponding link.
- Click on the “Download R for (Mac) OS X” link at the top of the page.
- Click on the file containing the latest version of R under “Files.”
- Save the .pkg file, double-click it to open, and follow the installation instructions.
- Now that R is installed, you need to download and install RStudio.
Install RStudio:
- Go to www.rstudio.com and click on the “Download RStudio” button.
- Click on “Download RStudio Desktop.”
- Click on the version recommended for your system, or the latest Mac version, save the .dmg file on your computer, double-click it to open, and then drag and drop it to your applications folder.
Install the SDSFoundations Package
- Download SDSFoundations to your desktop (make sure it has the “.tgz” extension).
- Open RStudio.
- Click on the Packages tab in the bottom right window.
- Click “Install.”
- Select install from “Package Archive File.”
- Select the SDSFoundations package file from your desktop.
- Click install. You are done! You can now delete the SDSpackage file from your desktop.
For Windows Users:
Install R:
- Open an internet browser and go to www.r-project.org.
- Click the “download R” link in the middle of the page under “Getting Started.”
- Select a CRAN location (a mirror site) and click the corresponding link.
- Click on the “Download R for Windows” link at the top of the page.
- Click on the “install R for the first time” link at the top of the page.
- Click “Download R for Windows” and save the executable file somewhere on your computer. Run the .exe file and follow the installation instructions.
- Now that R is installed, you need to download and install RStudio.
Install RStudio:
- Go to www.rstudio.com and click on the “Download RStudio” button.
- Click on “Download RStudio Desktop.”
- Click on the version recommended for your system, or the latest Windows version, and save the executable file. Run the .exe file and follow the installation instructions.
Install the SDSFoundations Package:
- Download SDSFoundations to your desktop (make sure it has the “.zip” extension).
- Open RStudio.
- Click on the Packages tab in the bottom right window.
- Click “Install.”
- Select install from “Package Archive File.”
- Select the SDSFoundations package file from your desktop.
- Click install. You are done! You can now delete the SDSpackage file from your desktop.
R Programming Language Job Positions
The question arises which are the various fields that are using R and making it the hot topic it is. The careers in R Programming that can lead to your ‘R’ enlightenment is as below:
- R programmer
- Data Scientist
- Data Analyst
- Data Architect
- Data Visualization Analyst
- Geo Statisticians
- Database Administrator
- Quantitative Analysis with R
Install R in Windows
There are following steps used to install the R in Windows:
Step 1:
First, we have to download the R setup from https://cloud.r-project.org/bin/windows/base/.
Step 2:
When we click on Download R 3.6.1 for windows, our downloading will be started of R setup. Once the downloading is finished, we have to run the setup of R in the following way:
1) Select the path where we want to download the R and proceed to Next.
2) Select all components which we want to install, and then we will proceed to Next.
3) In the next step, we have to select either customized startup or accept the default, and then we proceed to Next.
4) When we proceed to next, our installation of R in our system will get started:
5) In the last, we will click on finish to successfully install R in our system.
Install R in Linux
There are only three steps to install R in Linux
Step 1:
In the first step, we have to update all the required files in our system using sudo apt-get update command as:
Step 2:
In the second step, we will install R file in our system with the help of sudo apt-get install r-base as:
Step 3:
In the last step, we type R and press enter to work on R editor.
R Packages
R packages are the collection of R functions, sample data, and compile codes. In the R environment, these packages are stored under a directory called “library.” During installation, R installs a set of packages. We can add packages later when they are needed for some specific purpose. Only the default packages will be available when we start the R console. Other packages which are already installed will be loaded explicitly to be used by the R program.
There is the following list of commands to be used to check, verify, and use the R packages.
Check Available R Packages
To check the available R Packages, we have to find the library location in which R packages are contained. R provides libPaths() function to find the library locations.
- libPaths()
When the above code executes, it produces the following project, which may vary depending on the local settings of our PCs & Laptops.
[1] “C:/Users/ajeet/OneDrive/Documents/R/win-library/3.6”
[2] “C:/Program Files/R/R-3.6.1/library”
Getting the list of all the packages installed
R provides library() function, which allows us to get the list of all the installed packages.
- library()
When we execute the above function, it produces the following result, which may vary depending on the local settings of our PCs or laptops.
Packages in library ‘C:/Program Files/R/R-3.6.1/library’:
Like library() function, R provides search() function to get all packages currently loaded in the R environment.
- search()
When we execute the above code, it will produce the following result, which may vary depending on the local settings of our PCs and laptops:
[1] “.GlobalEnv” “package:stats” “package:graphics”
[4] “package:grDevices” “package:utils” “package:datasets”
[7] “package:methods” “Autoloads” “package:base”
Install a New Package
In R, there are two techniques to add new R packages. The first technique is installing package directly from the CRAN directory, and the second one is to install it manually after downloading the package to our local system.
Install directly from CRAN
The following command is used to get the packages directly from CRAN webpage and install the package in the R environment. We may be prompted to choose the nearest mirror. Choose the one appropriate to our location.
- install.packages(“Package Name”)
The syntax of installing XML package is as follows:
- install.packages(“XML”)
Output
Install package manually
To install a package manually, we first have to download it from https://cran.r-project.org/web/packages/available_packages_by_name.html. The required package will be saved as a .zip file in a suitable location in the local system.
Once the downloading has finished, we will use the following command:
- install.packages(file_name_with_path, repos = NULL, type = “source”)
Install the package named “XML”
- install.packages(“C:\Users\ajeet\OneDrive\Desktop\graphics\xml2_1.2.2.zip”, repos = NULL, type = “source”)
Load Package to Library
We cannot use the package in our code until it will not be loaded into the current R environment. We also need to load a package which is already installed previously but not available in the current environment.
There is the following command to load a package:
- library(“package Name”, lib.loc = “path to library”)
Command to load the XML package
- install.packages(“C:\Users\ajeet\OneDrive\Desktop\graphics\xml2_1.2.2.zip”, repos = NULL, type = “source”)
Conclusion
In a nutshell, R is a great tool to explore and investigate the data. Elaborate analysis like clustering, correlation, and data reduction are done with R. This is the most crucial part, without a good feature engineering and model, the deployment of the machine learning will not give meaningful results.
Are you looking training with Right Jobs?
Contact Us- Data Science Tutorial
- Python Tutorial
- R Interview Questions and Answers
- Business Analytics With R Programming Languages
- Python vs R vs SAS
Related Articles
Popular Courses
- Python Online Training
11025 Learners
- Data Science Course Training
12022 Learners
- Machine Learning Online Training
11141 Learners
- What is Dimension Reduction? | Know the techniques
- Difference between Data Lake vs Data Warehouse: A Complete Guide For Beginners with Best Practices
- What is Dimension Reduction? | Know the techniques
- What does the Yield keyword do and How to use Yield in python ? [ OverView ]
- Agile Sprint Planning | Everything You Need to Know