Updated on: 14th Oct 2020

Big Data Hadoop Interview Questions and Answers

Ratings()

Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. Its distributed file system enables concurrent processing and fault tolerance. Developed by Doug Cutting and Michael J. Cafarella, Hadoop uses the MapReduce programming model for faster storage and […]

Read More

Updated on: 13th Oct 2020

How Facebook is Using Big Data?

Ratings()

According to the current situation, we can strongly say that it is impossible to see a person without using social media. Because the world is getting drastic exponential growth digitally around every corner of the world. According to a report, from 2017 to 2019 the total number of social media users has increased from 2.46 […]

Read More

Updated on: 13th Oct 2020

Apache Sqoop Tutorial

Ratings()

Sqoop tutorial provides basic and advanced concepts of Sqoop. Our Sqoop tutorial is designed for beginners and professionals. Sqoop is an open source framework provided by Apache. It is a command-line interface application for transferring data between relational databases and Hadoop Our Sqoop tutorial includes all topics of Apache Sqoop with Sqoop features, Sqoop Installation, […]

Read More

Updated on: 12th Oct 2020

Spark And RDD Cheat Sheet Tutorial

Ratings()

Apache Spark is an open-source cluster computing framework. Its primary purpose is to handle the real-time generated data.Spark was built on the top of the Hadoop MapReduce. It was optimized to run in memory whereas alternative approaches like Hadoop’s MapReduce writes data to and from computer hard drives. So, Spark processes the data much quicker […]

Read More

Updated on: 12th Oct 2020

Apache Pig Tutorial

Ratings()

Apache Pig is the tool in which all sorts of programs can be pipelined in a desired order to work in Hadoop’s distributed environment. Oozie also provides a mechanism to run the job at a given schedule. This tutorial explains the scheduler system to run and manage Hadoop jobs called Apache Pig. It is tightly […]

Read More

Updated on: 12th Oct 2020

Talend

Ratings()

Talend is a software integration platform which provides solutions for Data integration, Data quality, Data management, Data Preparation and Big Data. The demand for ETL professionals with knowledge on Talend is high. Also, it is the only ETL tool with all the plugins to integrate with Big Data ecosystem easily.Talend also offers Open Studio, which […]

Read More

Updated on: 12th Oct 2020

Cassandra Tutorial

Ratings()

Cassandra is a distributed database from Apache that is highly scalable and designed to manage very large amounts of structured data. It provides high availability with no single point of failure. The tutorial starts off with a basic introduction of Cassandra followed by its architecture, installation, and important classes and interfaces. Thereafter, it proceeds to […]

Read More

Updated on: 12th Oct 2020

Kafka Tutorial

Ratings()

   Apache Kafka was originated at LinkedIn and later became an open sourced Apache project in 2011, then First-class Apache project in 2012. Kafka is written in Scala and Java. Apache Kafka is publish-subscribe based fault tolerant messaging system. It is fast, scalable and distributed by design. This tutorial will explore the principles of Kafka, installation, […]

Read More

Updated on: 12th Oct 2020

HBase Tutorial

Ratings()

HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell. It also describes how to connect to […]

Read More

Updated on: 12th Oct 2020

Spark Java Tutorial

Ratings()

The Spark Java API exposes all the Spark features available in the Scala version to Java. To learn the basics of Spark, we recommend reading through the Scala programming guide first; it should be easy to follow even if you don’t know Scala. This guide will show how to use the Spark features described there […]

Read More

Updated on: 12th Oct 2020

ELK Stack Tutorial

Ratings()

ELK stands for Elasticsearch, Logstash, and Kibana. In the ELK stack, Logstash extracts the logging data or other events from different input sources. It processes the events and later stores it in Elasticsearch. Kibana is a web interface, which accesses the logging data form Elasticsearch and visualizes it. Logstash and Elasticsearch Logstash provides input and […]

Read More

Updated on: 12th Oct 2020

Netbeans Tutorial

Ratings()

Scratching your head while thinking what is Netbeans? Well, this is the right place for you. Netbeans is an open-source integrated development environment for developing with Java, PHP, C++, and other programming languages. This NetBeans tutorial will provide you with the basic workflow along with the complete insight on the installation of Netbeans.  Netbeans is […]

Read More

Updated on: 12th Oct 2020

PySpark MLlib Tutorial

Ratings()

Apache Spark comes with a library named MLlib to perform Machine Learning tasks using the Spark framework. Since there is a Python API for Apache Spark, i.e., PySpark, you can also use this Spark ML library in PySpark. MLlib contains many algorithms and Machine Learning utilities. In this tutorial, you will learn how to use […]

Read More

Updated on: 12th Oct 2020

Spark RDD Optimization Techniques Tutorial

Ratings()

Apache Spark is a world-famous open-source cluster computing framework that is used for processing huge data sets in companies. Processing these huge data sets and distributing these among multiple systems is easy with Apache Spark. It offers simple APIs that make the lives of programmers and developers easy. Spark provides native bindings for programming languages, […]

Read More

Updated on: 12th Oct 2020

Apache Spark & Scala Tutorial

Ratings()

What is Apache Spark? Apache Spark is an open-source cluster computing framework that was initially developed at UC Berkeley in the AMPLab. As compared to the disk-based, two-stage MapReduce of Hadoop, Spark provides up to 100 times faster performance for a few applications with in-memory primitives. This makes it suitable for machine learning algorithms, as […]

Read More

Acte Technologies WhatsApp