menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Data Science News

>

Introducti...
source image

Dev

1M

read

176

img
dot

Image Credit: Dev

Introduction to Hadoop:)

  • Hadoop is an open-source software framework designed to handle and process large volumes of data across distributed computing environments.
  • It is designed to be scalable, fault-tolerant, and capable of handling vast amounts of data efficiently.
  • Hadoop Distributed File System (HDFS) is the storage layer of Hadoop, designed to store large files across multiple machines.
  • MapReduce is the processing layer of Hadoop. It allows for parallel processing of data across a cluster, which speeds up the data processing tasks.
  • Yet Another Resource Negotiator (YARN) is the resource management and job scheduling layer of Hadoop. It manages and schedules resources and job execution across the Hadoop cluster.
  • In addition to these core components, the Hadoop ecosystem includes various tools and frameworks that enhance its capabilities: HBase, Hive, Pig, Sqoop, Flume, and Oozie.
  • Hadoop Streaming JAR file is a key component in Hadoop’s MapReduce ecosystem that facilitates the use of non-Java languages for writing MapReduce jobs.
  • Data stream refers to a continuous flow of data elements that are generated and transmitted in real-time.
  • Types of queries on data streams include Standard Queries and Ad-hoc Queries.
  • Bloom Filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set.

Read Full Article

like

10 Likes

For uninterrupted reading, download the app