Maven is a dependency manager tool which eases programmer in managing dependencies for multi module project. Let's discuss with an example ho…
I am providing you some of very important questions that you will be facing as Data Engineer ,Bigdata Developer, Hadoop Developer etc. …
Hive Compression is one of the optimization technique available in Apache Hive.It is preferable for high data intensive workload where network ban…
CSV file reading is one of the important technique used frequently in Data analytics Programming. Here I am sharing my work which may help you. H…
I am sharing my knowledge about hive partitioning and will describe you with example .Hope you all get benefited from this. Partitioning is a …
1.What is spark good for? Spark ,a data processing engine used in wide range of platforms including spark ETL batches ,machine learning process, …
Spark dataset is distributed collection of typed objects.This was introduced in Spark 1.6 version. It consolidates features of both RDD and Data f…
Kafka was originally developed by LinkedIn as real time messaging system. LinkedIn and other companies use Apache Kafka for managing the stre…