access-modifiers

Access Modifiers in Scala

Access modifiers, also known as access specifiers, determine the accessibility and scope of classes, methods, and other members. Scala's access modifiers closely resemble those of Java, although they provide more...

airflow

How To Set SLA in Apache Airflow

Apache Airflow enables us to schedule tasks as code. In Airflow, a SLA determines the maximum completion time for a task or DAG. Note that SLAs are established based on...

algorithms

An Introduction to Algorithms and Data Structures

An algorithm is a series of instructions in a particular order for performing a specific task.

algorithms-and-data-structures

An Introduction to Algorithms and Data Structures

An algorithm is a series of instructions in a particular order for performing a specific task.

amazon-emr

Overview of Amazon EMR

Amazon EMR is a managed cluster platform that makes it easier to run big data frameworks like Apache Hadoop and Apache Spark on AWS to process and analyze huge amounts...

anti-pattern

Anti-Pattern

Anti-patterns at first seem to be quick and reasonable, they typically have adverse effects in the future. They are design and code smells. It affects our software badly and adds...

apache spark

apache-pinot

apache-spark

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

Partitions and Bucketing in Spark

Partitioning and bucketing are used to improve the reading of data by reducing the cost of shuffles, the need for serialization, and the amount of network traffic.

Need for Caching in Apache Spark

Caching is one of Spark's optimization strategies for reusing computations. It stores interim and partial results so they'll be utilised in subsequent computation stages.

apm

application-performance-monitoring

aws

Overview of Amazon EMR

Amazon EMR is a managed cluster platform that makes it easier to run big data frameworks like Apache Hadoop and Apache Spark on AWS to process and analyze huge amounts...

AWS Command Line Interface (AWS CLI)

AWS CLI is an open-source tool that allows us to interact with AWS services using command-line shell commands.

aws-cli

AWS Command Line Interface (AWS CLI)

AWS CLI is an open-source tool that allows us to interact with AWS services using command-line shell commands.

aws-glue

big-data

Data Governance

Data governance is the process of defining security guidelines and policies and making sure they are followed by having authority and control over how data assets are managed.

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

Introduction to Data Engineering

It's the process of designing and building systems for gathering vast quantities of raw operational data from a variety of sources and formats, analyzing, converting, and storing it at scale....

bucketing

Partitions and Bucketing in Spark

Partitioning and bucketing are used to improve the reading of data by reducing the cost of shuffles, the need for serialization, and the amount of network traffic.

cache

Need for Caching in Apache Spark

Caching is one of Spark's optimization strategies for reusing computations. It stores interim and partial results so they'll be utilised in subsequent computation stages.

coding-principles

Singleton Pattern

A singleton pattern limits the number of instances of a class to one.

coding-problem

coding-problem-solving

columnar-format

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

columnar-storage

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

container-management

container-orchestration

data-as-a-product

Data Product vs. Data as a Product

A data product is not the same as data as a product. A data product aids the accomplishment of the product's goal by using the data, whereas in data as...

data-caching

Need for Caching in Apache Spark

Caching is one of Spark's optimization strategies for reusing computations. It stores interim and partial results so they'll be utilised in subsequent computation stages.

data-catalog

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

data-engineering

Data Product vs. Data as a Product

A data product is not the same as data as a product. A data product aids the accomplishment of the product's goal by using the data, whereas in data as...

Data Governance

Data governance is the process of defining security guidelines and policies and making sure they are followed by having authority and control over how data assets are managed.

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

Introduction to Data Engineering

It's the process of designing and building systems for gathering vast quantities of raw operational data from a variety of sources and formats, analyzing, converting, and storing it at scale....

Data Deluge

When the granularity of data increases, its complexity also increases. At some point, we will reach a point where we cannot handle the volume of fresh data being generated.

data-goverance

Data Governance

Data governance is the process of defining security guidelines and policies and making sure they are followed by having authority and control over how data assets are managed.

data-inventory

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

data-key

Envelope Encryption

Envelope encryption is a way of encrypting plaintext data using a key and then encrypting that key using an another key. This strategy is intended not just to make things...

data-lake

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

data-management

Data Product vs. Data as a Product

A data product is not the same as data as a product. A data product aids the accomplishment of the product's goal by using the data, whereas in data as...

Data Deluge

When the granularity of data increases, its complexity also increases. At some point, we will reach a point where we cannot handle the volume of fresh data being generated.

data-mesh

data-pipeline

Introduction to Data Engineering

It's the process of designing and building systems for gathering vast quantities of raw operational data from a variety of sources and formats, analyzing, converting, and storing it at scale....

data-product

Data Product vs. Data as a Product

A data product is not the same as data as a product. A data product aids the accomplishment of the product's goal by using the data, whereas in data as...

data-protection

Envelope Encryption

Envelope encryption is a way of encrypting plaintext data using a key and then encrypting that key using an another key. This strategy is intended not just to make things...

data-science

data-security

Data Governance

Data governance is the process of defining security guidelines and policies and making sure they are followed by having authority and control over how data assets are managed.

data-streaming

data-structures

An Introduction to Algorithms and Data Structures

An algorithm is a series of instructions in a particular order for performing a specific task.

database

database-indexing

delta lake

design-patterns

Singleton Pattern

A singleton pattern limits the number of instances of a class to one.

Anti-Pattern

Anti-patterns at first seem to be quick and reasonable, they typically have adverse effects in the future. They are design and code smells. It affects our software badly and adds...

elastic-apm

elasticsearch

envelope-encryption

Envelope Encryption

Envelope encryption is a way of encrypting plaintext data using a key and then encrypting that key using an another key. This strategy is intended not just to make things...

etl

functional-programming

Defining Variables Using the `def` Keyword in Scala

Difference between `lazy val` and `def`.

grpc

hadoop

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

iac

Terraform Basics

Terraform is an open source infrastructure-as-code tool that allows us to programmatically provision the physical resources required for an application to run.

infrastructure-as-code

Terraform Basics

Terraform is an open source infrastructure-as-code tool that allows us to programmatically provision the physical resources required for an application to run.

inter-process-communication

kibana

kubernetes

lakefs

lakehouse

memory-management

Rust’s Ownership and Borrowing Enforce Memory Safety

Rust's ownership and borrowing features prevent us from experiencing memory-related problems. Rust is a great choice when performance matters and it solves pain points that bother many other languages.

object-oriented-programming

Case Class in Scala

The case class represents immutable data. It is a type of class that is often used for data storage.

olap

olap-datastore

oops

Access Modifiers in Scala

Access modifiers, also known as access specifiers, determine the accessibility and scope of classes, methods, and other members. Scala's access modifiers closely resemble those of Java, although they provide more...

Case Class in Scala

The case class represents immutable data. It is a type of class that is often used for data storage.

parquet

Let’s Know About the Parquet File

An open source file format for Hadoop that provides columnar storage and is built from the ground up with complex nested data structures in mind.

partition

Partitions and Bucketing in Spark

Partitioning and bucketing are used to improve the reading of data by reducing the cost of shuffles, the need for serialization, and the amount of network traffic.

pinot

postgres

postgresql

presto

prestodb

problem-solving

programming

Case Class in Scala

The case class represents immutable data. It is a type of class that is often used for data storage.

Defining Variables Using the `def` Keyword in Scala

Difference between `lazy val` and `def`.

Rust’s Ownership and Borrowing Enforce Memory Safety

Rust's ownership and borrowing features prevent us from experiencing memory-related problems. Rust is a great choice when performance matters and it solves pain points that bother many other languages.

remote-procedure-call

reverse-etl

root-key

Envelope Encryption

Envelope encryption is a way of encrypting plaintext data using a key and then encrypting that key using an another key. This strategy is intended not just to make things...

rpc

rust

Rust’s Ownership and Borrowing Enforce Memory Safety

Rust's ownership and borrowing features prevent us from experiencing memory-related problems. Rust is a great choice when performance matters and it solves pain points that bother many other languages.

scala

Access Modifiers in Scala

Access modifiers, also known as access specifiers, determine the accessibility and scope of classes, methods, and other members. Scala's access modifiers closely resemble those of Java, although they provide more...

Case Class in Scala

The case class represents immutable data. It is a type of class that is often used for data storage.

Defining Variables Using the `def` Keyword in Scala

Difference between `lazy val` and `def`.

scala-collections

service-level-agreement

How To Set SLA in Apache Airflow

Apache Airflow enables us to schedule tasks as code. In Airflow, a SLA determines the maximum completion time for a task or DAG. Note that SLAs are established based on...

shuffling

singleton-pattern

Singleton Pattern

A singleton pattern limits the number of instances of a class to one.

sla

How To Set SLA in Apache Airflow

Apache Airflow enables us to schedule tasks as code. In Airflow, a SLA determines the maximum completion time for a task or DAG. Note that SLAs are established based on...

solid

sql

terraform

Terraform Basics

Terraform is an open source infrastructure-as-code tool that allows us to programmatically provision the physical resources required for an application to run.

workflow-engine

How To Set SLA in Apache Airflow

Apache Airflow enables us to schedule tasks as code. In Airflow, a SLA determines the maximum completion time for a task or DAG. Note that SLAs are established based on...