We can create a Makefile in Python which helps to automate tasks like running your Python scripts, testing, or building your project. A Makefile is a file used by the make build automation tool to define a set of tasks that should be executed. Each task is written as a target, and it can depend on other tasks or files. In general, structure of makefile are: Target Dependencies Commands Ok, lets understand this with help of an example. Makefile (has no extensions) # Define variables for system Python and pip for virtual environment SYSTEM_PYTHON = python3 VENV_PYTHON = myvenv/bin/python VENV_PIP = myvenv/bin/pip # Target: venv # Creates a virtual environment in 'myvenv' directory using system Python. venv : $(SYSTEM_PYTHON) -m venv myvenv # Target: install # Installs dependencies if 'requirements.txt' exists. install : venv $(VENV_PIP) install -r requirements.txt; # Target: all # Runs venv, install, and run targets. all : venv install In example above...
Pyspark Transformation and Actions RDD Transformation Examples Some of the examples of RDD transformations are: flatmap() map() reduceByKey() filter() sortByKey() RDD Actions Example some of the examples of RDD actions are: count() first() max() reduce() take() collect() lets us understand by code how RDD is fault tolerant, immutable, how lazy evaluation works and also understand its distributed nature. lets define a RDD , while converting a data to RDD, it is not directly loaded in memory, first of all a lineage graph is made for the data, such that even if any node crashes in future, data can be recreated. This lineage graph helps in fault tolerance by allowing Spark to recreate lost data partitions in case of any reason causing node failure. RDDs are distributed across multiple nodes in a cluster, allowing parallel processing of data records. rdd = sc.parallelize([ 1 , 2 , 3 , 4 , 5 ]) the immutability nature of RDD can be understand such that, notice now we didnt changed...