Skip to main content

All About SQL(Structured Query Language) part one

Introduction

SQL i.e. Structured Query Language is a querying language for relational database. SQL has many flavors some of which are MSSQL, MYSQL, SQLite, PostgreSQL etc. All of these are for manipulation and management of relational databases. SQL provides set of commands that help interact with the database such as creating, modifying and querying data. Like any other languages SQL has levels of learning, 
i.e. beginning, intermediate and advance. We will talk about thin brief later on.
With SQL we can manage our data in more organized form like in table easy for us to query them when ever we need such as deleting, inserting, update. 

Well SQL does have many types and variation of its own and each have their own purpose. They are 
  • Data Declarative Language (DDL) --- Create, Alter, Drop
  • Data Manipulation Language (DML) --- Select, Insert, Update, Delete
  • Data Control Language (DCL) --- Grant, Revoke
  • Transaction Control Language (TCL) --- Commit, Rollback, Savepoint
  • Data Query Language (DQL) --- Select
As I said, all of these have their purpose but we won't go in detail of each, how ever i have mentioned their respective commands.

Learning Level of SQL

Basic

For beginners, SQL can be started from gaining theoretical knowledge, like what is schemas, relational database, tables, row, column, primary key, foreign key etc.
Lets start from relational database, RDB is a kind of database which is organized and stored in table form i.e in form of rows and columns.

This is a simple example of a data of fruits in tabular form, id and name are the columns and the values such as 1, Apple, 2, Banana in each line are row. To enter these data in SQL database we need to prepare database schema. A schema is like a guide that helps user to insert data into the table and also manipulate them. For above data the schema is :

create table fruits(
id int,
name varchar(30)

);


Don't worry we will go through these commands as well. The schema is like a blue print to the database which helps us to identify the data types of each entity as well like, id is an integer where as name is a string which in SQL is denoted by varchar.

Moving on to keys, primary key is a column or the set of columns that uniquely identifies each rows or records of table. This really helps in querying. 


Here customerid is a primary key and the command for this is:

CREATE TABLE Customers ( 

        CustomerID INT PRIMARY KEY,
        FirstName VARCHAR(50), 
        LastName VARCHAR(50), 
        Email VARCHAR(100) 
        
);

Moving on to the foreign key, it is a type of key that refers to the primary key of another table. Foreign key concept is generally used while working with joins.

Well as we see customerid acts as foreign key in the order table which indeed is a primary key in customer table.
While we know SQL can be used to manipulate data in database, we need the background of how SQL works. Data manipulation includes inserting, updating, deleting, filtering, applying aggregation functions etc. We have different commands or say SQL syntax for all of these data manipulation tasks, which is a separate topic for basics practical.

Conclusion

Hence, these are some theoretical backgrounds for beginners to get started, there are much more to cover that includes practically manipulating data with basics set of commands, which I will continue further in next blog.



Comments

Popular posts from this blog

Docker With PostgreSQL

Brief Introduction to Docker img src Docker is a platform as a service (PaaS) for developing, shipping and running applications in containers. By containers, i mean, a lightweight, portable and self-sufficient unit capable of running applications and their dependencies. It is a runnable instance of a Docker image. Docker container runs inside a host machine but completed isolated from host machine. Importance of Docker img src Some of the importance of docker are highlighted as: Dependency Management As data engineers we work on various pipelines and applications. So, to simplify the dependency and make sure we don't end up with conflict across different environment, it is necessary to encapsulate the dependencies of applications in a container and generate an image for it. Portability Docker containers are portable and can be easily moved between different environments, which makes it simpler to deploy and share pipelines and applications across teams or organizations easily. Envi...

Make Automation

We can create a Makefile in Python which helps to automate tasks like running your Python scripts, testing, or building your project.  A Makefile is a file used by the make build automation tool to define a set of tasks that should be executed. Each task is written as a target, and it can depend on other tasks or files. In general, structure of makefile are: Target Dependencies Commands Ok, lets understand this with help of an example. Makefile (has no extensions) # Define variables for system Python and pip for virtual environment SYSTEM_PYTHON = python3 VENV_PYTHON = myvenv/bin/python VENV_PIP = myvenv/bin/pip # Target: venv # Creates a virtual environment in 'myvenv' directory using system Python. venv : $(SYSTEM_PYTHON) -m venv myvenv # Target: install # Installs dependencies if 'requirements.txt' exists. install : venv $(VENV_PIP) install -r requirements.txt; # Target: all # Runs venv, install, and run targets. all : venv install In example above...

Pyspark Transformation and Actions

 Pyspark Transformation and Actions RDD Transformation Examples Some of the examples of RDD transformations are: flatmap() map() reduceByKey() filter() sortByKey() RDD Actions Example some of the examples of RDD actions are: count() first() max() reduce() take() collect() lets us understand by code how RDD is fault tolerant, immutable, how lazy evaluation works and also understand its distributed nature. lets define a RDD , while converting a data to RDD, it is not directly loaded in memory, first of all  a lineage graph is made for the data, such that even if any node crashes in future, data can be recreated. This lineage graph helps in fault tolerance by allowing Spark to recreate lost data partitions in case of any reason causing node failure. RDDs are distributed across multiple nodes in a cluster, allowing parallel processing of data records. rdd = sc.parallelize([ 1 , 2 , 3 , 4 , 5 ]) the immutability nature of RDD can be understand such that, notice now we didnt changed...