Skip to main content

Posts

Showing posts from May, 2023

Data Engineering and Tools

What is Data? Before jumping to Data Engineering, let's start from what actually is Data. Well in today's world data is everything and everything is data. While scrolling social media, playing games online, surfing internet etc., terabytes of data are being processed and stored. Data nowadays are asset to people, So these assets need to be well managed for taking more benefits as we can. Data Lifecycle If we take a look at impact of data in every gadget and tech we use, we never actually think how the data is reaching us in such form and from where? Data goes through many processes for it to be used in real world some data are used as it is created like in real time system whereas some are used better with certain time intervals. First data is collected from various sources then it is ingested to the pipeline and then it is stored . The data is the processed i.e. converted to the form as desired by the users and used for analysis. The processing storing and analyzing steps are ...

ETL using AWS Glue

Basic Idea ETL is the core task of Data Engineers. Extracting Data from various sources whether it is streaming data or historical data, then transforming it into suitable form such that it can be used as per business requirements. Finally, data is loaded into suitable storage space.  In this blog we will see the ETL pipeline using AWS Glue.  AWS GLUE Aws Glue is a fully managed serverless ETL pipeline services. Serverless means that developers or users can build ad run applications without having to manage servers. It is totally managed by AWS and follows pay as you use kind of facilities. AWS is use to prepare and transform data for analytics and other processing tasks. It simplifies the process of data cleaning, data transformation into desired format and so on.  Let's understand the architecture of AWS Glue. AWS GLUE ARCHITECTURE ref:  AWS Glue concepts - AWS Glue (amazon.com) Data Stores can be anything depending upon the use case i.e. S3, Redshift etc. We load ...

Data Analysis and Manipulation of Dengue outbreak in Nepal in 2022 using Pandas

Dengue outbreak in Nepal in 2022 was really frightening. A lot of people were infected. That includes my friends and Neighbours as well. In this blog we will analyze the dengue outbreak zones, most infected zones, most infected months and try to find the cause of the outbreak. Brief Background about Dengue Dengue is a disease caused by bite of infected Aedes mosquitoes. Common symptoms of Dengue can be mild-severe fever, severe headache, joint and muscle pain, vomiting, skin rash etc. Dengue transmitting Aedes mosquitoes breed on stagnant water. So, we should remove stagnant water around the Neighbour and keep surrounding clean to prevent such diseases. Data source  The source of data for this project is taken form Government of Nepal, Ministry of Health and Population, Department of Health Services, Epidemiology and Disease Control Division. link 63c63d4f8257e.pdf (edcd.gov.np) Analysis based on District According to the provided data, all the 76 district of Nepal has reported the...