MK Study Journal: June 2019

Tuesday, 11 June 2019

Amazon Web Services

Amazon Web Services provides different cloud computing infrastructure.

It provides different IT resources on demand.

AWS provides different services-

1. IAAS(Infrastructure as a service)

2. PAAS(Platform as a service)

3. SAAS(Software as a service)

AWS provides different services like –

1. Computing

2. Storage and Content delivery

3. Database

4. Security

5. Data Migration

6. Networking

7. Messaging

8. Application Services

9. Management tools

AWS Computing –

AWS EC2 -

EC2 is a RAW server, which is resizable as per the needs. Resizable as when required.

AWS Lambda –

Lambda does not execute the application, it’ is used to execute the background task of application. Like uploading an image.

AWS Elastic Beanstalk-

It is used to host an application. It is an automated from of EC2, create the environment and deploy the code. Most of the software are already installed, just need to select the required software like for java application select java and create the environment and deploy you application while in EC2 nothing is preinstalled, it’s completely raw server.

AWS Elastic load Balancing-

Elastic load balancing is used to distribute the workload on the deployed instances.

AWS Autoscaling –

Autoscaling is automatically handle the scale based on traffic, autoscaling and load balancing works in parallel.

AWS Storage Services:

1. S3 is object oriented file system. Create a bucket folder and then file can be uploaded.

2. CloudFront - Caged the website near to user location to reduce response time.

3. Elastic block storage- block level storage. One EC2 can connect to multiple EBS.

4. Glacier - Data archiving service.

5. Snowball –Snowball is way to transferring data form/to AWS infrastructure

Physical data transfer.

6. Storage Gateway works between data center and AWS cloud. Storage gateway take care of data center failure.

AWS Database –

1. RDS – Manages database, auto update, security patch etc.

2. Aurora – Amazon mySql which faster the mysql.

3. DynamoDB – Manages No-SQL db

4. ElastiCache - Distributed caching service

5. RedShift - Data ware house services, which is used to analysis of data. Analytic tool.

AWS Networking –

1. VPC – Virtual private cloud, manage VPN for individual application cloud

2. Direct connect - direct connect to AWS

3. Route 53 - domain name system.

AWS Security Service –

1. IAM- Identification and authentication management

2. KMS - Key management service (Public/Private key infrastructure)

AWS Application Service –

1. SES- Bulk emailing service

2. SQS- Simple Queue service

3. SNS- Simple notification service

AWS Management Service –

1. Cloud watch

2. Cloud Formation - create could snapshot from existing cloud.

3. Cloud Trail - logging service

4. CLI – Command line interface of AWS

5. OpsWorks – Configuration management (Stack , layers)

6. Trusted Advisor – personal advisor from AWS

Security Group for EC2
Create a private key pair for EC2, follow below steps
Navigate to aws console aws-console
Navigate to EC2 dashboard and click on key-pair

Click on Create Key Pair button and provide kay oair name

Now click on create it will download the private key file

AWS EC2 –

Follow below step to create EC2 instance-

Navigate to aws console aws-console

Select EC2 and navigate to EC2 dashboard

Here you can see all running EC2 instances existing key pair of secutrity groups.

To create new Ec2 instance click on launch instance as highlighted below

Now it will navgate to the EC2 configuration window
First select appropiate os image, selecting windows server
Now it will ask further configuration for
select type of instance and select t2.micro(free tier) and than select no of instance as follows-

Now click on add storage & selct storage size by default it is 30 GB

Now we can click on Add tag to add key value pair for this instance.
Than click on review and launch with default setting.
review screen -

Click on launch button and select a existing security group or you can create with above mention step in this tutorial.

Now lunch the EC2 instances, after launching navigate to EC2 dashboard and select running instance, here you can see all running instances

Once the instance will up and running the status check will 2/2 check, now select the instance and below you can see the Ip, now we can connect with remote desktop to this EC2 instance with your security group.

Monday, 10 June 2019

Apache Spark

Apache Spark is fast cluster computing technology. It is based on Hadoop MapReduce and it extends the MapReduce model. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.

Spark provide a high level API in java, scala, python and R. Spark. It provides a shell in scala and python

Scala can be accessed through ./bin/spark-shell and python shell can be accessed via ./bin/pyspark

Spark is 100 time faster than hadoob, spark achieve this via parallel distributed data processing using partitions.

Spark support multiple data source like Parquet, JSON, hive and Cassandra apart from text file, csv and RDBMS

Setup Apache Spark-

Install JDK https://www.oracle.com/technetwork/es/java/javasebusiness/downloads/index.html

Set Java home, use below command to set JAVA_HOME

setx JAVA_HOME -m "Path". For “Path”, paste in your Java installation path .

Install Scala https://www.scala-lang.org/download/

setx SCALA_HOME -m "Path of scala installation dir".

Download and unzip the Apache spark https://spark.apache.org/downloads.html

setx SPARK_HOME –m “path of spark bin folder”

Download and unzip Hadoob common library

setx HADOOB_HOME –m “hadoob common bin directory”

Install python https://www.python.org/downloads/

Open command prompt and navigate to spark bin folder and run spark-shell command

Refer below snippet-

Saturday, 8 June 2019

Apache Kafka

As mention in https://kafka.apache.org/intro Apache Kafka is distributed streaming platform.

Kafka works on Publishers and Subscribers models.

Publishers publish the data into a Kafka Topic and Consumer that is listening this topic will consume this message.

Kafka used for real time data streaming.

Kafka Internal Processing.

Kafka has four core API

1. Producer API –

Publish a stream of records to one or more Kafka topics

2. Consumer API –

Subscribe to one or more topics and process the stream of records

3. Stream API –

Transforming the input streams to output streams.

4. Connector API –

Reusable producers or consumers that connect Kafka topics to existing applications or data system

Topic

A topic is a feed name to which records are published. Topics in Kafka are always multi-subscriber one or more consumer subscribe a topic.

Topic can have multiple partition. Refer below diagram

In above topic have two partition.

Kafka uses partitions to scale a topic across many servers for producer writes. In addition, Kafka also uses partitions to facilitate parallel consumers. Consumers consume records in parallel up to the number of partitions.

Kafka Topic Log Partition’s Ordering and Cardinality

Kafka maintains record order only in a single partition. A partition is an ordered, immutable record sequence. Kafka continually appended to partitions using the partition as a structured commit log. Records in partitions are assigned sequential id number called the offset. The offset identifies each record location within the partition. Topic partitions allow Kafka log to scale beyond a size that will fit on a single server. Topic partitions must fit on servers that host it, but topics can span many partitions hosted on many servers. In addition, topic partitions are a unit of parallelism - one consumer in a consumer group can only work on a partition at a time. Consumers can run in their own process or their own thread. If a consumer stops, Kafka spreads partitions across the remaining consumer in the same consumer group.

Kafka can replicate partitions across a configurable number of Kafka servers, which is used for fault tolerance. Each partition has a leader server and zero or more follower servers. Leaders handle all read and write requests for a partition.

Kafka Setup

1. Install JDK https://www.oracle.com/technetwork/es/java/javasebusiness/downloads/index.html

2. Set Java home, use below command to set JAVA_HOME

setx JAVA_HOME -m "Path". For “Path”, paste in your Java installation path .

3. Install zookeeper download tar file - http://zookeeper.apache.org/releases.html

4. Start zookeeper server zkServer.bat

5. Start zkCLI

6. Download Apache kafka - https://kafka.apache.org/quickstart

7. Start kafak server with below command

kafka-server-start.bat ..\..\config\server.properties

8. Now open a separate command prompt and run below command to create topic

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topic-name

9. Use below command to fetch list of all topics

kafka-topics --list --zookeeper localhost:2181

10. To publish the data from command line use below command

kafka-console-producer --broker-list localhost:9092 --topic topic-Name

11. To consume the message from Kafka topic use below command

kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic <topic-name>