distributed-kv-store/gold_30.md at main · Alearner12/distributed-kv-store

client server architecture

one side we have client like web browser mobile app or any frontend application and other side we have server a machine runs continously waiting for req

client can send queries liek store retrive modify and data and server receives the req and response accordingly but the question is how the frontend locates the server ?

here comes the ip address , like a address but in for servers client want this adress to locate the communicate with it thats's how computer find each other using ip adresss like phone number for servers

when we visit a website we do not type its ip we just enter website name we can't user and systme to memorize the random number for every service they connect to , if we migrate our serice to another server , its ip address may change leading to connection failure here comes the "dns"

instead of relying on hard to remember ip address we use something much more human friendly domain names , but we need a map so its corresponding ip adresss this is where dns or domain name systme comes in it map easy to remembebr domain name like google to its corresponding ip addresses

when we type google , our computer ask a dns server for corresponding ip adress and then dns server responds with ip your browser uses it to establish a connection to server and make a request , ping command can be used to find ip of any domain name

when we visit a website our req does not always go directly to the server sometime it passes through a proxy or reverse proxy first a proxy acts as a middleman between ur device and the internet , proxy hides our ip and but keep the flow intact , keeping our identity private , reverse proxy works other way aroun d, it intercepts client request , and forwards them based on predefine rules allowing direct access to server can pose security risks like ddos attakcs reverse proxy mitigates this by acting as controlled entry points that regulates incoming traffic and hides server ips , can also acts a load balancer distribute traffic across multiple server keeping the uniformity or avoiding single point of failure reverse proxy most famous example are cloudflare , prevents website from brute force attack or ddos attacks and also hides server ip , other example are nginx , and proxy can be vpn used to access blocked website , or bypass geospatial restrictions ,
latency : there is some delay whenever we connect to the server ,cause are obvious is distance or medium in which signal are travelling , like india to newyork signal needs to travel two times for that dis to get a response , this round trip is called latency high latency make application feel slow , and unresponsive , one day to reduce latency to deploying our service across multiple data centres across worldwide and whenever a request comes in try to redirect it to nearest server , some of routing technique may include geodns location based dns , return ip of closest data centre based on user location used by aws route s3 , cloudfrae , google cloud dns b. latency based dns routing , dns chooses dc with lowest real latency . it continously measure , ping times from many regions c. can be health aware dns if any server down move to next one other intenet automatically decides which is nearest called this type of rouuting d."border gateway protoco"l used by cloudflare , bgp decides which path to take . basically is the gps of the internet

6.http / https , website communicate to tthe server using some set of rules called http , it includes a header with some cookie contain session id and additional params , may contain extra info like data forms . server responds with http response , http has one problem send data in plain text can;t use for sensitive info , https is here with encyption all data using ssl / tls ensuring that even if someone intercepts the request they can't read or alter it http is just a protocol for tranferring data but does not define , how request should be structured , what format responses should be in or how different client should interact with the server this is where api comes in

apis , think it like middleman that allows clients web to talk to server without even worrying about the low level detail , workflow client send a request to an api api hosted on server processs the request interacts with databases or other services and prepares a response the api send back to response in a structured format usually json or xml which client understand and can display not all apis are built the same different apis style exist and serve different purpose two of the most popular ones are rest and graphql
REST APi among the different api style rest , representational state transfer is most widely used , a rest api follows a set of rules that defines how client and server communciates over http in a structured way REST is a stateless, every request is independet ,it doest not store client state resource based ,everything is treated as users orders produst uses standard http methods clients interact with resource using http method like Get - retrieves data , fetching a user profile post , creates new data eg put/ patch - update existign data delete - removes data

are simple scalable and easy to cache but have limitations return more data than needed leading to inefficient network usage . client needs to make multiple request to retreives all required information

here comes the graphql by facebook

graphql , unlike rest forces client to retrieve fix set graph is more precise nothign more nothing less ake multiple requests to different endpoints:

GET /api/users/123 → fetch user details

GET /api/users/123/profile → fetch user profile

GET /api/users/123/posts → fetch user’s posts

With GraphQL, you can combine those requests into one

server responsed with requested feild leading to unneccesary data transfer but has not easy to process on the server side and not easy to cache as rest

databases main objective of any request to consume data nowadays world is seeen and going to big data , we can't handle that much data in our memory , wo need to something big to that can handle that much data effieciently . backbone of any applications not all data bases are same there are more than 15 times of databases , can haev different scabaitity performance and consistency requirement ,choosing right type of databse for ur application is important but we do need usually to learnn each type we can talk about two here sql vs nosql
sql database stores data in tables with a strict predefined schema and follow the acid properties atomoticy - all or nothign , 2 consistence , data always , remains valid , and follow define rules 3 . isolation , do not interfere with each other 4 . durable data is saved no lost must be backtup for applicattion required consistency and structued relationships such as banking system . nosql databases are other hand are designed for high scability and performance do not required fixed schema key value stores example redis document storage , mongo db graph databases , best for high connected data used for social media maybe wide column stores , optimized for large sccale distributed data cassandra

usage of this depedsn on the type of application ,it depends , on the requirmentss many modern days uses both for ecommerce , sql stores cusotmer order in sql they require strict consistency and product recommdation in nodql , the need flexbile and fast lookups

vertical scaling as our user base grows do does the number of request hitting our applications , single server might be not enough to handle the load , but as traffic increases , the single server , become s bottleneck , slowing everthing down

add more cpu ram or storage to make powerful , mean upgrade the one existing server but has hardware limiations and cost and single point of failure then comes the horizontal scaling

horizontal scaling add more server in parallel to share the load distribute the load across the multiple machines more machines = more capacity an no SPOF but comes with new challanegs how do client know we server to connect
load balancers acts a traffic manager between client and server that distribute across multiple servers if one server fails . the load balancer automatically redirects to another healty server load balancing algorithm are
round robin , req are send sequentially one by one in aloop
least connection ,request are send to server iwht the fewest active conn
ip hashing , request comes from same ip goes to same server ,
Database indexing One of quickest and most effective way to speed up databse read queries is indexing typically crated on col that are frequently queried such as primary key foreign key and col used in where conditions , but indexes speed up reads but slow down writes , insert update delelte , since index need to be updated when the data changes helps the read perfomance but what if our singlel database even indexing is not enough how database can't handle that much requet?
Replications do same like growing traffic on serves we can scale the databse , by creating replicas or ocpies it across multiple server we have one primary database callled primary or master replicate that handlaes all the write operatins , insert delete update we have multiple read replicas that handle queries , select called (slave) when data is written is written to primary database , it gets copied to the read replicas so that they stay in sync , toghether they called master slave architectue

improves the performance that are spread across multiple replicas , reducing load and also availability since if the primary replicae fails a read replica can take over as a new primary this is great for read heavy application what about huge amoutn of data we want more write then sharind come s

sharding: for milllons of users and our database has grown to terabytes of data a single dataabase server will fail so split the db into smaller more mangable distribute them across multiple serves technique is called sharding we divide the database into smaller called shards each shard contain a subet and data is distributed on a sharding key (eg : user id)

reduce databse load and spread up read and write perfornace also callled as horizontal partitionign since splits databse into rows ,

vertical partioning imagine we have a user table that stores
profile details (name, email, profile picture)

login history (last_login, IP addresses)

and billing information (billing address, payment details)

as this table grows become slower the databse must scan many col when a request only needs a few specific feilds to optimize this we use vertical partitioning where we split user table into smaller more focused table

sounds similar to normalization but different , norm is logical level and implemented at database design level for data accuracy and data corrrectness and after that what data we want data how much data optimizer neeed to scan , we do this after we see the workload , then after that query optimizaiton comes in at run time which table look for what hash and index need to touch

User_Profile → Stores name, email, profile picture User_Login → Stores login timestamps. User_Billing → Stores billing address, payment details.

This improves query performance since each request only scans relevant columns instead of the entire table.

in summary , norm defines how the data looks like partitioning where the data lives and query optimizairiotn decidees howt to fetch it fastest

but still no mattee how much optimize the database retrives data from disk will be slower than retrivering from memory this is called caching
caching , is used to optimize the performace of a system by storing , frequently access adata in memory instead of repeadately fetching it from databse

famous cache aside pattern 1 user reeques a data the application chekc the data if it in cache return it , if not search in database return and put that data in cache for future request , to prevent outdata or garbage data to be served used time to live ttl an expiration time set on cahche data gets automatically refreshed after certian period

Denormalization we look normalization is neccearyfor decreasing data redudancy and for data correctoness but when we want data across multiple normalised table we have to join them , and it creates overhead nad mkae it query slower

denormalization reduces the nunber of joins by combign related data into single it means some data gets corrupeted , denormalisation is used in read heavy applications where speed in more critical downside it leds to increase storage

CAp theorem as we scale theeough servers , databases and data centre we enter the world of the distributed systems , fundamental principle of distributed system is cap theorem , no distributed sysem can acheive all three of the following at the same time consistency => every node always return the most recent data availavility -> the system alwasy responds to request , even if some nodes are down , even if somendes are down , but the data may not be latest partition tolerance the system continues operating evven if there is network failures between nodes

network failure are inevitable we must choose between we mustchoose ,, const + patiiton tolenrance (cp) ensure every req dgets the latest dabut may reject request during failures , sql databse like mysql availablity + parititon tolerance(AP) = > ensues the system alwyas response even if some data is stale , ex no sql databases like casandra and dynamo db

in distributed nosql database acheibinf instant consistenc across all servers is too slow instrad we use eventual consistency which measns not all nodes are updated isntantly , but given enough times they will eventually sync and return the same data this allow system to remain highly availbale , even if extreme condiition

. when user updates data in one replica then system achnowlegd the update ensuring high avavibility and update and then propagted asycnhrnously to other replicaes , after a short delay all replicas have the latest data , ensuring consistency over time

modern apps do not just need to store text they also need handle images , vids and pdfs and other large files but here is the problem traditional db are not for unstructurd data so we use blob stoage a highly scalable and cost effectieve way to stotr alrge unstrucute files in the cloud , thesee blobs arre stored inside logical containers or buckets in the cloud , each file get unique url makeing it easier to retrive and serve over thewbe advanatages scalability petabytese of data , payas u going , atutomatic , replication , easy access using rest apis common use is to stream audio or video files to user appliation in real time but straming directly from blob storage can be slow , if it stored in distant location
for example , u are in inda ,to watch netflix , and that are hosted in californiia would cause buffering and slow load times due to distance gap so content delievey network to user basesd ont the location , a cdn is global network of distributed servers to deliver web content html pages , js files , and imaegs and vids to users based on geographic locations instead of fetching direclty from the data centre cdn caches static content on multiple edge servers ,and display to users instead of eaching all the wya to server , since content is served form closest cnd nod users expreince faster load times with minimal buffering
web sockets , most web applications use https which follows a request- response model . 1 client sends a request
if the server processe the request , and sends a request
if thereis new data it must send another requset to retrive that

works fine to but it is too slow , for real time apps , as live chats stock market and online mulitplayer games , with http only way to get real time updates are through polling every few seconds , but polling is inefficient as most of time server load and wasted bandwidht without responses are empty ,

web sockets , allowing a two way communication between server and client over a single persistnt connection

intiates a handshake connection with the server ,and once , established the cpnnection remains open , server can push updates to aclient time without waiting for request client also can send messages websockets , enables real time communiction what what if instead of client it is another server how then my server notify that like completing payment on stripe for cursor , stripe needs to notify cursor that payment is successful or whenver pushes code to github , a ci cd system triggeer to checks

webhooks instead of constanly polling an api to check if an events has occured , webhooks allow a server to send a http request to another serveer as soon a request occurs user -> payment to stripe -> after completio -> post -> post/api/webhoos/payment http / 1.1 . webhook url and https://myapp.com/api/webhooks/payments -> app processs the incoming request payment failed or pass and updates data accordingy
microservice -> traditionally , application using a monolithic architecture where all features eg " authenication paymens order are inside large codebase if one part failed other part also failed deployemnt is risky , one bad updates can take down entire app example: Imagine an e-commerce app where the order, payment, inventory, and shipping modules are all tightly connected in a single codebase. if inventory down , an entire app could go donw it can be work to small application but not for large applications

the sol is to break down ur application into smaller independent blocks called micro services , that work together each microservice handle single responsibilty , has its own database and logic and scale indepentdenlt communciate with ther microservices with messages queue and apis however when multople microservices need to communicates directt apis are not always efficient may cause .

in a microservvices based system . if one service is slow down everthing waits high trafiic can overlaead a single service synchronous communication waiting for immediate response does not scale well a msg queue enables service to communicate asynchroously allowing request to ber processs without blockign each other

A producer (e.g., checkout service) places a message in the queue (e.g., "Process Payment").

The queue temporarily holds the message until a consumer (e.g., payment service) is ready to process it.

The consumer retrieves the message and processes it.

examples included , rabbitmq , apache kafka , amazon sqs using msg queue we can prevent overload on internal service within our requesst what about request comes from our system

but we do we prevent overload from public apis , and services we deploy we use rate limiting
bots making thousand of request to ur wesbite crash ur website , increase cloud cost , and degrade perfomacne rate limiting request number no of requst send by client within a specific time frame , can be checked via

100 request per minutes per IP if eceed the limit block additional request temporatloy and return an error too many request algorithm included 1 . fixed window -> limits request based on a fixed time window 100 req per minutes 2. sliding window -> more flexible bersion that dynamically adjust limits to smooth out request burst 3. token bucket - > users get token for request which replenishover time a fix rate

29 . api gateways : an api gateway is a cenralized service that handles authentication rate limiting logging and monitoring and rrequest routing , imagine a ms based application with multiple service instead of exposing each sericve directly ,it is combination of reverse proxy + api maaganeent layer , that also do , authenication , authrization , request validatoo , quotas , analytics, api keys validations, The client sends a request to the API Gateway. The Gateway validates the request (e.g., authentication, rate limits). It routes the request to the appropriate micro-service. The response is sent back through the Gateway to the client.

example are kong , apigee , aws api gateway

idempotency in distributed system network failures and retry are common if a use accidenlty refesh payment page the system migh recibes two payment instead of one , ideompotency make sure no repaeated request prodcuces the same result ,as if the req was made only once Here’s how it works: -> Each request is assigned a unique ID (e.g., request_1234). Before processing, the system checks if the request has already been handled. If yes → It ignores the duplicate request. If no → It processes the request normally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

gold_30.md

Latest commit

History

gold_30.md

File metadata and controls