Learning How the Internet Works Part Two— A Personal Journey

Hey ya’ll, I’m back! Here I am, continuing my journey through the basics of how the internet works. A quick recap so we can pick up where we left off:

  • What is the Internet? A way for us humans to send information.
  • The internet has a physical side! There are enormous amounts of cables that transport our data.
  • Circuit Switching + Packet-switching are different ways our data is moved from one machine to the next.
  • Our data knows where to go because it has an IP address.
  • The Client-Server Model is the process of a client requesting information from a Server and the Server receiving and responding to that request.
A room full of servers
Photo by İsmail Enes Ayhan on Unsplash

Alright, let’s get back on track. We type a URL in our browser(client) and then it checks a cache to find the appropriate DNS record that matches said URL. So what is this cache?

A cache is a storage location! It’s used to hold temporary data so that everything runs faster. Typically, when you visit a website for the first time, your browser will store a copy of it on your own machine via a browser cache! That way, next time you visit the page, your browser(whether that be Chrome or Safari) doesn’t need to pull all the information again because it already has it on hand. Think of it as your browser’s memory of sorts. So when we make that initial request, you browser actually searches a few different caches:

  1. Your browser will check its own cache (duh)! As we previously mentioned, your browser keeps a repo of DNS records of sites you’ve previously visited.
  2. Next, it will search your computers own cache.
  3. If it can’t find the correct DNS record there, it will then check your router.
  4. Finally, if all else fails, it will reach out to your Internet Service Provider’s cache(ISP). If the DNS record is not in their cache, they will perform their own search to find the appropriate IP address.¹

Lots of steps! Lots of your information floating out in the world…perhaps thats a topic for a different blog! After it receives the appropriate record to retrieve the right IP address, it’s time to establish the TCP connection (Transmission Control Protocol) with the server!

TCP/IP

TCP + IP are besties

Before we dive into the next section, I wanna bring our attention to the “IP” of “IP Address.” So as discussed in Part One, IP stands for Internet Protocol. Why does the internet have a protocol? Well, because it provides a set of standards that governs the way certain actions are performed and how data is formatted in order for devices to communicate with one another. You will often see TCP + IP together because they are part of a protocol suite — A protocol suite is a collection of protocols that are designed to work together. For the Internet Protocol suite, TCP/IP are the most commonly used protocols. Where the IP is used to address and send the data packets to the correct recipient, the TCP’s job is to establish a connection between the two devices and maintain it while the data is transmitted.

How exactly is the TCP connection established? To begin, the client sends a SYN packet (short for synchronize) to the server, asking for a connection to be established. The server then agrees by sending back a SYN-ACK packet (ack standing for acknowledgement). Last but not least, the client sends its own ACK back to the server letting it know it received its acknowledgement. At this point, the client can begin transferring data.²

Great summary of a TCP connection

Protocols after protocols after protocols….

Alright, we got the IP address of Amazon.com, we made connection and now we need to get the data so we can actually see the website. In order to get that data, the client (our side) needs to send a HTTP GET request over to the server. What does HTTP stand for? Another protocol?! Oy vey!

It stands for Hypertext Transfer Protocol. It’s used to structure the way requests and responses are transmitted on the internet. We can actually take a look at one of these requests in our Network tab from the browser console! While browsing the big list of HTTP requests sent out when I visited Amazon, I was curious why so many? I thought it was a simple GET request (like the ones I use when working on my own projects?)

After some research, I found out thats because our websites now are far more than just HTML; they are filled with images, Javascript, styling sheets, etc. In addition to the complexity of modern sites, many elements of a webpage don’t reside on the same server! Many companies choose to use the CDN (Content Delivery Network) libraries, which are used to mitigate the task of transferring data to different groups of servers in order for the internet to continue to run optimally. CDNs also hold their own caches! The other benefit of using CDNs is to help prevent DDOS attacks! I don’t want to rabbit hole further than I have, but DDOS is ‘a distributed denial-of-service’ attack, aimed at disrupting the normal traffic of a targeted server, service or network by overwhelming it or its surrounding infrastructure with a flood of Internet traffic.³