Basics of DNS and Beyond

Published on September 12, 2024

Preface

The DNS (Domain Name System) has always been a fascinating concept for me, both in terms of its implementation and complexity. The concept in its entirety is neither too simple to seem trivial nor too complex to seem difficult and overly complicated. For me, it sits right in the sweet spot of the complexity spectrum. I hope you too find it that way.

But what is DNS?

DNS is often called the Phone book of the Internet. But if you are from my generation (or younger), you've most likely never seen one.
It is analogous to the Contacts App on our phones. The app translates our Contact Names to Phone Numbers so that we don't have to remember the number of each of our contacts.
Similarly, DNS translates the URLs that we enter into our search bars (like youtube.com) into IP addresses (like 173.231.205.196) of computers connected to the internet belonging to that website. Hence, it does the heavy lifting of remembering (or finding out) the IP addresses of all the websites on the Internet.

How does it work?

DNS works on the concept of Domain Names. A Domain name is like a hierarchical contact name. It is divided into sections as such:

Domain name components — Credits: bigrock.in

The rightmost part is called the Top Level Domain (TLD), before that is the Second Level Domain (SLD) and the part(s) before that are subdomains (except the protocol).

How are Domain names resolved?

The domain name resolution process is also known as DNS Lookup and contains quite a few middlemen. But we (our client, eg: browser) only have to interact with a software called DNS resolver. It's the resolver that does the dealing with the middlemen and finds the IP address for you.

Middlemen involved

Root nameserver: The root server is the first step in the lookup process. It contains the addresses of all the TLD nameservers.
TLD nameserver: The TLD server contains the address of the authoritative nameservers of all domain names registered under it. For example, sample.com's authoritative nameserver address will be stored in .com TLD servers and sample.in's in .in TLD servers.
Authoritative nameserver: The authoritative nameserver is the last stop in the nameserver query. It contains all records related to the domain name and its subdomains. If the authoritative name server has access to the requested record, it will return the IP address for the requested hostname back to the DNS Resolver.

DNS Nameserver hierarchy — Credits: Cloudflare

Root Servers of the Internet

The root servers are divided into 13 zones containing hundreds of servers each. Each zone has an IP address and is maintained by the following organizations:

Root Server	IPv4 Address	Operated by
a.root-servers.net	198.41.0.4	Verisign, Inc.
b.root-servers.net	170.247.170.2	University of Southern California, Information Sciences Institute
c.root-servers.net	192.33.4.12	Cogent Communications
d.root-servers.net	199.7.91.13	University of Maryland
e.root-servers.net	192.203.230.10	NASA (Ames Research Center)
f.root-servers.net	192.5.5.241	Internet Systems Consortium, Inc.
g.root-servers.net	192.112.36.4	US Department of Defense (NIC)
h.root-servers.net	198.97.190.53	US Army (Research Lab)
i.root-servers.net	192.36.148.17	Netnod
j.root-servers.net	192.58.128.30	Verisign, Inc.
k.root-servers.net	193.0.14.129	RIPE NCC
l.root-servers.net	199.7.83.42	ICANN
m.root-servers.net	202.12.27.33	WIDE Project

How the resolver interacts with these servers is the lookup process.

DNS Lookup Process

A user types ‘example.com’ into a web browser and the query travels into the Internet and is received by a DNS recursive resolver.
The resolver then queries a DNS root nameserver (.).
The root server then responds to the resolver with the address of a Top Level Domain (TLD) DNS server (such as .com or .net), which stores the information for its domains. When searching for example.com, our request is pointed toward the .com TLD.
The resolver then makes a request to the .com TLD.
The TLD server then responds with the IP address of the domain’s nameserver, example.com.
Lastly, the recursive resolver sends a query to the domain’s nameserver.
The IP address for example.com is then returned to the resolver from the nameserver.
The DNS resolver then responds to the web browser with the IP address of the domain requested initially.

After receiving the IP address, the client makes the request directly to the desired web server.

You can perform lookups from your browser and see the result by going to chrome://net-internals/#dns.

But carrying out this entire process every time I wish to visit a website seems excessive, right?

Speeding up DNS: Caching

DNS caching involves storing data closer to the requesting client so that the DNS query can be resolved earlier and additional queries further down the DNS lookup chain can be avoided. It improves load times and reduces bandwidth/CPU consumption.

DNS records can be cached at multiple places, your browser, your operating system, and at the servers along the lookup chain. The cached records are also accompanied by a time-to-live (TTL) value, after which the cached records expire and need to be refetched.

I hope by now you have a general idea of what DNS is and how it works. Now let's go beyond...

Beyond the basics

DNS works on the UDP protocol (to prioritise efficiency) and typically runs on port 53 (not mandatory). The DNS Protocol is largely documented in the RFC 1035 Internet Standard titled DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION. We will refer to this documentation here onwards.

Architecture

The typical Domain name system configuration looks like this:

                 Local Host                        |  Foreign
                                                   |
    +---------+               +----------+         |  +--------+
    |         | user queries  |          |queries  |  |        |
    |  User   |-------------->|          |---------|->|Foreign |
    | Program |               | Resolver |         |  |  Name  |
    |         |<--------------|          |<--------|--| Server |
    |         | user responses|          |responses|  |        |
    +---------+               +----------+         |  +--------+
                                |     A            |
                cache additions |     | references |
                                V     |            |
                              +----------+         |
                              |  cache   |         |
                              +----------+         |

The foreign name server here represents the above mentioned DNS lookup chain.
The DNS zone details and records are stored in Master Files in the nameservers. The simplest nameserver looks something like this:

                 Local Host                        |  Foreign
                                                   |
      +---------+                                  |
     /         /|                                  |
    +---------+ |             +----------+         |  +--------+
    |         | |             |          |responses|  |        |
    |         | |             |   Name   |---------|->|Foreign |
    |  Master |-------------->|  Server  |         |  |Resolver|
    |  files  | |             |          |<--------|--|        |
    |         |/              |          | queries |  +--------+
    +---------+               +----------+         |

Record Storage

The DNS is broken up into many different zones. These zones differentiate between distinct areas of the DNS namespace. A DNS zone is a portion of the DNS namespace that is managed by a specific organization or administrator and allows for more fine control.

For example, the .com zone is responsible for all the domains under it. The faizahm.com zone comes within it. The faizahm.com zone manages all the subdomains under it. If wanted further groups can be created from its subdomain for more control, if those subdomains have a big enough role. Like google.com is a zone, but maps.google.com is a subdomain of it yet it is a large enough sector to warrant its own zone and be managed by the GMaps team.

The details for each zone are stored in a zone file. An example zone file would look like this:

$TTL 86400    ; Default Time To Live (TTL) in seconds
@   IN  SOA   ns1.example.com. admin.example.com. (
            2024091201 ; Serial number (must increment with each change)
            3600       ; Refresh interval
            1800       ; Retry interval
            1209600    ; Expiry time
            86400 )    ; Negative caching TTL

    IN  NS    ns1.example.com.       ; Name server for example.com
    IN  NS    ns2.example.com.       ; Secondary name server

ns1 IN  A     192.168.0.1            ; IP address of ns1.example.com
ns2 IN  A     192.168.0.2            ; IP address of ns2.example.com

www IN  A     192.168.0.3            ; IP address of www.example.com
mail IN  A    192.168.0.4            ; IP address of mail server

The sample zone file contains the following:

Start of Authority (SOA) record: stores important information about a domain or zone such as the email address of the administrator, when the domain was last updated, and how long the server should wait between refreshes.
Resource Records (RRs): Address of nameservers and subdomains associated with the domain.

Note
In this blog, I’ve focused on the most common type of DNS record, the A record. However, there are many other useful record types as well. You can check them out here.

Messages

All communications inside of the domain protocol are carried in a single format called a message. The top level format of the message is divided into 5 sections (some of which are empty in certain cases) shown below:

    +---------------------+
    |        Header       |
    +---------------------+
    |       Question      | the question for the name server
    +---------------------+
    |        Answer       | RRs answering the question
    +---------------------+
    |      Authority      | RRs pointing toward an authority
    +---------------------+
    |      Additional     | RRs holding additional information
    +---------------------+

Header section format

The header contains the following fields:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      ID                       |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    QDCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    ANCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    NSCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    ARCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

where:

ID              A 16 bit identifier assigned by the program that
                generates any kind of query.  This identifier is copied
                the corresponding reply and can be used by the requester
                to match up replies to outstanding queries.

QR              A one bit field that specifies whether this message is a
                query (0), or a response (1).

OPCODE          A four bit field that specifies kind of query in this
                message.  This value is set by the originator of a query
                and copied into the response.  The values are:

                0               a standard query (QUERY)

                1               an inverse query (IQUERY)

                2               a server status request (STATUS)

                3-15            reserved for future use

AA              Authoritative Answer - this bit is valid in responses,
                and specifies that the responding name server is an
                authority for the domain name in question section.

                Note that the contents of the answer section may have
                multiple owner names because of aliases.  The AA bit
                corresponds to the name which matches the query name, or
                the first owner name in the answer section.

TC              TrunCation - specifies that this message was truncated
                due to length greater than that permitted on the
                transmission channel.

RD              Recursion Desired - this bit may be set in a query and
                is copied into the response.  If RD is set, it directs
                the name server to pursue the query recursively.
                Recursive query support is optional.

RA              Recursion Available - this be is set or cleared in a
                response, and denotes whether recursive query support is
                available in the name server.

Z               Reserved for future use.  Must be zero in all queries
                and responses.

RCODE           Response code - this 4 bit field is set as part of
                responses.  The values have the following
                interpretation:

                0               No error condition

                1               Format error - The name server was
                                unable to interpret the query.

                2               Server failure - The name server was
                                unable to process this query due to a
                                problem with the name server.

                3               Name Error - Meaningful only for
                                responses from an authoritative name
                                server, this code signifies that the
                                domain name referenced in the query does
                                not exist.

                4               Not Implemented - The name server does
                                not support the requested kind of query.

                5               Refused - The name server refuses to
                                perform the specified operation for
                                policy reasons.  For example, a name
                                server may not wish to provide the
                                information to the particular requester,
                                or a name server may not wish to perform
                                a particular operation (e.g., zone
                                transfer) for particular data.

                6-15            Reserved for future use.

QDCOUNT         an unsigned 16 bit integer specifying the number of
                entries in the question section.

ANCOUNT         an unsigned 16 bit integer specifying the number of
                resource records in the answer section.

NSCOUNT         an unsigned 16 bit integer specifying the number of name
                server resource records in the authority records
                section.

ARCOUNT         an unsigned 16 bit integer specifying the number of
                resource records in the additional records section.

Question section format

The question section is used to carry the "question" in most queries, i.e., the parameters that define what is being asked. Format:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                                               |
    /                     QNAME                     /
    /                                               /
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     QTYPE                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     QCLASS                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

where:

QNAME           a domain name represented as a sequence of labels, where
                each label consists of a length octet followed by that
                number of octets.  The domain name terminates with the
                zero length octet for the null label of the root.  Note
                that this field may be an odd number of octets; no
                padding is used.

QTYPE           a two octet code which specifies the type of the query.
                The values for this field include all codes valid for a
                TYPE field, together with some more general codes which
                can match more than one type of RR.

QCLASS          a two octet code that specifies the class of the query.
                For example, the QCLASS field is IN for the Internet.

Resource record format

The answer, authority, and additional sections all share the same format: a variable number of resource records, where the number of records is specified in the corresponding count field in the header. Each resource record has the following format:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                                               |
    /                                               /
    /                      NAME                     /
    |                                               |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      TYPE                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     CLASS                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      TTL                      |
    |                                               |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                   RDLENGTH                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
    /                     RDATA                     /
    /                                               /
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

where:


NAME            a domain name to which this resource record pertains.

TYPE            two octets containing one of the RR type codes.  This
                field specifies the meaning of the data in the RDATA
                field.

CLASS           two octets which specify the class of the data in the
                RDATA field.

TTL             a 32 bit unsigned integer that specifies the time
                interval (in seconds) that the resource record may be
                cached before it should be discarded.  Zero values are
                interpreted to mean that the RR can only be used for the
                transaction in progress, and should not be cached.

RDLENGTH        an unsigned 16 bit integer that specifies the length in
                octets of the RDATA field.

RDATA           a variable length string of octets that describes the
                resource.  The format of this information varies
                according to the TYPE and CLASS of the resource record.
                For example, the if the TYPE is A and the CLASS is IN,
                the RDATA field is a 4 octet ARPA Internet address.

Now that we are well versed with th Domain Name System, let's look at ways it can be hacked.

DNS Security Threats

Typosquatting

This is the most common threat concerning domain names. It is based on tricking the human operating the computer system, rather than hacking the system itself.

It is the practice of registering a domain name that is confusingly similar to an existing popular domain name and fooling the user into mistaking it for the real one.

Typosquatting examples

Any unaware user would end up mistaking the fake link as legitimate and falling victim to a phishing attack. So always double-check the links you encounter in the wild before you trust or click them.

Caution
This type of attack is quite common, and even some of my friends have fallen for it. Always verify any link before clicking on it, especially those you receive in emails or text messages.

DNS Cache poisoning

This is an actual software vulnerability that takes advantage of how DNS records are cached. If the DNS server or cache is poorly configured, it can result in an attacker being able to change or inject malicious addresses into the cache, essentially “poisoning” it.

Cache poisoning

The attacker could change the address of a legitimate website to one controlled by the attacker. This would result in the user being unaware of visiting a fake website even after visiting the correct domain name. If the attacker's website looks similar enough to the legitimate website (which is quite easy to pull off), the user may end up submitting sensitive information such as login info to the attacker.

Security measures against this attack include properly configuring domain systems and implementing a protocol known as DNSSEC, which adds a digital signature to a domain name. This means that browsers and ISPs are able to validate that the DNS information they receive is authentic, rendering most cache poisoning attacks obsolete.

Data exfiltration

This is not a threat to DNS, but rather a technique used by adversaries to siphon out data from a compromised system/organization. Attackers know that transferring data from an organization they have hacked by usual means such as HTTP or FTP messages might trigger detection systems, but DNS packets, usually considered harmless, could be left unchecked by detection systems. Hence, they use DNS messages to transfer data out of the organization and potentially steal sensitive records.

How such an attack might take place:

The attacker registers the domain name ZG5ZC2VJDXJPDHKK.COM, and sets up the name server NS1.ZG5ZC2VJDXJPDHKK.COM
The infected client encodes stolen information, in this case, the text “Pa$$w0rd”, into “UGEKJHCWCMQK”
The client makes the DNS query for the domain with the encoded password as a subdomain: UGEKJHCWCMQK.ZG5ZC2VJDXJPDHKK.COM
A recursive name server finds the authoritative name server NS1.ZG5ZC2VJDXJPDHKK.COM and sends the query there.
The attacker recognizes the subdomain value as the encoded password. The attacker decodes the information UGEKJHCWCMQK back to recover “Pa$$w0rd”.

An easy way to prevent such data exfiltration is to also check the DNS queries made for any suspicious activity, such as the legitimacy of the domains they are made to.

This is almost all the information you'll ever need on the Domain Name System, the protocol that made the modern web possible.

Tip
If you still wish to learn more about the topic, Cloudflare has excellently covered it in depth in their blog series, and most of my knowledge comes from there itself.

That's all for today. I know this was a lot of information to cover in a single blog post, but I hope you found at least some of it useful. I also hope you found the topic as fascinating as me and that I was able to do justice to it.

Note
As a project, I have created an Authoritative DNS Name Server from scratch written in Python. You can check it out here if you wish.

See you soon, hopefully in some other blog 🙂.

Basics of DNS and Beyond

Table of Contents

Preface

But what is DNS?

How does it work?

How are Domain names resolved?

Middlemen involved

DNS Lookup Process

Speeding up DNS: Caching

Beyond the basics

Architecture

Record Storage

Messages

Header section format

Question section format

Resource record format

DNS Security Threats

Typosquatting

DNS Cache poisoning

Data exfiltration