Basics of DNS and Beyond
Table of Contents
Preface
The DNS (Domain Name System) has always been a fascinating concept for me, both in terms of its implementation and complexity. The concept in its entirety is neither too simple to seem trivial nor too complex to seem difficult and overly complicated. For me, it sits right in the sweet spot of the complexity spectrum. I hope you too find it that way.
But what is DNS?
DNS is often called the Phone book of the Internet. But if you are from my generation (or younger), you've most likely never seen one.
It is analogous to the Contacts App on our phones. The app translates our Contact Names to Phone Numbers so that we don't have to remember the number of each of our contacts.
Similarly, DNS translates the URLs that we enter into our search bars (like youtube.com) into IP addresses (like 173.231.205.196) of computers connected to the internet belonging to that website. Hence, it does the heavy lifting of remembering (or finding out) the IP addresses of all the websites on the Internet.
How does it work?
DNS works on the concept of Domain Names. A Domain name is like a hierarchical contact name. It is divided into sections as such:
The rightmost part is called the Top Level Domain (TLD), before that is the Second Level Domain (SLD) and the part(s) before that are subdomains (except the protocol).
How are Domain names resolved?
The domain name resolution process is also known as DNS Lookup and contains quite a few middlemen. But we (our client, eg: browser) only have to interact with a software called DNS resolver. It's the resolver that does the dealing with the middlemen and finds the IP address for you.
Middlemen involved
- Root nameserver: The root server is the first step in the lookup process. It contains the addresses of all the TLD nameservers.
- TLD nameserver: The TLD server contains the address of the authoritative nameservers of all domain names registered under it. For example, sample.com's authoritative nameserver address will be stored in .com TLD servers and sample.in's in .in TLD servers.
- Authoritative nameserver: The authoritative nameserver is the last stop in the nameserver query. It contains all records related to the domain name and its subdomains. If the authoritative name server has access to the requested record, it will return the IP address for the requested hostname back to the DNS Resolver.
Root Servers of the Internet
The root servers are divided into 13 zones containing hundreds of servers each. Each zone has an IP address and is maintained by the following organizations:
Root Server | IPv4 Address | Operated by |
---|---|---|
a.root-servers.net | 198.41.0.4 | Verisign, Inc. |
b.root-servers.net | 170.247.170.2 | University of Southern California, Information Sciences Institute |
c.root-servers.net | 192.33.4.12 | Cogent Communications |
d.root-servers.net | 199.7.91.13 | University of Maryland |
e.root-servers.net | 192.203.230.10 | NASA (Ames Research Center) |
f.root-servers.net | 192.5.5.241 | Internet Systems Consortium, Inc. |
g.root-servers.net | 192.112.36.4 | US Department of Defense (NIC) |
h.root-servers.net | 198.97.190.53 | US Army (Research Lab) |
i.root-servers.net | 192.36.148.17 | Netnod |
j.root-servers.net | 192.58.128.30 | Verisign, Inc. |
k.root-servers.net | 193.0.14.129 | RIPE NCC |
l.root-servers.net | 199.7.83.42 | ICANN |
m.root-servers.net | 202.12.27.33 | WIDE Project |
How the resolver interacts with these servers is the lookup process.
DNS Lookup Process
- A user types ‘example.com’ into a web browser and the query travels into the Internet and is received by a DNS recursive resolver.
- The resolver then queries a DNS root nameserver (.).
- The root server then responds to the resolver with the address of a Top Level Domain (TLD) DNS server (such as .com or .net), which stores the information for its domains. When searching for example.com, our request is pointed toward the .com TLD.
- The resolver then makes a request to the .com TLD.
- The TLD server then responds with the IP address of the domain’s nameserver, example.com.
- Lastly, the recursive resolver sends a query to the domain’s nameserver.
- The IP address for example.com is then returned to the resolver from the nameserver.
- The DNS resolver then responds to the web browser with the IP address of the domain requested initially.
After receiving the IP address, the client makes the request directly to the desired web server.
You can perform lookups from your browser and see the result by going to chrome://net-internals/#dns.
But carrying out this entire process every time I wish to visit a website seems excessive, right?
Speeding up DNS: Caching
DNS caching involves storing data closer to the requesting client so that the DNS query can be resolved earlier and additional queries further down the DNS lookup chain can be avoided. It improves load times and reduces bandwidth/CPU consumption.
DNS records can be cached at multiple places, your browser, your operating system, and at the servers along the lookup chain. The cached records are also accompanied by a time-to-live (TTL) value, after which the cached records expire and need to be refetched.
I hope by now you have a general idea of what DNS is and how it works. Now let's go beyond...
Beyond the basics
DNS works on the UDP protocol (to prioritise efficiency) and typically runs on port 53 (not mandatory). The DNS Protocol is largely documented in the RFC 1035 Internet Standard titled DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION. We will refer to this documentation here onwards.
Architecture
The typical Domain name system configuration looks like this:
Local Host | Foreign
|
+---------+ +----------+ | +--------+
| | user queries | |queries | | |
| User |-------------->| |---------|->|Foreign |
| Program | | Resolver | | | Name |
| |<--------------| |<--------|--| Server |
| | user responses| |responses| | |
+---------+ +----------+ | +--------+
| A |
cache additions | | references |
V | |
+----------+ |
| cache | |
+----------+ |
The foreign name server here represents the above mentioned DNS lookup chain.
The DNS zone details and records are stored in Master Files in the nameservers. The simplest nameserver looks something like this:
Local Host | Foreign
|
+---------+ |
/ /| |
+---------+ | +----------+ | +--------+
| | | | |responses| | |
| | | | Name |---------|->|Foreign |
| Master |-------------->| Server | | |Resolver|
| files | | | |<--------|--| |
| |/ | | queries | +--------+
+---------+ +----------+ |
Record Storage
The DNS is broken up into many different zones. These zones differentiate between distinct areas of the DNS namespace. A DNS zone is a portion of the DNS namespace that is managed by a specific organization or administrator and allows for more fine control.
For example, the .com zone is responsible for all the domains under it. The faizahm.com zone comes within it. The faizahm.com zone manages all the subdomains under it. If wanted further groups can be created from its subdomain for more control, if those subdomains have a big enough role. Like google.com is a zone, but maps.google.com is a subdomain of it yet it is a large enough sector to warrant its own zone and be managed by the GMaps team.
The details for each zone are stored in a zone file. An example zone file would look like this:
$TTL 86400 ; Default Time To Live (TTL) in seconds
@ IN SOA ns1.example.com. admin.example.com. (
2024091201 ; Serial number (must increment with each change)
3600 ; Refresh interval
1800 ; Retry interval
1209600 ; Expiry time
86400 ) ; Negative caching TTL
IN NS ns1.example.com. ; Name server for example.com
IN NS ns2.example.com. ; Secondary name server
ns1 IN A 192.168.0.1 ; IP address of ns1.example.com
ns2 IN A 192.168.0.2 ; IP address of ns2.example.com
www IN A 192.168.0.3 ; IP address of www.example.com
mail IN A 192.168.0.4 ; IP address of mail server
The sample zone file contains the following:
- Start of Authority (SOA) record: stores important information about a domain or zone such as the email address of the administrator, when the domain was last updated, and how long the server should wait between refreshes.
- Resource Records (RRs): Address of nameservers and subdomains associated with the domain.
Note
In this blog, I’ve focused on the most common type of DNS record, the A record. However, there are many other useful record types as well. You can check them out here.
Messages
All communications inside of the domain protocol are carried in a single format called a message. The top level format of the message is divided into 5 sections (some of which are empty in certain cases) shown below:
+---------------------+
| Header |
+---------------------+
| Question | the question for the name server
+---------------------+
| Answer | RRs answering the question
+---------------------+
| Authority | RRs pointing toward an authority
+---------------------+
| Additional | RRs holding additional information
+---------------------+
Header section format
The header contains the following fields:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
where:
ID A 16 bit identifier assigned by the program that
generates any kind of query. This identifier is copied
the corresponding reply and can be used by the requester
to match up replies to outstanding queries.
QR A one bit field that specifies whether this message is a
query (0), or a response (1).
OPCODE A four bit field that specifies kind of query in this
message. This value is set by the originator of a query
and copied into the response. The values are:
0 a standard query (QUERY)
1 an inverse query (IQUERY)
2 a server status request (STATUS)
3-15 reserved for future use
AA Authoritative Answer - this bit is valid in responses,
and specifies that the responding name server is an
authority for the domain name in question section.
Note that the contents of the answer section may have
multiple owner names because of aliases. The AA bit
corresponds to the name which matches the query name, or
the first owner name in the answer section.
TC TrunCation - specifies that this message was truncated
due to length greater than that permitted on the
transmission channel.
RD Recursion Desired - this bit may be set in a query and
is copied into the response. If RD is set, it directs
the name server to pursue the query recursively.
Recursive query support is optional.
RA Recursion Available - this be is set or cleared in a
response, and denotes whether recursive query support is
available in the name server.
Z Reserved for future use. Must be zero in all queries
and responses.
RCODE Response code - this 4 bit field is set as part of
responses. The values have the following
interpretation:
0 No error condition
1 Format error - The name server was
unable to interpret the query.
2 Server failure - The name server was
unable to process this query due to a
problem with the name server.
3 Name Error - Meaningful only for
responses from an authoritative name
server, this code signifies that the
domain name referenced in the query does
not exist.
4 Not Implemented - The name server does
not support the requested kind of query.
5 Refused - The name server refuses to
perform the specified operation for
policy reasons. For example, a name
server may not wish to provide the
information to the particular requester,
or a name server may not wish to perform
a particular operation (e.g., zone
transfer) for particular data.
6-15 Reserved for future use.
QDCOUNT an unsigned 16 bit integer specifying the number of
entries in the question section.
ANCOUNT an unsigned 16 bit integer specifying the number of
resource records in the answer section.
NSCOUNT an unsigned 16 bit integer specifying the number of name
server resource records in the authority records
section.
ARCOUNT an unsigned 16 bit integer specifying the number of
resource records in the additional records section.
Question section format
The question section is used to carry the "question" in most queries, i.e., the parameters that define what is being asked. Format:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| |
/ QNAME /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QTYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QCLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
where:
QNAME a domain name represented as a sequence of labels, where
each label consists of a length octet followed by that
number of octets. The domain name terminates with the
zero length octet for the null label of the root. Note
that this field may be an odd number of octets; no
padding is used.
QTYPE a two octet code which specifies the type of the query.
The values for this field include all codes valid for a
TYPE field, together with some more general codes which
can match more than one type of RR.
QCLASS a two octet code that specifies the class of the query.
For example, the QCLASS field is IN for the Internet.
Resource record format
The answer, authority, and additional sections all share the same format: a variable number of resource records, where the number of records is specified in the corresponding count field in the header. Each resource record has the following format:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| |
/ /
/ NAME /
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| TYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| CLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| TTL |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| RDLENGTH |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
/ RDATA /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
where:
NAME a domain name to which this resource record pertains.
TYPE two octets containing one of the RR type codes. This
field specifies the meaning of the data in the RDATA
field.
CLASS two octets which specify the class of the data in the
RDATA field.
TTL a 32 bit unsigned integer that specifies the time
interval (in seconds) that the resource record may be
cached before it should be discarded. Zero values are
interpreted to mean that the RR can only be used for the
transaction in progress, and should not be cached.
RDLENGTH an unsigned 16 bit integer that specifies the length in
octets of the RDATA field.
RDATA a variable length string of octets that describes the
resource. The format of this information varies
according to the TYPE and CLASS of the resource record.
For example, the if the TYPE is A and the CLASS is IN,
the RDATA field is a 4 octet ARPA Internet address.
Now that we are well versed with th Domain Name System, let's look at ways it can be hacked.
DNS Security Threats
Typosquatting
This is the most common threat concerning domain names. It is based on tricking the human operating the computer system, rather than hacking the system itself.
It is the practice of registering a domain name that is confusingly similar to an existing popular domain name and fooling the user into mistaking it for the real one.
Any unaware user would end up mistaking the fake link as legitimate and falling victim to a phishing attack. So always double-check the links you encounter in the wild before you trust or click them.
Caution
This type of attack is quite common, and even some of my friends have fallen for it. Always verify any link before clicking on it, especially those you receive in emails or text messages.
DNS Cache poisoning
This is an actual software vulnerability that takes advantage of how DNS records are cached. If the DNS server or cache is poorly configured, it can result in an attacker being able to change or inject malicious addresses into the cache, essentially “poisoning” it.
The attacker could change the address of a legitimate website to one controlled by the attacker. This would result in the user being unaware of visiting a fake website even after visiting the correct domain name. If the attacker's website looks similar enough to the legitimate website (which is quite easy to pull off), the user may end up submitting sensitive information such as login info to the attacker.
Security measures against this attack include properly configuring domain systems and implementing a protocol known as DNSSEC, which adds a digital signature to a domain name. This means that browsers and ISPs are able to validate that the DNS information they receive is authentic, rendering most cache poisoning attacks obsolete.
Data exfiltration
This is not a threat to DNS, but rather a technique used by adversaries to siphon out data from a compromised system/organization. Attackers know that transferring data from an organization they have hacked by usual means such as HTTP or FTP messages might trigger detection systems, but DNS packets, usually considered harmless, could be left unchecked by detection systems. Hence, they use DNS messages to transfer data out of the organization and potentially steal sensitive records.
How such an attack might take place:
- The attacker registers the domain name ZG5ZC2VJDXJPDHKK.COM, and sets up the name server NS1.ZG5ZC2VJDXJPDHKK.COM
- The infected client encodes stolen information, in this case, the text “Pa$$w0rd”, into “UGEKJHCWCMQK”
- The client makes the DNS query for the domain with the encoded password as a subdomain: UGEKJHCWCMQK.ZG5ZC2VJDXJPDHKK.COM
- A recursive name server finds the authoritative name server NS1.ZG5ZC2VJDXJPDHKK.COM and sends the query there.
- The attacker recognizes the subdomain value as the encoded password. The attacker decodes the information UGEKJHCWCMQK back to recover “Pa$$w0rd”.
An easy way to prevent such data exfiltration is to also check the DNS queries made for any suspicious activity, such as the legitimacy of the domains they are made to.
This is almost all the information you'll ever need on the Domain Name System, the protocol that made the modern web possible.
Tip
If you still wish to learn more about the topic, Cloudflare has excellently covered it in depth in their blog series, and most of my knowledge comes from there itself.
That's all for today. I know this was a lot of information to cover in a single blog post, but I hope you found at least some of it useful. I also hope you found the topic as fascinating as me and that I was able to do justice to it.
Note
As a project, I have created an Authoritative DNS Name Server from scratch written in Python. You can check it out here if you wish.
See you soon, hopefully in some other blog 🙂.