MongoDB Clustering 101

Overview

MongoDB is a NoSQL database that focuses more on collections and documentations rather than the traditional tables/databases. There are several reasons you might be interested in running a MongoDB “at-scale” as the kids say:

  • Given its Javascript-style notion it significantly reduces the overhead of your Object Relational Mapping (ORM) created at the language level especially you use a Java-esque language such as NodeJS or Python.
  • The native clustering and sharding of MongoDB is also pretty trivial when compared to RDBMS systems so if you have a need for either of those behaviors MongoDB provides an easy way to accomplish that end.
  • The schema-less structure of MongoDB also gets programmers out of trying to represent their data using relational representation and storing/querying it using a model more intuitive for programmers (i.e no table joins).
  • MongoDB performance is actually pretty good when compared to more traditional RDBMS systems.

For an example of how intuitive MongoDB can be from a Java-inspired language take this example which uses PyMongo:

from pymongo import MongoClient

connection = MongoClient('mongodb://joel:password@db01')
inventory = connection.joeldb.inventory

item = {"author": "Joel"}
newInsert = inventory.insert_one(item).inserted_id
print("New Insert: %s" % newInsert)

Which is pretty neat. With a tiny bit of thought given to object and variable names, your code could likely be read by someone with little to no actual Python or MongoDB knowledge.

Just a few caveats to MongoDB clustering:

  • Due to the lack of Multi-Master cluster replication (such as exists with MySQL Galera clusters) there are several issues introduced:
    • Your application should be primarily read-heavy. This is because reads are the only operation all nodes in the cluster will be able to perform and therefore the only workload that can be safely spread across the cluster by a load balancer.
    • You need to have enough control over the application to ensure that read operations and write/update operations go to different servers or VIP’s.
    • You’ll have to reconfigure MongoDB priority for members so that they match the load balancer’s priorities. This could be problematic (but solvable) if you need to be able to spin up arbitrary numbers of MongoDB instances and then scale them down when the workload subsides.
  • In recent history MongoDB has had a flurry of security issues that has resulted in a somewhat earned reputation for lackluster security controls. This is still widely considered true even though most of the security flaws have been addressed in recent years (many people take a while to get over something like that).
  • RDBMS have a lot longer history and better ACID compliance (though this could be changing soon). If your application relies on many reads and has high standards for data consistency you may not be able to get around RDBMS or at the very least MongoDB may not be the NoSQL DBMS for you.

Given the above I leave the question to the reader of this article whether or not MongoDB makes sense in their environment and for their project.

Establishing The Cluster

Installation of MongoDB

OK let’s assume we have three Ubuntu 18.04 VM’s with 1GB of memory, two CPU cores and 20GB of storage. Not exactly killing it in the resource department but that’s more than enough to run a three node cluster. All that’s required beyond those hardware requirements is that any hostnames you plan on using being resolvable on each node.

First things first, on each node trust MongoDB’s GPG key and configure the upstream MongoDB repository:

root@db01:~# apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
Executing: /tmp/apt-key-gpghome.Vuyi6iGPjw/gpg.1.sh --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
gpg: key 68818C72E52529D4: public key "MongoDB 4.0 Release Signing Key <packaging@mongodb.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1
root@db01:~# echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list
deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse
root@db01:~# apt-get update
[...snip...]
Ign:4 https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 InRelease                                                                      
Get:5 https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 Release [2,029 B]   
Get:6 https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 Release.gpg [801 B]            
[...snip...]
Get:17 https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0/multiverse amd64 Packages [2,129 B]             
[....snip...]
Fetched 1,457 kB in 5s (303 kB/s)                                                            
Reading package lists... Done

Next install the mongodb-org package:

apt-get install -y mongodb-org

We now have to change the contents of /etc/mongod.conf to something similar to:

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true

systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

net:
  bindIp: 0.0.0.0

processManagement:
  timeZoneInfo: /usr/share/zoneinfo

replication:
  replSetName: testcluster

Pointing out two important bits from the configuration:

  • I’ve changed bindIp from its default of 127.0.0.1 so that it will listen on all available interfaces for TCP connections from the other nodes.
  • I’ve set replication.replSetName so that each node will see itself as belonging to a cluster called testcluster

Once the configuration is updated, restart the database with systemctl restart mongod.

Replication Sets Configured

Now that mongod is running on each node and each can access the others over the network, we need to bootstrap the actual cluster. We do this by going into the mongo shell (which has no authentication by default) and invoking rs.initialize() enumerating all the other members in the cluster:

root@db01:~# mongo
MongoDB shell version v4.0.1
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.1
Server has startup warnings:
2018-08-20T20:01:47.600+0000 I STORAGE  [initandlisten]
2018-08-20T20:01:47.600+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-08-20T20:01:47.600+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-08-20T20:01:50.531+0000 I CONTROL  [initandlisten]
2018-08-20T20:01:50.531+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-08-20T20:01:50.531+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-08-20T20:01:50.531+0000 I CONTROL  [initandlisten]
---
Enable MongoDB's free cloud-based monitoring service, which will then receive and display
metrics about your deployment (disk utilization, CPU, operation statistics, etc).

The monitoring data will be available on a MongoDB website with a unique URL accessible to you
and anyone you share the URL with. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.

To enable free monitoring, run the following command: db.enableFreeMonitoring()
To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---

> rs.initiate(
...   {
...     _id : "testcluster",
...     members: [
...       { _id : 0, host : "db01:27017" },
...       { _id : 1, host : "db02:27017" },
...       { _id : 2, host : "db03:27017" }
...     ]
...   }
... )
{
        "ok" : 1,
        "operationTime" : Timestamp(1534795606, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1534795606, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}

Judging from the ok field in the JSON output being set to 1 our cluster has bootstrapped successfully and after enough time for the election to have taken place the mongo prompt should be rewritten from a simple > to a prompt that represents both the cluster name and this node’s status within the cluster such as: testcluster:PRIMARY>.

Congratulations, you’re now the proud owner of a MongoDB cluster. You can verify by inserting a document the primary node and then pulling it up on a secondary node:

testcluster:PRIMARY> db.inventory.insertOne(
... { item: "canvas", qty: 100, tags: ["cotton"], size: { h: 28, w: 35.5, uom: "cm" } }
... )
{
"acknowledged" : true,
"insertedId" : ObjectId("5b7d8bb38a7ce17ba8e7b3b2")
}

[...]

testcluster:SECONDARY> rs.slaveOk()
testcluster:SECONDARY> db.inventory.find( {} )
{ "_id" : ObjectId("5b7b372cfdbae13d4397081a"), "item" : "canvas1", "qty" : 100, "tags" : [ "cotton" ], "size" : { "h" : 28, "w" : 35.5, "uom" : "cm" } }
{ "_id" : ObjectId("5b7b3747a19159b98f1b15b3"), "item" : "canvas1", "qty" : 100, "tags" : [ "cotton" ], "size" : { "h" : 28, "w" : 35.5, "uom" : "cm" } }
{ "_id" : ObjectId("5b7d8bb38a7ce17ba8e7b3b2"), "item" : "canvas", "qty" : 100, "tags" : [ "cotton" ], "size" : { "h" : 28, "w" : 35.5, "uom" : "cm" } }

Please note that in the above I had to use rs.slaveOk() to enable the secondary node to read data.

And by using db.serverStatus().repl.hosts to enumerate the members of the cluster:

testcluster:SECONDARY> db.serverStatus().repl.hosts
[ "db01:27017", "db02:27017", "db03:27017" ]

Sharding

Overview

Sharding is the means by which MongoDB attempts to solve the “scale out” problem with database capacity. There comes a time in the life of large applications when capacity needs to be added or removed dynamically without disrupting on-going use of the product. Most database systems accomplish this by establishing a multi-master cluster where a load balancer such as haproxy or nginx distributes load amongst all the members of the cluster and adding or removing capacity is as simple as adding or removing a new node to the cluster and updating the load balancer appropriately.

With MongoDB, though the expectation is that additional capacity is added by creating new data partitions called “shards” that house separate tranches of data and can then be queried in parallel to other shards as needed. The belief is that this improved concurrency will yield better scalability than having multiple somewhat independent database servers that replicate changes once the work of replication is taken into consideration. This is the same general architectural concept as with Oracle GRID computing.

MongoDB requires three components:

  • The individual shard servers (the database servers that will house the actual data). These can be standalone servers or can be replica sets. In either case they’re identical to their non-sharded analogs above except they have a clusterRole assigned to them of shardsvr to indicate that these servers/sets store shards.
  • A replica set storing only the distribution of shards, this is referred to as the “configuration server” component.
  • One or more mongos components which will act as a gateway to the data stored on the shard servers query the configuration server, querying the above configuration server as needed. If you need to use a load balancer as an ingress to a MongoDB cluster it’s probably best to treat mongos as the application server and let it hide the complexity of the cluster behind it. In this guide though for simplicity I’ll be putting on mongos on the same server as the configuration server, just know that this is ullustrative of the architecture but not what you would do in production.

Configuration Replica Set Established

The /etc/mongod.conf configuration file:

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true

systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

net:
  port: 27017
  bindIp: 0.0.0.0

processManagement:
  timeZoneInfo: /usr/share/zoneinfo

sharding:
  clusterRole: configsvr

replication:
  replSetName: configset

Breaking down the new bits:

  • We introduced a sharding section, this section is always very simple and in this case just specifies that this mongod instance will be the configuration server. This means that it will store the sharding metadata that later mongos clients will query to locate requested data.
  • We change the replSetName to configset just for clarity. It really can be anything. Just as long as it’s unique and matches what the mongos clients will attempt to look for.

The change from regular database server to “configuration server” has caused the port mongod listens on to change to 27019 therefore to begin bootstraping the configuration replica set we enter the database shell via mongo --port 27019. Then we bootstrap the configuration replica set the same as any other:

> rs.initiate(
...   {
...     _id: "configset",
...     configsvr: true,
...     members: [
...       { _id : 0, host : "db01:27019" },
...     ]
...   }
... )
{
        "ok" : 1,
        "operationTime" : Timestamp(1534976343, 1),
        "$gleStats" : {
                "lastOpTime" : Timestamp(1534976343, 1),
                "electionId" : ObjectId("000000000000000000000000")
        },
        "lastCommittedOpTime" : Timestamp(0, 0),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1534976343, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
configset:SECONDARY>

You’ll notice that I had to specify the port in the host field but otherwise I’ve just established a regular replica set called clusterset.

Shard Servers Configured

Now we need to configure a place for our data to actually live (aka the shard servers). You can either store shards in replica sets or standalone database servers. The latter provides the simplest setup (no additional replica set) but also has the drawback of losing the availability guarantees that a replica set provides. For example, you can establish a replica set in two data centers, with config replicas and shard replicas in backup mode should the primaries in the other data center fail. In general, if you’re at the point where you need to worry about optimizing I/O you should probably also be worried about availability so replicating to standalone servers should be considered a minority concern.

We now need to establish an application replica set. To that end let’s use this for /etc/mongod.conf:

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true

systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

processManagement:
  timeZoneInfo: /usr/share/zoneinfo

sharding:
   clusterRole: shardsvr

replication:
   replSetName: appcluster

net:
   bindIp: 0.0.0.0

The only new bit in that configuration should be the sharding section which specifies the clusterRole as shardsvr. Since it’s a shard server it will go from listening on the default port of 27017 and onto the port 27108 and so we’ll have to enter the mongo shell by executing mongo --port 27018. We then bootstrap the new replica set just like the last one:

> rs.initiate(
...   {
...     _id : "appcluster",
...     members: [
...       { _id : 0, host : "db02:27018" },
...       { _id : 1, host : "db03:27018" }
...     ]
...   }
... )
{
        "ok" : 1,
        "operationTime" : Timestamp(1535062208, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1535062208, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
appcluster:SECONDARY>

At which point both servers will be ready to receive data from a mongos frontend.

Database Routing Configured

Now that we have a the shard replica set available to receive data and a configuration replica set to coordinate the process, the only thing left is a mongos instance for tying the two together into a complete solution. As mentioned before, we’ll be putting mongos on the same server as our single server configuration replica set (on db01 specifically) but in production these would likely be ran separately either on their own VM’s or in their own containers.

The mongos executable should be available from our server installation, but there’s no systemd service available to start it on our db01 VM and there’s no configuration file present either.

Luckily the configuration for mongos is pretty straight foward, you just need to point it to the configuration server and make sure it’s listening on an IP where clients have access. An example configuration might be:

sharding:
  configDB: configset/localhost:27019

net:
  bindIp: 0.0.0.0

Which just points it to the configuration server, which is where it will pull all other configuration data from. Next we add a system service since this will be running on a VM. I created a file at /etc/systemd/system/mongos.service:

[Unit]
Description=mongos Database Router
After=mongod.service
Documentation=https://docs.mongodb.org/manual

[Service]
User=mongodb
Group=mongodb
ExecStart=/usr/bin/mongos --config /etc/mongos.conf
PIDFile=/var/run/mongodb/mongos.pid

[Install]
WantedBy=multi-user.target

After that issuing a systemctl daemon-reload should make systemd aware of the new service and we can then enable the service then start it up:

root@db01:~# systemctl daemon-reload
root@db01:~# systemctl enable mongos
Created symlink /etc/systemd/system/multi-user.target.wants/mongos.service → /etc/systemd/system/mongos.service.
root@db01:~# systemctl start mongos
root@db01:~# netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:27017           0.0.0.0:*               LISTEN      1348/mongos         
tcp        0      0 0.0.0.0:27019           0.0.0.0:*               LISTEN      606/mongod          
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      551/systemd-resolve 
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      698/sshd            
tcp6       0      0 :::22                   :::*                    LISTEN      698/sshd

From the above you can see that mongos is now listening on the default MongoDB port of 27017 and ready to accept connections.

Shards Added to Configuration Server

Now that all the components are installed and running, with mongos pointed at the configuration server, we now need to update the configuration server to point to the shard server which will finally bring all this together.

mongos> sh.addShard( "appcluster/db03:27018")
{
        "shardAdded" : "appcluster",
        "ok" : 1,
        "operationTime" : Timestamp(1535063301, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1535063301, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}

From the above, even though db02 was the current primary in my cluster I gave it the db03 server to illustrate that it will automatically locate the primary for us. Now that we have a sharded cluster and mongos gateway, lets create a sharded database with some data to finally make this real:

mongos> sh.enableSharding("rascaldev")
{
        "ok" : 1,
        "operationTime" : Timestamp(1535063507, 3),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1535063507, 3),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> use rascaldev
switched to db rascaldev
mongos> db.createCollection("testCollection")
{
        "ok" : 1,
        "operationTime" : Timestamp(1535063544, 2),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1535063544, 2),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.testCollection.insertOne({ _name: "something1" })
{
        "acknowledged" : true,
        "insertedId" : ObjectId("5b7f3614f57f15fdc41d8541")
}
mongos> db.testCollection.insertOne({ _name: "something2" })
{
        "acknowledged" : true,
        "insertedId" : ObjectId("5b7f3618f57f15fdc41d8542")
}
mongos> db.testCollection.insertOne({ _name: "something3" })
{
        "acknowledged" : true,
        "insertedId" : ObjectId("5b7f361af57f15fdc41d8543")
}
mongos> db.testCollection.ensureIndex({_name: "hashed"})
{
        "raw" : {
                "appcluster/db02:27018,db03:27018" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 1,
                        "numIndexesAfter" : 2,
                        "ok" : 1
                }
        },
        "ok" : 1,
        "operationTime" : Timestamp(1535064626, 2),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1535064626, 2),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> sh.shardCollection("rascaldev.testCollection", { "_name": "hashed" } )
{
        "collectionsharded" : "rascaldev.testCollection",
        "collectionUUID" : UUID("1acfe577-c0f1-4cd8-9c89-bd638a4997b2"),
        "ok" : 1,
        "operationTime" : Timestamp(1535064653, 11),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1535064653, 11),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}

My actions above should be pretty obvious if you already know how to use MongoDB:

  1. I create a database called rascaldev and create a collection called testCollection, enabling sharding on the database
  2. I inserted arbitrary data so we had something to shard.
  3. I think create an index on the one field the documents in the collection have (name “_name“), this index will generate a hash for each document.
  4. I then used sh.shardCollection to shard this existing collection based on the hash present in the _name index. When new shards are added these hashes will then be assigned to a particular shard with a record of the assignment made in the configuration server.

So we now have a sharded database. The replica set on the shard servers will ensure database availability, and the mongos service will do the work of routing the queries to whichever shard requires it. If we need the gateway to be highly available, we should be able to frontend several mongos instances with a standard load balancer.

To view the shards configured in a particular cluster, you can use sh.status(). For example:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5b7de15bb1547fbbe057cfee")
  }
  shards:
        {  "_id" : "appcluster",  "host" : "appcluster/db02:27018,db03:27018",  "state" : 1 }
  active mongoses:
        "4.0.1" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                appcluster      1
                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : appcluster Timestamp(1, 0) 
        {  "_id" : "rascaldev",  "primary" : "appcluster",  "partitioned" : true,  "version" : {  "uuid" : UUID("19df1ed8-0248-4e37-b7fb-27925bdfe4ec"),  "lastMod" : 1 } }
                rascaldev.testCollection
                        shard key: { "_name" : "hashed" }
                        unique: false
                        balancing: true
                        chunks:
                                appcluster      1
                        { "_name" : { "$minKey" : 1 } } -->> { "_name" : { "$maxKey" : 1 } } on : appcluster Timestamp(1, 0)

Performance

Long Running Query Killed

Depending on application quality and unforeseeable events, you may run into a situation where a database table is locked or for some other reason you need to kill a long running operation. To do that you can use db.currentOp().inprog

testcluster:PRIMARY> db.currentOp().inprog.forEach(function(oper) {                                                                                                                            
...   if(oper.secs_running > 5)                                                                                                                                                                
...     print(oper.secs_running, oper.opid);                                                                                                                                                   
... })                                                                                                                                                                                         
NumberLong(363) 3040                                                                                                                                                                           
testcluster:PRIMARY> db.currentOp().inprog.forEach(function(oper) {                                                                                                                            
...   if(oper.opid == 3040)                                                                                                                                                                    
...     printjson(oper);                                                                                                                                                                       
... })                                                                                                                                                                                         
{                                                                                                                                                                                              
        "host" : "db01:27017",                                                                                                                                                                 
        "desc" : "conn37",                                                                                                                                                                     
        "connectionId" : 37,                                                                                                                                                                   
        "client" : "127.0.0.1:45510",                                                                                                                                                          
        "appName" : "MongoDB Shell",  
[...snip...]
testcluster:PRIMARY> db.killOp(3040)
{
        "info" : "attempting to kill op",
        "ok" : 1,
        "operationTime" : Timestamp(1534957455, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1534957455, 1),
                "signature" : {
                        "hash" : BinData(0,"+BQ3Qow4kri3n+7T0ykreJiBo8E="),
                        "keyId" : NumberLong("6591896989649076225")
                }
        }
}

The above is abbreviated output at the snip but shows the normal workflow for long running jobs:

  1. Locate which operations seem to be lasting a long time
  2. Examine each operation to determine if it can be safely killed or if there’s an obvious problem preventing it from completing (either successfully or not)
  3. If it isn’t possible for the operation to ever complete you then use db.killOp(<opid>) to end its long and tragic life.

Sharding Strategies

Previously we discussed adding additional compute and storage capacity by sharding your data across several nodes. Let’s dig a little deeper. In the sharding example I presented before, I just used the “hash” method of sharding without really explaining what it was.

In this method, to satisfy the index MongoDB generates a hash for the given field. Then with { _name: 'hashed' }I’m telling MongoDB that data will be sharded according to the value of this hash. When the collection is sharded (or a new shard is added to the cluster) each shard is allocated a uniformly large “chunk” of the collection data according to the numeric value of the hash. For example: “hashes between this and this value cause the document to go into chunk 1 which is on the db02 shard”, etc.

Since this process is rather deterministic and splits documents according to their hash in a known way it’s actually pretty trivial for mongos to determine which mongod instance will house the data when you go to update or retrieve it. If you’re matching according to the indexed field, it just hashes your comparison and looks at the mongod instance that would house a document with the given value and queries only that node. In so doing it actually splits up the load across multiple nodes which is why sharding adds additional capacity to your cluster.

This presents another problem though, different documents are going to be queried at different rates and if you’re directing traffic based on a field hash then you could very well end up with a cluster with five mongod instances where two of the instances are being absolutely pegged and the other three are just more or less cold storage. Since it’s hard to tell which field values will generate hashes that just happen to be near each other, it would be helpful to specify a means of sharding according to data that you can easily see. This is where range sharding comes into play.

The process for using ranged sharding is almost exactly the same though except when you get to the shardCollection stage you execute something similar to:

mongos> sh.shardCollection("<database>.<collection>", { <shard key> : <direction> } )

Where <direction> is either 1 or -1 depending on which direction you want the documents ordered in.

There’s obviously more that can go into sharding but in the interests of keeping this the “Clustering 101” (like the title states) I’ll just mention this possibility and move on.

Brief Note About Reading/Writing To a Replica Set

By its nature a MongoDB replica set is what’s referred to as an “active-passive” cluster. Meaning there is one node that is actively servicing requests while the others sit standby taking over should the current primary node go out of commission. Per the official documentation this node is supposed to be used for all traffic, with no traffic going to the secondary nodes unless you’re doing something particular.

Additional Security

Cluster Components Secured With Key Files

OK so now our users are being authenticated, their access restricted as appropriate, and the conversations they’re having with the cluster are protected from prying eyes. Now we need to secure the cluster components themselves. Going purely by defaults, the only real requirement for an attacker to join a MongoDB cluster is that they know what the cluster name should be and you can get that from just snooping on the packets. Enter keyfile-based authentication. With key files, you can supply a string of non-whitespace characters (from the base64-compatible character classes).

For example on each node we can issue the follow commands:

root@db02:~# echo praise | base64 > /etc/mongod.key 
root@db02:~# chown mongodb /etc/mongod.key 
root@db02:~# chmod 0400 /etc/mongod.key 
root@db02:~#

Which creates a new key based on the base64 value of the word praise and deposits it at /etc/mongod.key and then securing it to be read-only for the mongodb user with no other users or permissions being allowed for this file.

We can then configure MongoDB to use the key file by adding security.keyFile to the /etc/mongod.conf configuration file:

security:
  keyFile: /etc/mongod.key

After restarting each of these nodes, this key will then be used to form a valid response to a one-time challenge, thus proving it’s supposed to join the cluster it’s targeting.

User Security In A Sharded Cluster

Now that the components have been secured we can progress onto user security. This may seem counter intuitive ordering but it’s required by the security model MongoDB follows: You control which hosts can join the cluster and only then may you begin enabling user access controls. This is because in mongoDB “authorization” is for everyone, including the servers themselves.

If we enable user access control at this point, we’ll be locked out of administrative functions since he haven’t created an authenticated account and our current administrator account will go away once user access control is enabled. So taking the cluster as it stands at the end of the last section, let’s first create a new administrative user:

mongos> db.createUser(                                                                                                                                                                         
...   {                                                                                                                                                                                        
...     user: "rascaldev",                                                                                                                                                                     
...     pwd: "password",                                                                                                                                                                       
...     roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]                                                                                                                               
...   }                                                                                                                                                                                        
... )                                                                                                                                                                                          
Successfully added user: {                                                                                                                                                                     
        "user" : "rascaldev",                                                                                                                                                                  
        "roles" : [                                                                                                                                                                            
                {                                                                                                                                                                              
                        "role" : "userAdminAnyDatabase",                                                                                                                                       
                        "db" : "admin"                                                                                                                                                         
                }                                                                                                                                                                              
        ]                                                                                                                                                                                      
}

Once this exists in the cluster, we can then enable user access by adding the following clause to the /etc/mongod.conf for all mongod instances in our cluster:

security:
  keyFile: /etc/mongod.key
  authorization: enabled

This obviously replaces any existing security section you have in place. Now that authorization has been enabled lets create two different users with access to two separate databases:

db.createUser(
  {
    user: "rascaldev",
    pwd: "password",
    roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
  }
)

mongos> use rascaldev                                                                                                                                                                              
switched to db rascaldev 

mongos> db.createUser({
... user: "joel",
... pwd: "password",
... roles: [ "readWrite" ]
... })
Successfully added user: { "user" : "joel", "roles" : [ "dbAdmin" ] }

mongos> db.createCollection("testCollection")
{
        "ok" : 1,
        "operationTime" : Timestamp(1536000699, 8),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1536000699, 8),
                "signature" : {
                        "hash" : BinData(0,"sObI408ALwg8KjbsjiUSsNlf/kg="),
                        "keyId" : NumberLong("6597041394102042629")
                }
        }
}

mongos> use newDB                                                                                                                                                                              
switched to db newDB 

mongos> db.createUser({
... user: "otherUser",
... pwd: "password",
... roles: [ "readWrite" ]
... })
Successfully added user: { "user" : "otherUser", "roles" : [ "readWrite" ] }

The first user is called joel and functions as a dbAdmin as well as readWrite access for the rascaldev database while the second is otherUser which merely has readWrite privileges to a separate newDB. Both these users will have readWrite only for the databases we were in when we created them.

A full description of MongoDB access controls is probably out of scope for this article (which is centered on clustering) but in general the built-in RBAC should suffice for most people.

Client Connections Secured With x509 Certificates

OK, so now that we’ve established some level of control over which servers can join the cluster and the amount of access each user has. We now need to put more controls in place to ensure the medium they’re communicating over (i.e the network) can be trusted. To accomplish this we can use MongoDB’s native TLS/SSL support and communication with mongos will be secured in the same manner as a web page might be when you go to make a purchase or manage your banking account.

First, let’s secure the mongos instance frontending our database cluster by modifying the net section in /etc/mongos.conf to be the following:

net:
  ssl:
    mode: allowSSL
    PEMKeyFile: /etc/ssl/mongodb.pem

In the above we’re basically doing two things:

  1. Setting the SSL mode to allowSSL permits client connections to use SSL if they request it without requiring cluster components to use SSL. Since we haven’t configured the mongod instances yet we need this setting.
  2. We’re pointing to a PEM encoded certificate paired with PEM encoded private key at /etc/ssl/mongodb.pem

Before restarting mongos though we need to create the referenced .pem file. You can create this a variety of ways but one way would be generating a self-signed certificate:

root@db01:~# openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:4096 -keyout mongodb-cert.key -out mongodb-cert.crt \
> -subj '/C=US/ST=NorthCarolina/L=Durham/O=rascaldev/OU=FeverDream/CN=db01/emailAddress=rhjoel84@gmail.com'
Generating a 4096 bit RSA private key
...............................................................................................................................................................................................
.............................................................................................................................++
........................................++
writing new private key to 'mongodb-cert.key'
-----
root@db01:~# cat mongodb-cert.key mongodb-cert.crt > /etc/ssl/mongodb.pem 
root@db01:~#

This certificate will then be delivered to clients attempting to connect using SSL. Since it’s client visible you may choose to use an actual certificate authority so that clients are able to validate it. If you’re load balancing amongst several different mongos instances it may be desirable to make sure it’s the same certificate for each instance with the “common name” set to the VIP’s hostname.

Once all components in the cluster have an SSL certificate available, you’ll probably want to change the SSL mode from allowSSL to preferSSL which will cause internal communication to cut over to SSL as well.

Cluster Components Secured With x509 Certificates

OK so controlling who can join the cluster with a pre-shared key is nice and all but it would be nice if each component in the cluster didn’t require identical credentials. Identical credentials create two situations undesirable from a security standpoint:

  1. A pre-shared key can be used an arbitrary number of times. Meaning if one server is compromised, the credentials required for joining 1,000 other servers to the cluster has also been attained. In that scenario, if an attacker is able to establish a mongod instance on your network, they can join the cluster and mine any data replicated to it.
  2. It’s impossible to revoke a particular node’s access without updating the shared key for the entire cluster which is an operation with a high likelihood to cause downtime.

Neither of which is good. Enter SSL-based authentication of cluster components. With MongoDB’s native SSL support you can not only require SSL for communication like we did did the previous section but you can also validate that your peers are indeed who they say they are. The way you do this is by operating a private certificate authority and issuing certificates for the individual cluster members.

Running a private CA is a bit out of scope for this article (and I’ve covered it elsewhere) but for each server in the cluster, let’s assume you have the certificate files (CA, public PEM certificate, and private PEM key) all saved in /etc/ssl/mongodbCerts for example this directory structure:

root@db01:/etc/ssl/mongodbCerts# find . -type f
./db01/db01.key
./db01/db01.pem
./ca/ca.key
./ca/ca.pem
./db02/db02.key
./db02/db02.pem
./db03/db03.key
./db03/db03.pem

For clarity let’s assume this directory structure exists on all nodes in my sharded cluster and that all .pem files have their respective private keys appended to the end. Once the certs are present on each system, you would first edit /etc/mongos.conf to be something similar to:

sharding:
  configDB: config/db01:27019

security:
  clusterAuthMode: x509
  keyFile: /etc/mongod.key

net:
  ssl:
    mode: preferSSL
    PEMKeyFile: /etc/ssl/mongodb.pem
    CAFile: /etc/ssl/mongodbCerts/ca/ca.pem
    clusterFile: /etc/ssl/mongodbCerts/db01/db01.pem

Once the mongos on db01 is reconfigured, we can reconfigure the configuration server on the same VM by editing /etc/mongod.conf with a similar configuration. For instance the config server that’s also on db01 could be:

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true

systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

sharding:
  clusterRole: configsvr

security:
  clusterAuthMode: x509

net:
  bindIp: 0.0.0.0
  ssl:
    mode: requireSSL
    PEMKeyFile: /etc/ssl/mongodb.pem
    clusterFile: /etc/ssl/mongodbCerts/db01/db01.pem
    CAFile: /etc/ssl/mongodbCerts/ca/ca.pem

replication:
  replSetName: config

processManagement:
  timeZoneInfo: /usr/share/zoneinfo

Breaking down the changes:

  • In security we’ve gotten rid of the keyFile definition and have instead set clusterAuthMode from the default of keyfile and to x509 which instructs mongod
  • Compared to the mongos configuration we’ve changed some stuff in net.ssl as well:
    • The SSL mod has been changed from preferSSL (where SSL is only required for server components) to requireSSL which forces all connections, even with supposed clients, to provide a valid SSL certificate so that each end of the connection successfully authenticates to the other. This appears to be done on a hostname basis, so testing client access to the mongos instance running on db01 should happen either from your host system or a VM unrelated to the cluster.
    • We provide a PEMKeyFile certificate that will be used for encrypting transport traffic. This file should be the certificate followed by the PEM encoded private key.
    • We then provide a separate clusterFile certificate which is used for the initial authentication when a system is attempting to join the cluster. Without this line, MongoDB would just use PEMKeyFile for cluster authentication instead. This file should be the certificate followed by the PEM encoded private key.
    • For SSL authentication to work, we need to be able to verify that the certificate the other side is giving is a certificate we care about the authenticity of. Both the certificates in my example are signed by the same CA but this needn’t be the case.

Your actual shard servers will have similar net and security section changes, each pointing to its respective certificate files (which, you guessed it, should be the certificate followed by the PEM encoded private key).

At this point, your cluster should be sharded, highly available, with the cluster components secured and client communication successfully encrypted.

Further Reading