Overview
MongoDB is a NoSQL database that focuses more on collections and documentations rather than the traditional tables/databases. There are several reasons you might be interested in running a MongoDB “at-scale” as the kids say:
- Given its Javascript-style notion it significantly reduces the overhead of your Object Relational Mapping (ORM) created at the language level especially you use a Java-esque language such as NodeJS or Python.
- The native clustering and sharding of MongoDB is also pretty trivial when compared to RDBMS systems so if you have a need for either of those behaviors MongoDB provides an easy way to accomplish that end.
- The schema-less structure of MongoDB also gets programmers out of trying to represent their data using relational representation and storing/querying it using a model more intuitive for programmers (i.e no table joins).
- MongoDB performance is actually pretty good when compared to more traditional RDBMS systems.
For an example of how intuitive MongoDB can be from a Java-inspired language take this example which uses PyMongo:
from pymongo import MongoClient connection = MongoClient('mongodb://joel:password@db01') inventory = connection.joeldb.inventory item = {"author": "Joel"} newInsert = inventory.insert_one(item).inserted_id print("New Insert: %s" % newInsert)
Which is pretty neat. With a tiny bit of thought given to object and variable names, your code could likely be read by someone with little to no actual Python or MongoDB knowledge.
Just a few caveats to MongoDB clustering:
- Due to the lack of Multi-Master cluster replication (such as exists with MySQL Galera clusters) there are several issues introduced:
- Your application should be primarily read-heavy. This is because reads are the only operation all nodes in the cluster will be able to perform and therefore the only workload that can be safely spread across the cluster by a load balancer.
- You need to have enough control over the application to ensure that read operations and write/update operations go to different servers or VIP’s.
- You’ll have to reconfigure MongoDB priority for members so that they match the load balancer’s priorities. This could be problematic (but solvable) if you need to be able to spin up arbitrary numbers of MongoDB instances and then scale them down when the workload subsides.
- In recent history MongoDB has had a flurry of security issues that has resulted in a somewhat earned reputation for lackluster security controls. This is still widely considered true even though most of the security flaws have been addressed in recent years (many people take a while to get over something like that).
- RDBMS have a lot longer history and better ACID compliance (though this could be changing soon). If your application relies on many reads and has high standards for data consistency you may not be able to get around RDBMS or at the very least MongoDB may not be the NoSQL DBMS for you.
Given the above I leave the question to the reader of this article whether or not MongoDB makes sense in their environment and for their project.
Establishing The Cluster
Installation of MongoDB
OK let’s assume we have three Ubuntu 18.04 VM’s with 1GB of memory, two CPU cores and 20GB of storage. Not exactly killing it in the resource department but that’s more than enough to run a three node cluster. All that’s required beyond those hardware requirements is that any hostnames you plan on using being resolvable on each node.
First things first, on each node trust MongoDB’s GPG key and configure the upstream MongoDB repository:
root@db01:~# apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4 Executing: /tmp/apt-key-gpghome.Vuyi6iGPjw/gpg.1.sh --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4 gpg: key 68818C72E52529D4: public key "MongoDB 4.0 Release Signing Key <packaging@mongodb.com>" imported gpg: Total number processed: 1 gpg: imported: 1 root@db01:~# echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse root@db01:~# apt-get update [...snip...] Ign:4 https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 InRelease Get:5 https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 Release [2,029 B] Get:6 https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 Release.gpg [801 B] [...snip...] Get:17 https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0/multiverse amd64 Packages [2,129 B] [....snip...] Fetched 1,457 kB in 5s (303 kB/s) Reading package lists... Done
Next install the mongodb-org
package:
apt-get install -y mongodb-org
We now have to change the contents of /etc/mongod.conf
to something similar to:
storage: dbPath: /var/lib/mongodb journal: enabled: true systemLog: destination: file logAppend: true path: /var/log/mongodb/mongod.log net: bindIp: 0.0.0.0 processManagement: timeZoneInfo: /usr/share/zoneinfo replication: replSetName: testcluster
Pointing out two important bits from the configuration:
- I’ve changed
bindIp
from its default of127.0.0.1
so that it will listen on all available interfaces for TCP connections from the other nodes. - I’ve set
replication.replSetName
so that each node will see itself as belonging to a cluster calledtestcluster
Once the configuration is updated, restart the database with systemctl restart mongod
.
Replication Sets Configured
Now that mongod
is running on each node and each can access the others over the network, we need to bootstrap the actual cluster. We do this by going into the mongo
shell (which has no authentication by default) and invoking rs.initialize()
enumerating all the other members in the cluster:
root@db01:~# mongo MongoDB shell version v4.0.1 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 4.0.1 Server has startup warnings: 2018-08-20T20:01:47.600+0000 I STORAGE [initandlisten] 2018-08-20T20:01:47.600+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine 2018-08-20T20:01:47.600+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem 2018-08-20T20:01:50.531+0000 I CONTROL [initandlisten] 2018-08-20T20:01:50.531+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database. 2018-08-20T20:01:50.531+0000 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted. 2018-08-20T20:01:50.531+0000 I CONTROL [initandlisten] --- Enable MongoDB's free cloud-based monitoring service, which will then receive and display metrics about your deployment (disk utilization, CPU, operation statistics, etc). The monitoring data will be available on a MongoDB website with a unique URL accessible to you and anyone you share the URL with. MongoDB may use this information to make product improvements and to suggest MongoDB products and deployment options to you. To enable free monitoring, run the following command: db.enableFreeMonitoring() To permanently disable this reminder, run the following command: db.disableFreeMonitoring() --- > rs.initiate( ... { ... _id : "testcluster", ... members: [ ... { _id : 0, host : "db01:27017" }, ... { _id : 1, host : "db02:27017" }, ... { _id : 2, host : "db03:27017" } ... ] ... } ... ) { "ok" : 1, "operationTime" : Timestamp(1534795606, 1), "$clusterTime" : { "clusterTime" : Timestamp(1534795606, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } }
Judging from the ok
field in the JSON output being set to 1
our cluster has bootstrapped successfully and after enough time for the election to have taken place the mongo
prompt should be rewritten from a simple >
to a prompt that represents both the cluster name and this node’s status within the cluster such as: testcluster:PRIMARY>
.
Congratulations, you’re now the proud owner of a MongoDB cluster. You can verify by inserting a document the primary node and then pulling it up on a secondary node:
testcluster:PRIMARY> db.inventory.insertOne( ... { item: "canvas", qty: 100, tags: ["cotton"], size: { h: 28, w: 35.5, uom: "cm" } } ... ) { "acknowledged" : true, "insertedId" : ObjectId("5b7d8bb38a7ce17ba8e7b3b2") } [...] testcluster:SECONDARY> rs.slaveOk() testcluster:SECONDARY> db.inventory.find( {} ) { "_id" : ObjectId("5b7b372cfdbae13d4397081a"), "item" : "canvas1", "qty" : 100, "tags" : [ "cotton" ], "size" : { "h" : 28, "w" : 35.5, "uom" : "cm" } } { "_id" : ObjectId("5b7b3747a19159b98f1b15b3"), "item" : "canvas1", "qty" : 100, "tags" : [ "cotton" ], "size" : { "h" : 28, "w" : 35.5, "uom" : "cm" } } { "_id" : ObjectId("5b7d8bb38a7ce17ba8e7b3b2"), "item" : "canvas", "qty" : 100, "tags" : [ "cotton" ], "size" : { "h" : 28, "w" : 35.5, "uom" : "cm" } }
Please note that in the above I had to use rs.slaveOk()
to enable the secondary node to read data.
And by using db.serverStatus().repl.hosts
to enumerate the members of the cluster:
testcluster:SECONDARY> db.serverStatus().repl.hosts [ "db01:27017", "db02:27017", "db03:27017" ]
Sharding
Overview
Sharding is the means by which MongoDB attempts to solve the “scale out” problem with database capacity. There comes a time in the life of large applications when capacity needs to be added or removed dynamically without disrupting on-going use of the product. Most database systems accomplish this by establishing a multi-master cluster where a load balancer such as haproxy or nginx distributes load amongst all the members of the cluster and adding or removing capacity is as simple as adding or removing a new node to the cluster and updating the load balancer appropriately.
With MongoDB, though the expectation is that additional capacity is added by creating new data partitions called “shards” that house separate tranches of data and can then be queried in parallel to other shards as needed. The belief is that this improved concurrency will yield better scalability than having multiple somewhat independent database servers that replicate changes once the work of replication is taken into consideration. This is the same general architectural concept as with Oracle GRID computing.
MongoDB requires three components:
- The individual shard servers (the database servers that will house the actual data). These can be standalone servers or can be replica sets. In either case they’re identical to their non-sharded analogs above except they have a
clusterRole
assigned to them ofshardsvr
to indicate that these servers/sets store shards. - A replica set storing only the distribution of shards, this is referred to as the “configuration server” component.
- One or more
mongos
components which will act as a gateway to the data stored on the shard servers query the configuration server, querying the above configuration server as needed. If you need to use a load balancer as an ingress to a MongoDB cluster it’s probably best to treatmongos
as the application server and let it hide the complexity of the cluster behind it. In this guide though for simplicity I’ll be putting onmongos
on the same server as the configuration server, just know that this is ullustrative of the architecture but not what you would do in production.
Configuration Replica Set Established
The /etc/mongod.conf
configuration file:
storage: dbPath: /var/lib/mongodb journal: enabled: true systemLog: destination: file logAppend: true path: /var/log/mongodb/mongod.log net: port: 27017 bindIp: 0.0.0.0 processManagement: timeZoneInfo: /usr/share/zoneinfo sharding: clusterRole: configsvr replication: replSetName: configset
Breaking down the new bits:
- We introduced a
sharding
section, this section is always very simple and in this case just specifies that thismongod
instance will be the configuration server. This means that it will store the sharding metadata that latermongos
clients will query to locate requested data. - We change the
replSetName
toconfigset
just for clarity. It really can be anything. Just as long as it’s unique and matches what themongos
clients will attempt to look for.
The change from regular database server to “configuration server” has caused the port mongod
listens on to change to 27019
therefore to begin bootstraping the configuration replica set we enter the database shell via mongo --port 27019
. Then we bootstrap the configuration replica set the same as any other:
> rs.initiate( ... { ... _id: "configset", ... configsvr: true, ... members: [ ... { _id : 0, host : "db01:27019" }, ... ] ... } ... ) { "ok" : 1, "operationTime" : Timestamp(1534976343, 1), "$gleStats" : { "lastOpTime" : Timestamp(1534976343, 1), "electionId" : ObjectId("000000000000000000000000") }, "lastCommittedOpTime" : Timestamp(0, 0), "$clusterTime" : { "clusterTime" : Timestamp(1534976343, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } configset:SECONDARY>
You’ll notice that I had to specify the port in the host
field but otherwise I’ve just established a regular replica set called clusterset
.
Shard Servers Configured
Now we need to configure a place for our data to actually live (aka the shard servers). You can either store shards in replica sets or standalone database servers. The latter provides the simplest setup (no additional replica set) but also has the drawback of losing the availability guarantees that a replica set provides. For example, you can establish a replica set in two data centers, with config replicas and shard replicas in backup mode should the primaries in the other data center fail. In general, if you’re at the point where you need to worry about optimizing I/O you should probably also be worried about availability so replicating to standalone servers should be considered a minority concern.
We now need to establish an application replica set. To that end let’s use this for /etc/mongod.conf
:
storage: dbPath: /var/lib/mongodb journal: enabled: true systemLog: destination: file logAppend: true path: /var/log/mongodb/mongod.log processManagement: timeZoneInfo: /usr/share/zoneinfo sharding: clusterRole: shardsvr replication: replSetName: appcluster net: bindIp: 0.0.0.0
The only new bit in that configuration should be the sharding
section which specifies the clusterRole
as shardsvr
. Since it’s a shard server it will go from listening on the default port of 27017
and onto the port 27108
and so we’ll have to enter the mongo
shell by executing mongo --port 27018
. We then bootstrap the new replica set just like the last one:
> rs.initiate( ... { ... _id : "appcluster", ... members: [ ... { _id : 0, host : "db02:27018" }, ... { _id : 1, host : "db03:27018" } ... ] ... } ... ) { "ok" : 1, "operationTime" : Timestamp(1535062208, 1), "$clusterTime" : { "clusterTime" : Timestamp(1535062208, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } appcluster:SECONDARY>
At which point both servers will be ready to receive data from a mongos
frontend.
Database Routing Configured
Now that we have a the shard replica set available to receive data and a configuration replica set to coordinate the process, the only thing left is a mongos
instance for tying the two together into a complete solution. As mentioned before, we’ll be putting mongos
on the same server as our single server configuration replica set (on db01
specifically) but in production these would likely be ran separately either on their own VM’s or in their own containers.
The mongos
executable should be available from our server installation, but there’s no systemd service available to start it on our db01
VM and there’s no configuration file present either.
Luckily the configuration for mongos
is pretty straight foward, you just need to point it to the configuration server and make sure it’s listening on an IP where clients have access. An example configuration might be:
sharding: configDB: configset/localhost:27019 net: bindIp: 0.0.0.0
Which just points it to the configuration server, which is where it will pull all other configuration data from. Next we add a system service since this will be running on a VM. I created a file at /etc/systemd/system/mongos.service
:
[Unit] Description=mongos Database Router After=mongod.service Documentation=https://docs.mongodb.org/manual [Service] User=mongodb Group=mongodb ExecStart=/usr/bin/mongos --config /etc/mongos.conf PIDFile=/var/run/mongodb/mongos.pid [Install] WantedBy=multi-user.target
After that issuing a systemctl daemon-reload
should make systemd aware of the new service and we can then enable the service then start it up:
root@db01:~# systemctl daemon-reload root@db01:~# systemctl enable mongos Created symlink /etc/systemd/system/multi-user.target.wants/mongos.service → /etc/systemd/system/mongos.service. root@db01:~# systemctl start mongos root@db01:~# netstat -tlpn Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:27017 0.0.0.0:* LISTEN 1348/mongos tcp 0 0 0.0.0.0:27019 0.0.0.0:* LISTEN 606/mongod tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 551/systemd-resolve tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 698/sshd tcp6 0 0 :::22 :::* LISTEN 698/sshd
From the above you can see that mongos
is now listening on the default MongoDB port of 27017
and ready to accept connections.
Shards Added to Configuration Server
Now that all the components are installed and running, with mongos
pointed at the configuration server, we now need to update the configuration server to point to the shard server which will finally bring all this together.
mongos> sh.addShard( "appcluster/db03:27018") { "shardAdded" : "appcluster", "ok" : 1, "operationTime" : Timestamp(1535063301, 1), "$clusterTime" : { "clusterTime" : Timestamp(1535063301, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } }
From the above, even though db02
was the current primary in my cluster I gave it the db03
server to illustrate that it will automatically locate the primary for us. Now that we have a sharded cluster and mongos
gateway, lets create a sharded database with some data to finally make this real:
mongos> sh.enableSharding("rascaldev") { "ok" : 1, "operationTime" : Timestamp(1535063507, 3), "$clusterTime" : { "clusterTime" : Timestamp(1535063507, 3), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } mongos> use rascaldev switched to db rascaldev mongos> db.createCollection("testCollection") { "ok" : 1, "operationTime" : Timestamp(1535063544, 2), "$clusterTime" : { "clusterTime" : Timestamp(1535063544, 2), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } mongos> db.testCollection.insertOne({ _name: "something1" }) { "acknowledged" : true, "insertedId" : ObjectId("5b7f3614f57f15fdc41d8541") } mongos> db.testCollection.insertOne({ _name: "something2" }) { "acknowledged" : true, "insertedId" : ObjectId("5b7f3618f57f15fdc41d8542") } mongos> db.testCollection.insertOne({ _name: "something3" }) { "acknowledged" : true, "insertedId" : ObjectId("5b7f361af57f15fdc41d8543") } mongos> db.testCollection.ensureIndex({_name: "hashed"}) { "raw" : { "appcluster/db02:27018,db03:27018" : { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } }, "ok" : 1, "operationTime" : Timestamp(1535064626, 2), "$clusterTime" : { "clusterTime" : Timestamp(1535064626, 2), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } mongos> sh.shardCollection("rascaldev.testCollection", { "_name": "hashed" } ) { "collectionsharded" : "rascaldev.testCollection", "collectionUUID" : UUID("1acfe577-c0f1-4cd8-9c89-bd638a4997b2"), "ok" : 1, "operationTime" : Timestamp(1535064653, 11), "$clusterTime" : { "clusterTime" : Timestamp(1535064653, 11), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } }
My actions above should be pretty obvious if you already know how to use MongoDB:
- I create a database called
rascaldev
and create a collection calledtestCollection
, enabling sharding on the database - I inserted arbitrary data so we had something to shard.
- I think create an index on the one field the documents in the collection have (name “
_name
“), this index will generate a hash for each document. - I then used
sh.shardCollection
to shard this existing collection based on the hash present in the_name
index. When new shards are added these hashes will then be assigned to a particular shard with a record of the assignment made in the configuration server.
So we now have a sharded database. The replica set on the shard servers will ensure database availability, and the mongos
service will do the work of routing the queries to whichever shard requires it. If we need the gateway to be highly available, we should be able to frontend several mongos
instances with a standard load balancer.
To view the shards configured in a particular cluster, you can use sh.status()
. For example:
mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("5b7de15bb1547fbbe057cfee") } shards: { "_id" : "appcluster", "host" : "appcluster/db02:27018,db03:27018", "state" : 1 } active mongoses: "4.0.1" : 1 autosplit: Currently enabled: yes balancer: Currently enabled: yes Currently running: no Failed balancer rounds in last 5 attempts: 0 Migration Results for the last 24 hours: No recent migrations databases: { "_id" : "config", "primary" : "config", "partitioned" : true } config.system.sessions shard key: { "_id" : 1 } unique: false balancing: true chunks: appcluster 1 { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : appcluster Timestamp(1, 0) { "_id" : "rascaldev", "primary" : "appcluster", "partitioned" : true, "version" : { "uuid" : UUID("19df1ed8-0248-4e37-b7fb-27925bdfe4ec"), "lastMod" : 1 } } rascaldev.testCollection shard key: { "_name" : "hashed" } unique: false balancing: true chunks: appcluster 1 { "_name" : { "$minKey" : 1 } } -->> { "_name" : { "$maxKey" : 1 } } on : appcluster Timestamp(1, 0)
Performance
Long Running Query Killed
Depending on application quality and unforeseeable events, you may run into a situation where a database table is locked or for some other reason you need to kill a long running operation. To do that you can use db.currentOp().inprog
testcluster:PRIMARY> db.currentOp().inprog.forEach(function(oper) { ... if(oper.secs_running > 5) ... print(oper.secs_running, oper.opid); ... }) NumberLong(363) 3040 testcluster:PRIMARY> db.currentOp().inprog.forEach(function(oper) { ... if(oper.opid == 3040) ... printjson(oper); ... }) { "host" : "db01:27017", "desc" : "conn37", "connectionId" : 37, "client" : "127.0.0.1:45510", "appName" : "MongoDB Shell", [...snip...] testcluster:PRIMARY> db.killOp(3040) { "info" : "attempting to kill op", "ok" : 1, "operationTime" : Timestamp(1534957455, 1), "$clusterTime" : { "clusterTime" : Timestamp(1534957455, 1), "signature" : { "hash" : BinData(0,"+BQ3Qow4kri3n+7T0ykreJiBo8E="), "keyId" : NumberLong("6591896989649076225") } } }
The above is abbreviated output at the snip but shows the normal workflow for long running jobs:
- Locate which operations seem to be lasting a long time
- Examine each operation to determine if it can be safely killed or if there’s an obvious problem preventing it from completing (either successfully or not)
- If it isn’t possible for the operation to ever complete you then use
db.killOp(<opid>)
to end its long and tragic life.
Sharding Strategies
Previously we discussed adding additional compute and storage capacity by sharding your data across several nodes. Let’s dig a little deeper. In the sharding example I presented before, I just used the “hash” method of sharding without really explaining what it was.
In this method, to satisfy the index MongoDB generates a hash for the given field. Then with { _name: 'hashed' }
I’m telling MongoDB that data will be sharded according to the value of this hash. When the collection is sharded (or a new shard is added to the cluster) each shard is allocated a uniformly large “chunk” of the collection data according to the numeric value of the hash. For example: “hashes between this and this value cause the document to go into chunk 1 which is on the db02 shard”, etc.
Since this process is rather deterministic and splits documents according to their hash in a known way it’s actually pretty trivial for mongos
to determine which mongod
instance will house the data when you go to update or retrieve it. If you’re matching according to the indexed field, it just hashes your comparison and looks at the mongod
instance that would house a document with the given value and queries only that node. In so doing it actually splits up the load across multiple nodes which is why sharding adds additional capacity to your cluster.
This presents another problem though, different documents are going to be queried at different rates and if you’re directing traffic based on a field hash then you could very well end up with a cluster with five mongod
instances where two of the instances are being absolutely pegged and the other three are just more or less cold storage. Since it’s hard to tell which field values will generate hashes that just happen to be near each other, it would be helpful to specify a means of sharding according to data that you can easily see. This is where range sharding comes into play.
The process for using ranged sharding is almost exactly the same though except when you get to the shardCollection
stage you execute something similar to:
mongos> sh.shardCollection("<database>.<collection>", { <shard key> : <direction> } )
Where <direction>
is either 1
or -1
depending on which direction you want the documents ordered in.
There’s obviously more that can go into sharding but in the interests of keeping this the “Clustering 101” (like the title states) I’ll just mention this possibility and move on.
Brief Note About Reading/Writing To a Replica Set
By its nature a MongoDB replica set is what’s referred to as an “active-passive” cluster. Meaning there is one node that is actively servicing requests while the others sit standby taking over should the current primary node go out of commission. Per the official documentation this node is supposed to be used for all traffic, with no traffic going to the secondary nodes unless you’re doing something particular.
Additional Security
Cluster Components Secured With Key Files
OK so now our users are being authenticated, their access restricted as appropriate, and the conversations they’re having with the cluster are protected from prying eyes. Now we need to secure the cluster components themselves. Going purely by defaults, the only real requirement for an attacker to join a MongoDB cluster is that they know what the cluster name should be and you can get that from just snooping on the packets. Enter keyfile-based authentication. With key files, you can supply a string of non-whitespace characters (from the base64-compatible character classes).
For example on each node we can issue the follow commands:
root@db02:~# echo praise | base64 > /etc/mongod.key root@db02:~# chown mongodb /etc/mongod.key root@db02:~# chmod 0400 /etc/mongod.key root@db02:~#
Which creates a new key based on the base64 value of the word praise
and deposits it at /etc/mongod.key
and then securing it to be read-only for the mongodb
user with no other users or permissions being allowed for this file.
We can then configure MongoDB to use the key file by adding security.keyFile
to the /etc/mongod.conf
configuration file:
security: keyFile: /etc/mongod.key
After restarting each of these nodes, this key will then be used to form a valid response to a one-time challenge, thus proving it’s supposed to join the cluster it’s targeting.
User Security In A Sharded Cluster
Now that the components have been secured we can progress onto user security. This may seem counter intuitive ordering but it’s required by the security model MongoDB follows: You control which hosts can join the cluster and only then may you begin enabling user access controls. This is because in mongoDB “authorization” is for everyone, including the servers themselves.
If we enable user access control at this point, we’ll be locked out of administrative functions since he haven’t created an authenticated account and our current administrator account will go away once user access control is enabled. So taking the cluster as it stands at the end of the last section, let’s first create a new administrative user:
mongos> db.createUser( ... { ... user: "rascaldev", ... pwd: "password", ... roles: [ { role: "userAdminAnyDatabase", db: "admin" } ] ... } ... ) Successfully added user: { "user" : "rascaldev", "roles" : [ { "role" : "userAdminAnyDatabase", "db" : "admin" } ] }
Once this exists in the cluster, we can then enable user access by adding the following clause to the /etc/mongod.conf
for all mongod
instances in our cluster:
security: keyFile: /etc/mongod.key authorization: enabled
This obviously replaces any existing security
section you have in place. Now that authorization has been enabled lets create two different users with access to two separate databases:
db.createUser( { user: "rascaldev", pwd: "password", roles: [ { role: "userAdminAnyDatabase", db: "admin" } ] } ) mongos> use rascaldev switched to db rascaldev mongos> db.createUser({ ... user: "joel", ... pwd: "password", ... roles: [ "readWrite" ] ... }) Successfully added user: { "user" : "joel", "roles" : [ "dbAdmin" ] } mongos> db.createCollection("testCollection") { "ok" : 1, "operationTime" : Timestamp(1536000699, 8), "$clusterTime" : { "clusterTime" : Timestamp(1536000699, 8), "signature" : { "hash" : BinData(0,"sObI408ALwg8KjbsjiUSsNlf/kg="), "keyId" : NumberLong("6597041394102042629") } } } mongos> use newDB switched to db newDB mongos> db.createUser({ ... user: "otherUser", ... pwd: "password", ... roles: [ "readWrite" ] ... }) Successfully added user: { "user" : "otherUser", "roles" : [ "readWrite" ] }
The first user is called joel
and functions as a dbAdmin
as well as readWrite
access for the rascaldev
database while the second is otherUser
which merely has readWrite
privileges to a separate newDB
. Both these users will have readWrite
only for the databases we were in when we created them.
A full description of MongoDB access controls is probably out of scope for this article (which is centered on clustering) but in general the built-in RBAC should suffice for most people.
Client Connections Secured With x509 Certificates
OK, so now that we’ve established some level of control over which servers can join the cluster and the amount of access each user has. We now need to put more controls in place to ensure the medium they’re communicating over (i.e the network) can be trusted. To accomplish this we can use MongoDB’s native TLS/SSL support and communication with mongos
will be secured in the same manner as a web page might be when you go to make a purchase or manage your banking account.
First, let’s secure the mongos
instance frontending our database cluster by modifying the net
section in /etc/mongos.conf
to be the following:
net: ssl: mode: allowSSL PEMKeyFile: /etc/ssl/mongodb.pem
In the above we’re basically doing two things:
- Setting the SSL mode to
allowSSL
permits client connections to use SSL if they request it without requiring cluster components to use SSL. Since we haven’t configured themongod
instances yet we need this setting. - We’re pointing to a PEM encoded certificate paired with PEM encoded private key at
/etc/ssl/mongodb.pem
Before restarting mongos
though we need to create the referenced .pem
file. You can create this a variety of ways but one way would be generating a self-signed certificate:
root@db01:~# openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:4096 -keyout mongodb-cert.key -out mongodb-cert.crt \ > -subj '/C=US/ST=NorthCarolina/L=Durham/O=rascaldev/OU=FeverDream/CN=db01/emailAddress=rhjoel84@gmail.com' Generating a 4096 bit RSA private key ............................................................................................................................................................................................... .............................................................................................................................++ ........................................++ writing new private key to 'mongodb-cert.key' ----- root@db01:~# cat mongodb-cert.key mongodb-cert.crt > /etc/ssl/mongodb.pem root@db01:~#
This certificate will then be delivered to clients attempting to connect using SSL. Since it’s client visible you may choose to use an actual certificate authority so that clients are able to validate it. If you’re load balancing amongst several different mongos
instances it may be desirable to make sure it’s the same certificate for each instance with the “common name” set to the VIP’s hostname.
Once all components in the cluster have an SSL certificate available, you’ll probably want to change the SSL mode from allowSSL
to preferSSL
which will cause internal communication to cut over to SSL as well.
Cluster Components Secured With x509 Certificates
OK so controlling who can join the cluster with a pre-shared key is nice and all but it would be nice if each component in the cluster didn’t require identical credentials. Identical credentials create two situations undesirable from a security standpoint:
- A pre-shared key can be used an arbitrary number of times. Meaning if one server is compromised, the credentials required for joining 1,000 other servers to the cluster has also been attained. In that scenario, if an attacker is able to establish a
mongod
instance on your network, they can join the cluster and mine any data replicated to it. - It’s impossible to revoke a particular node’s access without updating the shared key for the entire cluster which is an operation with a high likelihood to cause downtime.
Neither of which is good. Enter SSL-based authentication of cluster components. With MongoDB’s native SSL support you can not only require SSL for communication like we did did the previous section but you can also validate that your peers are indeed who they say they are. The way you do this is by operating a private certificate authority and issuing certificates for the individual cluster members.
Running a private CA is a bit out of scope for this article (and I’ve covered it elsewhere) but for each server in the cluster, let’s assume you have the certificate files (CA, public PEM certificate, and private PEM key) all saved in /etc/ssl/mongodbCerts
for example this directory structure:
root@db01:/etc/ssl/mongodbCerts# find . -type f ./db01/db01.key ./db01/db01.pem ./ca/ca.key ./ca/ca.pem ./db02/db02.key ./db02/db02.pem ./db03/db03.key ./db03/db03.pem
For clarity let’s assume this directory structure exists on all nodes in my sharded cluster and that all .pem
files have their respective private keys appended to the end. Once the certs are present on each system, you would first edit /etc/mongos.conf
to be something similar to:
sharding: configDB: config/db01:27019 security: clusterAuthMode: x509 keyFile: /etc/mongod.key net: ssl: mode: preferSSL PEMKeyFile: /etc/ssl/mongodb.pem CAFile: /etc/ssl/mongodbCerts/ca/ca.pem clusterFile: /etc/ssl/mongodbCerts/db01/db01.pem
Once the mongos
on db01 is reconfigured, we can reconfigure the configuration server on the same VM by editing /etc/mongod.conf
with a similar configuration. For instance the config server that’s also on db01 could be:
storage: dbPath: /var/lib/mongodb journal: enabled: true systemLog: destination: file logAppend: true path: /var/log/mongodb/mongod.log sharding: clusterRole: configsvr security: clusterAuthMode: x509 net: bindIp: 0.0.0.0 ssl: mode: requireSSL PEMKeyFile: /etc/ssl/mongodb.pem clusterFile: /etc/ssl/mongodbCerts/db01/db01.pem CAFile: /etc/ssl/mongodbCerts/ca/ca.pem replication: replSetName: config processManagement: timeZoneInfo: /usr/share/zoneinfo
Breaking down the changes:
- In
security
we’ve gotten rid of thekeyFile
definition and have instead setclusterAuthMode
from the default ofkeyfile
and tox509
which instructsmongod
- Compared to the
mongos
configuration we’ve changed some stuff innet.ssl
as well:- The SSL mod has been changed from
preferSSL
(where SSL is only required for server components) torequireSSL
which forces all connections, even with supposed clients, to provide a valid SSL certificate so that each end of the connection successfully authenticates to the other. This appears to be done on a hostname basis, so testing client access to themongos
instance running ondb01
should happen either from your host system or a VM unrelated to the cluster. - We provide a
PEMKeyFile
certificate that will be used for encrypting transport traffic. This file should be the certificate followed by the PEM encoded private key. - We then provide a separate
clusterFile
certificate which is used for the initial authentication when a system is attempting to join the cluster. Without this line, MongoDB would just usePEMKeyFile
for cluster authentication instead. This file should be the certificate followed by the PEM encoded private key. - For SSL authentication to work, we need to be able to verify that the certificate the other side is giving is a certificate we care about the authenticity of. Both the certificates in my example are signed by the same CA but this needn’t be the case.
- The SSL mod has been changed from
Your actual shard servers will have similar net
and security
section changes, each pointing to its respective certificate files (which, you guessed it, should be the certificate followed by the PEM encoded private key).
At this point, your cluster should be sharded, highly available, with the cluster components secured and client communication successfully encrypted.