Code Exercise - Node.js, Couchbase, AWS

This is a coding exercise I did in Feb, 2016 using node.js, couchbase and AWS. Here's the source code. I chose node because I'm familiar with it, it's fast and easy for simple web apps. I chose Couchbase because it, too, is fast, simple, very scalable and I did

This is a coding exercise I did in Feb, 2016 using node.js, couchbase and AWS. Here's the source code.

I chose node because I'm familiar with it, it's fast and easy for simple web apps. I chose Couchbase because it, too, is fast, simple, very scalable and I did a similar project last year so I could reuse a chunk of code, including the testing setup (mocha and istanbul).

Design issues:

Scalability and concurrency - I need a concurrency scheme to generate the reduced url (or couchbase key). Technically, couchbase 4.0 has transaction support but it's new and I haven't used it so I postponed it for now. I could use an additional SQL database but that's a lot of extra complexity. So I chose a key domain subsetting scheme which I first used twenty-five years ago for disconnected mobile clients, which guarantees no locking/concurrency issues or key conflicts.

In this case, the first letter of the key is assigned to a specific server as a key prefix, which guarantees that server owns a subset of the key domain. I'm using one letter which means I could deploy up to 36 servers, each running independently and I decided this was sufficient given the application. I decided to reserve the "T" space for test records which removes a lot of testing issues (cleaning the database, etc).

The tradeoff is waste of key address space but I have a ton of that since couchbase key length is 250 characters. The initial key assignment per server has to be managed, also.

I keep the last generated url in memory and write it to couchbase as a backup in case the server crashes. I also separated the seed records into their own bucket to avoid naming conflicts and the seed/seed configuration is unique for each key domain (i.e. server).

Configuration - all configuration data is stored in a standard json file. Also stored the REST url strings here for single point of definition to avoid duplication bugs.

Reduced Url generation - I chose to increment the reduced url from a seed number ('0') using a base-36 scheme. I could have used base-62 (upper and lower characters) to increase the number of possible urls but as mentioned before, the key size is already 250 characters so I didn't see a need, but it's pretty simple to change.

REST and js function - I separated the REST call and javascript code into two functions. This is easier for testing purposes, easier to debug and it's more flexible for re-use.

Admin url considerations - I looked at three schemes for the application URLs.

1) Access the admin urls (/seed, /reduce, /findKeys, /config) and redirection url (/url) on different ports. This is probably what I'd do if I was managing the host servers but heroku and nodejitsu only expose the 80 port, and I've had issues accessing admin ports.

2) Use a magic number - I control the initial seed url, so I could start the generated urls at '1' and reserve the current seed, '0', as the url for admin functions, so they'd look like this -

http://host/0/reduce/xyz
http://host/0/reduce/xyz&url=bing.com
etc

This guarantees uniqueness and avoids possible conflicts but it may not be apparent to another programmer what I've done or why. In reality, I'd probably allocate 0-9 as reserve urls/keys (which drops my maximum concurrent server count to 26)

3) My chosen scheme was to explicitly list each url function and the redirection url has a prefix of "/url". It's not as elegant as solution #2 and requires the user to enter "/url/key" instead of just "/key" but it's simpler, easier to understand, more extensible.

Couchbase considerations - I like couchbase a lot. It's simple, fast, easy to cluster, quite flexible and I've encountered no bugs yet. I used version 4.0 because it has a SQL-like query language (N1QL) and initially I had problems getting it to work so I wrote the findKeys() method using map/reduce functions.

Couchbase can function as a key/value store like Redis but can also store a json document, which I can easily extend to store a url owner, associated IP address, hit count, etc.

I considered Redis (it's easier to find hosting for it) but it's more restrictive in schema/key design and I'm not sure it's faster enough to matter. Also, Couchbase has no RAM restriction while the last Redis version I used had to run entirely in RAM. Also, the coding for Redis is more complicated.

Heroku considerations - I used Heroku to host node.js because I already have several node apps running on it and felt I could get it working without any issues.

AWS considerations - I used AWS because I wanted to use couchbase and it had a simple couchbase deployment. Other hosting services would have required me to create docker images, configure ports, etc, and probably would have taken many hours to get right. As it was, AWS setup already took the most time in this project because I had to debug two config issues.

Debugging - My biggest problem was getting the couchbase N1QL query to work. I used N1QL before and it ran on my local setup but not on AWS and I traced it down to two items -

1) N1QL needs a primary index and you can only create that index from a command-line tool. So it took me a little while to figure out how to ssh into AWS and execute it.

2) The couchbase deployment on AWS exposes default ports but the N1QL query executes on port 8093, which isn't part of the default. This took about an hour to figure out.

UI considerations - I started an html5/jquery UI which became a debugging issue as I haven't written much UI stuff lately. I had it mostly working but switched to basic form UI as I had back-end issues to debug, too.

Overall, I spent about 20 hours of real work on this, and about 6 to 7 hours on debugging or dead-end approaches.

Security - I spent no time on security. For the node app, I've used passport previously which supports OAuth and Saml. The couchbase API needs user/password enabled. Role-based and enforced on url basis.

Testing - currently has 87.5% test coverage according to Istanbul, I'm mostly missing tests for error-handling. I consolidated test execution in a shell script, "npm test", which was the standard at my last project and is easily added to the build/deploy of the app.

Improvements - For extreme predictability, I'd probably store the base configuration file in couchbase, manage configuration generation in a separate node module and record an association of IP address/dns name to each one-letter key, and increment/record that letter for each new server deployment.

I'd add more testing for error conditions.

I wrote a quick url vetting/fixing scheme to ensure my redirection code works but it's not exhaustive. This is the most likely area for bugs, I'd make this more extensive or use an existing library for url formatting.

Input for custom key format check.

Could use a more visual UI with colors, etc.

If it was an app with an indeterminate amount of future volume, I might rewrite the key scheme so it reads/writes an incremented key to couchbase and figure out the transaction code to lock/unlock each call.

Move hard-coded strings (error & status messages) into a separate file for localization purposes.

Add logging