Skip to content

A RESTful API architected with MongoDB, Node.js, Express, and Redis.

Notifications You must be signed in to change notification settings

Pipes36/QuestAPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 

Repository files navigation

QuestAPI

QuestAPI is a REST API that provides Question and Answers data to the Atelier e-Commerce platform.

QuestAPI provides an automatic Extract, Transform and Load process for the required CSV data, which populates a Mongo instance without any need for manual user interaction.

That process, combined with tests covering 77% of the code base, and a simple straightforward API that interacts with Mongo and Redis instances which are containerized with Docker, means that the service is easily scalable both vertically and horizontally in the future.


Table of Contents

  1. Automated Extract, Transform, and Load Process
  2. API Performance
  3. Data Flow
  4. Endpoints

Automated Extract, Transform, and Load Process

QuestAPI includes an automated ETL process built-in. It uses the Node.js Fs module as well as the Mongoose ODM to populate a Mongo database.

  1. Initializing The Database <connection.js>

On initilization, QuestAPI will check the current connected Mongo instance for the required data, and invoke init() if it does not contain it.

const Promise = require('bluebird');
const mongoose = require('mongoose');
mongoose.promise = Promise;

const db = (async () => {
  try {
    await mongoose.connect(URI, { useNewUrlParser: true, useUnifiedTopology: true, useCreateIndex: true });
    console.log('Successfully Connected to DB'
    const connection = mongoose.connection;
    const collectionSearch = await connection.db.listCollections().toArray();
      if (isEmpty(collectionSearch)) {
       console.log('Initializing DB with Data');
       init();
      }
    return connection;
  } catch(err) {
    console.log(err);
    console.log('Error connecting to DB');
  }
})();
  1. Initializing Function <init.js>

Once it is verified that the data is not within the database, Init() is invoked, starting the ETL process.

const path = require('path');
const populateDB = require('../csvParse/populateDB.js');
const parseQuestion = require('../csvParse/parseQuestion.js');

const init = () => {
  populateDB.insertQuestions(path.join(__dirname, '../data/questions.csv'), parseQuestion);
}
  1. Populating The DB <populateDB.js>

This is first function of the ETL chain, but the next two functions follow the same pattern as this one.

PATTERN :

  1. Extract data from the CSV file using a readStream
  2. Transform the data to fit the Mongoose Schema
  3. Load the data into Mongo with Mongoose Queries
  4. Repeat
  • IMPORTANT : This implementation only moves as fast as Mongoose Queries can resolve, but it works within the constraints of the V8 engine's memory heap.

  • Each chunk received from the stream is about 530 rows from the CSV.

  • Each function also invokes timer() which is a HoF. The resulting function is invoked again at the end of the process, giving us the function duration.

  • For the transformations, we abstract that logic into separate functions that return the objects to be used by the Mongoose Query.

  • A log is also scheduled every 1000 queries for human sanity checks.

  • This implementation ignores insertion errors and continues on with { ordered: false }

{
  insertQuestions: async (filePath, transformer) => {
    const startTime = timer();
    console.log('STEP 1/3: INSERT QUESTIONS')
    let firstLineRead = false;
    let columns = '';
    let iteration = 0;
    const questionReadStream = fs.createReadStream(filePath, {encoding: 'utf8'});

    questionReadStream.on('data', async (res) => {

      questionReadStream.pause();
      let docs = [];

      if (!firstLineRead) {
        columns = determineColumns(res);
        docs = transformRowsIntoObjects('', res, transformer);
        firstLineRead = true;
      } else {
        docs = transformRowsIntoObjects(columns, res, transformer);
      }
      try {
        const insertionQuery = await Question.insertMany(docs, { ordered: false });
        if (++iteration % 1000 === 0) console.log(`Questions Iteration Count: ${iteration}`);
      } catch (err) {
        console.log('Insertion Error: continuing { ordered: false }');
      }
      
      questionReadStream.resume();
      
    });
    questionReadStream.on('error', (err) => {
      console.log(err);
    });
    questionReadStream.on('end', () => {
      console.log(`INSERT QUESTIONS ENDING. TIME ELAPSED: ${startTime()}`)
      populateDB.insertAnswers(path.join(__dirname, '../data/answers.csv'), parseAnswer) // Continue Population
    });
  },
}
  1. ETL Duration

Since this ETL process relies on the speed of Mongoose Queries resolving, optimizations are focused on the readStream and chunk-size to Query-speed comparisons.

This process clocked in at an average of 45 minutes (with a fastest completion time of 30 minutes, and a slowest completion of 60 minutes). This process was run on over 12 million rows of CSV data. The times are taken from 7 iterations of this ETL process.


API Performance

LAST UPDATED : 05/15/2021

FUTURE OPTIMIZATION : Horizontal scaling and load-balancing. This is easily accompishable with the arcitecture of the service being containerized.

The most significant optimization implemented after deployment was the addition of a Redis cache that operated with short expiration timers. I decided this in order to cache the most popular products. On page load, the Questions route would be the route that has a 100% chance of being called. Since this was the case, I wanted to focus on optimizing the most products that were the most popular, handling strong upticks in traffic.

The results of the tests indicate that that was successful, but now we need to scale horizontally in order to handle varied requests.

all stress tests done with Loader.io on an AWS EC2 T2 Micro Instance

The most expensive route is to GET a question since it must also be populated with all of a single product's questions and answers, denormalized.

These next two results are for repeated requests against the Questions at 500 clients per second.

Prior to Redis optimization image After Redis optimization image

As you can see, adding Redis caching decreased the response time to 2% of its original and in increase to 100% successful responses instead of the previous 21% success rate.

--

These next two results are for repeated requests against the Answer endpoint at 750 clients per second.

Prior to Redis optimization image After Redis optimization image

The post-Redis optimized results show that we have a response time 1% of the original and nearly 100% of the possible successful repsonses completed.


Data Flow

  1. Database Initializes with data
  2. Client sends GET request
  3. Server checks Redis cache which operates off of LRU timers
  4. If the requested information is within the Redis cache, return response to the Client
  5. If the requested information is not within the Redis cache, Query the Database
  6. Once the request returns from Mongo, add it to the Redis cache with an expiration timer
  7. Return the response to the Client

image


Endpoints

All example requests are made with SuperAgent

GET /api/qa/questions

Retrieves a list of questions for the requested product, not including reported questions.

Header Type Description
product_id String Specifies the product for which to receive questions
const res = await request.get('/api/qa/questions').set('product_id', '1');

Response: Status 200 OK

{
    "product_id": "1",
    "results": [
        {
            "_id": "609bec05262cc03dd6ef9ce7",
            "question_id": 1,
            "product_id": "1",
            "question_body": "What fabric is the top made of?",
            "question_date": "2020-07-27T16:18:34-05:00",
            "asker_name": "yankeelover",
            "asker_email": "first.last@gmail.com",
            "reported": false,
            "question_helpfulness": 1,
            "__v": 0,
            "answers": {
                "5": {
                    "photos": [
                        {
                            "photo_id": 1,
                            "url": "https://images.unsplash.com/photo-1530519729491-aea5b51d1ee1?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1651&q=80"
                        }
                    ],
                    "answer_id": 5,
                    "answer_body": "Something pretty soft but I can't be sure",
                    "answer_date": "2020-09-13T04:49:20-05:00",
                    "answerer_name": "metslover",
                    "reported": false,
                    "answer_helpfulness": 5,
                    "answerer_email": "first.last@gmail.com",
                    "__v": 0
                },
                "7": {
                    "photos": [],
                    "answer_id": 7,
                    "answer_body": "Its the best! Seriously magic fabric",
                    "answer_date": "2021-02-27T12:45:24-06:00",
                    "answerer_name": "metslover",
                    "reported": false,
                    "answer_helpfulness": 7,
                    "answerer_email": "first.last@gmail.com",
                    "__v": 0
                },
                "8": {
                    "photos": [],
                    "answer_id": 8,
                    "answer_body": "DONT BUY IT! It's bad for the environment",
                    "answer_date": "2020-09-19T16:49:22-05:00",
                    "answerer_name": "metslover",
                    "reported": false,
                    "answer_helpfulness": 8,
                    "answerer_email": "first.last@gmail.com",
                    "__v": 0
                },
            }
        }
    ]
}

GET /api/qa/questions/:question_id/answers

Returns answers for a given question, not including reported answers.

Parameter Type Description
question_id Integer Specifies the product for which to receive answers
const res = await request.get('/api/qa/questions/1/answers');

Response: Status 200 OK

{
  "question": "1",
  "page": 0,
  "count": 5,
  "results": [
      {
          "photos": [
              {
                  "photo_id": 1,
                  "url": "https://images.unsplash.com/photo-1530519729491-aea5b51d1ee1?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1651&q=80"
              },
              {
                  "photo_id": 2,
                  "url": "https://images.unsplash.com/photo-1511127088257-53ccfcc769fa?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1650&q=80"
              },
              {
                  "photo_id": 3,
                  "url": "https://images.unsplash.com/photo-1500603720222-eb7a1f997356?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1653&q=80"
              }
          ],
          "answer_id": 5,
          "answer_body": "Something pretty soft but I can't be sure",
          "answer_date": "2020-09-13T04:49:20-05:00",
          "answerer_name": "metslover",
          "reported": false,
          "answer_helpfulness": 5,
          "answerer_email": "first.last@gmail.com",
          "__v": 0
      },
      {
          "photos": [],
          "answer_id": 7,
          "answer_body": "Its the best! Seriously magic fabric",
          "answer_date": "2021-02-27T12:45:24-06:00",
          "answerer_name": "metslover",
          "reported": false,
          "answer_helpfulness": 7,
          "answerer_email": "first.last@gmail.com",
          "__v": 0
      },
  ]
}

POST /api/qa/questions

Creates a question for the specified product.

Body Parameters

Parameter Type Description
body String Text of the question being asked
name String name of the question asker
email String Email address of the question asker
product_id Integer Required: ID of the product for which the question is submitted
const res = await request.post('/api/qa/questions')
      .send({ product_id: '25' })
      .send({ body: 'Does this make me look hip?!' })
      .send({ name: 'CoolGuy221' })
      .send({ email: 'coolguy@email.com' });;

Response: Status 201 Created


POST /api/qa/questions/:question_id/answers

Creates an answer for the specified question.

Parameters

Parameter Type Description
question_id Integer Required: ID of the question being answered

Body Parameters

Parameter Type Description
body String Text of the question being askwer
name String name of the question asker
email String Email address of the question asker
photos [String] List of photo urls
  const res = await request.post('/api/qa/questions/30/answers')
    .send({ body: 'Hey I have an answer to your question!'})
    .send({ name: 'I\'m a question asker' })
    .send({ email: 'answer@email.com' })
    .send({ photos: [ 'url1', 'url2' ]});

Response: Status 201 Created


PUT /api/qa/questions/:question_id/helpful

Adds to the helpfulness count of a specified question.

Parameter Type Description
question_id Integer Required: ID of the question being marked as helpful
const res = await request.put('/api/qa/questions/1/helpful');

Response: Status 204 No Content


PUT /api/qa/questions/:question_id/report

Reports a question.

Parameter Type Description
question_id Integer Required: ID of the question being reported
const res = await request.put('/api/qa/questions/1/report');

Response: Status 204 No Content


PUT /api/qa/answers/:answer_id/helpful

Adds to the helpfulness count of a specified answer.

Parameter Type Description
answer_id Integer Required: ID of the question being marked as helpful
const res = await request.put('/api/qa/answers/2/helpful');

Response: Status 204 No Content


PUT /api/qa/answers/:answer_id/report

Reports an answer.

Parameter Type Description
answer_id Integer Required: ID of the question being reported
const res = await request.put('/api/qa/answers/2/helpful');

Response: Status 204 No Content