Lab: Prime Factor Competing Consumers

In this week’s lab, you will build an AWS app using Competing Consumers to compute the largest prime factor (LPF) of a number. Clients will be able to post a request for a particular number, the system will find its largest prime factor, and then clients can query the system to get results. These might be used when encrypting or decrypting messages. This lab builds on your experience writing lambda in the previous lab, but this time you will write three distinct lambdas:

Overview of API computing largest prime factor and storing result in DynamoDB

This structure theoretically scales to a large number of clients, though in practice you can support just two processing lambdas (the center element) because of limits imposed by AWS Learner Academy.

The pieces are as follows:

This means there will be two entry points for clients: /request_lpfactor and /lpfactors. In a full service you would create an additional lambda to look up results for specific numbers (and wait until they are available), but this lab already has enough complexity.

This is a pretty bare-bones implementation of this service. You might decide to add additional elements, say a more useful gateway that would check for answers in the database before posting a request to the SQS. But such additions are not required.

Important Notes

A. Limits

It is critical that you limit your Lambda functions in this lab. If you do not, you will likely exceed the limits for AWS Academy accounts and lock your account. Note there are specific steps below that will limit concurrency so you do not exceed limits; be sure to apply them.

AWS has a page discussing concurrency limits in case you would like additional background information. However, the directions below should be sufficient.

B. Project Structure

For the previous lab, you started with a working project. We are not giving you a project to start with for this lab because you already have a starting point: your solution from the previous lab. Start by copying the code, complete with any folders into your new repository. This is important, because Maven places strict requirements on a project’s structure, and failing to have the same structure means your code will not build. You can use IntelliJ to fix that structure, but it is much harder than simply starting with the existing code. We will give you steps later in this document detailing how to bring the code over.

C. Showing your work

Commit changes to your Git repository as you work. Failing to do so risks losing progress. Just as importantly, we need to see your progress so we can be sure you did your own work. For this lab, commit messages must be informative. That is, say what was accomplished by the commit or describe what portion you are working on. You might also have commits that help just by capturing what you had completed at the end of the lab time!

Detailed Requirements

Background material for setting up this lab

Debugging

This assignment is almost impossible to debug without using logging. Your code from the previous lab should already include a logger; use the following steps to review accessing that output:

  1. Open the lambda function you created in the previous lab
  2. Visit the Monitor tab
  3. Click on View Cloud Watch Logs, and
  4. click on the latest log stream at the top of the window. You should see the log messages you generated. Each log entry includes the fully qualified name of the class writing the message followed by your text.

A tricky issue with services like AWS Lambda is that it is easy to accidentally execute the wrong version of the code. A good practice is to write a log entry at the beginning of your lambda that prints a test number (say “trial 1”) and increment that number every time you upload new code. You can then check the log to make sure you have executed the version of the code you intended. While you are at it, echo the JSON input to the log so to confirm you are processing the data you think you are processing. It is a bad feeling to realize you spent an hour trying to get something to work and realizing you had just forgotten to upload code or change a setting in PostMan.

Working with multiple lambdas

You could create a separate package for each lambda. However, this requires you to update multiple packages at the same time and to keep everything consistent. Just as a single Java program can have multiple classes with a main, you can embed multiple endpoints in a single .jar file. This is why you specify the lambda code by its full path: the package, class, and method name. This simplifies deployment, but does mean the .jar file becomes large. You may get warnings from AWS that you should move your package to their storage service, S3, but you can ignore such warnings for this lab.

Note that you do not need to upload the .jar file to ALL the lambdas each time you make a change. If you are sure a lambda already has working code, you do not need to replace it with the refreshed version. However, you might want to replace them all one more time when you are done to make sure all code is consistent.

Whenever creating a lambda for this lab, set the maximum number of concurrent lambdas to two. This is the minimum. Attempting to use more than 10 concurrent lambdas with AWS Academy results accounts being deactivated. Your instructor would then need to contact AWS support to ask that it be reactivated. Since this lab involves multiple lambdas, it is easy to exceed the budget if you allow more than 2 concurrent instances for any one lambda. You will do this by setting each lambda’s Reserve concurrency property to 2.

Steps

This section captures the steps that you need to do. They are in the intended order. As discussed above, commit your code to your repository as you complete steps (with good commit messages) so that you do not lose work and to give us confidence that you did your own work.

A. Set up the project

Steps to set up your repository for this assignment:

  1. Create your repository for this assignment.
  2. Copy the src, .idea, .gitignore, and pom.xml files from the repository for the previous lab to your new repository.
  3. Open the new project in IntelliJ.
  4. Open UnicornLocation.java.
  5. Right click on the class name, UnicornLocation, select Refactor, then select Rename.
  6. Enter the name LPFRequest and click on OK at least twice so that it is renamed everywhere in the project. This will rename the file as well.
  7. Now in LPFRequest.java, select location in the package declaration at the top of the file, right click, select Refactor, then select Move Package or Directory.
  8. In the prompt that comes up, change the location to ...\java\com\lpfactor. This will rename the package and move it in the directory tree.
  9. Open UnicornPostLocationHandler.java, and use refactoring to change the class name to LPFRequestHandler.
  10. Select location in the file and refactor this class to also be in the package com.lpfactor.
  11. In the Project window in IntelliJ, right click on unicorn.location and use Delete… to delete this now empty folder. Do the same for other project folders (like unicorn) that are now empty.
  12. Commit and push your project, because it is good practice to do that any time you make big changes to a project’s structure. If you do not, you run the risk of the project getting “stuck” (where you cannot pull and cannot commit) and making it hard to save future work.

B. Create a Java object for communicating via JSON

In last week’s lab, a JSON object was parsed into an instance of the class UnicornLocation. For this lab, you will need to define your own class. It will have the fully qualified name com.lpfactor.LPFRequest. You should already have the file from the Unicorn project (thanks to the steps above), but you could just create the file yourself. In any case, edit this file so it has a single field, number, with type BigInteger. Write a zero-argument constructor, and then write methods setNumber and getNumber to provide setters and getters for your field. Note that getNumber will return a BigInteger, but setNumber must take a String argument since JSON works with strings, not binary data. Because it may be useful during debugging, add a second constructor which takes a BigInteger argument and stores that argument as the number.

Warning: the name of your numeric field in LPFRequest must match the name of the field in the JSON object that the client will send. This ensures your class matches the Java Beans convention.

If you have used IntelliJ’s refactoring to get to this point, your LPFRequestHandler code should build. Use the mvn package command to test this. There will still be references to “unicorn” in LPFRequestHandler.java; use search to identify those and replace those references by something more specific to this project. Commit and push your changes.

C. Set up the Simple Queuing Service (SQS) Queue and the listener

The next step is to set up the SQS. Create your queue through the AWS web console (just search for SQS and poke around a little). Do not use a FIFO queue. The default queue type and defaults will work fine. Note the queue can have any name as long as you stick to standard identifier characters (letters, digits, and underscores, but no spaces or other special characters). Most students simply use the name numbers.

Note: where you use capital letters is important in this lab. Most of the tools used here are case-sensitive. Java requires capital letters for class names, so you do have to follow that convention when naming classes. But package names, queue names, API endpoints, and other items should use lower case wherever possible. This dramatically simplifies work when on a project with other people; that way no one has to remember the capitalization practices of different developers. Projects are complex enough without attempting to track everyone’s idiosyncrasies.

When setting up the queue, set the *Visibility timeout to 3 Minutes. If your processing lambda (which you will create in a bit) fails to process a request in this time, the request will be sent to another lambda. This setting will be critical when you write the lambda that processes queue entries.

Once the queue is set up, read the page on How to send and receive an SQS message for notes on writing code that stores entries into a queue. Before writing code, see also the following tips:

As an aside, whenever you add something to your pom.xml, you may need to use IntelliJ’s File | Repair IDE feature to force the indexer to recognize the new classes these provide. Clicking through the dialog that comes up will resolve the red text you may see for some names in IntelliJ.

Once the queue is set up, create a lambda, com.lpfactor.LPFRequestHandler. (Note the LPF, not LFP.) This lambda takes the role of the Listener in the above diagram. When you set up the Lambda in AWS, be sure to limit the concurrency by visiting the Configuration tab, clicking on Concurrency and recursion detection, clicking the Edit button for Concurrency, selecting Reserve concurrency, and setting it to 2. Then create the API gateway for the lambda, using HTTP API, and give the route the name request_lpfactor.

Implement the lambda to write LPFRequest values to the queue. This is done through .sendMessage. If if x is a request object (you should use better names!), then you can use JSON.std.asString(x) to convert it into a string, make that the body of the request (.withMessageBody), and send the request result to the queue with .sendMessage. This puts a result on the queue for further processing.

Use PostMan to send requests to your handler, and check the SQS in AWS to ensure that the requests are being stored in the queue. You can see the entries in the queue by opening it from the AWS console, clicking on the Send and receive messages button in the upper right corner, clicking on the Poll for messages at the bottom of the window, and then click on individual messages. Check your logs as well. If you notice messages are written to the logs slowly, double-check that you did not include delays when setting up the SQS.

Note if you send enough requests quickly enough using different PostMan windows, you will get errors back from your request handler. This is because AWS attempts to start new instances of the lambda to handle the extra requests. If instances are already started, they can likely process the requests more quickly than you can submit them. But if there is no lambda already running, the start-up delay creates a window of time in which the Lambda management will identify that no lambda is ready to process the next request and cause more lambdas to be started. But to see this happen, you would have to use PostMan on different computers, sending multiple requests to the same endpoint simultaneously.

As an aside, note you can use Java code to create queues. This is true for many services. However, creating the queue and related services through the console is simpler.

D. Set up the DynamoDB to store results

In the AWS Console, search for DynamoDB. Use this to create a table called lpfactors. For the most part you will accept defaults, but set the “partition key name” to “Number”. The type for this field must be String to allow for values greater than int.

Add the following dependency to pom.xml:

        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk-dynamodb</artifactId>
            <version>1.12.699</version>
        </dependency>

E. Writing a lambda processing requests on the SQS

Create a LPF Computing lambda (see the above diagram), com.lpfactor.ComputeLPF, likely based on the Unicorn lambda you wrote. Putting this code in the lpfactor package ensures you meet Maven’s requirements. Add one additional import:

        import com.amazonaws.services.lambda.runtime.events.SQSEvent;

Declare the class as

        public class ComputeLPF implements RequestHandler<SQSEvent, Void> {

with the method

        @Override
        public Void handleRequest(SQSEvent sqsEvent, Context context) {

Note this returns a Void value. This signals a reference to something that the code does not need to process. handleRequest will end with return null.

Write the body of the lambda using the following steps:

  1. Create a try block which catches Exception. Catching Exception is a bad practice for production code – you should always catch more specific exceptions since different ones should be handled differently – but catching Exception in this assignment will simplify debugging. If you do catch the exception, have the code log a message documenting the error:

        logger.error("Error while processing SQS entries: ", e);
    
  2. Use sqsEvent.getRecords to retrieve the messages and place them in an object of type List<SQSEvent.SQSMessage>. (Note: the directions linked above suggest using APIGatewayProxyRequestEvent input to capture queues. You are using the queue’s name instead.)

  3. Write code to check if the queue is empty and log a message if it is. Otherwise, process all of the messages that are currently in the queue. For each, log the message body create an LPFRequest object:

        for (SQSEvent.SQSMessage m : messages) {
            logger.info("Message: " + m.getBody());
            LPFRequest lpfRequest = JSON.std.beanFrom(LPFRequest.class, m.getBody().toString());
    
  4. Create the Lambda in AWS. When you set the Execution role, also set the Timeout (both are in the Basic settings box in General configuration) to 2 min. This will be long enough for any examples we use.

  5. It is critical that you limit the concurrency for this lambda. While on the Configuration tab, click on Concurrency and recursion detection, click the Edit button in the Concurrency box, select Reserve concurrency, and set it to 2.

  6. Configure AWS to invoke your lambda through the queue. In AWS Console, find the Lambda Triggers tab and click through the options to select the LPF Computing lambda you have just created. You can also go to the lambda, find the Triggers control, and select your queue as the trigger for the lambda; both have the same effect. If you get an error about the visibility timeout being too short for this lambda, go back to the instructions for setting up the SQS and fix the timeout.

  7. Next, use PostMan to confirm that if you send a number to /request_lpfactor, that number is displayed in the log.

  8. Write a method which creates a HashMap from String to AttributeValue and create two entries in the hashmap, one for "Number" and the other for "LPFactor". Note AttributeValue is in package com.amazonaws.services.dynamodbv2.model. Set the value for "Number" to the value in lpfRequest. For now, set the result to 0. Of course, this is incorrect, but you will implement the full algorithm later. We get the end-to-end processing of data working first, then worry about correctness. The method header needs to be

        private void createLPFRecord(LPFRequest lpfRequest, BigInteger lpf) {
    

    For example, if the map is named m (a terrible name!), the number is 12, and we are going to store 0 as its lpf, we’d write

        m.put("Number", new AttributeValue("12"));
        m.put("Value", new AttributeValue("0"));
    

    This means you will have two columns in your database, one for Number and the other for Value. You will write code to access the numbers and values later.

  9. Once you have a hash value, use the following code to write it to the database. The putItem call in this code does the actual write:

        final AmazonDynamoDB ddb = AmazonDynamoDBClientBuilder.defaultClient();
    
        try {
            ddb.putItem(TABLE_NAME, [your hashmap]);
        } catch (ResourceNotFoundException e) {
            logger.error("Error: The table \"%s\" cannot be found.\n", TABLE_NAME);
            logger.error("Be sure that it exists and that you have typed its name correctly!");
        } catch (AmazonServiceException e) {
            logger.error("Error from AWS: " + e.getMessage());
        }
    

IntelliJ will likely prompt you to add imports when you write the above code, but if not then add them manually:

        import com.amazonaws.services.sqs.model.ResourceNotFoundException;
        import com.amazonaws.AmazonServiceException;
        import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
        import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
  1. Use PostMan with the /request_lpfactor endpoint to test that sending a request to the request endpoint results in data being written to the database. Note the result field will be incorrect (zero) for now.

Something to watch for is that the Items summary shown in the web console can easily be out of date. Clicking Get live item count will show you how many items are actually in the table at any moment. You can see the content of the table by visiting it and clicking on the orange Explore table items in the upper right corner. However, the goal is to present the data to client code, so you will write a lambda in the next section to retreive the data and present it.

Note: Once you have introduced the processing lambda, items will no longer stay in the SQS. Use the Monitor tab to observer data being added to and removed from the queue.

F. Dumping the database

Create a third lambda (again, starting with the Unicorn code or other working lambda) to dump the database. Name this com.lpfactor.DatabaseDumpingHandler and set its reserve concurrency to 2.

To access the entire contents of the database, use this method:

        public static void getAllRecords(Logger logger) {
            AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
  
            ScanRequest scanRequest = new ScanRequest()
                .withTableName(ComputeLPF.TABLE_NAME);
  
            ScanResult result = client.scan(scanRequest);
            for (Map<String, AttributeValue> returned_item : result.getItems()){
                try {
                    if (returned_item != null) {
                        Set<String> keys = returned_item.keySet();
                        for (String key : keys) {
                            logger.info(String.format("%s: %s\n", key
                                        returned_item.get(key).toString()));
                    }
                } else {
                    logger.error("No items found in the table "+ComputeLPF.TABLE_NAME+"\n");
                } catch (AmazonServiceException e) {
                    logger.error("AWS Service Exception from Database Dump: "
                                + e.getErrorMessage());
                }
            }
        }

This code is described the cloud computing notes (on slide 38, as of the time this was written). Adjust this code to print something readable (say, multiple lines with the requested number first on the line and its LPF second) and return it as text on a new route, /lpfactors. This list will be a simple string with multiple lines in it rather than JSON.

The above code requires these imports:

        import com.amazonaws.AmazonServiceException;
        import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
        import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
        import com.amazonaws.services.dynamodbv2.model.AttributeValue;
        import com.amazonaws.services.dynamodbv2.model.ScanRequest;
        import com.amazonaws.services.dynamodbv2.model.ScanResult;

Once your /lpfactors path successfully dumps the contents of the database, test this route using PostMan to confirm the requests are being stored in the database along with the incorrect LPF, zero.

G. Implement computing LPFs

Increasing the Lambda Timeout

If you failed to increase timeout when setting up the ComputeLPF lambda above, change it to two minutes now. If you do need to update this timeout, you will also need to delete the SQS trigger for this lambda and then recreate it. Do not forget to Save.

You may notice early runs are slower than succeeding runs. This is because of the “cold start” issue discussed in class: it takes time for AWS to start a lambda if it is not already running.

Implementation

Use the following as pseudocode for computing LPFs:

        def largest_prime_factor(number)
          i = 2
          while number > 1
            if number % i == 0
              number /= i
            elsif i > Math.sqrt(number)
              i = number
            else
              i += 1
            end
          end
          return i
        end

This is actually Ruby code from Stack Overflow, but you should not need Ruby skills to read it as pseudocode. See the SO site if you are interested in a discussion of computing largest prime factors, but stay away from attempting other optimizations on that page. Note this computation is to be done using BigInteger. Computing with BigInteger is much slower than binary, which is why using the Competing Consumers Pattern provides value for this problem. See the Java documentation for BigInteger for the operations.

Testing code through lambdas is a very slow process. Write a standard main in com.lpfactor.ComputeLPF so you can check your code works locally. Remember that you will return the original number as the result if that number is prime.

H. Trying your solution on some large integers

Once all three lambdas are running and you have set the concurrency limit to 2, you are ready to test your solution. Use Postman to send the following values to '/request_lpfactor':

Then, use Postman to “ping” (that is, send a message) to your /lpfactors route every few seconds. You will see the table get filled in over time; this is your application working! If you forgot to set your timeout to a minute or so earlier, you will want to revisit that step.

You might review the CloudWatch logs for the processing lambda. Note that it computes the values in bunches depending on how quickly you submitted the requests through PostMan.

In case you’re curious, most of these numbers were picked (essentially at random) from Prime Curios!.

Submission

Be prepared to demo your lab when requested. Commit and push your code, and leave your services in place (on AWS) so your instructor can run additional tests later. See Canvas for any additional submission requirements.

Problems and fixes

A common reason for this is not structuring your project appropriately. The source code should be in the folder src\main\java\com\lpfactor (though you may have a different name than lpfactor). Restructure your project to match this folder structure. An important feature of Maven is that projects must follow a set structure.

Data not appearing in SQS

This section discusses AWS settings to check if the program builds (look at the history to make sure there aren’t errors!) but data is not appearing in the queue.

  1. Check that your LPFRequest class has the right structure, including a setNumber method that takes a String, a getNumber method that returns a BigInteger, and a no-argument constructor that sets it to some default value like 1.

  2. Check that you created a (Standard) queue with no delivery delay with a visibilitiy timeout of 3 minutes.

  3. Check that when you created you request handler lambda function, you used the runtime that matches the SDK in the project settings. Most students use Java 17, but some use Java 21. The architecture should be x86_64, the default.

  4. Open the Change default execution role tab and confirm you are using an existing role, the role LabRole.

  5. In the Code tab, you will likely see a message about the code editor not supporting the chosen runtime. That’s simply because you cannot edit a .jar file; ignore this warning. Make sure the most recent .jar is uploaded, it should be the one that is listed as with-dependencies.

  6. Check the Runtime settings box, a bit below the Code source box, and make sure the Handler is set to something like the following (remembering lpfactor starts with the letter ell and that the request handler starts with LPF):

        com.lpfactor.LPFRequestHandler::handleRequest
    
  7. Review the Reserve concurrency setting and ensure it is 2.

  8. Check that you have an API Gateway set up with the API name request_lpfactor, that it has a Lambda integration, and that the Lambda function is set to the lambda you created earlier. Version should be set to 2.0. In configure routes, the resource path should be /request_lpfactor and the integration target should be the name of the lambda you created that puts data in the queue. The Stage name should be $default.

  9. Check that under the Deploy option for the API Gateway, you have clicked on Stages and set it to $default.

  10. Check that you have a Function URL set up. Auth type should be AWS_IAM (the default).

  11. Check that when you test with Postman, your request is going to the correct endpoint and that the data is something like the following (with no quotes around the digits):

        {
            "number": 51
        }
    
  12. Check the queue by going to the Monitoring tab and scrolling to the bottom to find Aproximate Number Of Messages Visible. To see the values on the queue, click on the Send and receive messages button in the upper right, then click on the Poll for messages link.

Acknowledgements

This lab draws from many online sources including Stack Overflow and the AWS tutorial used in last week’s lab. That lab was developed by multiple people including Dr. Yoder, Prof. Lewis, Prof. Porcaro, D. Cofta, and Dr. Hasker.