Starry Wisdom

Entropic Words from Neilathotep

Tuesday, October 24, 2017

Putting It Together Part 1: Deploying AWS Chalice apps with Terraform.

Chalice

Chalice is the “Python Serverless Microframework for AWS”. It allows quick and simple development of REST APIs, and comes with a a deploy tool that does all the work necessary to deploy your lambda, as well as create policy and integrate with the API Gateway. Let’s start out by creating an example app:

$ chalice new-project example-app

And then we can quickly create a simple app that reads in a couple of parameters and creates a JSON response. Note the use of the decorator to declare the route and method:

from chalice import Chalice
app = Chalice(app_name='example-app')
@app.route('/customer/{customer_id}/order/{order_id}', methods=['PUT'])
def register_order(customer_id, order_id):
    # imagine inserting this into Dynamo, etc...
    return {'customer':customer_id,
            "order_id": order_id }
Deploying this as simple as:

$ chalice deploy
Creating role: example-app-dev
Creating deployment package.
Creating lambda function: example-app-dev
Initiating first time deployment.
Deploying to API Gateway stage: api
https://id.execute-api.us-west-2.amazonaws.com/api/

At which point you can access it like this:

$ curl -X PUT https://id.execute-api.us-west-2.amazonaws.com/api/customer/customer01/order/123183123
{"customer": "customer01", "order_id": "123183123"}

See the tutorial, which is quite good, for more information on what you can do inside the apps (such as tying in other AWS services).

Enter Terraform

But what if you want to use Terraform to deploy your infrastructure? The first step is to create a deployment package:

$ chalice package .
Creating deployment package.
$ ls -l deployment.zip
-rw-r--r-- 1 nchazin staff 9022 Oct 18 22:24 deployment.zip

And now we’re ready to code up our terraform. We’ll begin by defining a few variables, which we can store values for in a terraform.tfvars file.

variable "environment" {
  description = "AWS account environment environment for the lambda and api gateway)"
}
variable "region" {
  description = "AWS region"
  default     = "us-west-2"
}
variable "account_id" {
   description = "AWS account id of the environment"
}

Next we’ll define the lambda itself, along with its associated role, and a policy which allows us to log and monitor with Cloudwatch:

 

provider "aws" {
  profile  = "${var.environment}"
  region   = "${var.region}"
}
resource "aws_iam_role" "lambda_example_app_role" {
  name = "lambda_example_app_role"
  assume_role_policy = <<EOF
{
  "Version""2012-10-17",
  "Statement": [
    {
      "Action""sts:AssumeRole",
      "Principal": {
        "Service""lambda.amazonaws.com"
      },
      "Effect""Allow",
      "Sid"""
    }
  ]
}
EOF
}
# Logging and metric policy
resource "aws_iam_role_policy" "lambda_example_app_role_policy" {
    name = "lambda_example_app_role_policy"
    role = "${aws_iam_role.lambda_example_app_role.id}"
    policy = <<EOF
{    
  "Version""2012-10-17",
  "Statement": [
    {
      "Effect""Allow",
      "Action": [
        "cloudwatch:PutMetricData",
      ],
      "Resource""*"
    },
    {
      "Effect""Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource""arn:aws:logs:*:*:*"
    }
  ]
}
EOF
}
resource "aws_lambda_function" "example_app" {
    function_name = "example_app"
    # This is the archive we created with chalice package
    filename = "deployment.zip"
    description = "An example app"
    role = "${aws_iam_role.lambda_example_app_role.arn}"
    handler = "app.app"
    timeout = 300
    runtime = "python3.6"
}

 

With our lambda set up, we can create out API Gateway:

 

# this declares the api gateway
resource "aws_api_gateway_rest_api" "example_api" {
    name = "CustomerOrderAPI"
    description = "API Gateway to register customer orders"
}
 
/*
 these four blocks declare the path for our api
-------------------------------------------------------------------------
*/
resource "aws_api_gateway_resource" "customer" {
    rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
    parent_id = "${aws_api_gateway_rest_api.example_api.root_resource_id}"
    path_part = "customer"
}
 
resource "aws_api_gateway_resource" "customer_id" {
    rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
    parent_id = "${aws_api_gateway_resource.customer.id}"
    path_part = "{customer_id}"
}
 
resource "aws_api_gateway_resource" "order" {
    rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
    parent_id = "${aws_api_gateway_resource.customer_id.id}"
    path_part = "order"
}
 
resource "aws_api_gateway_resource" "order_id" {
    rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
    parent_id = "${aws_api_gateway_resource.order.id}"
    path_part = "{order_id}"
}
 
/*
-------------------------------------------------------------------------
*/
 
 
# Declare a PUT method on that our full path
resource "aws_api_gateway_method" "example_method" {
    rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
    resource_id = "${aws_api_gateway_resource.order_id.id}"
    http_method = "PUT"
    authorization = "NONE"
}
 
 
# Tie the API method into our lambda backent
# Note: the integration_http_method for a lambda is POST, regardless of the gateway method
resource "aws_api_gateway_integration" "example_api_integration" {
    rest_api_id             = "${aws_api_gateway_rest_api.example_api.id}"
    resource_id             = "${aws_api_gateway_resource.order_id.id}"
    http_method             = "${aws_api_gateway_method.example_method.http_method}"
    integration_http_method = "POST"
    type                    = "AWS_PROXY"
    uri                     = "${aws_lambda_function.example_app.invoke_arn}"
}
 
 
 
# API gateway uses stages for release control - we'll define dev and prod
resource "aws_api_gateway_deployment" "example_deployment_dev" {
  depends_on = [
    "aws_api_gateway_method.example_method",
    "aws_api_gateway_integration.example_api_integration",
  ]
  rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
  stage_name = "dev"
}
 
resource "aws_api_gateway_deployment" "example_deployment_prod" {
  depends_on = [
    "aws_api_gateway_method.example_method",
    "aws_api_gateway_integration.example_api_integration",
  ]
  rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
  stage_name = "api"
}
 
# these output variables will show the base of the endpoint we'll query
output "dev_url" {
}
 
output "prod_url" {
}

The final piece is the permission that allows our API Gateway to access the lambda. Note that you can be more specific and limit acess to the specific API method and path if desired by adding them to the source_arn.

Now we can install our API:


$ terraform apply
...
Apply complete! Resources: 13 added, 0 changed, 0 destroyed.
The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.
State path:
Outputs:
ev_url = https://id.execute-api.us-west-2.amazonaws.com/dev
prod_url = https://id.execute-api.us-west-2.amazonaws.com/api

And query it at both the stage and prod endpoints:


$ curl -X PUT  https://id.execute-api.us-west-2.amazonaws.com/dev/customer/aaa/order/bbb
{"customer""aaa""order_id""bbb"}
$ curl -X PUT  https://id.execute-api.us-west-2.amazonaws.com/dev/customer/yoyodyne/order/100
{"customer""yoyodyne""order_id""100"}
posted by neil at 10:41 pm
under technology  

Thursday, March 19, 2015

Be careful how you use data structures!

Recently at work I came across a certain long-lived server process that was using an immense amount of memory over time. Every few days it would grow something over 10GB of resident memory, at which time we would restart the process. This was clearly not an ideal solution, especially because this specific server was running on something like eighty servers in forty data centers. So last week, during some relatively down time, I dug into where the memory was going. The first thing I did was spend some time with the heap profiler and heap checker from google’s perftools (https://code.google.com/p/gperftools/?redir=1) (side note – this is still on google code and not github!?). This showed nothing particularly useful. I then resorted to using valgrind on a test box, which, after an excruciatingly slow run, showed no leaks.

Well, I next spent a little time on code inspection, where I found that where there was one place where a leak was possible, it wasn’t likely to contribute to GB of leaked memory over days under the workloads that we have. And then the next thing I did was break out all the inputs to the server – client requests, plus data streams going into it. I went through these one at a time, and finally found that our primary data stream could easily raise the memory usage from a baseline of around 2GB to the 10 GB when I ran it through a test box at an accelerated rate.

Without going into too much detail, this data stream basically gives us a small number of data points for many pieces of data. These data sources usually each contribute their own data to the data point, and this data can change over time. So conceptually there is a two level data structure:

As it was written (disclaimer – I wrote much of the code in this server, but I don’t think I wore this part… although I may have) this was two levels, each implemented by c++11 std::unordered_map – which is essentially a hash table. The lower level map was defined in particular as std::unordered_map<std::string, Pod>. For the purposes of this article, the Pod type was this (and this is actually almost exactly what we were using:

struct Pod {
uint32_t item1{0};
uint32_t item2{0};
time_t timestamp{0};
};

A word about the strings – these were the host ids of the several servers providing the data. So in effect, every “bucket” in the upper level map had a map inside of it with the same several keys in it. Which meant that we were spending a lot of space on the keys. So the quick thing to do was to create an ancillary std::unordered_map<std::string,uint16_t> to map the hostname into integers and therefor the lower level maps become std::unordered_map<uint16_t,Pod>. Testing this showed that memory usage went down by around 3GB, which was a big saving. The math for removing the strings didn’t quite ever add up, but I was fairly happy. 3GB savings from 8GB is about 38% improvement, at the cost of one additional O(1) lookup per insertion. But maybe there was more to do. Why use a map at all when you have the same 5 keys there for *almost* every single lower level map. What if we just used a std::vector<Pod>? For this run the memory used was 5.4 GB below the baseline, or an additional 2.4 GB used. This was a savings of over 5GB, or over 60%!

It wasn’t obvious to me what was going on, so I wrote a little program to test a very simple scenario of allocating 10 million of the lower level “maps” with different strategies – std::unordered_map,std::map and std::vector. Here are the results for memory used compiling with both clang and g++ – all memory is in KB used, as reported by ps -ev on linux.

This gives results similar to my results for our production code at work. And it was at this point that I realized what was going on. When you create a hash table, it is not sized to the actual data in it - there is a lot of extra space used in the table for buckets that are not full - wasted space. This is not a big deal for a fairly dense hash table. However the vector was sized for the exact number of elements inside of it - there is still some overhead for the vector itself, but compared to the empty space in the hash table, it's not much. In hindsight this is fairly obvious, but it was not clear to me or my coworkers when I first started looking into this memory issue.

By the way, my test code is available here.

posted by neil at 9:17 pm
under technology  

Saturday, March 16, 2013

A small technology post

I’ve been using Things to help keep track of tasks at work for the past 9 or so months. In the past couple of months I’ve had the need to try to export things so others can see what I am up to – which is a feature the program lacks. It does have a an AppleScript API which is somewhat documented, but I had never really used applescript before. I spent a few hours across a couple of days, but I was able to make a script to do more or less what I wanted. However, when I tried to turn it into an AppleScript application, I found that there weird characters and strings in my output. A little bit of googling showed me that they were constants, but in neither the Things docs nor in any AppleScript documentation could I find a good way to deal with said constants – besides just comparing to them and converting to my own strings as necessary. Anyway, I put it into github, so maybe this silly program will help someone else out someday. It works more or less, but I still don’t know why you can’t overwrite an existing file :(.

posted by neil at 8:45 pm
under technology  

Thursday, February 3, 2011

Computer Networking

You’d think that someone who works in the computer networking industry would have an easier time with their home network. And maybe I would if it weren’t for pesky game consoles. Just a note, this is rambly and a bit technical, so be warned. Also note that so far I am completely failing at writing about the cool topics I mentioned at the end of January – but I am also still soliciting more topics to write about…

Ok, to be honest, I started typing this out because there was a problem getting Mackenzie’s Wii connected to my wireless network, but then I recalled that you can actually mess with the network settings by non-intuitively clicking around – and of course that the Wii only supports AES for WPA2, which is OK, i guess. Anyway, I got it worked out but that’s the least of my worries.

Because my more modern Xbox360 also has problems with the wireless – in that it often refuses to believe the network exists, even though other devices have no problems. I can watch Netflix perfectly on my laptop, 10 feet further away from the WAP than the Xbox is. And the Xbox, when it can watch Netflix often scales the movies down to the lowest quality. If I plug it in via a long ethernet cable, HD all the way.

So, I have a tentative plan to buy another WRT54GL, put dd-wrt on it (which is what I am running on the current router), and set it up as a wireless bridge/repeater to get more coverage in the flat – which isn’t all that big. I’m hoping that maybe it can keep a more solid connection to the main router than the silly Xbox. Plus the wireless signal in the bedroom (which is the farthest room from the office) is pretty weak anyway – so maybe having another signal on this side of the flat will help? Or at least I can run a shorter cable to the Xbox maybe. I don’t know, but a router is only $50, and if it’s useless, I can ebay it with the bonus of the aftermarket OS already installed!

posted by neil at 11:23 pm
under rambling,technology  

Friday, August 6, 2010

…and we’re back

While in the process of moving, and while I had no reliable internet at home, I used my shell at dreamhost, which you know, I pay for, to use IRC. Evidently they are of the belief that only hackers use IRC, so they disabled my web sites thinking they were hacked, and because I had old installs of wordpress around (not active, not reachable via web…).

Anyway, I tarred up the old stuff to make it even less intrusive, updated my gallery to the current version (which really was a ‘problem’ on my end, I guess), and set it back up. So welcome back to me!

posted by neil at 10:44 am
under daily tribulations,technology  

Saturday, January 9, 2010

I Hate Computers

I’m not sure if I posted about this, in fact I’m quite positive I didn’t, but back in September my Windows computer died. It was nearly five years old so this wasn’t completely unexpected. It wasn’t the end of the world either, as I mostly use my laptop. But still, it served a function as my media server, so it needed to be replaced.

Out of laziness, instead of building my own, I ordered a mid tier Dell system, which came with WIndows Vista, but also a free upgrade to Windows 7, which was due to be released within a couple of months. I was loathe to run Vista, but the free upgrade cheered me up somewhat.

I finally got my upgrade DVD in December, but I dragged my feet in installing it. In hindsight, I really should have done it last week while I was off of work, but alas. Instead I just killed two weeknights dealing with the ‘upgrade’ and its fallout.

First of all, because I’m crazy, I decided I might as well see if the actual upgrade functionality would work, as opposed to doing a fresh install of Windows 7. I didn’t really have anything of value on the OS drive, so I could deal with a clean wipe, but I guess I wanted to see how elegant Microsoft could be. The answer is not at all, and I was forced to do a fresh install after wasting 2 hours doing the upgrade process. Pretty sad, since the install only took about 45 minutes. And now things were OK. Except they weren’t.

I alluded above to the fact that I had more than one drive in the computer. The second drive was the hard drive from my old, dead machine, which was filled with various media files (video and audio). Was is the key term here, since that drive perished in the upgrade. Actually, I’m pretty sure I know what happened to it, and it seems it’s half my fault for not keeping my drive’s firmware up to date (hah, clearly everyone needs to do that!). There is a bug on particular Seagate Barracudas, of which my drive is one, where upon bootup the drive can basically become a spinning brick. All your data is there, but since you cannot talk to the disk, you cannot retrieve it. The only solace to me in this is that I am 99% sure I have all the music on my old Ipod, which I can pull back off it. Oh, and that I can get a warranty replacement.

And now we come to the third woe: my wireless router is a piece of crap. I always knew that the Linksys WRT54G that I had was a less desirable version (v6 if you must know) but because it worked pretty well, I didn’t care. But now my new printer (bought to replace my more than nine year old deskjet that barely worked with Vista, and based on an experience at Erin’s with Windows 7, I figured I was best served with spending $100 to enter the modern age. Oh and I would be getting a scanner and a copier at the same time), which is fully wireless, taught me why that router might not be so great.

You see, I configured the printer to use my wireless network, and it seemed to be happy. But when I tried to find it on my laptop, no luck. And please remember, on this first night, I wasn’t done futzing around with the desktop, so I had no other way to test the printer. It just seemed like it wasn’t working. But on the second night, I found I was able to connect to the printer if I plugged it onto the router via wired ethernet. And on the third night, I learned that my wired windows computer could connect to the printer when it was connected wirelessly to the router. A bit more fooling around with devices and I determined:

Connection Type Wired Device Wireless Device
Wired Device Can Communicate Can Communicate
Wireless Device Can Communicate Can’t Communicate

So everything works except two wireless devices which try to talk to each other. I am mostly certain, but cannot be positive that this used to work. I decided to see what I could do to debug this, but the router’s web interface doesn’t really give you much to see. Then I looked up custom firmware to see if that would help me debug, and it turns out it probably would, but the dd-wrt website told me I should just sell the old router and get a better one. I don’t think I’ll be selling my neutered router, although I might see if someone at work wants it for free, but I did order a WRT54GL. The L stands for linux, and it’s basically Linksys admitting it was evil with the later revisions of the router. The first thing I’ll do is get the custom firmware on the new router, and after I get that working, I’ll unleash the old one on someone at work.

Anyway, there was a lot of rambling here, but I think you can see how this defends my thesis, I HATE COMPUTERS. At least some of the time, when I don’t really like them.

posted by neil at 12:38 pm
under daily tribulations,rambling,technology  

Friday, November 6, 2009

Another Product Review – Colgate MaxFresh Toothbrush

Contrary to what you might think after the last post, this is not a product review blog (well it’s not much of anything, as I am super lazy). However, I wanted to talk about this toothbrush I bought yesterday – the Colgate Max Fresh.

I was spending the night in Sunnyvale, since I had to take my friend Erin to get some surgery done at 6:30 AM this morning, and driving down to Sunnyvale, then up to Palo Alto from San Francisco would have mean leaving home at 5AM. However, I left my tooth brush sitting on my couch at home, so I bought a new one.

I can’t find a good picture of it online and don’t have a good camera ready and available to take a picture, so I’ll just point you to the official site. This is one of the fancy modern toothbrushes, with a rubber contoured handle and bristles that point off at various acute angles. The site lists some fantastic additional features, including a minty fresh handle to invigorate your brushing experience. I am not sure how much this adds to the experience when you already are most likely using mint scented dentifrice, but at least it has some humor value when you hear about it.

The other feature, the tongue freshener, is what I really want to talk about. I guess it’s a not uncommon feature to have these days – a bit of rubber on the back of the head which supposedly cleans off the tongue, but there is more than just a bit on this toothbrush. The rubber ‘stubble’ is also on the sides of the head, so when you brush you are constantly rubbing it against your cheeks. This is not what I call an enjoyable experience, and even a minty fresh handle can’t change it.

To sum it up, as far as fancy new toothbrushes go, the Max Fresh is a dud. I much prefer my current Oral B CrossAction Pro-Health Toothbrush.

posted by neil at 9:04 am
under rambling,technology  

Wednesday, July 1, 2009

Quick Update to the AV Post

Talking to Paul at work today, he surmised that it might be that the encoding was off, and instead of the receiver thinking that the dialog should go to the central channel (in phase on L and R inputs) it should go to the rear channel (out of phase on L and R inputs). I guess that’s a bit simplified for how Prologic works, but the upshot of this talk is that I hooked up my surround speakers to see if the audio was going backwards. Well, no, now it is properly going to the CENTER CHANNEL. Which is great, it works now, but that means that netflix had a bug the other night.

posted by neil at 7:25 pm
under daily tribulations,technology  

Monday, June 29, 2009

A/V equipment sucks

I’m pretty annoyed right now. For reasons that I really don’t understand, This American Life Season 2, as seen by Netflix Watch Instantly on Xbox 360 Live was not working properly. The music was working fine, and some sound effects, but not the dialog.

Now, I think I should explain a bit how things are wired up. The component output goes into the TV, and the stereo audio outputs WERE going into the satellite inputs on my receiver. This has worked perfectly fine since I bought the 360 last fall – but not tonight. I tried two separate episodes of the series, and both “failed”. I tried some other instant watchable Netflix and it worked. I futzed around with the surround settings on the receiver, and occasionally was able to hear Ira Glass, sounding like he was at the bottom of a well. I turned off the 360 in disgust.

On a whim, I tried moving the audio outputs to the VCR inputs on the receiver, and lo-and-behold it just works. I really am not sure why the receiver input matters in this case, but there’s bound to be an explanation. Maybe one of me 5 readers knows. If not it will just be a mystery to me.

posted by neil at 9:55 pm
under daily tribulations,technology  

Thursday, April 23, 2009

New Feed

Transparent, but I thought I would mention that I decide to check out feedburner, so I added a plugin that supposedly redirects my feed through that.

The feed itself is: http://feeds2.feedburner.com/StarryWisdom

posted by neil at 12:43 pm
under meta,technology  
Next Page »

Powered by WordPress