New blog
I now have a new blog. It is located at https://jimmyislive.dev
All future posts will be made there.
Thanks.
Sade Olutola

Product Placement

Kiana Khansmith

Kaledo Art
Claire Keane

❣ Chile in a Photography ❣
DEAR READER

Andulka
Cosimo Galluzzi

Discoholic 🪩

JBB: An Artblog!
cherry valley forever
ojovivo
I'd rather be in outer space 🛸
we're not kids anymore.
AnasAbdin
Cosmic Funnies
Lint Roller? I Barely Know Her
KIROKAZE

seen from Malaysia
seen from United States
seen from Netherlands
seen from Poland

seen from Romania

seen from United States

seen from Malaysia

seen from United States

seen from United States
seen from United States

seen from United States
seen from United States

seen from Malaysia
seen from United Kingdom
seen from United States
seen from United States

seen from Belgium
seen from United States
seen from Netherlands

seen from Malaysia
@jimmyislive
New blog
I now have a new blog. It is located at https://jimmyislive.dev
All future posts will be made there.
Thanks.

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Trap in bash scripts
You may be familiar with the “set -e” you typically see in bash scripts. It means that the script should exit on error (default behaviour of bash is to keep going)
Recently I needed a script to do some cleanup action. i.e. I wanted the script to exit on error, but I also wanted some cleanup action to be performed. A use case might be that if something fails, then before exiting, ship log files to an S3 bucket for troubleshooting.
I used “trap” for this. It essentially traps signals of interest and then performs cleanup actions specified by the user before exiting. Here is an example:
#!/usr/bin/env bash set -e
function finish { aws s3 sync /tmp/logs/ s3://some-bucket-name } trap finish EXIT
script1 arg1 &> /tmp/logs/script1.log
script2 arg2 &> /tmp/logs/script2.log
script3 arg3 &> /tmp/logs/script3.log
By using trap, even if any of the script1/script2/script3 fails, the logs will be shipped to S3 so you can troubleshoot it.
Adding multiple certs to an ALB
If you are using AWS and ALBs, you have the ability to add multiple certs to the ALB and terminate SSL there.
While it is easy to do via the AWS console, their documentation is not that clear as to how to do it in an automated way. The following is the code snippet, written with troposphere, to show you how to do it.
First create a HTTS listener with a certificate:
def create_lb_listener_https(alb, default_target_group, param_cert_one): return Listener('LoadBalancerListenerHTTPS', Port='443', Protocol='HTTPS', LoadBalancerArn=Ref(alb), DefaultActions=[Action(Type='forward', TargetGroupArn=Ref(default_target_group))], # Note, only one cert ARN can be specified here, else you will get an error Certificates=[ Certificate(CertificateArn=Ref(param_cert_one)), ] )
Now we have one listener, with a cert, attached to the ALB:
listener_arn = create_lb_listener_https(alb, default_target_group, param_cert_primary)
We can now add more certs via a ListenerCertificate:
def make_listener_certificate_two(listener_arn, param_cert_two): return ListenerCertificate('ListenerCertificate', Certificates=[ Certificate(CertificateArn=Ref(param_cert_two)), ], ListenerArn=Ref(listener_arn), Condition=condition )
def make_listener_certificate_three(listener_arn, param_cert_three): return ListenerCertificate('ListenerCertificate', Certificates=[ Certificate(CertificateArn=Ref(param_cert_three)), ], ListenerArn=Ref(listener_arn), Condition=condition )
The above is not describer clearly in the AWS docs. Hopefully this saves you some time if you run into this.
Docker image tagging in ECR
AWS provides a docker repository called ECR. This is similar to Docker Hub, Artifactory etc and provides a convenient place to place docker images for later use e..g deployment.
ECR supports Docker Image Manifest V2, Schema 2, providing the ability to add multiple tags per image. I recently ran into an issue wherein I tagged an image with multiple tags at build time and then just pushed one tag, thinking all tags would appear on ECR. That did not work and my image in ECR always had only one tag. To ensure that your image has all the tags on ECR, push each tag individually.
docker tag tag1 imageName
docker tag tag2 imageName
docker tag tag3 imageName
docker push tag1
docker push tag2
docker push tag3
Checklist Manifesto for Microservices
I recently read the Checklist Manifesto by Atul Gawande. It was a great read and I highly recommend it. It essentially lays out a case for using checklists as an aid to improving efficiency and outcomes of processes / procedures. He goes into several case studies in the Health, Construction, Aviation and Finance industries describing how they take advantage of checklists.
That got me wondering about such a checklist for software. Chances are that you are working with Microservices. What would such a checklist look like if you were deploying a freshly minted stateless microservice to production? Here’s my take:
1. Is your service HA (Highly Available) ?
This can take several forms e.g. when deploying, are you deploying it within an auto scaling group of some kind. This should allow you to easily scale up/down when needed. In order to use these machines, you would probably need a load balancer of some kind distributing load across this group based on some policy e.g. round robin, CPU utilization, request velocity etc
2. Does it have a Health Check defined ?
This is important for the load balancer to know if your service is health or not i.e. it gives the infrastructure an opportunity to self heal if needed. Health checks can be simple HTTP ping/pong type checks or use other protocols such as TCP too. You can also come up with more complex health checks that checks other dependent services before return an OK/NOK.
3. Does it have an internal dashboard ?
Have an internal dashboard for a service is very useful during troubleshooting. This dashboard can expose any kind of information deemed important to the service e.g. if the service is reading off kafka, it can publish the current lag. If it is reading/writing to a cache such as redis, it can offer a form to query that cache.
4. Does it have operational metrics ?
This goes without saying, but you’ll be surprised how many people choose to fly blind. This includes application metrics such as request rate, throughput, latency etc. It also includes host metrics such as CPU utilization, memory consumption etc (look at the stats generated by collectd for examples). Getting metrics is only half the battle. You should alert on them as well. Measure everything, alert on some.
5. Are logs collected, indexed, available for analysis / troubleshooting ?
Collect all your logs and index them. Choose any tool that you are comfortable with e.g. Splunk, ElasticSearch, SumoLogic etc. These prove invaluable during troubleshooting.
6. Is it versioned ?
It’s almost always a good idea to version your services. This helps in backwards compatibility in working with older clients. In case your service has other dependencies e.g. requirements.txt (Python) or third party libs in vendor/ (GoLang), it’s a good idea to pin those dependencies to a specific version too.
7. Do you have Continuous Integration (CI) set up ?
Writing unit tests and have them continually run against your branch before merging helps in retaining everyones sanity. It also makes you more confidant in that, the big refactor you are going to push out will not break the world !
8. Is the deployment automated ?
Everything about the deployment should automated i.e. reproducible at the push of a button. One time scripts and other shortcuts will just cause you grief later down the road.
The above are the most important things I could come up with. And I hope as an industry we start using checklists much more !

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Using NewRelic with Go
NewRelic (NR) is a tool that you can use for monitoring your applications. Once your code is instrumented with NewRelic, you can see your transactions on their dashboard and gain valuable insights like latency, throughput etc for your APIs and also things like DB calls, calls to external subsystems etc.
If your system is written using Go, you would need to use their Go Agent. Here are some ways in which you can use it effectively.
Using NR to instrument calls to your API
Before you can leverage NR, you will have to create a NR Transaction (txn). The best place to create this txn will be in your middleware i.e. when the request is coming in. So on startup of your server, first create a NR app:
func createNRApp(licenseKey string) (newrelic.Application, error) {
cfg := newrelic.NewConfig(”YourNRAppName”, licenseKey)
app, err := newrelic.NewApplication(cfg)
if err != nil {
return nil, err
}
return app, nil
}
Now that we have a NR app, we can send this app into our middleware to crate a txn for every request that comes through:
type NewRelicContextKey string
var NRKey = NewRelicContextKey("NewRelicTxn")
func newRelicMiddleware(app newrelic.Application) negroni.Handler {
return negroni.HandlerFunc(func(w http.ResponseWriter, req *http.Request, next http.HandlerFunc) {
txn := app.StartTransaction(”txnName”, w, req) defer txn.End()
ctx := context.WithValue(req.Context(), NRKey, txn) req = req.WithContext(ctx)
next(txn, req)
}
}
Now every request context will have NR txn embedded within it. Strictly speaking we did not have to embed it into the context, if all we wanted to do was instrument calls to our APIs. However, by placing the txn into the context and propagating the request context down your call chain, you can use NR to instrument other aspects of your code, as you will see in the next section.
Using NR to instrument calls to your DB
NR has support to instrument calls to your DB, which they call Datastore Segments. Let’s assume you have propagated the request context down to the models which access your DB. If you have followed along thus far, you now also have a NR txn embedded in your request context. We will now write a helper function to start a NR Datastore Segment:
func NRDS(txn newrelic.Transaction, tableName, operation string) newrelic.DatastoreSegment { s := newrelic.DatastoreSegment{ Product: newrelic.DatastoreMySQL, Collection: tableName, Operation: operation, } s.StartTime = newrelic.StartSegmentNow(txn) return s }
With the above helper function at hand, adding instrumentation in your models becomes trivial:
func myModelFunc(ctx context.Context, {remaining args of your model}) {
txn, _ := ctx.Value(NRKey).(newrelic.Transaction)
s := NRDS(txn, "MyTableName", "SELECT") defer s.End()
// your model code...
}
You will see your DB calls times in the “Databases” tab of the NR panel.
Using NR to instrument external calls
NR can also be used for instrumenting calls to external sub-sytems. Let’s assume you have propagated the request context to the function making the external calls. We will first write a helper function to make our lives easier:
func NRES(txn newrelic.Transaction, url string) newrelic.ExternalSegment { return newrelic.ExternalSegment{ StartTime: newrelic.StartSegmentNow(txn), URL: url, } }
With the above in place, instrumenting external calls becomes trivial:
func myExternalFunc(ctx context.Context, url string) {
txn, _ := ctx.Value(NRKey).(newrelic.Transaction)
seg := NRES(txn, url) defer seg.End()
// your external call code here
}
Using NR to instrument background workers
Many times we don’t have a request/response type scenario. e.g. maybe we have a backend worker that is just polling or performs a workflow only on an event. In those cases we would need to create a new context and a new txn. Embed that txn into the context and propagate this context down the worker call chain. Here is a helper function that lets you do this:
func NRMakeCtxAndTxn(app *newrelic.Application, txnName string) (newrelic.Transaction, context.Context) { if app == nil { return nil, context.Background() } txn := (*app).StartTransaction(txnName, nil, nil) ctx := context.WithValue(context.Background(), NRKey, txn) return txn, ctx }
func NREndTxn(txn newrelic.Transaction) { if txn != nil { txn.End() } }
Then in your backend workers, before kicking off a task do:
func myWorkerTask(nrApp *newrelic.Application) {
txn, _ := NRMakeCtxAndTxn(nrApp, "taskName") defer NREndTxn(txn)
// task code goes here...
}
Hopefully these code snippets help you.
Working with (possibly) Null values from the DB in GoLang
Many times when we interact with a DB schema the default type could be NULL. In those cases if we are reading in those values via GoLang, what should the datatype of the variable be? It cannot be string, as string does not support NULL values.
A good way to deal with them is to use sql.NullString. It is essentially a struct that comprises of two fields: A String and a Valid field. So if Valid is true it means that String is not NULL.
The only problem with the above data type is that if you are rendering it into json, it seems kinda clunky, as the json would have to show the values of both Valid and String.
One package I found recently that helps is the null package . Using this, you can declare your variables of type null.String and it should take care of all cases. e.g. if we had a viable called Description:
var Description null.String
In case you want to initialize it with a string you can do:
Description = null.NewString("Hello World", true)
You could also populate this variable from the DB and even if Description is a NULL value it will be fine. To extract the string value stored in a null.String:
fmt.Println(Description.String)
CA 2013 STAR Test Result Graphs
I recently completed a Data Science course at coursera. It was a great course and I enjoyed it. Pandas is really a powerful tool. To put it to some use, I decided to play around with some public data sets and see If I could arrive at some conclusions. I downloaded the California state results for the STAR (California Standardized Testing and Reporting) program for the year 2013 from the California Department of Education. I sliced and diced the data and come up with these graphs.
All the code for these graphs are available as a Jupyter notebook on my Github.
The number of schools by county in CA:
Percentage of students deemed proficient in Math in grades 9 / 10 / 11/ 12 based on the CST (California Standards Tests)
Top 5 elementary schools with the best median scores in the CST:
County Name District Name School Name 0 San Francisco San Francisco UnifiedChin (John Yehall) Elementary 1 OrangeOrange UnifiedVilla Park Elementary 2 AlamedaFremont UnifiedMission San Jose Elementary 3 Santa Clara Palo Alto Unified Hoover (Herbert) Elem 4 Alameda Fremont Unified Chadbourne (Joshua) Elementary
Top 5 middle schools with the best median scores in CST (grades 6 through 8)
County Name District Name School Name 0 Alameda Fremont Unified Hopkins (William) Junior High 1 Los Angeles ABC Unified Whitney (Gretchen) High 2 Alameda Fremont Unified Weibel (Fred E.) Elementary 3 Alameda Fremont Unified Mission San Jose Elementary 4 Orange Orange Unified Villa Park Elementary
Top 5 high schools with the best median scores in CST (grades 9 through 12)
County Name District Name School Name 0 Los Angeles ABC Unified Whitney (Gretchen) High 1 Orange Anaheim Union HighOxford Academy 2 Alameda Fremont Unified Mission San Jose High 3 Santa Clara Fremont Union High Monta Vista High 4 Santa Clara Fremont Union High Lynbrook High
I hope to add more of these as I play around with these data sets more. Have fun and hopefully this will encourage you to play around with data science as well !
Terraform script to automate creation of AWS VPC setup
A common way to protect your infrastructure is via a layered security approach. AWS makes it easy by providing you several tools like VPC, security groups etc that allow you to incorporate a layered security structure throughout your infrastructure. Here is a very common setup: A VPC with three subnets. 1 subnet is public which houses a public EC2 instance. The other two subnets are private. One of the private subnets could house databases such as RDS / ElastiCache. The other subnet could probably house other things such as lambda functions.
Here is a diagrammatic representation of something similar:
Reference
NOTE: In the above diagram it shows only one private subnet, my example will show two private subnets.
Ideally, you don’t want to be building the above via the UI or manually. You want to automate it’s creation. (Infrastructure as code !). One great utility to do this is Terraform
Here is a terraform script that automates the creation of the entire setup. You can get the entire source code on Github. Please refer the readme on the repo for more details. Here i’m just going to describe the top level main.tf
This is the top level script that kicks off the terraform run.
It starts out by specifying the provider as AWS
It then declares a bunch of variables (whose values you can customize by populating the terraform.tfvars)
Each component of the VPC setup is implemented as modules which is imported by main.tf
The script outputs various values for ease of use e.g. elastic ip to connect to EC2 etc
Please refer the README.md file in the repo for further instructions as to how to build, use and tear down the infrastructure.
CSRF in Single Page Apps (SPA)
Single Page Apps are becoming popular these days. Typically this involves downloading all the js on the client side which then interacts with an API server to get and render all the data.
CSRF concerns exist here as well. In a typical application you could use the Token Synchronization pattern to mitigate CSRF i.e. when the server renders the form it inserts a nonce which is then checked on form submit at the server end. But since in a SPA, the rendering is done on client side, there is no way to get the token securely from the server.
One easy way in such situations is to check for the Origin header and ensure that it is coming from the domain we host. The Origin header is inserted by the browser and hence cannot be spoofed.
Here is a decorator that you can use to ensure that the origin is what you want it to be and thus prevent CSRF type vulnerabilities in your SPA.
https://gist.github.com/jimmyislive/a00869b596b19482811a1f78568104f9
NOTE: Sometimes, Firefox does not send in the Origin header, so u'll probably have to deal with that accordingly.

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Revenue Funded Startups
With all the fuss about valuations, unicorns and how much each one has raised, I thought it would be a refreshing change to see startups which are bootstrapped and revenue funded. I’ve started a list here. Send me a pull request and i’ll add your startup in.
Using the AWS KMS infrastructure for encrypting on-disk data
Sometimes we come across situations where data in the db needs to be encrypted on-disk. e.g. identifying information like email ids etc (Arguably you now have a different problem of securing the encryption key, but thats another discussion. Sometimes you cannot get away with it e.g. compliance or regulatory requirements)
Note that we are not talking about hashing. Passwords are (and always should be) hashed and stored in the db using something like bcrypt. We are talking about encryption so that if anyone runs away with your db, it is useless unless they have the encryption key.
AWS KMS provides an infrastructure which can be used to store keys securely. AWS will create a root key for you which never leaves the AWS infrastructure. You can them create a bunch of derived keys which you use. When you ask AWS for the derived key, it returns you Plaintext and Ciphertext of your key. Use the Plaintext key to encrypt your payload. Then store the encrypted payload and the encrypted key into your local db and delete the plaintext key from memory. For decryption, send aws the encrypted key and get back the Plaintext key. Decrypt your payload and then delete the Plaintext key from memory.
Here is some code that can help you do this:
https://gist.github.com/jimmyislive/c5d12acc128dab0df8b9
Sending a git repo to remote user
I recently had to send a git repo to a remote user. For some reason we could not pull from the same remote. So I had to send him my repo. Rather than a tar ball, git has a very useful utility called git-bundle that will let you send the whole git repo (say via email or something) and the other person can then just extract from it and get the full history etc... Here’s what I did, In the git root:
git bundle create myrepo.bundle master
git tag -f 07282015 master
The above creates a file called myrepo.bundle. I then took a md5 hash of it and emailed it to him (the body of the email had the md5 for verification)
md5 myrepo.bundle MD5 (myrepo.bundle) = <some hash>
On the remote end he would extract it like:
git clone -b master /tmp/myrepo.bundle
Partitioning tables in Postgres
Sometimes, we use a table wherein we know that as time goes by, the relevance of the data diminishes. e.g. user alerts. If they are stored in a postgres table, then we probably want to retain only say the last 90 days worth and then discard the remaining. One option would be to have a cron job that periodically vacuums unwanted rows and deletes/archives it. A much better option, from a design perspective, would be to partition that table into many child tables. e.g. we could partition child tables by month and when we don't want some data belonging to a month, we can just drop that table.
Postgres supports this via something called inheritance. There is a master table and several child tables. When inserts are done into the master, they actually go into the child table.
CREATE OR REPLACE FUNCTION insert_trigger() RETURNS TRIGGER AS $insert_trigger$ DECLARE year text; month text; table_name text; BEGIN
-- ensure the table and indexes exist year = date_part('year', NEW.TS); month = date_part('month', NEW.TS); table_name := 'alerts_y' || year || 'm' || month;
PERFORM 1 FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind = 'r' AND c.relname = table_name AND n.nspname = TG_TABLE_SCHEMA;
IF NOT FOUND THEN EXECUTE 'CREATE TABLE ' || table_name || ' (CHECK ( TS >= TIMESTAMP ''' || year || '-' || month || '-01'' AND TS < TIMESTAMP ''' || year || '-' || month || '-01 '' + interval ''1 month'')) INHERITS (alerts)'; EXECUTE 'CREATE INDEX alerts_idx_y' || year || 'm' || month || '_1 ON ' || table_name || ' (ts, urn_id)'; EXECUTE 'CREATE INDEX alerts_idx_y' || year || 'm' || month || '_2 ON ' || table_name || ' (activity_id, urn_id)'; END IF;
EXECUTE 'INSERT INTO alerts_y' || year || 'm' || month || ' VALUES ($1.*)' USING NEW;
RETURN NULL; END; $insert_trigger$ LANGUAGE plpgsql;
CREATE TRIGGER insert_trigger BEFORE INSERT OR UPDATE ON alerts FOR EACH ROW EXECUTE PROCEDURE insert_trigger();
In the above snippet, we created child partitions of the form alerts_yYYYYmMM. We have a trigger that is executed every time we insert/update into the master. This trigger will insert into the appropriate child table.
You can continue to query the master table the same as you were doing before. it will automatically query the appropriate child table. e.g If you run something like:
explain select count(*) from alerts where (EXTRACT(MONTH FROM TS) = 7 AND EXTRACT(YEAR FROM TS) = 2015);
You will notice that only the correct child table (and not all child tables) are queried.
NOTE: In case you are doing an INSERT statement and using the RETURNING clause, it WILL NOT work. That uis caus RETURNING clause does not work with partitioned tabled. For details refer this thread.
There is a workaround/hack though. Modify the first trigger to 'RETURN NEW'. This will insert a row into the master table. Then create a new trigger post insert that goes and DELETEs the master row. It would look something like:
CREATE OR REPLACE FUNCTION delete_after_insert_trigger() RETURNS TRIGGER AS $delete_after_insert_trigger$ DECLARE purged alerts%rowtype; BEGIN DELETE FROM ONLY alerts WHERE id = NEW.id RETURNING * INTO purged;
RETURN purged; END; $delete_after_insert_trigger$ LANGUAGE plpgsql;
CREATE TRIGGER delete_after_insert_trigger AFTER INSERT OR UPDATE ON alerts FOR EACH ROW EXECUTE PROCEDURE delete_after_insert_trigger();
It means that you are doing two write and a delete for every write, but thats the only way right now, until postgres supports it intrinsically.
Exporting data from AWS Redis Elasticache
Recently, I had to clone/backup a running redis elasticache instance on AWS. It turned out to be harder than expected. This post describes how I finally got my data out.
I have AOF turned on and fsync every second. This ensures that I will loose max 1s worth of data in case of a catastrophe.
Elasticache provides a snapshotting feature. However, you cannot download that snapshot. So if you want to clone onto, say localhost, it's pretty much useless.
Then I tried SAVE/BGSAVE. The commands seem to have been disabled. Actually, even if they worked it would be useless caus you cannot log onto the instance to scp the *.rdb file generated.
redis-cli has an interesting option called --rdb, that dumps the contents of redis onto a localfile. So I tried it.
The above worked, however I noticed that it was not an accurate reflection of my data. I could very easily confirm that some keys in prod were mising from the *.rdb file generated. Maybe caus I had AOF/fsync turned on, and hence all writes get recorded to an aof file, the rdb file generated was not entirely accurate? I'm not sure of how the --rdb flag works internally, but I was back at square one.
The suggestion from AWS was to spin up a new instance as a slave and dump the data on it. More here.
So the only reasonable way I found was to iterate through the entire keyspace, dump it out to file and then relay it on localhost. Dumb, but if you know of a better option, let me know!
When dumping out content, the best way is to dump the keys/values out in redis protocol. This will make doing a mass import into the localhost redis really fast (in case you have millions of keys in your keyspace). Here is the script to export the data out of redis into a flat file in the redis protocol:
https://gist.github.com/jimmyislive/0efd7a6a1c7f7afd73e8
NOTE: The script uses the KEYS command, which is a O(N) operation and hence blocking. (Thanks to @itamarhaber for pointing it out) The docs mention to be careful using the KEYS command in production owing to it's blocking nature. Now you know the pros/cons of using this script !
Once we have the above file, ship it to localhost and then use the shell script to mass import this into redis.
Seems more work than I needed to do just to get a simple backup. Maybe AWS should provide an option of placing the auto snapshots it generates onto an S3 folder or something.

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Honeypot
Most websites have forms for users to submit data. Often times, you will find that data being submitted is junk i.e. via bots. Depending on the popularity of your site, this can be a major / minor / non-existent problem.
However, there are several simple techniques to mitigate this. Some are intrusive e.g. CAPTCHAs (but very effective). Other are very simple to implement (and catch the vast majority of bots). Building a totally foolproof way to deter bots can get very complicated.
A simple technique is using 'honeypots'. The main idea is to have a hidden text field (via css) in your form that users cannot see and hence will never populate. But bots will see it and will go ahead and populate it. On the server side, check if the field is populated. If it is, reject the request via a 403 error code.
Here is the html and server code you can use as a sample:
https://gist.github.com/jimmyislive/5b0f4a239443a3cb95e4
To see a sample of this implementation, see here
AWS CloudSearch Gotcha
AWS CloudSearch allows you to search indexed data. It's pretty simpel to use. First use the console to create the index fields, their types etc. Then upload your data (in XML / JSON / CSV etc). Head over to their documentation for more details.
The purpose of this post is to point out one gotcha that is not mentioned in the docs. When you create the index fields say field1, field2, field3, the uploaded document will typically contain data corresponding to those indexed fields. They need not contain all the fields mentioned in the index. i.e. the JSON doc may contain only field1, field2 for one record, field1, field2, field3 for the next and so on.
However, the JSON doc *cannot* contain any field that is not in the list of indexes you created. If you do, you will get a very cryptic error message. i.e. your JSON record cannot contain a field4 as it is not part of the indexes you created on the AWS console.
My uploaded document had one extra field which was not there in my indexes and it kept giving me some wierd error message. When I got rid of it, it worked !
Hopefully this helps someone else in the same situation.