Monday, September 30, 2013

How to Install ElasticSearch on EC2

Search is not easy. There are a lot of things you need to consider.

In the software level,

Can a search query have spelling mistakes?
Should stop words (Ex. a, the) be filtered?
What about a phrase search given non-exact phrase?

In the operation level,

Should the search be decoupled from the app machines?
Should the search be distributed? If so, how many shards, replicas should be there?

Doing a quick search would tell you that Apache Lucene is the industry standard. There are two popular abstractions on top of Lucene: Solr and ElasticSearch (ES).

There are a lot of debates on which one should be used. I choose ES because
  • it's distributed by design
  • easier to integrate for AWS EC2

The following post will talk about how you can install ElasticSearch in your linux machine (I like to use the ubuntu 12.04 build from EC2).

Download elasticsearch from elasticsearch.org. Extract the files and put it into a folder of your choice (Ex. /opt/tools).
cd /opt/tools
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.zip
unzip elasticsearch-0.90.5.zip
You can start elasticsearch by:
bin/elasticsearch -f
You may want to tweak the Xmx (max memory size the heap can reach for the JVM) and Xms (the inistal heap memory size for the JVM) values.
bin/elasticsearch -f -Xmx2g -Xms2g -Des.index.storage.type=memory -Des.max-open-files=true
You can also run it as a service using the script located in bin/service.

After you started your service, visit "http://localhost:9200" in the browser. You should see the following:

{
  "ok" : true,
  "status" : 200,
  "name" : "Solitaire",
  "version" : {
    "number" : "0.90.5",
    "build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee",
    "build_timestamp" : "2013-09-17T12:50:20Z",
    "build_snapshot" : false,
    "lucene_version" : "4.4"
  },
  "tagline" : "You Know, for Search"
}

No comments:

Post a Comment