Last week I wrote about installing ElasticSearch on your local machine. Today I will focus on using this search engine – indexing and searching for data.
BTW If you find something suspicious in my post, let me know because I’m still discovering ElasticSearch and there is probably lots of facts and tricks I don’t know.
Some help – Sense plugin
First of all, I recommend installing Sense for Chrome. It works like a charm and makes you be able to focus on indexing and only indexing. Very helpful for a beginning! When installed, Sense is accessed by the green tree (??) icon marked on the below screen.
Here we start
Before we start playing, remember to fire the ElasticSearch. Without this, you will get the following error:
‘Request failed to get to the server (status code: 0):’
Starting ES is described in my previous post.
Let’s index something!
Let’s assume we have a huge collection of companies and we want to search it quickly (wow, it’s just like in my pet project, ReMaster! What a coincidence!).
This is how we build our index: PUT /companies/company/1
{ "name" : "John's Grocery", "city" : "Warsaw", "street" : "3 Maja 15", "owner" : "John Bravo" }
When we click the green arror, the result will appear in the right pane:
Let me tell you a little bit what the result JSON means:
“_index”
Ok, so what happened? We just put the first element to the ‘companies’ index. The index didn’t exist, so ElasticSearch just created if for us. It’s a helpful guy, don’t you think?
So the string followed by the “_index” is just an index name.
“_type”
Within this index, we created a type and called it a company. We will have a lot of company objects in our companies index. We can also have other types in this index, for example branch_office or employee.
“_id”
We specified the id of the object. Our index was empty so the record was saved but guess what would happen if we try to save some other data with the same id in the index? For example let’s fire the following block:
PUT /companies/company/1 { "name" : "Gardella", "city" : "London", "street" : "Oxford str.", "owner" : "Billinda Gates" }
Anyway, don’t read further for a moment, try to guess what will happen!
Well…
The data in index will be updated. Yep, overwritten. So keep it in mind :).
“_version”
ES versions its data so if you update it (like I wrote above – by simply PUT-ing a new data in the same place in index), version number will increase.
“result”
Obvious – if the indexing went OK you will get ‘created’ or ‘updated’ result.
“_shards”
Shards are low-level pieces of data. You can think of it like that: one document = one shard.
One index has a lot of shards. There are 2 types of shards – primary (which are searched always first) and replica shards (these are kind of a backup and influence the search performance). For now, this amount of knowledge seems to be enough.
BTW Did you notice that there are two blocks of Sense commands in the above screen? That’s right, you can have a as many block as you want and fire (with the green arrow marked on the screen) only the desired one. You see, I told you Sense is a great tool!
It’s time to check the index
Ok, we indexed one company, but how to retrieve the data from it?
Well, very simply! You just paste the following code in Sense left pane and click the green arrow:
GET _search { "query": { "match_all": {} } }
In the result we get some new keywords.
“_score”
ElasticSearch analyses the data to fit the query. This way it ranks the data, giving it scores. The higher score, the greater probability that these are the data you are looking for. The highest value is 1, anyway ;).
“_source”
Just the found record content.
Show me something!
Ok, let’s do something more ‘sophisticated’. Let’s search a company with a given name! What a challenge! 😉
I modified data in my index so it looks like this:
{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 1, "hits": [ { "_index": "companies", "_type": "company", "_id": "2", "_score": 1, "_source": { "name": "Gardella", "city": "London", "street": "Oxford str.", "owner": "Billinda Gates" } }, { "_index": "companies", "_type": "company", "_id": "4", "_score": 1, "_source": { "name": "Nails world", "city": "Berlin", "street": "Brown Strasse 4A", "owner": "Hilda Himmel" } }, { "_index": "companies", "_type": "company", "_id": "1", "_score": 1, "_source": { "name": "Makeup & nails", "city": "Warsaw", "street": "3 Maja 15", "owner": "John Bravo" } }, { "_index": "companies", "_type": "company", "_id": "3", "_score": 1, "_source": { "name": "Gardella Co", "city": "Paris", "street": "Other Av.", "owner": "Pierre Newman" } } ] } }
Let’s search for a company with the “nails” phrase in its name!
We fire the following code:
GET _search { "query": { "match": { "name": "nails" } } }
As you can see, ES found 2 companies with ‘nails‘ in name. The scores it gives our data are different and quite far from the magic value ‘1’. So it really works, I guess ;)!