Last time we talked about creating indexes. We also searched our index in some easy manner. Today I would like to focus on other search-alike operations – bool queries.
BTW If you can’t wait to write some “real” code in C#, stay tuned, we will do it in the near future.
Let’s start
For the purpose of this post, we will need some index. Again, we will work with the companies data. Here you have a short code you can copy-paste into the Chrome’s Sense addon so we will work on ‘the same‘ index:
PUT /companies/company/1 { "name" : "Gardella", "city" : "London", "street" : "Oxford str.", "owner" : "Billinda Gates", "employees": 10 } PUT /companies/company/2 { "name" : "New world", "city" : "London", "street" : "Oxford str.", "owner" : "Ana Novak", "employees": 2 } PUT /companies/company/3 { "name" : "Gardens & Houses", "city" : "Paris", "street" : "Abc str.", "owner" : "Pierre Pain", "employees": 890 } PUT /companies/company/4 { "name" : "Sone", "city" : "New York", "street" : "Silent Av.", "owner" : "John Bravo", "employees": 14 } PUT /companies/company/5 { "name" : "Paris style", "city" : "Warsaw", "street" : "Handlowa", "owner" : "Jan Kowalski", "employees": 4978 }
Bool queries – some practise
It is said that bool queries in ElasticSearch 5.4 are successors of filtering.
But what do we use them for?
For more advanced searching, of course!
For example, let’s say we want to search for all the companies that have more than 10 and less than 200 workers… Not a problem!
POST _search { "query": { "bool" : { "must" : { "range" : { "employees" : { "gte" : 10, "lte" : 200 } } } } } }
The above query returned 2 companies – Sone and Gardella.
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "companies", "_type": "company", "_id": "4", "_score": 1, "_source": { "name": "Sone", "city": "New York", "street": "Silent Av.", "owner": "John Bravo", "employees": 14 } }, { "_index": "companies", "_type": "company", "_id": "1", "_score": 1, "_source": { "name": "Gardella", "city": "London", "street": "Oxford str.", "owner": "Billinda Gates", "employees": 10 } } ] } }
Wow, it works! ?
Ok, but let’s say, we want only companies with 10-200 workers from all the cities but from London. ElasticSearch comes here with a help too!
POST /companies/company/_search { "query": { "bool" : { "must" : { "range" : { "employees" : { "gte" : 10, "lte" : 200 } } }, "must_not" : { "match" : { "city" : "London" } } } } }
This time, we received only one company:
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "companies", "_type": "company", "_id": "4", "_score": 1, "_source": { "name": "Sone", "city": "New York", "street": "Silent Av.", "owner": "John Bravo", "employees": 14 } } ] } }
Perfect!
Aaaand what about getting companies from Lodon OR Paris? That’s easy too!
POST /companies/company/_search { "query": { "bool" : { "should" : [ { "match" : { "city" : "London" } }, { "match" : { "city" : "Paris" } } ] } } }
The result:
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0.80259144, "hits": [ { "_index": "companies", "_type": "company", "_id": "2", "_score": 0.80259144, "_source": { "name": "New world", "city": "London", "street": "Oxford str.", "owner": "Ana Novak", "employees": 2 } }, { "_index": "companies", "_type": "company", "_id": "1", "_score": 0.2876821, "_source": { "name": "Gardella", "city": "London", "street": "Oxford str.", "owner": "Billinda Gates", "employees": 10 } }, { "_index": "companies", "_type": "company", "_id": "3", "_score": 0.2876821, "_source": { "name": "Gardens & Houses", "city": "Paris", "street": "Abc str.", "owner": "Pierre Pain", "employees": 890 } } ] } }
Time for some knowledge
Bool queries can be build with 4 different occurrence descriptors. And by seriously sounding ‘occurrence descriptor’ we understand a keyword that describes how we are filtering our index. You can join these keywords in one query so you get all the data you are looking for. Let’s focus on this these magical words.
must
Returned data will for sure match the rule that follows in this node.
filter
Just like a ‘must’ but ignores the ‘score’ (in the result all the scores will be set to 0)
should
That’s a funny one! If it is used in the query without any other occurrence descriptor it acts just like an ‘OR’ in the ‘if’ statement. Assuming we filter our index with eg. 3 conditions in ‘should’ clause, the record will be returned if it fulfill at least one of the 3 conditions.
Buuut if the query contains ‘filter’ or ‘must’ keyword then ‘should’ doesn’t influence the returned records count! The record may not fulfill any of the ‘should’ statement and it will be returned anyway! But, what’s interesting, it can increase the score (if the record matches at least one of the ‘should’ clause). Example? Here we go!
Let’s look for the companies with workers number between 100 and 9900.
POST /companies/company/_search { "query": { "bool" : { "must" : { "range" : { "employees" : { "gte" : 100, "lte" : 9900 } } }, "should" : [ { "term" : { "city" : "New York" } }, { "term" : { "city" : "Paris" } } ] } } }
The result will look like this (pay special attention to the scores). BTW have you noticed that result contains a company from Warsaw although this city is not listed in ‘should’ clause? Yeah, it really works!
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "companies", "_type": "company", "_id": "5", "_score": 1, "_source": { "name": "Paris style", "city": "Warsaw", "street": "Handlowa", "owner": "Jan Kowalski", "employees": 4978 } }, { "_index": "companies", "_type": "company", "_id": "3", "_score": 1, "_source": { "name": "Gardens & Houses", "city": "Paris", "street": "Abc str.", "owner": "Pierre Pain", "employees": 890 } } ] } }
Now, let’s try the same query without the ‘should’ clause:
POST /companies/company/_search { "query": { "bool" : { "must" : { "range" : { "employees" : { "gte" : 100, "lte" : 9900 } } } } } }
The result:
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "companies", "_type": "company", "_id": "5", "_score": 1, "_source": { "name": "Paris style", "city": "Warsaw", "street": "Handlowa", "owner": "Jan Kowalski", "employees": 4978 } }, { "_index": "companies", "_type": "company", "_id": "3", "_score": 1, "_source": { "name": "Gardens & Houses", "city": "Paris", "street": "Abc str.", "owner": "Pierre Pain", "employees": 890 } } ] } }
Oh! Did you see this? The score is much lower!
Sooo, this example clearly demonstrates (my favourite sentence from the university times ?) that ‘should’ keyword can be used to influence the search result when we want to favour the records with some features (in the above example – we prefer companies from London and Paris).
must_not
It’s just like a NOT or ‘!’ operator. It will return the data that will not match the clause followed by must_not. Tricky part here! Instead of using “term” in the must_not query, use “match” (like in the one of the previous example).
That’s all for today. I hope you didn’t fall asleep in the middle.
Btw if you haven’t tried the above examples in the Sense addon, I do recommend trying it. Playing with ElasticSearch can be a great fun, really!