Learnt a bit more about Elastic Search. Architecture overview, how to use it, and situations where you’d consider using it.
Source
Architecture
An Elastic search cluster has the following nodes:
- Coordinating node
- Master node
- Data node
- Ingest node
- Machine learning node
- Elastic search provides a distribution system on top of Lucene.
- Replication, shards, indexes, providing the http interface, coordinating the distributed queries
A few examples of where it works well
- Read heavy workloads: My understanding so far tells me that this kind of situation is where Elastic Search works best
- Fuzzy matching
- Supporting faceted-search, especially which is common on e-commerce sites.
- Full text, geo-spatial, and structured search
How it keeps in sync with your DB
There are a few ways to achieve this.
- Change Data Capture tools like Debezium
- Dual writes: More complex application code, but less infrastructure overhead. Risk of inconsistencies here though, I wouldn’t try this on a greenfield project.
- Batch sync & cron jobs: usable when you don’t need real time sync.
- Message queues or event streaming
Not a database
- It’s optimised for search, not efficient storage
- Durability & availability aren’t as bulletproof as a traditional db
- No ACID transactions
You get the point, the list goes on.
Misc
- Denormalization is key with elastic search. You shouldn’t need to do complex joins or nested queries when using it.