Everything You Need to Know About Migrating from Solr to Elasticsearch
Written by Manpreet Kaur
Content Writer
November 30, 2022
The release of HCL Commerce V9.1 introduced an advanced search and indexing platform named Elasticsearch. This search platform uses Ingest Service, which has a new data-loading architecture ensuring easy customization and incremental index updates. In addition, it has downward compatibility with HCL Commerce Transaction Server, externalized customizations, and storefronts. Read this blog to know what Elasticsearch is and why it is essential to migrate from Solr to Elasticsearch to deliver powerful search experiences.
What is Elasticsearch?
Elasticsearch is an open-source, distributed, free search engine designed in Apache Lucene. It is an enterprise-level search engine popular for its high speed, flexibility, scalability, REST APIs, and contains a bunch of free data ingestion, storage, enrichment, analysis, and visualization tools.
Elasticsearch searches and indexes information in various formats, including text, numbers, structured, unstructured, and geospatial. It supports the following languages – PHP, Perl, Go, Java, JavaScript, Python, Ruby, and .NET. and helps users get quick search responses by searching the index rather than text directly. In simple words, it is same as searching for a keyword by checking the index instead of searching each word of the book. It is possible to scale Elasticsearch to multiple servers storing petabytes of data. This capability of Elasticsearch allows it to be used in multiple use cases listed below:
- Security Analytics: Analyze access logs to track what’s happening across systems in real-time
- Business Analytics: Helps organizations with multiple data sources to perform analytics
- Website Search: Websites with voluminous content use Elasticsearch for precise results
- Application Search: Applications depend on search platforms to retrieve and report data
- Enterprise Search: Facilitates extensive searching of information across different platforms
- Infrastructure Metrics and Container Monitoring: Analyze metrics and collect data for performance parameters
- Logging and Log Analytics: Utilized to ingest and analyze log data in real-time, providing essential details on log metrics to take action
Why Migrate to Elasticsearch?
Multiple Search Options
Elasticsearch uses various search methods, including full-text search, instant search, fuzzy search, faceted search, autocompletion, customized stemming, and splitting text into words. Fuzzy search gives search results based on entered query, even if you enter misspelled query. Autocompletion helps search by predicting query based on the user’s input.
Distributed Nature
The files in Elasticsearch are divided into multiple containers known as shards, which are duplicated to have replicas that can be used in times of hardware breakdown. When a new file is added, Elasticsearch performs routing and rebalancing tasks automatically.
Quick
Elasticsearch is high-speed and provides the best results for full-text search. The time taken to index a document and show it in the search results is very low – usually one second. Also, it caches the structured queries for result sets and runs them only once. After that, it retrieves results from the cache for further requests that have a cached filter. Therefore, it can be used for short-lived use cases, which include security analytics and infrastructure monitoring.
Easy Data Ingestion, Visualization, and Reporting
Elasticsearch can be easily integrated with Logstash and Beats to simplify data processing ahead of indexing into Elasticsearch. Logstash is a server-side data processing tool that collects data from multiple sources, processes, and transfers it to Elasticsearch. Beats is a free platform for single-purpose data shipping agents to transfer data from systems to Logstash or Elasticsearch. In addition, Kibana offers real-time visualization of Elasticsearch data through line graphs, pie charts, and histograms.
Scalability
The distributed approach makes it easy to scale Elasticsearch horizontally by adding resources and maintaining loading balance between cluster nodes. It has robust pre-built features that facilitate efficient data storage and search.
Document-Oriented
All the entities are stored as structured JSON documents with indexed fields by default with superior performance. The JSON-powered DSL enables developers to create intricate queries to retrieve accurate search results.
Business Cases of Elasticsearch Deployment
The Guardian
The Guardian, a British daily newspaper, wanted to ensure its web content was displayed and visible to its large reader base of over 5 million users. Its internal analytics system allowed the team to track user interaction with content in real-time and use this information to push the right content at the right time across social media channels.
Featuring ElasticSearch, The Guardian has a robust in-house analytics system that processes over 40 million documents daily, providing real-time website traffic visibility and insights about content consumption. It helps track content impressions, the kind of content that drives higher traffic, traffic source, the right time to promote content, and much more. Also, the developers use Elasticsearch to identify website performance issues by searching through events, responding to modifications, and leveraging analytics in real-time.
Dell
Dell uses Elasticsearch to facilitate eCommerce searches worldwide. Previously, its search commerce platform didn’t fulfill needs such as multi-tenancy and cloud-readiness. Thus, it switched to Elasticsearch owing to its advanced features such as open-source and scalability. Presently, Dell has two Elasticsearch clusters – a search cluster that drives search experience on Dell.com and Analytics cluster that tracks search-related user activity.
The Dell search cluster indexes all documents on Dell.com, while the Dell Analytics cluster indexes all the clicks on Dell.com. Also, it created extensive linguistic pipelines that use ElasticSearch’s language analyzers, spell check, stemming, and stop word removal to give accurate search results. Dell also has a virtual assistant that helps buyers to refine their search before clicking the search button, by showing a preview of results. With the adoption of Elasticsearch, Dell has seen significant conversions, revenue, and customer satisfaction growth.
Guidelines to Follow while Migrating from Solr to Elasticsearch
- Define all the Index field definitions in the Solr schema.xml in the data specification of all object types, a NiFi Ingest Service Connector element.
- Define the ETL processes in the NiFi Ingest Service Connector. The Ingest Service manages ingestion, processes data from varied sources, and transforms data in memory in the Apache NiFi cluster.
- Update the required fields in the existing indexed document with the help of incremental index updates. There is no need for extension indexes.
- Create a single ingest connector for every document source. This simplifies customizing extensions that ingest data and load it to the search component indexes.
- Create a custom connector and pipeline to load data. The site crawler collects page URLs; however, it doesn’t load data into the search box automatically.
- ZooKeeper stores custom configurations and overrides default behaviors, for example, query responses.
- Assess Solr-based search profiles, defined expression providers, and custom query pre and post processors in the profiles.
- Evaluate custom expression providers associated with search relevancy and ranking concerning Natural Language Processing (NLP).
Conclusion
Are you looking to migrate from Solr to Elasticsearch to improve your search systems? Royal Cyber’s certified HCL Commerce experts offer the best consultation to help you understand the process and perform a non-disruptive migration. We help you scale your business by deploying Elasticsearch to perform dynamic searches and analytic functionalities. Reach out to us to know how you can enjoy the benefits of Elasticsearch.