Watch all RedisConf 2021 sessions on demand

Watch Now

RediSearch in Action

Learn how to ingest tweets in real-time and query them flexibly using RediSearch with Azure Cache for Redis.



Back to Blogs

Redis has a versatile set of data structures ranging from simple Strings all the way to powerful abstractions such as Redis Streams. The native data types can take you a long way, but there are certain use cases that may require a workaround. One example is the requirement to use secondary indexes in Redis in order to go beyond the key-based search/lookup for richer query capabilities. Though you can use Sorted Sets, Lists, and so on to get the job done, you’ll need to factor in some trade-offs. 

Enter RediSearch! Available as a Redis module, RediSearch provides flexible search capabilities, thanks to a first-class secondary indexing engine. It offers powerful features such as full-text Search, auto completion, geographical indexing, and many more. 

To demonstrate the power of RediSearch, this blog post offers a practical example of how to use RediSearch with Azure Cache for Redis with the help of a Go service built using the RediSearch Go client. It’s designed to give you a set of applications that let you ingest tweets in real-time and query them flexibly using RediSearch. 

Specifically, you will learn how to: 

  • Work with RediSearch indexes 
  • Use different RediSearch data types, such as TEXT, NUMERIC, TAG, and others 
  • How to build an application to show RediSearch capabilities 
  • How to deploy the service components to Azure with just a few commands 
  • Analyze tweet data by querying RediSearch 

Application overview 

As mentioned, the example service lets you consume tweets in real-time and makes them available for querying via RediSearch. 

It has two components: 

  1. Consumer/Indexer: Reads from the Twitter Streaming API, creates the index, and continuously adds tweet data (in Redis HASHes) as they arrive. 
  2. Search service: A REST API that allows you to search tweets using the RediSearch query syntax

At this point, I am going to dive into how to get the solution up and running so that you can see it in action. However, if you’re interested in understanding how the individual components work, please refer to the Code walk through section below, and the GitHub repo for this blog: https://github.com/abhirockzz/redisearch-tweet-analysis

Prerequisites 

  1. To begin with, you will need a MicrosoftAzure account: get one for free here! 
  2. The service components listed above will be deployed to Azure Container Instances using native Docker CLI commands. This capability is enabled by integration between Docker and Azure .
  3. You will need Docker Desktop version 2.3.0.5 or later, for Windows, macOS, or install the Docker ACI Integration CLI for Linux. To use Twitter Streaming API, you will also need a Twitter developer account. If you don’t have one already, please follow these instructions. RediSearch in action! 

Start off by using this quick-start tutorial to set up a Redis Enterprise tier cache on Azure. Once you finish the set up, ensure that you have the the Redis host name and access key handy:

Both the components of our service are available as Docker containers: the Tweet indexing service and the Search API service. (If you need to build your own Docker images, please use the respective Dockerfile available on the GitHub repo.) 

You will now see how convenient it is to deploy these to Azure Container Instances, which allows you to run Docker containers on-demand in a managed, serverless Azure environment.

Deploy to Azure 

A docker-compose.yml file defines the individual components (tweets-search and tweets-indexer). All you need to do is update it to replace the values for your Azure Redis instance as well as your Twitter developer account credentials. Here is the file in its entirety

Create an Azure context

Clone the GitHub repo: 

Deploy both the service components as part of a container group

(Note that Docker Compose commands currently available in an ACI context start with docker compose. That is NOT  the same as docker-compose with a hyphen. )

You will see an output similar to this: 

Wait for services to start, you can also check the Azure portal. Once both the services are up and running, you can check their respective logs: 

If all goes well, the tweet-consumer service should have kicked off. It will read a stream of tweets and persist them to Redis.

The moment of truth!

It’s time to query the tweet data. To do so, you can access the REST API in Azure Container Instances with an IP address and a fully qualified domain name (FQDN) (read more in Container Access). To find the IP, run docker ps and check the PORTS section in the output (as shown below):

You can now run all sorts of queries! Before diving in, here is a quick idea of the indexed attributes that you can use in your search queries: 

(Note, I use curl in the examples below, but would highly recommend the “REST Client” for VS Code

Set the base URL for the search service API: 

Start simple and query all the documents (using * ):

You will see an output similar to this: 

Notice the headers Page-Size and Search-Hits: these are custom headers being passed from the application, mainly to demonstrate pagination and limits. In response to our “get me all the documents” query, we found 12 results in Redis, but the JSON body returned 10 entries. This is because of the default behavior of the RediSearch Go API, which you can change using different query parameter, such as:

Or, for example, search for tweets sent from an iPhone:

You may not always want all the attributes in the query result. For example, this is how to just get back the user (Twitter screen name) and the tweet text:

How about a query on the user name (e.g. starting with jo):

You can also use a combination of attributes in the query:

How about we look for tweets with specific hashtags? It is possible to use multiple hashtags (separated by |)?

Want to find out how many tweets with the biden hashtag were created recently? Use a range query:

If you were lucky to grab some coordinates info on the tweets, you can try extracting them and then query on coordinates attribute:

These are just a few examples. Feel free to experiment further and try out other queries. This section in the RediSearch documentation might come in handy!

Important: After you finish, don’t forget to stop the services and the respective containers in Azure Container Instances: 

Use the Azure Portal to delete the Azure Redis instance that you had created.

Code walk through

This section provides a high-level overview of the code for the individual components. This should make it easier to navigate the source code in the GitHub repo.

Tweets consumer/indexer:

go-twitter library has been used to interact with Twitter.

It authenticates to the Twitter Streaming API:

And listens to a stream of tweets in a separate goroutine:

Notice the go index.AddData(tweetToMap(tweet))—this is where the indexing component is invoked. It connects to Azure Cache for Redis:

It then drops the index (and the existing documents as well) before re-creating it:

The index and its associated documents are dropped to allow you to start with a clean state, which makes it easier to experiment/demo. You can choose to comment out this part if you wish.

Information for each tweet is stored in a HASH (named tweet:<tweet ID>) using the HSET operation: 

Tweets search exposes a REST API to query RediSearch. All the options (including query, etc.) are passed in the form of query parameters. For example, http://localhost:8080/search?q=@source:iphone.  It extracts the required query parameters:

The q parameter is mandatory. However, you can also use the following parameters for search: 

  • fields : to specify which attributes you want to return in the result, and, 
  • offset_limit : if you want to specify the offset from where you want to search and the number of documents that you want to include in the result (by default, offset is 0 and limit is 10 – as per RediSearch Go client).

For example:

Finally, the results are iterated over and passed back as JSON (array of documents):

That’s all for this section!

Redis Enterprise tiers on Azure Cache for Redis

Redis Enterprise is available as a native service on Azure in the form of two new tiers for Azure Cache for Redis which are operated and supported by Microsoft and Redis Labs. This service gives developers access to a rich set of Redis Enterprise features, including modules like RediSearch. For more information, see these resources: 

Conclusion

This end-to-end application demonstrates how to work with indexes, ingest real-time data to create documents (tweet information) which are indexed by RediSearch engine and then use the versatile query syntax to extract insights on those tweets. 

Want to understand what happens behind the scenes when you search for a topic on the Redis Labs documentation? Check out this blog post to learn how Redis Labs site incorporated full-text search with RediSearch! Or, perhaps you’re interested in exploring how to use RediSearch in a serverless application

If you’re still getting started, visit the RediSearch Quick Start page.

If you want to learn more about the enterprise capabilities in Azure Cache for Redis, you can check out the following resources: