Deploy ELK Stack on a Remote Ubuntu VM

Purpose

Deploy ElasticSearch (8.X), Kibana (8.X), Logstash (8.X), and PostgreSQL14 (optional) as native services on a AWS EC2 Ubuntu VM
Allow outside access for ES and Kibana.
Set up data syncing between ES and PostgreSQL, using Logstash + PostgreSQL JDBC.

!!!! Notice that this is a very simple test deployment but not a production-ready deployment!

I only want to:

test availability of ELK stack
!!!! create some deployed search APIs for my frontend colleagues to call (through a backend server as the proxy), making their development easier

Key Ideas

Expose port 9200 for ElasticSearch’s programmatic interface. Your backend server can use ElasticSearch’s REST APIs here.
Expose port 5601 for Kibana’s GUI admin interface. You can access it in your local machine’s browser.
Deploy Logstash for data syncing between ES and PostgreSQL.
I also included an extra section for deploying PostgreSQL. It is optional if you have a running and reachable DB already. I added this section cuz it is easier to set it up on the VM instead of AWS RDS for my testing purpose.

Why this Guide?

I initially followed [this guide]. However, it uses Version 7.X, which has different security settings than the latest 8.X. So Additional settings need to be configured for 8.X. Besides, this guides use Nginx as reverse proxy to handle all the traffic. But I want to skip this step and expose ES and Kibana directly to public.
ElasticSearch’s official guide can be found [here]. But it is sort of incomplete and contains redundant information.

My EC2 VM

System: Ubuntu 22.04.3 LTS (GNU/Linux 6.2.0-1018-aws x86_64)
Machine: t3.large, 2 vCPUs, 8GB memory. I tried 4GB and it is pretty laggy
Firewall: port 9200, 5601 open. Also open PostgreSQL’s default port if you want.

ES and Kibana

Install Java

sudo apt update
sudo apt install default-jdk
check for success:
- java -version
- javac -version

Install ElasticSearch (8.11.3)

Install signing key.

curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

Install ES

sudo apt update
sudo apt install elasticsearch=8.11.3

Now, start ES as a service, it is gonna take a while

sudo systemctl start elasticsearch

Health check

sudo systemctl status elasticsearch

Then, we reset the password. It is easier to reset it than to find the default one in logs.

sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

The password should have been output to the terminal, copy it, then store it in an env variable:

export ES_PWD='<THE PASSWORD VALUE>'

Test connection locally, a JSON should be returned

curl -k -X GET -u elastic:$ES_PWD "https://localhost:9200"

Now, we check if this port has been exposed to outside traffic successfully, open your local machine’s terminal, run:

curl -k -X GET -u elastic:<THE PASSWORD VALUE> "https://<Remote VM IP>:9200"

The same json as last step should be returned.

Very important: We use -k to bypass the certification validation, so that the request can be sent even if our cert is self-signed in ES. Otherwise, the request will be rejected by CRUL. Using self-signed Certification will yield a signficant security threat (E.g. Man-in-the-middle attack). It is highly suggested to add a valid certificate as soon as possible

If you failed somewhere, troubeshotting

check your VM firewall. Is 9200 open?
Have you messed up ES’s config file? Open sudo nano /etc/elasticsearch/elasticsearch.yml
- check that http.host should be 0.0.0.0. This is the default setting.
- network.host should be commented or set as localhost

Install Kibana (8.11.3)

Install Kibana sudo apt-get update && sudo apt-get install kibana=8.11.3
Start Kibana sudo systemctl start kibana
Health check sudo systemctl status kibana

Now we edit config to allow outside traffic, open the config file:

sudo nano /etc/kibana/kibana.yml

UNCOMMENT and/or EDIT following configs, when you are done, don’t forget to CTRL+S to save !

!!! Make sure you uncomment them by removing the #
!!! you can press CTRL+W to search. It is faster than scolling down
server.host: 0.0.0.0
server.port: 5601
elasticsearch.hosts: ["http://localhost:9200"]

Restart Kibana sudo systemctl restart kibana

Go to <YOUR VM IP>:5601 in your local machine’s browser, you will get prompted to enter a token. If nothing shows up, wait for a while cuz it takes sometime for Kibana to start.

Now, we re-generate an enrolment token for kibana to use

sudo /usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana

Enter this token.

Then, you will be prompted to enter a verification code

run journalctl -u kibana
then keep scolling down by pressing the down arrow on your keyboard. On the last page, you can find the code there.

Finally, you can use username/password to login, we have genereated the password for user elastic in earlier steps and stored it in an env variable. If you can’t find it, run echo $ES_PWD. If you have lost it, no worries, follow earlier steps to generate a new one.

Uninstall

During my setup, I frequently uninstall-reinstall ES and Kibana. Using apt remove, purge will not fully remove all configuration files for ES. Follow [this guide] to manually rm -rf relevant folders for a full clean-up.

PostgreSQL

Skip this section if you have any available deployed PostgreSQL ready. Otherwise, you can do it in a VM and I personally find it easier.

We also gonna expose it to outside traffic so that you can connect to it from your local machine and easily insert data when testing Logstash.

Install

sudo apt update
sudo apt install postgresql postgresql-contrib
start service sudo systemctl start postgresql
health check sudo systemctl status postgresql

Configure

connect to shell sudo -u postgres psql
Set up your DB, DB user, DB user password, grant permissions, etc.
Now, we edit config, open sudo nano /etc/postgresql/<version>/main/postgresql.conf, make sure you replace the version number with the actual one. To find the version number, sudo ls /etc/postgresql. Then, EDIT:
- open to remote connection listen_addresses = '*'
- add your ip to trust list. Add this line host all <DB USERNAME> <YOUR IP ADDRESS>/32 md5. It is likely that you don’t have a static public ip from your ISP. Just google any “find my ip” tool and use that IP unless it is changed (which will likely to happen often, e.g. you turn off your PC). If you have made everything working, but one day later PostgreSQL produces some ip-blocked errors, then change this line to your new public ip.
[Optional] Test connection from your local machine’s shell
- psql -h <remote machine ip> -p <port> -U <DB USER> -d <DB NAME> to make sure you can connect to it locally.
- Then, you can use any DB admin GUI on your local PC e.g. DBeaver to play around and insert data when testing Logstash.

Logstash

Install Logstash

sudo apt install logstash

Download PostgreSQL jdbc

I am connecting PostgreSQL to ES so I need JDBC as a plugin to Logstash. If you are using MySQL, please download MySQL JDBC.

download the jdbc jar to your local machine. [link]
now, we upload this jar to the VM. I used scp here.
- scp -i <path to your private key on your local machine> <path to the jdbc jar file on your local machine> <VM username>@<VM ip>:<VM directory path to store the file>
now, go your VM, you should be able to see this file

Insert dummy data in postgresql

insert dummy data in postgresql

Creata an api key in Kibana

go to kibana, create an api key in “Logstash” format. Copy and save it in your notepad.

Create logstash configuration file

Now, we create two configuration file for Logstash to use. logstash.conf is the main configuration file. I also created a statement.sql and let Logstash refer to this file for data syncing SQL. Or, you could put the SQL code inside logstash.conf, which also works.

place statement.sql in sudo nano ~/statement.sql (or anywhere you like)
place logstash.conf in sudo nano ~/logstash.conf (or anywhere you like)

Now, we create logstash.conf. I believe that before trying to deploy to VM, you must have tried running logstash locally. There are a couple of things that need to be changed in this file accordingly. This file will vary from one person to another. Here is a checklist including what I changed. Make sure you thoroughly go through yours.

modify jdbc_driver_library to the absolute path where you stored the JDBC on VM
modify jdbc_connection_string if needed, make sure it points to your postgresql
modify api_key to the one you just created earlier
since I used a seperate SQL file, I need to modify statement_filepath to the abosolute path of the file. I placed statement.sql in home. So it is /home/<User>/statement.sql. !!! Make sure you use absoluate path here!!!, not relative path like ./statement.sql ❌

Run logstash

Here, I run logstash through terminal directly instead of running it as a service. Because it is easier to specify extra arguments (I am using aggregate filter, so I need to limit number of worker thread to be 1 -w 1). If you are not using this aggregate filter then NVM.

Normal: sudo /usr/share/logstash/bin/logstash -f ~/logstash.conf
For me (aggregate filter) sudo /usr/share/logstash/bin/logstash -f ~/logstash.conf -w 1

The process should be running.

Since we are not running it as a service, we need to make sure the process is running after we close the shell, to achieve this, stop the process above, instead, do:

nohup sudo /usr/share/logstash/bin/logstash -f ~/logstash.conf > /dev/null 2>&1 &
For me (aggregate filter) nohup sudo /usr/share/logstash/bin/logstash -f ~/logstash.conf -w 1 > /dev/null 2>&1 &

(I also manually discarded stdout output)

GOOD JOOOOOB

GOOOD JOOOOOOB.

What Next for a better set up?

Some of my thoughts.

!!! Add more ES nodes and form a cluster to handle high concurrency
Have a valid SSL certificate
Use Nginx as reverse proxy and handle all SSL. Hide ES and Kibana.
Optimize security settings
etc…

Extra: Why backend as a proxy?

Why we need to use backend server as a proxy to bridge frontend and ElasticSearch? Can’t we let frontend access ES directly to reduce network overhead? By:

setting up an frontend only API key with limited query permission
let ElasticSearch do the input validation/sanitization
pre-construct search query associated with API key in ElasticSearch

Here are some of the reasons why we can’t do it, there may be more:

less secure than a dedicated backend server.
some search engine natively support this, e.g. TypeSense. ElasticSearch recently added support for this, with the help of “Search Application”. However,
- “Search Application” is still in beta.
- “Search Application” is premium account only.
- [READ MORE]

End

Thanks for reading!