Purpose
- Deploy ElasticSearch (8.X), Kibana (8.X), Logstash (8.X), and PostgreSQL14 (optional) as native services on a AWS EC2 Ubuntu VM
- Allow outside access for ES and Kibana.
- Set up data syncing between ES and PostgreSQL, using Logstash + PostgreSQL JDBC.
!!!! Notice that this is a very simple test deployment but not a production-ready deployment!
I only want to:
- test availability of ELK stack
- !!!! create some deployed search APIs for my frontend colleagues to call (through a backend server as the proxy), making their development easier
Key Ideas
- Expose port 9200 for ElasticSearch’s programmatic interface. Your backend server can use ElasticSearch’s REST APIs here.
- Expose port 5601 for Kibana’s GUI admin interface. You can access it in your local machine’s browser.
- Deploy Logstash for data syncing between ES and PostgreSQL.
- I also included an extra section for deploying PostgreSQL. It is optional if you have a running and reachable DB already. I added this section cuz it is easier to set it up on the VM instead of AWS RDS for my testing purpose.
Why this Guide?
- I initially followed [this guide]. However, it uses Version 7.X, which has different security settings than the latest 8.X. So Additional settings need to be configured for 8.X. Besides, this guides use Nginx as reverse proxy to handle all the traffic. But I want to skip this step and expose ES and Kibana directly to public.
- ElasticSearch’s official guide can be found [here]. But it is sort of incomplete and contains redundant information.
My EC2 VM
- System: Ubuntu 22.04.3 LTS (GNU/Linux 6.2.0-1018-aws x86_64)
- Machine: t3.large, 2 vCPUs, 8GB memory. I tried 4GB and it is pretty laggy
- Firewall: port 9200, 5601 open. Also open PostgreSQL’s default port if you want.
ES and Kibana
Install Java
sudo apt update
sudo apt install default-jdk
- check for success:
java -version
javac -version
Install ElasticSearch (8.11.3)
Install signing key.
curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
Install ES
sudo apt update
sudo apt install elasticsearch=8.11.3
Now, start ES as a service, it is gonna take a while
sudo systemctl start elasticsearch
Health check
sudo systemctl status elasticsearch
Then, we reset the password. It is easier to reset it than to find the default one in logs.
sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic
The password should have been output to the terminal, copy it, then store it in an env variable:
export ES_PWD='<THE PASSWORD VALUE>'
Test connection locally, a JSON should be returned
curl -k -X GET -u elastic:$ES_PWD "https://localhost:9200"
Now, we check if this port has been exposed to outside traffic successfully, open your local machine’s terminal, run:
curl -k -X GET -u elastic:<THE PASSWORD VALUE> "https://<Remote VM IP>:9200"
The same json as last step should be returned.
Very important: We use -k to bypass the certification validation, so that the request can be sent even if our cert is self-signed in ES. Otherwise, the request will be rejected by CRUL. Using self-signed Certification will yield a signficant security threat (E.g. Man-in-the-middle attack). It is highly suggested to add a valid certificate as soon as possible
If you failed somewhere, troubeshotting
- check your VM firewall. Is 9200 open?
- Have you messed up ES’s config file? Open
sudo nano /etc/elasticsearch/elasticsearch.yml
- check that
http.host
should be0.0.0.0
. This is the default setting. network.host
should be commented or set aslocalhost
- check that
Install Kibana (8.11.3)
- Install Kibana
sudo apt-get update && sudo apt-get install kibana=8.11.3
- Start Kibana
sudo systemctl start kibana
- Health check
sudo systemctl status kibana
Now we edit config to allow outside traffic, open the config file:
sudo nano /etc/kibana/kibana.yml
UNCOMMENT and/or EDIT following configs, when you are done, don’t forget to CTRL+S to save !
- !!! Make sure you uncomment them by removing the #
- !!! you can press CTRL+W to search. It is faster than scolling down
server.host: 0.0.0.0
server.port: 5601
elasticsearch.hosts: ["http://localhost:9200"]
Restart Kibana sudo systemctl restart kibana
Go to <YOUR VM IP>:5601
in your local machine’s browser, you will get prompted to enter a token. If nothing shows up, wait for a while cuz it takes sometime for Kibana to start.
Now, we re-generate an enrolment token for kibana to use
sudo /usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana
Enter this token.
Then, you will be prompted to enter a verification code
- run
journalctl -u kibana
- then keep scolling down by pressing the down arrow on your keyboard. On the last page, you can find the code there.
Finally, you can use username/password to login, we have genereated the password for user elastic
in earlier steps and stored it in an env variable. If you can’t find it, run echo $ES_PWD
. If you have lost it, no worries, follow earlier steps to generate a new one.
Uninstall
During my setup, I frequently uninstall-reinstall ES and Kibana. Using apt remove
, purge
will not fully remove all configuration files for ES. Follow [this guide] to manually rm -rf
relevant folders for a full clean-up.
PostgreSQL
Skip this section if you have any available deployed PostgreSQL ready. Otherwise, you can do it in a VM and I personally find it easier.
We also gonna expose it to outside traffic so that you can connect to it from your local machine and easily insert data when testing Logstash.
Install
sudo apt update
sudo apt install postgresql postgresql-contrib
- start service
sudo systemctl start postgresql
- health check
sudo systemctl status postgresql
Configure
- connect to shell
sudo -u postgres psql
- Set up your DB, DB user, DB user password, grant permissions, etc.
- Now, we edit config, open
sudo nano /etc/postgresql/<version>/main/postgresql.conf
, make sure you replace the version number with the actual one. To find the version number,sudo ls /etc/postgresql
. Then, EDIT:- open to remote connection
listen_addresses = '*'
- add your ip to trust list. Add this line
host all <DB USERNAME> <YOUR IP ADDRESS>/32 md5
. It is likely that you don’t have a static public ip from your ISP. Just google any “find my ip” tool and use that IP unless it is changed (which will likely to happen often, e.g. you turn off your PC). If you have made everything working, but one day later PostgreSQL produces some ip-blocked errors, then change this line to your new public ip.
- open to remote connection
- [Optional] Test connection from your local machine’s shell
psql -h <remote machine ip> -p <port> -U <DB USER> -d <DB NAME>
to make sure you can connect to it locally.- Then, you can use any DB admin GUI on your local PC e.g. DBeaver to play around and insert data when testing Logstash.
Logstash
Install Logstash
sudo apt install logstash
Download PostgreSQL jdbc
I am connecting PostgreSQL to ES so I need JDBC as a plugin to Logstash. If you are using MySQL, please download MySQL JDBC.
- download the jdbc jar to your local machine. [link]
- now, we upload this jar to the VM. I used scp here.
scp -i <path to your private key on your local machine> <path to the jdbc jar file on your local machine> <VM username>@<VM ip>:<VM directory path to store the file>
- now, go your VM, you should be able to see this file
Insert dummy data in postgresql
- insert dummy data in postgresql
Creata an api key in Kibana
- go to kibana, create an api key in “Logstash” format. Copy and save it in your notepad.
Create logstash configuration file
Now, we create two configuration file for Logstash to use. logstash.conf
is the main configuration file. I also created a statement.sql
and let Logstash refer to this file for data syncing SQL. Or, you could put the SQL code inside logstash.conf
, which also works.
- place
statement.sql
insudo nano ~/statement.sql
(or anywhere you like) - place
logstash.conf
insudo nano ~/logstash.conf
(or anywhere you like)
Now, we create logstash.conf
. I believe that before trying to deploy to VM, you must have tried running logstash locally. There are a couple of things that need to be changed in this file accordingly. This file will vary from one person to another. Here is a checklist including what I changed. Make sure you thoroughly go through yours.
- modify
jdbc_driver_library
to the absolute path where you stored the JDBC on VM - modify
jdbc_connection_string
if needed, make sure it points to your postgresql - modify
api_key
to the one you just created earlier - since I used a seperate SQL file, I need to modify
statement_filepath
to the abosolute path of the file. I placedstatement.sql
in home. So it is/home/<User>/statement.sql
. !!! Make sure you use absoluate path here!!!, not relative path like./statement.sql
❌
Run logstash
Here, I run logstash through terminal directly instead of running it as a service. Because it is easier to specify extra arguments (I am using aggregate filter, so I need to limit number of worker thread to be 1 -w 1
). If you are not using this aggregate filter then NVM.
- Normal:
sudo /usr/share/logstash/bin/logstash -f ~/logstash.conf
- For me (aggregate filter)
sudo /usr/share/logstash/bin/logstash -f ~/logstash.conf -w 1
The process should be running.
Since we are not running it as a service, we need to make sure the process is running after we close the shell, to achieve this, stop the process above, instead, do:
nohup sudo /usr/share/logstash/bin/logstash -f ~/logstash.conf > /dev/null 2>&1 &
- For me (aggregate filter)
nohup sudo /usr/share/logstash/bin/logstash -f ~/logstash.conf -w 1 > /dev/null 2>&1 &
(I also manually discarded stdout output)
GOOD JOOOOOB
GOOOD JOOOOOOB.
What Next for a better set up?
Some of my thoughts.
- !!! Add more ES nodes and form a cluster to handle high concurrency
- Have a valid SSL certificate
- Use Nginx as reverse proxy and handle all SSL. Hide ES and Kibana.
- Optimize security settings
- etc…
Extra: Why backend as a proxy?
Why we need to use backend server as a proxy to bridge frontend and ElasticSearch? Can’t we let frontend access ES directly to reduce network overhead? By:
- setting up an frontend only API key with limited query permission
- let ElasticSearch do the input validation/sanitization
- pre-construct search query associated with API key in ElasticSearch
Here are some of the reasons why we can’t do it, there may be more:
- less secure than a dedicated backend server.
- some search engine natively support this, e.g. TypeSense. ElasticSearch recently added support for this, with the help of “Search Application”. However,
- “Search Application” is still in beta.
- “Search Application” is premium account only.
- [READ MORE]
End
Thanks for reading!