Hello everyone!
depesz just released a new useful tool, pg_terminator
I believe it will be useful in Pg DBA's toolbox.
It requires just Ruby, and ruby-pg gem.
You run it like this:
$ pg_terminator etc/pg_terminator/webdb.yaml
and the yaml config looks like this:
log: /tmp/pg_terminator.webdb.log
min_sleep: 1
database:
dbname: webdb
host: webdbsrv1
rules:
- name: long-webapp-txn
action: transaction
threshold: 180
type: transaction
condition: application_name ~ 'webapp' and state <> 'idle'
Here is an example of pg_terminator in action:
2018-09-25 13:01:27 Sleeping for 20 seconds
2018-09-25 13:01:47 Sleeping for 20 seconds
2018-09-25 13:02:07 Sleeping for 15.0 seconds
2018-09-25 13:02:22 Terminated a query (pid: 2242) because of rule #1: long-txn
2018-09-25 13:02:22 Sleeping for 37.987898 seconds
wtorek, 25 września 2018
niedziela, 23 września 2018
Neo4j LOAD CSV performance testing
Hello everyone!
Since I put my hands on Neo4j graph database, I am going to delve into its performance. I'll start with benchmarking basic operations - including data ingestion.
So, let's copy some tabular data into Neo4j. First step is obviously reading and parsing input files.
In case you don't know - the standard command for reading flat files is LOAD CSV. For example, to load data from tab-separated CSV file and print only 1st and 7th fields, you need to run:
I decided to test this basic load operator in several variants, checking for possible factors that *could* have impact on load performance:
Here are the raw results:
Conclusions?
Executive summary:
Test conditions:
- default configuration for neo4j
- OS: Ubuntu 18.04
- Linux kernel 4.15
- Oracle Java 8
- Dell E6440 laptop
- Intel i5-4200M CPU
- 4 GB RAM: SODIMM DDR3 Synchronous 1600 MHz (0.6 ns)
- SSD: PLEXTOR PX-128M5
[23:51:25][filip@lap2:~]$ uname -a
Linux lap2 4.15.0-34-generic #37-Ubuntu SMP Mon Aug 27 15:21:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[23:51:38][filip@lap2:~]$ java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
[23:51:42][filip@lap2:~]$ LC_ALL=C lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz
Stepping: 3
CPU MHz: 1313.826
CPU max MHz: 3100.0000
CPU min MHz: 800.0000
BogoMIPS: 4988.50
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts flush_l1d
Since I put my hands on Neo4j graph database, I am going to delve into its performance. I'll start with benchmarking basic operations - including data ingestion.
So, let's copy some tabular data into Neo4j. First step is obviously reading and parsing input files.
In case you don't know - the standard command for reading flat files is LOAD CSV. For example, to load data from tab-separated CSV file and print only 1st and 7th fields, you need to run:
LOAD CSV FROM 'file:///input.txt' AS line
RETURN line[0], line[6];
I decided to test this basic load operator in several variants, checking for possible factors that *could* have impact on load performance:
- If input file is compressed or not compressed
- If WITH HEADERS clause is used or not (each row as hash / dictionary)
- If the load is followed with a filter (WHERE clause)
- If the load is follwoed by a simple MATCH rule (which actually does not match any existing paths)
Here are the raw results:
Gzipped | Headers | Filter | Match | Runtime#1 | Runtime#2 | Runtime#3 | Runtime#avg | CypherCommand |
False | False | False | False | 13,99 | 14,70 | 14,13 | 14,28 | LOAD CSV FROM 'file:///locations.csv' AS csv RETURN count(*) |
False | False | False | True | 14,26 | 14,57 | 14,60 | 14,48 | LOAD CSV FROM 'file:///locations.csv' AS csv MATCH (x:X {x: csv[5]}) RETURN count(*) |
False | False | True | False | 16,59 | 16,92 | 17,06 | 16,86 | LOAD CSV FROM 'file:///locations.csv' AS csv WITH csv WHERE csv[5] IS NOT NULL RETURN count(*) |
False | False | True | True | 17,81 | 16,85 | 18,46 | 17,71 | LOAD CSV FROM 'file:///locations.csv' AS csv WITH csv WHERE csv[5] IS NOT NULL MATCH (x:X {x: csv[5]}) RETURN count(*) |
False | True | False | False | 22,22 | 20,53 | 20,74 | 21,16 | LOAD CSV WITH HEADERS FROM 'file:///locations.csv' AS csv RETURN count(*) |
False | True | False | True | 20,62 | 21,08 | 21,30 | 21,00 | LOAD CSV WITH HEADERS FROM 'file:///locations.csv' AS csv MATCH (x:X {x: csv.city}) RETURN count(*) |
False | True | True | False | 22,21 | 22,25 | 22,69 | 22,38 | LOAD CSV WITH HEADERS FROM 'file:///locations.csv' AS csv WITH csv WHERE csv.city IS NOT NULL RETURN count(*) |
False | True | True | True | 22,92 | 22,89 | 23,27 | 23,02 | LOAD CSV WITH HEADERS FROM 'file:///locations.csv' AS csv WITH csv WHERE csv.city IS NOT NULL MATCH (x:X {x: csv.city}) RETURN count(*) |
True | False | False | False | 13,85 | 14,17 | 14,74 | 14,25 | LOAD CSV FROM 'file:///locations.csv.gz' AS csv RETURN count(*) |
True | False | False | True | 14,06 | 14,51 | 14,87 | 14,48 | LOAD CSV FROM 'file:///locations.csv.gz' AS csv MATCH (x:X {x: csv[5]}) RETURN count(*) |
True | False | True | False | 16,62 | 16,66 | 16,97 | 16,75 | LOAD CSV FROM 'file:///locations.csv.gz' AS csv WITH csv WHERE csv[5] IS NOT NULL RETURN count(*) |
True | False | True | True | 17,08 | 16,90 | 17,41 | 17,13 | LOAD CSV FROM 'file:///locations.csv.gz' AS csv WITH csv WHERE csv[5] IS NOT NULL MATCH (x:X {x: csv[5]}) RETURN count(*) |
True | True | False | False | 20,31 | 20,07 | 20,76 | 20,38 | LOAD CSV WITH HEADERS FROM 'file:///locations.csv.gz' AS csv RETURN count(*) |
True | True | False | True | 20,78 | 20,40 | 21,52 | 20,90 | LOAD CSV WITH HEADERS FROM 'file:///locations.csv.gz' AS csv MATCH (x:X {x: csv.city}) RETURN count(*) |
True | True | True | False | 22,03 | 22,03 | 22,79 | 22,28 | LOAD CSV WITH HEADERS FROM 'file:///locations.csv.gz' AS csv WITH csv WHERE csv.city IS NOT NULL RETURN count(*) |
True | True | True | True | 22,55 | 22,65 | 23,34 | 22,85 | LOAD CSV WITH HEADERS FROM 'file:///locations.csv.gz' AS csv WITH csv WHERE csv.city IS NOT NULL MATCH (x:X {x: csv.city}) RETURN count(*) |
Conclusions?
- The gzipped input does not affect LOAD performance. Apparently, gzip decompression is so fast it does not affect anything. Gzipped input seems to be marginally faster
- The WITH HEADERS option adds significant overhead (40-50%) to the LOAD operation.
- The WHERE clause adds a small overhead but it's not so big as WITH HEADERS
- The negative MATCH after CSV LOAD is negligible.
Executive summary:
- Gzip adds 0% to the cost
- Negative MATCH adds 1.40% to the cost
- Simple WHERE adds 18% - 20% to the cost
- WITH HEADERS option adds 43% - 48% to the cost.
Test conditions:
- default configuration for neo4j
- OS: Ubuntu 18.04
- Linux kernel 4.15
- Oracle Java 8
- Dell E6440 laptop
- Intel i5-4200M CPU
- 4 GB RAM: SODIMM DDR3 Synchronous 1600 MHz (0.6 ns)
- SSD: PLEXTOR PX-128M5
[23:51:25][filip@lap2:~]$ uname -a
Linux lap2 4.15.0-34-generic #37-Ubuntu SMP Mon Aug 27 15:21:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[23:51:38][filip@lap2:~]$ java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
[23:51:42][filip@lap2:~]$ LC_ALL=C lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz
Stepping: 3
CPU MHz: 1313.826
CPU max MHz: 3100.0000
CPU min MHz: 800.0000
BogoMIPS: 4988.50
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts flush_l1d
sobota, 22 września 2018
PostgreSQL projection from INT4 x INT4 to INT8, and back again.
From time to time I need to cast (integer,integer) to bigint and bac k.
The rule is simple:
0001000f
0001000f
================
0001000f0001000f
SQL functions:
CREATE OR REPLACE FUNCTION
intpair2bigint(integer, integer)
RETURNS bigint
IMMUTABLE STRICT
LANGUAGE sql
AS '
SELECT (
intpair2bigint(integer, integer)
RETURNS bigint
IMMUTABLE STRICT
LANGUAGE sql
AS '
SELECT (
$1::bit(32)::bit(64)
| ($2::bit(32)::bit(64)>>32)
)::int8
';
CREATE OR REPLACE FUNCTION
bigint2intpair(bigint)
RETURNS TABLE(high integer, low integer)
IMMUTABLE STRICT
LANGUAGE sql
AS '
SELECT
$1::bit(64)::bit(32)::int4,
$1::bit(32)::int4
';
Some tests:
bigint2intpair(bigint)
RETURNS TABLE(high integer, low integer)
IMMUTABLE STRICT
LANGUAGE sql
AS '
SELECT
$1::bit(64)::bit(32)::int4,
$1::bit(32)::int4
';
Some tests:
SELECT intpair2bigint( 0, 0 );
0
SELECT intpair2bigint( 0, (2^31-1)::int4 );
2147483647
SELECT intpair2bigint( 0, (-2^31)::int4 );
-2147483648
SELECT intpair2bigint( (2^31-1)::int4, 0 );
140737488289792
SELECT intpair2bigint( (2^31-1)::int4, (2^31-1)::int4 );
140739635773439
SELECT intpair2bigint( (2^31-1)::int4, (-2^31)::int4 );
140735340806144
SELECT intpair2bigint( (-2^31)::int4, 0 );
-140737488355328
SELECT intpair2bigint( (-2^31)::int4, (2^31-1)::int4 );
-140735340871681
SELECT intpair2bigint( (-2^31)::int4, (-2^31)::int4 );
-140739635838976
Subskrybuj:
Posty (Atom)