My name is Jingquan XIE [dʒǐ:n tʃuá:n ʃi:è] and XIE is the surname. I am a database enthusiast, knowledge engineer and low-level system developer. Most of the time I work with command line interfaces (CLI) and enjoy it very much. I prefer CLI since most of (if not all) the computable problems can be solved with just one method - character manipulation. In fact this is the only task a computer can do. All others are just interpretations by different hardware components like graphic cards, monitors and network adapters, etc.
Currently I am working at Fraunhofer and living in Germany. At the moment, we work focuses on the research and development of next-generation collaborative pan-Europe crisis management system. In the spare time, I like reading, travelling, photography and hacking software in particular Linux kernel and its subsystems like graphics, network stacks and file systems.
Data Stream Management System
Stream data plays a central role in modern society. applications like
real-time air traffic monitoring and control,
electronic stock exchange, algorithmic trading,
credit card fraud detection, etc. need to process a large amount of streaming
data in real-time.
Transactional RDBMS is however not designed for that purpose, though it can be
used to facilitate the implementation of these system by providing a reliable
The research work of modern data stream management systems (DSMS) started from
the beginning of 21th century in the database community:
They noticed that DB were too slow to do real-time data analysis, mainly
because of the overhead of transactions and other internal mangement overheads.
They started with continuous queries on streams of incoming data by using
sliding windows to update the query results.
Since then, different DSMS have been proposed focusing on
high throughput, real-time capability, and scalability.
This research leads to Event Stream Processing (ESP) today. The focus of ESP is
real-time processing of huge amount of events.
Comparing to ESP,
Complex Event Processing (CEP) is a technology originally developed in the 1990s to analyse discrete
event simulation. Later on, some researchers noticed the principles and
techniques in CEP can be applied in other kinds of event-driven systems.
The research topic was quite popular during 2005, 2006, etc.
Comparing to ESP, CEP mainly foused on offline analysis based on simulation
protocols or logs.
CEP is more focused on extractiting information from a cloud of events like
enterprise IT, business activity monitoring systems. It focuses on event
derivation information based on the received events to support enterprise
One of the differences of ESP and CEP is event stream and event cloud. Streams
have order while cloud not. (in reality, events in stream are also not
strictly in order because of network delays, etc.) Global synchronisation with
GPS is a solution.
Moreover, ESP can be regarded as a subset of CEP with real-time requirements
and less complex event patterns: only logic or, logic and, etc. without
considering the causality between events.
Bridging ESP, CEP and (in-memory) DBMS with weak transactional support is one
of the areas I am currently interested in. Comparing to the traditional
application areas like
banking, insurance, etc. that require strong ACID transaction, weak
traditional systems provides higher concurrency, better performance by
sacrifying the strict ACID properties.
Esper is my first choice to develop event-driven applications with standard
Event Processing Language (EPL).
I am planning to implement the
MATCH_RECOGNIZE clause specified by
the event processing community as
Dealing with large rules bases (Event-Condition-Action rule and production
rule), uncertainty and out-of-order events is a challenging task in both ESP
I do not like hyping
NoSQL database and
Big Data frameworks.
I'm interested in research and application of Spatial-Temporal Databases. I
have been working with
Oracle for several years. Both
are great Relational Database Management Systems with comprehensive spatial-
temporal support. For data-centric applications, to ensure the portability, I
prefer standard SQL with minimum DBMS-specific syntax. If available it
is always my first choice to use in-database features to simplify the data
management on the application level. Modern DBMS provides plenty of
advanced features to achieve this like
Recursive Common Table Expressions
User-Defined Functions (UDFs),
Foreign Data Wrapper (PostgreSQL),
Materialised Views (MV),
Full Text Search (FTS), etc.
Abstract data types like
etc. can also be used to optimise data access and achieve the highest system
Query plan is one of the most important techniques to optimise SQL queries and
database schema design. Via query
EXPLAIN, details of query execution are
provided. This applies not only for DML but also for DDL.
sqlplus are very powerful
(if you like CLI) client-side tools to interact with the database backend.
In general, RDBMS does not scale very well horizontally. Different DB vendors
provide different replication technology with
shared-memory (NUMA) or
PostgreSQL provides Write-Ahead-Log (WAL) for Point-In-Time-Recovery
Modern DBMS have sophisticated builtin
bi-temporal support: transaction time
and valid time. The
FlashBack feature provided in Oracle
Temporal Validity are typical examples for that purpose.
For systems requiring extreme low-latency and high throughput, in memory
Redis without query optimisation would be a better choice.
On the other side, most moder RDBMS also provide
prepared queries that are
parsed and analysed beforehand and the overhead of query planning can also be
minimised at runtime based on RDBMS-specific configurations.
Furthermore, most DB vendors provide intelligent operations inside of DBMS
like calculating string similarities based on the
Levenshtein Distance or
Besides databases I am also working in Knowledge Engineering in
particular the construction of (non-tacit) knowledge base with
Description Logics (DL) and
Web Ontology Language (OWL).
knowledge of certain domains plays a central role in many information systems
to provide intelligent capabilities. Knowledge is however intrinsically vague
and ambiguous. Combining
DL provides a powerful formalism to model and
reason over the knowledge. For tooling I use extensively
Protégé Frames and
Protégé OWL. Protégé provides great APIs to integrate and manipulate the knowledge
base in customised software application.
Resource Description Framework (RDF) is quite popular to support the
envisioned Web of linked data or Semantic Web. I have tried to marriage it
with the relational DBMS
where the anti-pattern
Entity-Attribute-Value (EAV) is used. This
anti-pattern is not totally useless, however use it with caution.
Structured knowledge bases like
YAGO, etc. are
good candidates if accessing knowledge in
Wikipedia is needed.
They are also some of the most prominent work in the area of Linked Data.
Serialisation techniques like
are used to materialise these data with well-accepted common vacabulary
techniques have been developed to semantically annotate the knowledge stored
in web pages. Among them,
json-ld are prominent
proposals. Using the ontology defined by
schema.org is recommended and
supported by annotation consumers like the search engine from Google.
Currently I am working on the application of
DL to formally support Crisis
Management (CM) and Critical Infrastructure Protection (CIP) based on
modelling and simulation technology.
The focus of this research is (inter)dependency analysis of Critical
Infrastructures (CI), risk analysis and training and decision making in crisis
In the spare time, I also read materials about
Cellular Automata, which is
another way of organising knowledge.
Game of Life from John Conway and
Rechnender Raum from Konrad Zuse are
mind-opening ideas in this area.
Automatic Program Generation
Combining knowledge base and formal specifications to automatically generate
computer programs is a challenging task.
It is one of the core questions in Artificial Intelligence (AI) - whether
computer can write programs by itself.
Some preliminary work has been conducted and formal specification systems like
TLA+, etc. have also been developed.
They are maily used in model checking areas, i.e. automatic program
verification - to find potential bugs and vulnerabilities.
Automatic program generation, which is a step further, is still far from
My favourite text editor is
Vim. It is highly configurable. I use it for
virtually everything: developing, blogging, E-Mail writing, etc.
writing programs in
C and server-side
requirements of the software).
It is enjoyable to write Text User Interface (TUI) with the new
ncurses - if the Graphical User Interface (GUI) is not a hard requirement.
Automake tools to detect target build environment and
Makefiles needed is also quite convenient.
YACC are solid tools for creating your own language with
context-free grammar, producing C-based parser of your Domain Specific
Languages (DSL). Similar tools exist for other languages like ANTLR for Java
Combining Vim and completion engine like
providing a comfortable environment to write C and JS programs.
Runtime tracing and debugging I prefer using
For data-intensive tasks however I prefer
which provides extremely high productivity and efficiency.
I was a fan of
and have been working with
JavaEE for over six years. It's a great
high-level imperative language.
Regarding IDEs, I started with Eclipse (SWT) and after three years switched to
NetBeans (Swing). In Java ecosystem, for both client side and
server side, there exists a huge amount of frameworks like
CDI, etc. to help developers build comprehensive
applications with less efforts.
Java is now however too heavy for me to develop agile and highly scalable
for backends and
Bootstrap) for client-side
Socket.IO, which uses
Pulling, etc. for transports, is my
favourite for real-time Web-based applications.
Isomorphic design is one of the challenges I am currently facing.
Shell and Unix utilities like
awk, etc. are my first choice for automating repetitive tasks.
bash is my daily shell with customised auto-completion support.
make quite a lot for dependency management and build automation, not
restricted in C development. For NodeJS-based development, I use extensively
Gulp for build automation.
For text file revision control currently I prefer
To ensure software quality and make the life easier I use extensively
Test-Driven Development (TDD) and
Behaviour-Driven Development (BDD) for
both unit and integration testing (
Agile methods like Scrum and Extreme Programming (XP) sounds very interesting;
due to practical reasons however I still have no chance to really apply them
in our project development process.
To depict software design in a unified and consistent way, various UML
diagrams are used: usecase, depoloyment,
component, class, acitivity, state machine, and sequential, etc.
Design methods like Convention over Configuration (CoC) and
Dependency Injection (DI) are extensively used when I design a software
system to ensure system flexibility extensibility, and maintainability.
My favourite E-Mail client (Mail User Agent - MUA) is
Mutt. For sending
E-Mails I use the classic
sendmail program as the Mail Transfer Agent - MTA.
To write technical documents like user manual of software systems,
syntax is my first choice. It is a replacement of
DocBook that is a great
XML format however different to write.
For serious writing like scientific publictions, I use
LaTeX with DVI or PostScript as the
output (well, most people like PDF). I use
dvifb for previewing the generated
DVI file under Linux virtual console. For scientific plotting
Gnuplot is my
Sometimes I use
AngularJS to develope small
HTML5-based mobile apps. Recently
Ionic is emerging, which uses
D3.js is a good choice for browser-based visualisation.
I use extensively RESTful web service for its simplicity and resource-oriented
entity-centric system design methodology. The four major HTTP request types
are in most cases sufficient to manage different kinds of resources.
To prepare the document of RESTful API, I prefer the extended markdown syntax
API Blueprint. The output can then be further processed by
aglio, which can produce single HTML documents with
Recently I tried
React Native. It is a great idea to maintain one code base
for multiple platform, without losing the benefits provided by native
For spatial data I usually work with
OpenStreetMap by importing the complete
dataset into a PostGIS database.
It has a huge amount of information that is usually
hidden and can be useful for various application scenarios. Both
CartoCSS are my favourate tools for rendering static maps.
great choice for exchanging spatial data with flat files. I use
fastcgi interface provided by
WFS services. It has also excellent support for
serving temporal data (both vector and raster) with
feature for serving temporal raster dataset).
To serve small amount of static
tiles, my solution is
mbtiles (a spatiaLite database) and
are from MapBox. For large scale dynamic tiles, I prefer the standard setup
for OpenStreetMap, i.e. with
To build scalable web applications I use
Nginx with reverse proxy, load
balancing and HTTPS termination enabled. It provides great performance and
easy-to-understand semi-declarative configuration syntax. For applications
with extreme caching requirements I prefer to use
Varnish, which provides
great cache functionality for HTTP GET requests.
Created on 2005-09-27 and last modified on 2016-07-18
This website is built with MkDocs and all textual contents are written in Markdown syntax with Vim.