About Me

My name is Jingquan XIE [dʒǐ:n tʃuá:n ʃi:è] and XIE is the surname. I am a database enthusiast, knowledge engineer and low-level system developer. Most of the time I work with command line interfaces (CLI) and enjoy it very much. I prefer CLI since most of (if not all) the computable problems can be solved with just one method - character manipulation. In fact this is the only task a computer can do. All others are just interpretations by different hardware components like graphic cards, monitors and network adapters, etc.

Currently I am working at Fraunhofer and living in Germany. At the moment, we work focuses on the research and development of next-generation collaborative pan-Europe crisis management system. In the spare time, I like reading, travelling, photography and hacking software in particular Linux kernel and its subsystems like graphics, network stacks and file systems.


Data Stream Management System

Stream data plays a central role in modern society. applications like real-time air traffic monitoring and control, electronic stock exchange, algorithmic trading, credit card fraud detection, etc. need to process a large amount of streaming data in real-time. Transactional RDBMS is however not designed for that purpose, though it can be used to facilitate the implementation of these system by providing a reliable storage layer. The research work of modern data stream management systems (DSMS) started from the beginning of 21th century in the database community: They noticed that DB were too slow to do real-time data analysis, mainly because of the overhead of transactions and other internal mangement overheads. They started with continuous queries on streams of incoming data by using sliding windows to update the query results. Since then, different DSMS have been proposed focusing on high throughput, real-time capability, and scalability. This research leads to Event Stream Processing (ESP) today. The focus of ESP is real-time processing of huge amount of events. Comparing to ESP, Complex Event Processing (CEP) is a technology originally developed in the 1990s to analyse discrete event simulation. Later on, some researchers noticed the principles and techniques in CEP can be applied in other kinds of event-driven systems. The research topic was quite popular during 2005, 2006, etc. Comparing to ESP, CEP mainly foused on offline analysis based on simulation protocols or logs. CEP is more focused on extractiting information from a cloud of events like enterprise IT, business activity monitoring systems. It focuses on event patterns and derivation information based on the received events to support enterprise decision making. One of the differences of ESP and CEP is event stream and event cloud. Streams have order while cloud not. (in reality, events in stream are also not strictly in order because of network delays, etc.) Global synchronisation with GPS is a solution. Moreover, ESP can be regarded as a subset of CEP with real-time requirements and less complex event patterns: only logic or, logic and, etc. without considering the causality between events. Bridging ESP, CEP and (in-memory) DBMS with weak transactional support is one of the areas I am currently interested in. Comparing to the traditional application areas like banking, insurance, etc. that require strong ACID transaction, weak traditional systems provides higher concurrency, better performance by sacrifying the strict ACID properties. Esper is my first choice to develop event-driven applications with standard Event Processing Language (EPL). I am planning to implement the MATCH_RECOGNIZE clause specified by the event processing community as EPL in PostgreSQL and Redis. Dealing with large rules bases (Event-Condition-Action rule and production rule), uncertainty and out-of-order events is a challenging task in both ESP and CEP. I do not like hyping NoSQL database and Big Data frameworks.

Spatial-Temporal Databases

I'm interested in research and application of Spatial-Temporal Databases. I have been working with PostgreSQL/PostGIS and Oracle for several years. Both are great Relational Database Management Systems with comprehensive spatial- temporal support. For data-centric applications, to ensure the portability, I prefer standard SQL with minimum DBMS-specific syntax. If available it is always my first choice to use in-database features to simplify the data management on the application level. Modern DBMS provides plenty of advanced features to achieve this like Recursive Common Table Expressions (CTE), Composite Index, Inverted Index, User-Defined Functions (UDFs), Foreign Data Wrapper (PostgreSQL), Materialised Views (MV), View Updating, Full Text Search (FTS), etc. Abstract data types like range, tree, JSON or JSONB, spatial types, etc. can also be used to optimise data access and achieve the highest system performance. Query plan is one of the most important techniques to optimise SQL queries and database schema design. Via query EXPLAIN, details of query execution are provided. This applies not only for DML but also for DDL. Besides, both psqland sqlplus are very powerful (if you like CLI) client-side tools to interact with the database backend. In general, RDBMS does not scale very well horizontally. Different DB vendors provide different replication technology with shared-memory (NUMA) or share-nothing architecture. PostgreSQL provides Write-Ahead-Log (WAL) for Point-In-Time-Recovery (PITR). Modern DBMS have sophisticated builtin bi-temporal support: transaction time and valid time. The TotalRecall or FlashBack feature provided in Oracle and the Temporal Validity are typical examples for that purpose. For systems requiring extreme low-latency and high throughput, in memory solutions like Redis without query optimisation would be a better choice. On the other side, most moder RDBMS also provide prepared queries that are parsed and analysed beforehand and the overhead of query planning can also be minimised at runtime based on RDBMS-specific configurations. Furthermore, most DB vendors provide intelligent operations inside of DBMS like calculating string similarities based on the Levenshtein Distance or Edit-Distance.

Knowledge Engineering

Besides databases I am also working in Knowledge Engineering in particular the construction of (non-tacit) knowledge base with Baysian Networks (BN), Description Logics (DL) and Web Ontology Language (OWL). Formalising the knowledge of certain domains plays a central role in many information systems to provide intelligent capabilities. Knowledge is however intrinsically vague and ambiguous. Combining BN and DL provides a powerful formalism to model and reason over the knowledge. For tooling I use extensively Prolog, Protégé Frames and Protégé OWL. Protégé provides great APIs to integrate and manipulate the knowledge base in customised software application. The Resource Description Framework (RDF) is quite popular to support the envisioned Web of linked data or Semantic Web. I have tried to marriage it with the relational DBMS where the anti-pattern Entity-Attribute-Value (EAV) is used. This anti-pattern is not totally useless, however use it with caution. Structured knowledge bases like DBPedia, Freebase, YAGO, etc. are good candidates if accessing knowledge in Wikipedia is needed. They are also some of the most prominent work in the area of Linked Data. Serialisation techniques like RDF/XML, Turtle, and N3 are used to materialise these data with well-accepted common vacabulary like RDFS, FOAF, etc. Different techniques have been developed to semantically annotate the knowledge stored in web pages. Among them, rdfa, microformat, and json-ld are prominent proposals. Using the ontology defined by schema.org is recommended and supported by annotation consumers like the search engine from Google. Currently I am working on the application of DL to formally support Crisis Management (CM) and Critical Infrastructure Protection (CIP) based on modelling and simulation technology. The focus of this research is (inter)dependency analysis of Critical Infrastructures (CI), risk analysis and training and decision making in crisis management. In the spare time, I also read materials about Cellular Automata, which is another way of organising knowledge. Game of Life from John Conway and Rechnender Raum from Konrad Zuse are mind-opening ideas in this area.

Automatic Program Generation

Combining knowledge base and formal specifications to automatically generate computer programs is a challenging task. It is one of the core questions in Artificial Intelligence (AI) - whether computer can write programs by itself. Some preliminary work has been conducted and formal specification systems like B and Z, TLA and TLA+, etc. have also been developed. They are maily used in model checking areas, i.e. automatic program verification - to find potential bugs and vulnerabilities. Automatic program generation, which is a step further, is still far from practical use.

Technical Profiles

My favourite text editor is Vim. It is highly configurable. I use it for virtually everything: developing, blogging, E-Mail writing, etc. I enjoy writing programs in C and server-side JavaScript (depends on the requirements of the software). It is enjoyable to write Text User Interface (TUI) with the new curses - ncurses - if the Graphical User Interface (GUI) is not a hard requirement. Using Autoconf and Automake tools to detect target build environment and generate the Makefiles needed is also quite convenient. Both LEX and YACC are solid tools for creating your own language with context-free grammar, producing C-based parser of your Domain Specific Languages (DSL). Similar tools exist for other languages like ANTLR for Java (I used for my Master Thesis) and PEG.js for JavaScript. Combining Vim and completion engine like YouCompleteMe and TernJS providing a comfortable environment to write C and JS programs. Runtime tracing and debugging I prefer using DTrace. For data-intensive tasks however I prefer SQL which provides extremely high productivity and efficiency. I was a fan of Java and have been working with JavaSE and JavaEE for over six years. It's a great high-level imperative language. Regarding IDEs, I started with Eclipse (SWT) and after three years switched to NetBeans (Swing). In Java ecosystem, for both client side and server side, there exists a huge amount of frameworks like AWT, Swing, JavaFX, EJB, JPA, JSF, CDI, etc. to help developers build comprehensive applications with less efforts. Java is now however too heavy for me to develop agile and highly scalable application. I use NodeJS(libev/libuv)-based server-side JavaScript/TypeScript and native C for backends and AngularJS (Bower, Browserify, Bootstrap) for client-side HTML5 applications. Socket.IO, which uses WebSocket, Pulling, etc. for transports, is my favourite for real-time Web-based applications. Isomorphic design is one of the challenges I am currently facing. Shell and Unix utilities like grep, sed, awk, etc. are my first choice for automating repetitive tasks. Normally bash is my daily shell with customised auto-completion support. I use make quite a lot for dependency management and build automation, not restricted in C development. For NodeJS-based development, I use extensively Grunt and Gulp for build automation.

For text file revision control currently I prefer Git over SVN. To ensure software quality and make the life easier I use extensively Test-Driven Development (TDD) and Behaviour-Driven Development (BDD) for both unit and integration testing (JUnit, JasmineJS, Mocha, Chai, Karma, Selenium, and Phantomjs). Agile methods like Scrum and Extreme Programming (XP) sounds very interesting; due to practical reasons however I still have no chance to really apply them in our project development process. To depict software design in a unified and consistent way, various UML diagrams are used: usecase, depoloyment, component, class, acitivity, state machine, and sequential, etc. Design methods like Convention over Configuration (CoC) and Dependency Injection (DI) are extensively used when I design a software system to ensure system flexibility extensibility, and maintainability. My favourite E-Mail client (Mail User Agent - MUA) is Mutt. For sending E-Mails I use the classic sendmail program as the Mail Transfer Agent - MTA. To write technical documents like user manual of software systems, AsciiDoc syntax is my first choice. It is a replacement of DocBook that is a great XML format however different to write. For serious writing like scientific publictions, I use TeX and LaTeX with DVI or PostScript as the output (well, most people like PDF). I use dvifb for previewing the generated DVI file under Linux virtual console. For scientific plotting Gnuplot is my favourite.

Sometimes I use Cordova/PhoneGap and AngularJS to develope small HTML5-based mobile apps. Recently Ionic is emerging, which uses Cordova and AngularJS underneath. D3.js is a good choice for browser-based visualisation. I use extensively RESTful web service for its simplicity and resource-oriented entity-centric system design methodology. The four major HTTP request types are in most cases sufficient to manage different kinds of resources. To prepare the document of RESTful API, I prefer the extended markdown syntax provided by API Blueprint. The output can then be further processed by renderers like aglio, which can produce single HTML documents with Bootstrap styles. Recently I tried React Native. It is a great idea to maintain one code base for multiple platform, without losing the benefits provided by native operating system.

For spatial data I usually work with OpenStreetMap by importing the complete dataset into a PostGIS database. It has a huge amount of information that is usually hidden and can be useful for various application scenarios. Both Mapnik and CartoCSS are my favourate tools for rendering static maps. Shapefiles are great choice for exchanging spatial data with flat files. I use MapServer (via fastcgi interface provided by Nginx) to build WMS and WFS services. It has also excellent support for serving temporal data (both vector and raster) with WMS-T(TileIndex feature for serving temporal raster dataset). To serve small amount of static tiles, my solution is mbtiles (a spatiaLite database) and tilestream. Both are from MapBox. For large scale dynamic tiles, I prefer the standard setup for OpenStreetMap, i.e. with Apache, modtile, Mapnik, PostGIS. To build scalable web applications I use Nginx with reverse proxy, load balancing and HTTPS termination enabled. It provides great performance and easy-to-understand semi-declarative configuration syntax. For applications with extreme caching requirements I prefer to use Varnish, which provides great cache functionality for HTTP GET requests.

+My PGP key +My public key +My Blog

Created on 2005-09-27 and last modified on 2016-07-18

This website is built with MkDocs and all textual contents are written in Markdown syntax with Vim.