XSM : eXtended Sort/Merge utility
by Henri Henault
 
XSM   Intro   Features   Benchmarks   Download   Documentation   References   Extranet   


A multi-platform Huge Files Fast Sort/Merge Program
Time is Money: Speed up your data processing!

XSM can sort 1 Gigabyte in 40 seconds on a less than $500 machine!
Who does better ? See benchmarks

Your challenges:

  • Sort / Merge / Split / Filter / Deduplicate 100 gigabyte files on affordable machines
  • Increase your production capacity by reducing from 20 hours down to 4 hours processing time
  • Speed up your Datawarehouse ETL/EDI data exchanges
  • Optimize your software costs by moving from a really more expensive challenger software

The solution:

  • XSM, already chosen by 57 clients over 11 countries

    XSM has outstanding performance through its proprietary multi-threading technology, making the most of nowadays multi-processor architectures both on High-end IBM, SUN, HP machines as well as on affordable PCs with multi-core processor.

    XSM has powerful features that meet classic Sort / Merge / Split / Filter operations necessary for DataWareHousing and DataMining.

    With evolution of Information Technologies (storage capacity, CPU power), data volumes have exploded in last 10 years. We cannot only rely on CPU power: Software performances are essential!



But why use an external sort ... ??

  1. External sort to speed up database loading
  2. Merge / Split / Filter / Selective copy


1. External sorting for database loading Suppose you have to load every night heavy data files into your favorite database Oracle, DB/2, MySQL, SQL-Server, Informix, Sybase, ...

In this example, we use MySQL, which is pretty fast in data loading.

Suppose you have a strongly indexed table, each night you DELETE the table's content and then reload new data into this empty table,

  • if your input data is not sorted, your database server will have to do the job
  • if your input data is pre-sorted, your database server will just have to load, with no work to build indexes.

Now, let's have a look at our benchmark:

  • Input is an ASCII text file, 100 MegaBytes, 1023009 records
  • Records are variable length text, tab separated, 5 columns : 2 integers, 3 strings
  • SQL engine is MySQL Server 4.0.10-gamma running Linux 2.4.18 on Pentium II/550Mhz 512MB RAM (whatever the RDBMS, phenomenon is identical)
  • Chart shows process total elapse time in seconds:

DB loading unsorted:

  • data loading : 10200 secs
  • total : 10200 secs = 2 hours 50 minutes 28 secs.

DB loading pre-sorted:

  • pre-sorting : 315 secs. (using XSM V5.08)
  • data loading : 213 secs
  • total : 528 secs. = 8 minutes 48 secs. 20 times faster!

Now, you clearly understand the use of external sorting : Pre-sorting is necessary to speed-up huge data processing.

Don't let your integrated-"I can do everything!" database engine do it : it is not its job !

  • Read some interesting articles here and there on the web about pre-sorting ...
  • XSM does the job, but 5 time faster than its chalengers !
  • Try your standard sort ... just to get an idea of performances ...
  • and if you're tired of waiting, try XSM fast sort


2. Merge / Split / Filter / Selective copy You need to merge / split / filter / copy data according to given criteria.

Let's take a trivial example: You daily receive your Sales Report composed of 50 files and you wish to split the data per Zip Code, creating one distinct file per Zip code.

Two solutions :

  1. Using your RDBMS: most folks would go for this option, but it's not the good one!
    • Drop / Create table : 30 seconds
    • Load 50 files into table : 1 hour
    • Run a deduplicate SQL job : 1 hour
    • Run a hundred of UNLOAD jobs, one per Zip Code : 2 hours

    Total time (estimated) : 4 hours

  2. Using XSM as batch external sort/merge : The good option!
    • In one sole operation, XSM does merge / sort / deduplicate / selective split

    Selectif split

    Total time (estimated) : 5 minutes

Rather than a long marketing speech, just read clients feedback then download and evaluate freely XSM by yourself!