OK – It’s time to complete what we started last month...! ...and, of course, it's nice to see you back... :)
First, let's remember the challenge: Let’s do some plain
SQL text compression (in CUBRID,
of course..)
- You don’t have an issue with the database response speed, but you have an issue with the big data(base) size
- The solution you came up with is obviously to compress its content, using some text compression algorithm
- You can’t use stored procedures! (as they are not standard plain SQL)
- You must achieve compression only by using SQL statements and tables and what else a “standard” DMBS is offering!
- You can use as many SQL statements as you want!
- Once gain – you can use only standard SQL statements (including any well-known functions and data manipulation commands)
Just in case you forgot were we arrived in the first part, please make sure you remember...
The last thing we did was to create a “compressor" table, with a row number, an input word and the "compressed" value for the input word, using the following SQL to calculate the “compressed” value:
The last thing we did was to create a “compressor" table, with a row number, an input word and the "compressed" value for the input word, using the following SQL to calculate the “compressed” value:
UPDATE compressor
SET output_word = CONCAT(CHR(32 + CAST(SUBSTR(TO_CHAR(id, '0000'), 1,2) AS
INT)), CHR(32 + CAST(SUBSTR(TO_CHAR(id, '0000'), 3,2) AS INT)))
Let's continue from there!
3. Next, I created a temporary table – “words” - to hold text-to-be-compressed words and I created the SQL statement to
populate the table:
4. Then, I replaced each “known” word from the "words" table with its compressed
equivalent:
That's all, folks!... …I will leave the “revert”(= "uncompress") algorithm steps to you! :)
Of course, my final question (and challenge for you) is: What is your solution? Have you
found a better one…? :)
See you next time –
Bye-bye!
/Dee
P.S. I almost forgot to tell you about a much simpler, a no-brainer
solution! A solution which works, but it’s quite ugly…
Here it is: You simply execute a chained REPLACE command (using multiple SQLs or
just a big-big-big one), using as many times as you want the most common used
English words you choose to hard-code.
For example, you could do:
SELECT 'John Doe is the man. He is better, taller and stronger!'
INTO :input_txt;
SELECT :input_txt INTO :output_txt;
SELECT REPLACE(:output_txt, '. ', '^0') INTO :output_txt;
SELECT REPLACE(:output_txt, ' of ', '^1') INTO :output_txt;
SELECT REPLACE(:output_txt, ' to ', '^2') INTO :output_txt;
SELECT REPLACE(:output_txt, ' is ', '^3') INTO :output_txt;
… and so on ...
This would obviously ensure (some) compression, and the more words you choose, the better compression you have...
...but yes, it’s far-far-far from looking good or optimal…...but in the end, if you refer strictly to the problem of achieving some level of a text compression, it is probably the simplest solution after all...