Archive forJune, 2006

Open Document Format

Open Document Format (ODF), short name for OASIS Open Document Format for Office Applications (OpenDocument). ODF is an open document file format that can be used for creating various document types like text documents, spreadsheets, presentations, databases, charts, tables and formulas. It was initially developed by Openoffice.org and later on developed by OASIS International Consortium.

By open document file format, I mean the XML-based format specifications are accessible by everyone and anyone can implement it freely to create editable office documents. As ODF is based upon XML, it defines a schema for representing all types of data. The complete specifications are available online.

It provides an alternative solution to vendor-specific (Microsoft Office, being the single major one) proprietary formats, so that organizations and individuals may adopt an open format and can use other software in case they don’t want to depend on a proprietary format & software and want to remain flexible.

ODF Specifications

An OpenDocument file can be either a single XML file, or as a collection of several subdocuments within a package (ZIP archive mainly) each of which represents part of the complete document. The latter is used almost exclusively, for it can embed binary content and tends to be significantly smaller. In OpenDocument format, each structural component is represented by an element with associated attributes. It applies to all document types, so there is no difference between a text document, a spreadsheet or a presentation, apart from content.

There are four types of subdocuments, each with different root elements. Additionally, the single XML document has its own root element. The root elements are given below:

Root Element Subdocument Content Name in Package
<office:document> Complete document in a single XML Document. n/a
<office:document-content> Document content & automatic style used in the content. content.xml
<office:document-styles> Styles used in document content & automatic style used in the styles. styles.xml
<office:document-meta> Document meta information, like author name, last time of save action. meta.xml
<office:document-settings> Application-specific settings such as zoom factor, cursor position etc. settings.xml

OpenDocument format is also supported for the Accessibility features. Specification of OpenDocument is going through an extensive review, and many sub-components of ODF have gone through W3C’s Web Accessibility Initiative processes. ODF is supported as it’s got an advantage over proprietary formats, which generally don’t undergo peer reviews for accessibility issues. Open specifications of OpenDocument allows anyone to create software for those suffering from disabilities unlike proprietary formats which cannot be implemented easily by everyone.

Microsoft Office Open XML

Microsoft Open XML format is again Microsoft-specific file format to be implemented in upcoming Microsoft Office 2007 release of Office Suite. It’s the only competitor considered to the Open Document Format. Now, the difference between the two is ODF is an open format, but Microsoft’s Open XML has licensing requirements and it prevent some competitors from using it.

Now, we need to know what it means when Microsoft has created a separate format for itself. Certain software companies have criticized this move as Microsoft’s plan to counter attack the ODF standard, as currently Microsoft enjoys almost complete monopoly in the Office-suite market. So, if ODF becomes a popular, de-facto standard for creating, exchanging office documents, that means Microsoft loses its monopoly over the market.

But Microsoft maintains its stand for a different XML format so as to allow for backwards compatibility for existing documents and full support of all the features available, which are not supported by ODF, according to Microsoft. So, that is Microsoft’s decision.

But the good point is, Industry is slowly progressing from conventional binary formats (.ppt, .doc etc.) to open XML based formats which would open many other options for users to create office documents. The new format allows for smaller sized documents and allows for lower level of content manipulations for document recovery, modular structure by separating content, layout, styles & better accessibility to all types of users.

Comments

Single File Format

Recently, I read this article, where in the Author has imagined interoperability of all files some way; where all possible file formats could be represented in a standard/single format, so as to ease the process of file editing/creating and parsing.

Read it here

Comments

Oracle Joins

Definition / Concept

If we need to retrieve data from more than one table we use JOINs. A join is a query, used to retrieve data from more than one table based on JOIN condition (Note: Same table can referenced more than once).

Conditions

In SQL terms, Join occurs when we specify more than one table in the FROM clause of the SELECT statement. Additionally, we also specify a condition in the WHERE clause to specify JOIN condition in the SELECT statement. Now, JOINs are conditions to join/compare two tables’ data retrieved from the selected tables’ columns based on JOIN Condition. Oracle combines the rows’ data to compare with other tables’ rows and evaulate the JOIN condition for TRUE/FALSE.

We need to qualify columns having same name with table names/aliases to avoid ambiguity errors, for joins can be extended to multiple tables. In case of multiple joins extending to more than 2 tables, let’s say 3 for example, we create a recordset based on 1st join between 1st two tables & join the resulting recordset with 3rd table using 2nd Join.

Table 1 + 
        |
      (JOIN)--->Recordset I +
        |                   |
Table 2 +                   |
                          (JOIN)--- Final Recordset
                            |
                            |
                            +
                          Table 3

Now, we’ve understood what are joins, their need & application, let’s see what are the different types of JOINs
available.

SIMPLE JOIN/INNER JOIN/EQUALITY JOIN/EQUIJOIN

All the above terms refer the same JOIN type. Let’s call them INNER JOIN for our reference. Inner Join have equality operator as the JOIN condition. So, INNER JOIN returns only those rows from two tables that have same data for the specified columns.

Example:

+--------+              +---------+
|   ID   |              |   ID    |
|  NAME  |              | SALARY  |
|  AGE   |              |         |
|--------|              |---------|
|EMPLOYEE|              | PAYROLL |
+--------+              +---------+

We need to extract the salary of an employee from the payroll table. So, a INNER JOIN shall be created between EMPLOYEE
& PAYROLL table, using ID columns from the two tables.

SELECT employee.id, employee.name, payroll.salary
FROM employee, payroll
WHERE employee.id = payroll.id;

SELF JOIN

If we compare the data of a table with itself, we create a self join. I’ll give here a classical example of employees & managers. Suppose, we need to know the name of employees’ manager from the employee table, we’ll create a self join. For example purpose, I modify the EMPLOYEE table.

+--------+
|   ID   |
|  NAME  |
|  AGE   |
| MGR_ID |
|--------|
|EMPLOYEE|
+--------+

SELECT e1.name EMPLOYEE, e2.name MANAGER
FROM employee e1, employee e2
WHERE e1.mgr_id = e2.id;

CARTESIAN PRODUCT

If there is no comparison condition specified in a JOIN condition, a cartesian product is returned. In such a case, every row in a table is joined with every row in the other table of the JOIN. Let’s say, if there are 3 rows in Ist table & 4 rows in the IInd table, 12 rows (3×4) would be produced.

+--------+              +---------+
|   ID   |              |   ID    |
|  NAME  |              | SALARY  |
|  AGE   |              |         |
|--------|              |---------|
|EMPLOYEE|              | PAYROLL |
+--------+              +---------+

SELECT employee.id, payroll.salary
FROM employee, payroll
WHERE employee.name LIKE 'A%';

NATURAL JOIN

In a NATURAL JOIN all columns with same names & datatype are joined from both tables. Here, we need to qualify the column name by table name or alias. All rows matching with the same column data are retrieved.

+--------+              +---------+
|   ID   |              |   ID    |
|  NAME  |              | SALARY  |
|  AGE   |              |         |
|--------|              |---------|
|EMPLOYEE|              | PAYROLL |
+--------+              +---------+

SELECT id, name, salary
FROM employee NATURAL JOIN payroll;

OUTER JOIN

Till now, all joins we’ve discussed retrieve all matching records from both tables. But sometimes, there is a situation when we also need to retrieve non-matching records from either one or both tables. For retrieving such non-matching records, we create OUTER JOINs. There are 3 type of OUTER JOINs based on from which side of the JOIN Condition we extract the non-matching records. We assume the following structure for our examples:

+--------+              +---------+ 
|   ID   |              |         | 
|  NAME  |              |   ID    | 
|  AGE   |              |  SALARY | 
| MGR_ID |              |         | 
|--------|              |---------| 
|EMPLOYEE|              | PAYROLL | 
+--------+              +---------+

RIGHT OUTER JOIN

As the name suggests, we extract non-matching records from the table mentioned on the right hand side of the JOIN condition. So, all the matching records from two tables, as well as non-matching records from the 2nd table (on right hand) are retrieved. NULL values are replaced for the non-existing values for columns extracted from table on the right hand side in non-matching rows.

SELECT e.id, p.salary
FROM employee e, payroll p
WHERE e.id = p.id (+);

LEFT OUTER JOIN

In this JOIN, we extract non-matching records from the table mentioned on the left hand side of the JOIN condition in addition to the matching records. NULL value is displayed for the columns extracted from table on the left hand side in the non-matching rows.

SELECT e.id, p.salary
FROM employee e, payroll p
WHERE e.id (+) = p.id;

FULL OUTER JOIN

In case of a full outer join, we extract matching records from both tables specified in JOIN condition, as well as non-matching records from both tables. Actually to create FULL OUTER JOIN, a UNION set is generated from both LEFT OUTER JOIN & RIGHT OUTER JOIN. NULL values are displayed for non-existing records of the other table in each non-matching row of either table.

SELECT e.id, p.salary
FROM employee e, payroll p
WHERE e.id = p.id (+)
UNION
SELECT e.id, p.salary
FROM employee e, payroll p
WHERE e.id (+) = p.id;

Note:

  • SQL statements provided in the post were tested on Oracle 8 version only. Natural Joins don’t exist on Oracle 8
  • ANSI syntax has not been covered here.

Comments

Google Spreadsheets

Google is launching web-based spreadsheet application called Google Spreadsheets.

Still under beta test, requiring invites to try out this new application; nothing much can be said about the application & its future as of now. But, as I was going through various news articles, people are already claiming that Google is seriously competing with Microsoft by launching this spreadsheet program. In my opinion, it’s just small view of a bigger vision.

For, I believe it’s not just about Google taking on Microsoft or any other XYZ Co., but what Google is trying to achieve is reducing the dependance of the users on an operating system largely.

When we observe the recent moves made by Google, we observe that Google launched Email, Instant Messaging, Calendar etc and now Google Spreadsheets, a spreadsheet application. The straight & obvious advantage of all such applications is availability on the web, so no matter where you are, you’ve access to your information everytime; no more dependancy on the personal desktop PC, that is. Whenever you need to search, share files; you can use Google Search, email/invite people & share the files.

The only skepticism would be the usability of such applications, as they are web based & security of the information. Usability I would define, how responsive the application is while the user is working on it. Well, a web based application is not that responsive in comparison to a desktop based application, but largely all such applications are usable as most of them are javascript based, which gets downloaded once the application is accessed through the Web Browser.

Regarding the security of the data, it’s as secure/insecure that your email account is. Since, it’s password protected, we can say the data is safe/secure till the time you’ve chosen a medium-to-high strength password. But then, most of the desktop pc’s are also not completely safe. It’s a debatable topic, for people also doubt the security/integrity of the provider of such applications.

So, I observe that Google is not competing with Microsoft by launching another Operating System, but they’re eliminating/diminishing the role of the OS itself for the users by offering web based applications & I agree that such approach is right & smart.

It’s hitting the market perfectly with every application it has launched. Now, all these developments could be labelled as Google killing Microsoft or killing similar desktop applications viz. OpenOffice Calc alike.

Comments (1)