Fixing OG Forum (Organic Groups forums)

Organic Groups is a module -- a broad suite of modules actually -- allowing you to create and maintain private groups inside of the Drupal content management system.  While Drupal has a powerful roles and taxonomy system with lots of third-party node access tools, Organic Groups (OG) is an out-of-the-box solution that doesn't require much knowledge of Drupal internals to get yourself up and running.

What is OG?  Think "Yahoo Groups".  It's a way to partition a Drupal site to support multiple private, semi-priviate and/or public groups, each with their own features.

Naturally, one of those features is forums so Organic Forum was born.  The problem is that Organic Forum is a third-party module, not something in Drupal core.   Third-party modules are independent projects with a dependency on a volunteer group of developers (which is often just one person) staying on top of bug reports and updates to other modules which could have an impact on their own.  Sometimes those people drop the ball.

Drupal.org in turn stays on top of projects which aren't being maintained in a timely manner, which is where things went wrong with Organic Forum.  Its developer may have become overtaxed with real-world work or simply lost interest in the project.  Whatever the case, Drupal deprecated Organic Forum and marked it "abandoned".   Worse, there were several fairly serious security issues with it so Drupal also recommended that nobody use it.

The problem is that that's not an option if you already have forums running inside of OG or you have a project that requires them.

That was the situation I found myself in.  A client in the nonprofit education industry demanded forums for its private groups and Organic Forum was the only practical solution besides ditching Drupal entirely for something like Acquia Commons, which has Organic Groups built in with a nice third-party forum system.

The client wasn't overly concerned about unauthorized people seeing content in those private forums but there was a bigger problem.  The client had a landing page where it wanted a list of teasers for all forums for which the user had access, both public and private.  The "Active Forum Topics" block in Drupal's stock forum.module doesn't know about OG's privacy and the "Active Forum Topics in group" block in Organic Forum only displays on Organic Group pages, not from the Community landing page the client wanted.

One of the nice things about Drupal is that you can override themes, or how data is displayed.  However, this function isn't a theme but a hook that generates data and hooks can't be overridden.  The only option was to write a new Active Forum Topics hook incorporating the client's requirements, which is what I did.   By creating it as a small module, it also meant that I didn't have to hack Drupal core or a third party module to do it.

The module is attached.  There's not much to it except a couple of slightly dense SQL queries incorporating a UNION to extract both public discussions and authorized private discussions into one result set.  Otherwise it's a standard Drupal hook_block(). Just unpack it in your modules directory and enable it in admin/build/modules/list.  Two new blocks will appear under admin/build/block/list: OG Forum Fixes: Active forum topics and OG Forum Fixes: New forum topics.  Use these as you would the blocks that come standard with Drupal's forum module.

Database Meets Revision Control


Any developer who has worked with HIPAA compliancy knows that the law is murky at best and the fed doesn't publish a programmers' guide to make your life any easier.  However, one of the cardinal rules is the requirement to keep track of who sees HIPAA data, who modifies it and when this was done.  Another is that if you delete/update patient data you need to log what was deleted/updated in order to provide an audit trail, if only for the lawyers.  Failure to do so can subject a company to some pretty draconian penalties.

This creates a challenge on the database side because SQL UPDATE obliterates a record's history.  There are a few potential solutions, such as maintaining logs which are written to based on table triggers.   I've done this but the log of atomic changes grew to immense size.  It was also difficult to reconstruct a large record based on potentially dozens or even hundreds of changes to records which must be retained for up to six years.

In a headbanging session with the CTO of Children's Health Fund we determined that what we needed was a hybrid of a relational database and a resource control system, where a SQL UPDATE would maintain a copy of the pre-updated record and freeze it from further changes.  RCS does its work by storing just the changes, or diffs, made to a document.  While it would be technically possible to do this with a database record -- for instance, using a BLOB in a sibling table -- there's a simpler and more practical method that also maintains relational integrity.

Finding duplicate records in a database: the SQL HAVING clause

One issue I run across occasionally is a table with duplicate entries such as two entries for the same company in an accounts payable system.  This can create embarrassing problems with billing if ACME Inc #1 is 90 days overdue because someone posted a payment, and now a credit, to ACME Inc #2.

I ran across this problem recently on a health records system which was missing a UNIQUE constraint on a secondary patient ID field: student ID.   The application was originally built without that student ID but  the client decided later that it wanted it added to the patient record.  That was done but without any database-bound constraints because it was an optional field that most clinics probably wouldn't use.  Or so the client thought.

A few months later, the client was so happy with the software that it decided to move all its student records over to the application so that the New Patient flow could import data to a data warehouse as a preload, saving the operator time and errors when the patient was entered into the core system.  The problem was that during those interim months, operators had been manually putting bad data into those student IDs resulting in New Patient importing patients from the warehouse which already existed in the patient table.  Voila, two identical patients with differing student IDs as well as two very different patients with the same student ID.

The only correctlive option in this case is to merge the two patients to one, but before you can do that you have to decide which patient record is the "authority" and what data you want to merge from the other record(s).  And before you can do that you have to identify those duplicates.

This query will accomplish that.  It employs the HAVING filter companion of the GROUP BY result set aggregator.   I don't see many queries employing HAVING, which is a shame because it's a powerful way to filter grouped data, almost like a WHERE clause.

Here's an example of a query on a patient table returning all duplicate patients with the same first and last name and student ID:
 
SELECT
    COUNT(*),
    first_name,
    last_name,
    student_id
FROM patients
GROUP BY first_name, last_name, student_id
HAVING COUNT(*) > 1;

Result:

  count | first_name  |   last_name    | student_id
--------+-------------+----------------+------------
     2  | KENNETH     | RIVERA         | 268098363
     2  | ERIC        | SALLES         | 217257892
     2  | MICHAEL     | LAWRENCE       | 274295927
     2  | LOUISE      | ROSARIO        |
     2  | KENDRA      | PRICE          | 224434348    

You can, of course, add other conditons to that query to find, for instance, all patients with the same last_name and student_id.  But that just gives you a list of the duplicated fields in the table.   Most of these results do indeed look like dupes but in the case of LOUISE ROSARIO, is she a dupe or is she two different patients without a student ID?  You need to eyeball their records to make sure.  What would be nicer is a pointer to the actual records where those dupes exist.  That's a tougher challenge because you can't add patient_id to this query without breaking the GROUP BY logic.

That can be accomplished with a nested, or sub, query with the above query serving us that search data.

SELECT
    PD.person_id,
    PD.first_name,
    PD.last_name,
    PD.dob,
    PD.student_id
FROM patients AS PD,
(SELECT
        first_name,
        last_name,
        student_id
    FROM patients
    GROUP BY first_name, last_name, student_id
    HAVING COUNT(*) > 1) AS SUB1
WHERE
    PD.first_name = SUB1.first_name
AND PD.last_name = SUB1.last_name
AND PD.student_id = SUB1.student_id
ORDER BY PD.student_id, PD.last_name, PD.first_name;

    

Results:

patient_id | first_name |   last_name    |    dob     | student_id
-----------+------------+----------------+------------+-----------
     27914 | VICTOR     | TORREDO        | 1995-02-14 | 432019685
     27915 | VICTOR     | TORREDO        | 1995-02-14 | 432019685
      8191 | ESTHER     | WILLIAMS       | 1995-09-23 | 206567040
      4519 | ESTHER     | WILLIAMS       | 1995-09-23 | 206567040
     17236 | THOMAS     | ASHCROFT       | 1995-02-13 | 333279517
      1288 | THOMAS     | ASHCROFT       | 1995-02-13 | 333279517

     17888 | LOUISE     | ROSARIO        | 1987-06-17 |
     11368 |
LOUISE     | ROSARIO        | 1972-01-30 |

Aha, they're obviously two different LOUISE ROSARIOs.

By the way, SQL, being as flexible as it is, would let you write this another way, such as with an explicit JOIN.  I leave that as an exercise to the interested user <g>.
 

Using Google Maps with the Garmin Zumo 660

One of the best things about motorcycling is spending a nice day exploring twisty back roads far from home.  One of the worst things is getting lost while running low on fuel and being clueless not only about where you are but where to find gas.  Fortunately, technology comes to the rescue again with a variety of water-resistant GPS units built especially for motorcycles.  Last week, I became the owner of a Garmin Zumo 660 GPS.  I also purchased a bluetooth headset kit for my helmet so I could hear the unit.

After a rocky start, which included trying to charge the Sena SMH10 bluetooth device through the MP3 port (doh!), I spent a couple of rainy nights familiarizing myself with the Garmin.  It's an impressive device, as it should be with a retail price over $700.  But it had one very disappointing handicap: it can only import route sheets from Garmin's MapSource software.  Not only do Google, MapQuest and Bing Maps make MapSource look like a relic from the Windows 3 era, MapSource is only available for Windows, not my Mac.

Yes, the Garmin will let you import destination points from the more popular online map vendors but the Garmin will construct the route for you.  That's fine if you want to get to Point B as quickly as possible but I usually don't want the fastest route.  I usually want one of the least efficient paths -- one which will take me through small villages and across covered bridges with a stop at my favorite hot dog stand which may 20 miles off the optimum route the Garmin creates for me.  (For instance, my favorite route from Washington, DC to NYC is through West Virginia's Monongahela Forest).

Garmin recognizes this so it allows you to construct a route based on waypoints, or mini-destinations within the larger route.   You can do this in MapSource or in the Garmin itself.  But you can't do this using Google Maps because it doesn't export the GPX-formatted file that the Garmin wants to construct a route based on waypoints.

Ironically, or perhaps justifiably, Google gave me the solution to fix this: a handy browser plug-in for Firefox called GMapToGPX.  What this does is read your current Google Maps route and create a GPX list of ordered waypoints in GPX format which can be imported into MapSource, from which it can then be exported to your Garmin.   Here's how it's done.

Tip for restoring a deleted Applications folder from the Dock


A lot of my blog posts are Post-Its for my own brain.  So is this one.  I've managed to accidentally drag my Applications folder to the trash twice now and both times I had to wade through a half dozen non-optimal solutions on the web to find the one that actually restored it to what I had before.

In OSX 10.6 it's actually pretty easy, if not very intuitive.

Open Finder and select Macintosh HD.  Then right-click on Applications in the second column and select Get Info.  That will give you a popup like this.

Under Preview, you'll see a folder with an "A" on it.  Click and drag it just to the right of your Dock.  Because it's a folder OSX won't let you actually place it in the Dock.

Then right click on the folder to set how you would like it displayed, i.e. List, Grid, etc.





Returning latest/highest record from a set of records

A common reporting requirement in database applications is selecting the latest purchase, or latest job, or oldest chlld for each person in a table.  Any beginning student of SQL knows how to return the most recent date in a table:

SELECT MAX(rec_date) FROM sales;

              max
-------------------------------
 2011-03-15 18:04:45.178057-04
(1 row)                                                           

But that doesn't tell us which person owns that record.  We can do this using a sub, or nested, query on the same table:

SELECT cust_name FROM sales WHERE rec_date = (SELECT MAX(rec_date) FROM sales);

  cust_name |          rec_date
------------+-------------------------------
 Dwight Schrute       | 2011-03-15 18:04:45.178057-04
(1 row)                                                 

This could return multiple records with the same rec_date.  But let's say we're not interested in who owns the most recent record.  Instead, we want to know the latest record for everyone in the table, such as the last sales date for every customer. Here's one of the possible solutions for that: