Um blog sobre nada

Um conjunto de inutilidades que podem vir a ser úteis

Archive for the ‘BI’ Category

Actian Matrix – list user privileges

Posted by Diego em Janeiro 22, 2015


The query bellow lists all tables on a database and whether or not a particular user has privileges on it.

The “privileges” are hard coded as SELECT, UPDATE, INSERT and DELETE (I’m sure that there’s a better way of implementing this part but since I was only interested on these 4, this is a simpler solution). 

The user is an input when you run the query (I’m using Aginity to run the query)





select n.nspname as schema, c.relname as table

, array(

       select CASE WHEN has_table_privilege($user,n.nspname ||‘.’|| c.relname,‘SELECT’) THEN ‘SELECT’ ELSE END UNION 

       select CASE WHEN has_table_privilege($user,n.nspname ||‘.’|| c.relname,‘UPDATE’) THEN ‘UPDATE’ ELSE END UNION

       select CASE WHEN has_table_privilege($user,n.nspname ||‘.’|| c.relname,‘INSERT’) THEN ‘INSERT’ ELSE END UNION

       select CASE WHEN has_table_privilege($user,n.nspname ||‘.’|| c.relname,‘DELETE’) THEN ‘DELETE’ ELSE END

) as privileges

FROM pg_class c JOIN pg_namespace n on c.relnamespace=n.oid

where n.nspname not in (‘information_schema’,‘pg_catalog’,‘sys’) and c.relkind=‘r’

order by 1,2






FYI, DO NOT run this code on versions bellow than 5.1.2. It will crash your database.

Posted in AWS Redshitft \ Actian Matrix, BI, Big Data | Leave a Comment »

How to point a Tableau dashboard to an excel file on a share location

Posted by Diego em Setembro 30, 2014

This post shows how you can deploy a dashboard that connects to a excel spread sheet without using extracts or shipping the spread sheet to the server, thus making it available to users to modify the data and being able to see the changes in real time. Of course, if someone messes with the “metadata” (column names for example) or rename\move the spread sheet, the dashboard will break.

I imagine there may be other ways to accomplish this. but that’s the easiest one I could find.

1)      Place the excel file on a network share

2)      Create workbook

3)      Connect to data -> Microsoft excel (on the network share)

4)      Right click the data source and select “Publish to server”


5)      Uncheck the option “Include External Files” (this is the most important part. if you select this option, tableau will ship a copy of your file to the server and use it to feed the dashboard)


6)      The data source can be seen at
Note that this is an isolated action, the fact that you published the data source didn’t affect the current dashboard you are working on.
If you want to test that, just publish the workbook and update the excel sheet on the share; the published workbook wont reflect the changes but tableau desktop will.
And the reason is that when you published the dashboard, a copy of the source data was published together, so it is not looking at the data on the share anymore.


So how to read the data from the published data source (that reads from the share):


8)      On a new workbook:

a.       Connect to data  -> Tableau Server

b.      Select the data source published before

c.       Pay attention to the icon:


d.      Publish the dashboard and it will read the data from the share

9)      On the workbook we initially were working on:

a.       Do the same as the previous item like if it was a new dashboard

b.      Right click the new data source

c.       Select “replace data source”

d.      Close the old datasource

e.      Publish the workbook

Posted in BI, I.T., Tableau | Leave a Comment »

Identifying Watermarks problems on Paraccel (Actian Matrix)

Posted by Diego em Julho 8, 2014

Watermarks control the next available value on an identity column. Every time you insert a value on a table that contains an identity, paraccel queries the identity highwater table to check which the next available value on a particular slice is. It’s important to notice that once that value has been retrieved, paraccel, when inserting on the destination table, does not check if it clashes with an existing one (remember that constraints are not enforced).

That being said, if the watermarks get “out of sync” with the actual table, you may end up with duplicate value on an identity field, which is never a good thing.

Now, I won’t get into the details of how it can happen (believe me, it can), instead, I’ll focus on how to identify these clashes. 

So, the watermarks are controlled per table\col\slice and to check it, first you will need to get the table id, which is hidden on the pg_class table (you have to specify the oid field, otherwise it will not show with “*”):


select oid, * from pg_class where relname = ‘<table>’ –1006571

You can also check the max value of you key:

select max(key) from <table> –160 016

Now, let’s check the highwater values by querying the stv_identity_highwater using the tableid:


select * FROM stv_identity_highwater where tbl = 1006571


That’s nice, it tells us for example that the next row inserted on slice 0 will get a value above 159871 (next next value is dependent on the watermark and the number of slices on the cluster), but we can’t really identity any problems just using this table.

There is a second table we can use, called stv_blocklist, which contains (among other things – of course) the maximum key value contained on each slice (It also contains the “num_values” (number of rows) on that slice so I guess here is the explanation why counts are extremely fast. Just a guess, I may be completely wrong here).

So if we query this table and compare its data with the previous one, we can easily identify problems.

select *
where tbl = 1006571 and col=0 and slice in (0,5,9) order by col, slice

 imageThe idea is always have the watermark bigger than any value on the table (so they won’t ever clash).

In this example, slice 0 is OK because the watermark is 159871 and the max value on the slice is 159683, but slice 5 has a problem because the watermark is 159840 and the max value on the slice is 159868.

We can also write a query that does this verification for us, and it would look something like this:


select * from (

       select  h.tbl as tbl_id,h.col as col,h.slice as slice

              , max(h.highwater) as highwater, max(b.maxvalue) as maxvalue

              , case when max(b.maxvalue) > max(h.highwater) then 1 else 0 end AS invalid

       from stv_identity_highwater h

       left join stv_blocklist b on

                     h.tbl = b.tbl

              and h.col = b.col

              and h.slice = b.slice

              where h.tbl = 1006571                   

       group by h.tbl,h.col,h.slice

 ) X where invalid = 1



Once the invalid water marks have been identified, a few steps must be taken to fix them, which I’ll focus on another post.

Posted in AWS Redshitft \ Actian Matrix, BI, Big Data | Leave a Comment »

Paraccel (Actian Matrix) security

Posted by Diego em Junho 24, 2014


Couple of quick notes \ tricks when it comes to working with security on a ParAccel database. All this was done on version 5 of the tool. I understand that Amazon Redshift uses ParAccel technology but I can’t tell which of the statements bellow are true or not on it.


Network Security


·         Edit the pg_hba.conf located at  /home/paraccel/padb/rel/etc

·         PADB reads the pg_hba.conf file on startup. After any edits, the cluster must be restarted with  xstop and cqi xstart or use the pg_ctl utility to reload the file:

o   $ pg_ctl reload

·         A valid userid and password with rights granted to access the database is not sufficient to access a PADB database. The pga_hba.conf file must also allow access for the user.


DB Security


If you want to expose your data trough views, hiding the tables, I suggest creating all the tables on schemas different than public and the views on the public schema (everything owned by a super user). Then you will need to create a “read only user” (let’s call it user1) and grant select permissions on all the views and usage permission on all the schemas the tables belong to.

I also suggest:

·         Prevent users from creating objects on the public schema by running:

  revoke create on schema public from public; 


·         Explicitly REVOKE execute permission on all functions on a particular schema since that by GRANTING usage on the schema you allowed the users to execute the function.

o    FYI, I believe there is a minor bug on paraccel here where the function:

select has_function_privilege(‘user’, ‘schema.function’,‘execute’)

o    returns true even though the user doesn’t have permissions to execute it;


Here follows a few tips\tricks\”things I’ve learned” while trying to accomplish that:


·         It’s not enough to grant select permission on a view, you also need to select USAGE permission on all the schemas that contain tables the view reads from


·         PUBLIC represents a group that always includes all users. An individual users privileges consist of the sum of privileges granted to PUBLIC, privileges granted to any groups that the user belongs to, and any privileges granted to the user individually;


·         PUBLIC is also a schema;


·         Database user’s accounts are global across a PADB instance (not per individual database).


·         By default, all users have CREATE and USAGE privileges on the PUBLIC schema. To disallow users from creating objects in the PUBLIC schema of a database, use the REVOKE command to remove that privilege.

o    revoke create on schema public from public;


·         The only privilege that is directly relevant to functions is EXECUTE. When you create a new UDF, the default behaviour is that EXECUTE privilege accrues to all users (PUBLIC). PUBLIC permissions are granted in addition to individual user or group permissions; therefore, if you want to revoke function privileges for any user or group, you need to revoke from PUBLIC, then grant privileges individually to those users and groups. Revoking privileges from a specific user or group is not sufficient to prevent users from running UDFs.

o    revoke execute on function schema.function() from public;


·         By default, all system tables are granted PUBLIC SELECT access, but system views are not granted PUBLIC SELECT access.


·         Having ownership of a view, or having privileges granted on a view, does not imply access to the underlying tables. You need to grant access to the underlying tables explicitly.


·         By default, users are granted permission to create temporary tables by their automatic membership in the PUBLIC group. To remove the privilege for any users to create temporary tables, revoke the TEMP permission from the PUBLIC group and then explicitly grant the permission to create temporary tables to specific users or groups of users.

o    revoke temporary on database <db> from public;


·         Warnings:

o    Nothing stops you from dropping a group that contains users;

o    It’s possible to drop a user that owns object. If you do so, the object will be left without an owner;

·         Useful Commands:

\du: show users      (select * from pg_user)
\dg: show groups
\dn: show schemas
\d: show all tables (
SELECT * from pg_class c WHERE relkind=‘r’😉

Print – Grant Select on All views:

SELECT ‘GRANT SELECT ON ‘ || relname || ‘ TO group readonlygroup;’

from pg_class c

                 JOIN pg_namespace n ON n.oid = c.relnamespace

WHERE nspname = ‘public’ and relkind=‘v’;


Print – Grant USAGE on all schemas:

SELECT ‘GRANT USAGE ON schema ‘ || nspname || ‘ TO group readonlygroup;’

from pg_namespace


Functions to test privileges:


select has_function_privilege(‘user’, schema.function()’,‘execute’)


select has_schema_privilege(‘user’,‘schema’,‘create’)


select has_table_privilege(‘user’, ‘schema.table’, ‘select’)

Posted in AWS Redshitft \ Actian Matrix, BI, Big Data, I.T. | Leave a Comment »

Tableau driver problem querying SQL 2012

Posted by Diego em Fevereiro 25, 2014

Just sharing a situation I’ve been through these past few days.


After updating a SQL Server Instance from 2008R2 to 2012, a dashboard that connected to a DB on that instance – and was working perfectly fine – started to produce a “driver not found ” error.

After some digging I realized that the problem was due to a particular GROUP BY clause. If the group by was there, I got the error, otherwise not.


Checking the XML, I realized that, on the presence of the GROUP BY, tableau was adding the highlighted clause bellow to the connection tag:


<connection authentication=’sqlserver’ class=’sqlserver’ dbname=’XXXXX’ minimum-driver-version=’SQL Server Native Client 10.0′ odbc-native-protocol=’yes’ server=’XXXX’ username=’XXXXX’ workgroup-auth-mode=’prompt’>


It’s  clearly telling you that the minimum driver necessary to that operation is the Native client for SQL 2008.

First thing I wondered was: why didn’t tableau put this clause before when the DB was SQL 2008? – I don’t really have an answer to this question.



To fix the problem I had to install the Microsoft SQL Server 2012 SP1 Native Client x64 (version 11.0.2100.60) driver (found here, which felt a little strange because:

1) I didn’t have to install the above driver to query SQL 2012 in the first place (any queries without the GROUP BY clause where working)

2) The dependency clearly says native client version 10 (which was installed) – not sure why only v11 fixed the problem.


Anyway, just sharing the information.

Posted in SQl Server, Tableau | Leave a Comment »

Understanding the similarity between sets used by the Fuzzy lookup component in excel

Posted by Diego em Julho 17, 2013

Use the set (A, B, C) and (A, C, D)


To make sure they are applied the correct weights I SET the “CustonTokenWeightsRowSetName” to an excel table. The values are the same as the ones on the pdf that comes with the Fuzzy lookup download.



So, if you read the explanation on the pdf, you will see that the Similarity is calculated by the value of the intersection divided by the value of the union of the sets, which in this case would be:

(A+C) / (A + B + C + D) 

5 / 17



But if you run it, you will get the value of 0.4588.

The reason is that the pdf leaves out the explanation on the concept of the ContainmentBias.


The actual formula to determine how similar 2 sets are is the following:


B*C + (1-B)*J



B: ContainmentBias
C:  Jaccard Containment
(size of the intersection divided by the size of left set)
Jaccard similarity (size of the intersection divided by the size of the union of the two sets) – the 0.294 we saw before


Not that, if you apply a bias of 0, you will annul the (B * C) and since (1 – B) will result in 1, the final result will be the value of J:



By setting the default value of 0.8, we would have the following:

B: 0.8
C: (A + C) / (A + B + C)    
à 5/10 à 0.5
J:  (A + C) / (A + B + C + D)
à 5/17 à


B*C + (1-B)*J

0.8 * 0.5 + 0.2 * 0.294

0.4 + 0.0588






<ComparisonResult similarity="0.458823529411765">


    <Token domain="Default" id="1" weight="2">a</Token>

    <Token domain="Default" id="4" weight="5">b</Token>

    <Token domain="Default" id="2" weight="3">c</Token>



    <Token domain="Default" id="1" weight="2">a</Token>

    <Token domain="Default" id="2" weight="3">c</Token>

    <Token domain="Default" id="3" weight="7">d</Token>





 If we run the comparison with ContainmentBias = 1, we get a result of 0.5

You can easily see that as close to 1 the ContainmentBias is, more similar the sets look like. That’s because the ContainmentBias is actually a penalty for tokens in the right set who are not present in the left set where 1 means no penalty (more similarity) and 0, full penalty (no similarity).










Posted in BI, DQS | Leave a Comment »

SQL Server 2012 – How To change a Analysis Service Instance Type 2

Posted by Diego em Junho 5, 2013

This is a FYI regarding this post where I was talking about changing a SSAS instance from multidimensional to tabular.

If you do that and try to create a SSAS Tabular project on SSDT with that SSAS instance as the “default workspace server” you will get this error the moment you click the .bim file:




Error Message:


This file cannot be opened. You can only open files that are compatible with Microsoft SQL Server 2012.


An error occurred while opening the model on the workspace database. Reason: This file cannot be opened. You can only open files that are compatible with Microsoft SQL Server 2012.


So yeah, even though you can change a SSAS instance from multidimensional to tabular, I wouldn’t do that. In my case, I had to uninstall the multidimensional one that I changed to tabular and install a new tabular one

Posted in BI, I.T., SSAS | Leave a Comment »

Error validating version

Posted by Diego em Maio 14, 2013

Error trying to validate or commit versions:


Server Error in ‘/’ Application

An error occurred while processing message request type ‘ValidationGetRequest’. See exception details for more information.



This happens because the account which the IIS Application Pool is running does not have sufficient privileges on the database.


In my case I made it db_owner on the DB, which is not enough. Microsoft support says that:

 “Sometimes, Windows accounts may have that permission in SQL Server. However, by default, new accounts are not granted that permission.”


To solve that you can either make the login a sys.admin (which I did just to test if it works – wouldn’t recommend) on the server or explicitly grant the view state permission:





Posted in BI, I.T., MDS | Leave a Comment »

SQL Server 2012 – How To change a Analysis Service Instance Type

Posted by Diego em Março 27, 2013


1)      You will “loose” all your databases. They can’t simple be migrated from one type to another so delete or detach them all

2)      Stop the SSAS Service

3)      Go to the SSAS config folder (In my case C:\Program Files\Microsoft SQL Server\MSAS11.MSSQLSERVER\OLAP\Config)

4)      Open the msmdsrv file and change the value of the DeploymentMode tag from 0 to 1


5)      Restart your server


If you are wondering why it is DeploymentMode=2 for tabular and not 1, DeploymentMode=1 is for PowerPivot for SharePoint instances.

Posted in BI, SSAS | Leave a Comment »

SSRS – Execution Log

Posted by Diego em Março 26, 2013

Few details on the SSRS Execution log table and views

1)      Log Table: ExecutionLogStorage: Contains the raw log data. Should use one of the views that sits on top of the table.


2)      Views:

·        dbo.ExecutionLog:  for backwards compatibility

·        dbo.ExecutionLog2:  for SQL Server 2008

·        dbo.ExecutionLog3:  for SQL Server 2008 R2  (same as ExecutionLog2, with 2 fields renamed:  ReportPath is now ItemPath, and ReportAction is now ItemAction)


3)      Main Columns:

a.     Request Type
0:  Interactive
1:  Subscription
2:  Refresh Cache

b.    Report Action

        1:  Render
2:  Bookmark Navigation
3:  Document Map Navigation
4:  Drill Through
5:  Find String
6:  Get Document Map
7:  Toggle
8:  Sort
9:  Execute


c.     Source

      1:  Live
2:  Cache
3:  Snapshot
4:  History
5:  Ad Hoc (i.e., Report Builder) 
6:  Session (i.e., another request within existing session) 
7:  RDCE (i.e., Report Definition Customization Extension)


Other blogs:

·       Good tips, especially on the “times”:

·       MSDN:

·       MSDN – Views:


Posted in BI, I.T., SSRS 2008 | Leave a Comment »