SjakalAx: 2010

Thursday, November 11, 2010

the noOfDecimals property

The noOfDecimals property caused some sensation over the last couple of weeks on the projects we're working on. A little diggin' resulted in this post:

The noOfDecimals property can be set on an EDT of type real and (overridden) on a form/reportRealControl. The help on the noOfDecimals property tells us 'number of decimals displayed'.

Does this mean it only shows a rounded value when being displayed and is the actual value stored in the DB much more precise?
Or does it actually round off and is the rounded value also what's in the DB?

Well … both are true. Kind of anyway … check out the example below.

First of all, I've set the noOfDecimals to 4 on the SalesPrice EDT.
Then I created a 3 custom of EDT's (noOfDecimals = 0, 2 and 4, all extending EDT 'price'), a table and a form.

A little explanation on how the form works:
- The 'auto calc fields' checkbox indicates whether the fields in between 'quantity' and 'code calculated' are calculated in code on insert/update based on the sales price and the quantity (salesprice * quantity).
- In the third record, I have manually entered '10/3' on all fields, which is translated to 3.33333… by Ax
- For the two columns to the left of the 'code calculated' columns, the noOfDecimals property is overridden on the realFormControl :
- EDT set to 4, formcontrol set to 2
- ETD set to 2, formcontrol set to 4
- The last column shows a value (hardcoded in insert/update of the table) of 0.12345678901234567890, but with noOfDecimals set to zero for the table field.

The result looks like this:

It really does seem like the noOfDecimals property does what's to be expected: values are rounded to the number of decimals as set up on the EDT or form control.

Let's have a look in the DB:

- Only the record where autoCalc = false match the values as shown in the Ax form.
- The first two records completely ignore the noOfDecimals property.

Values that are calculated in X++ code are stored in the DB with precision of 12 decimals. All fields of type real are defined in SQL as 'numeric(28,12)' where 12 is the number of decimal places (and 16 places are left for the integer value).
Values that are entered on the form are rounded based on the noOfDecimals property.

Have a look at the values recorded during debugging of the update-event on the table:

All variables of base type real - regardless of the EDT and the noOfDecimals - in X++ code are treated as a 'real' with a precision of 16 decimals. Once written to the DB they are rounded to 12 decimals.
I don't get why my 10/3 has a precision of 15 decimals, while the hardcoded value goes up to 16 decimal places ...

Anyway, the statement 'number of decimals displayed' is correct … but you can't be sure if the value in the DB is more precise or not.

Or can you? Almost all calculations go through logic (mostly static methods on the currency table) applying rounding set up in the company parameters, currency parameters, tax codes, … so yes, actually you can.

Also see http://community.dynamics.com/product/ax/f/33/p/42031/76024.aspx if you're thinking about changing the number of decimals throughout Ax.

Another thing I noticed: whatever numeric value you pass to infolog or print, only 2 decimals are shown (rounded off).

Download the Ax EDT's, table and form I used above here.

Friday, October 22, 2010

AOS and clustering

hi,

There seems to be quite some vagueness when it comes to Ax and AOS clustering.
A few statements to (hopefully) clear things up a bit:

Single AOS

The simplest configuration is an Ax implementation with one single AOS server.

Extra AOS servers can be added in scenarios where you want to increase availability or scale up the Ax environment to handle a larger number of users/connections.
Several possibilities from that point on:

AOS side by side

By default multiple AOS's in a single Ax environment will exist side by side. They do know about each others existence, sync their caches, try not to hamper each other too much, …
A user connects to a specific AOS as defined in his client configuration. In case there is more than one AOS defined in the configuration the first available will be used.

Adding all available AOS's to one client config file would probably be a smart thing to do. If you only add one node of the cluster to your config file, and this one node is down, you won't be able to connect. If you add all - let's say - 4 nodes, the client config will connect to the first available.

Unless you want to control they way users connect and have a specific department or region connect to a specific AOS of course. I'm thinking latency, bandwidth, ...

Clustered AOS
You can create a load balancing cluster (administration - setup - cluster configuration) and link AOS instances to it. Again two options:

Cluster without load balancer

The naming here is a bit tricky, even if there is no dedicated load balancer set up in a cluster, there will still be load balancing: all AOS's in a cluster implicitly act as load balancers
A user connects to a any of the AOS's as defined in his client configuration. He cannot predict which AOS he will eventually host his session.

The only criteria maintained to balance the load is the number of clients (including business connectors, worker threads, ...) connected to an AOS. If a cluster has two nodes: AOS A and AOS B and A has 3 client connections and B has 4. The next connection will most likely be hosted by A, even if the connecting user's client config only has AOS B in the list of available AOS instances. Most likely, because if AOS A does not respond (in a fashionable time), it will try to connect to B.
In this same example it is possible the load (CPU, memory usage, ...) created by those 3 connections on AOS A is much higher than the load generated by the 4 connections on AOS B. Still the next connection will be hosted by A.

Therefore the client config should have all AOS's listed. You're not sure which AOS will host your session, but you do want all entry point available.

Cluster with load balancer

If you set up an AOS instance to be a load balancer (checkbox 'load balancer' in server configuration) you dedicate this instance to spread incoming user connection among the non-load balancing AOS's. Ideally the one with the lowest number of active connections in the cluster will host the first incoming connection. That's all the dedicated load balancer will do for you.

Frankly I don't see the point of setting up a dedicated load balancing server. Maybe when you're dealing with a huge number of concurrent users, logging on and off all the time and you don't want a 'real' AOS to waste time handling those? On the other side: a dedicated load balancer does not consume an AOS license. So it's actually free.

The client config would only contain the dedicated load balancing AOS instance(s), because that's the only valid entry point here. But then again, if your load balancer is down …
Therefore:
- Is it possible to directly connect to a non-load balancer AOS in an AOS cluster with dedicated load balancer? Yes it is.
- Does a non-load balancer AOS in an AOS cluster with a dedicated load balancer also balances load?
Yes it does.

So even if you've got a dedicated load balancer … setting the non-dedicated AOS's instances from the cluster in the client config may be a satisfying plan.

You can go all fancy an set up difference clusters on one Ax environment, for example to split the online and batch processing. Most likely you want client config files per AOS cluster to control who's connecting to which cluster.

Failover

A word on failover might also be in place. An AOS cluster does not mean failover capabilities. It just indicates two or more AOS instances are grouped. That's about it. No automatic moving of sessions from AOS A to AOS B if server A crashes, users will lose their connection to the AOS, manual restart of Ax client by user is required, …
All bad news? No ma'am: an AOS cluster makes sure the load is equally spread over the different nodes. You can add or remove nodes to the cluster online. But in fact … that's about it … and besides the load spreading, a cluster doesn't do more than a well considered config file does.

Other things you might not yet know regarding AOS's:
- Each active not-dedicated loadbalancing AOS instance requires one AOS license. You can install as many AOS instances as you wish, but the number of active instances is limited by the license. You cannot have more active AOS instances than your license allows on the same Ax environment.
- In the online users form you only get to see the SPID's of the users that are on the same AOS as you are
- You can prevent users to connect to an AOS using the 'reject new clients' button on the 'server instances' tab of the online users form. Hit 'accept new clients' to undo this.

Sunday, October 10, 2010

How to detect the cause of a deadlock

hi,

This is the situation I was in last week:

Four Ax processes (all batch threads) executing the same logic (executing the same class).
Different processes were blocking each other (deadlock situation).

All steps below were taken during the deadlock situation.

The quick and drastic solution would be to kill some processes in SQL server ... but it would be a matter of time before the same would happen again.
The better solution implied some processes were hanging in the production system during the research, but we did get the time to sort this out decently.

Overview of the SPID's, based on the online users form:
Ax session 14 -> SPID's 59, 66, 66, 6
Ax session 13 -> SPID's 67, 68, 69
Ax session 12 -> SPID's 57, 58, 54
Ax session 11 -> SPID's 56, 60, 65

A dump of the blocking sessions on SQL server (based on SP_WHO2, only the relevant SPID's - the ones blocked + the source SPID for the blocking - are shown here):

A quick look would point to SPID 57 as process causing the blocking.
I repeatedly ran 'SP_WHO2 57' over about 30 seconds, and found the CPUtime nor the diskIO changing over time.
On top SPID 57 has status 'sleeping'.

Another view on this situation, including the Ax sessions:

Ax session 14 is blocked by Ax session 12 (because SPID 66 is blocked by 57)
Ax sessoin 13 is blocked by Ax session 11 (because SPID 69 is blocked by 65)
Ax sessoin 12 is blocked by Ax session 11 (because SPID 54 is blocked by 65)
Ax sessoin 11 is blocked by Ax session 12 and 14 (because SPID 56 is locked by 57 and 65 is blocked by 66)

This very much looks like a deadlock situation : 12 blocked by 11, 11 blocked by 12
And even more: 11 is blocked by 14, and 14 is blocked by 12

Still: Ax session 12 with SPID 57 seems to be the cause of the problem
but SPID 57 is sleeping ... and not moving at all ... (remember CPUTime, DiskIO)

Since SPID 57 is sleeping, I'm guessing we need to look a the other SPID's from Ax session 12: SPID 54 is blocked by SPID 65, SPID 65 is blocked by SPID 66.
SPID 66 from Ax session 14 seems to be the one to focus on, because
- this is the one blocked by SPID 57 in Ax session 12.
- if we manage to unblock Ax session 14, Ax session 11 will most likely be able to continue (because Ax sessoin 11 is only blocked by SPID 66 from Ax session 14)
... which will resolve (or even better, if we manage to trace down the problem to it's origin: prevent the possibility on) the deadlock situation.

In the Queyr Analyser 'sp_lock 66' gives me:

So the one with status 'wait' is the interesting one.
This means SPID 66 wants to update (U) data in object 2062018477, but it's blocked by SPID 57 (SPID 66 blocked by SPID 57)

'sp_lock 57' in the Query Analyser gives us a whole lot more, but the only locks in common with SPID 66 are the ones on objectID 20062018477.

To find the locks that SPID's 57 and 66 have in common, I used the statement below :
(based on what's described at http://weblogs.sqlteam.com/mladenp/archive/2008/04/29/SQL-Server-2005-Get-full-information-about-transaction-locks.aspx)

SELECT locksForSPID57.*,
'<->' as '<->',
locksForSPID66.*
FROM
(SELECT L.request_session_id AS SPID,
   DB_NAME(L.resource_database_id) AS DatabaseName,
   O.Name AS LockedObjectName,
   P.object_id AS LockedObjectId,
   L.resource_type AS LockedResource,
   L.request_mode AS LockType,
   L.request_type,
   L.request_status
FROM sys.dm_tran_locks L
   JOIN sys.partitions P ON P.hobt_id = L.resource_associated_entity_id
   JOIN sys.objects O ON O.object_id = P.object_id
   JOIN sys.dm_exec_sessions ES ON ES.session_id = L.request_session_id
   JOIN sys.dm_tran_session_transactions TST ON ES.session_id = TST.session_id
   JOIN sys.dm_tran_active_transactions AT ON TST.transaction_id = AT.transaction_id
   JOIN sys.dm_exec_connections CN ON CN.session_id = ES.session_id
   CROSS APPLY sys.dm_exec_sql_text(CN.most_recent_sql_handle) AS ST
WHERE resource_database_id = db_id()
AND L.request_session_id = '57') as locksForSPID57,
(SELECT L.request_session_id AS SPID,
   DB_NAME(L.resource_database_id) AS DatabaseName,
   O.Name AS LockedObjectName,
   P.object_id AS LockedObjectId,
   L.resource_type AS LockedResource,
   L.request_mode AS LockType,
   L.request_type,
   L.request_status

FROM sys.dm_tran_locks L
   JOIN sys.partitions P ON P.hobt_id = L.resource_associated_entity_id
   JOIN sys.objects O ON O.object_id = P.object_id
   JOIN sys.dm_exec_sessions ES ON ES.session_id = L.request_session_id
   JOIN sys.dm_tran_session_transactions TST ON ES.session_id = TST.session_id
   JOIN sys.dm_tran_active_transactions AT ON TST.transaction_id = AT.transaction_id
   JOIN sys.dm_exec_connections CN ON CN.session_id = ES.session_id
   CROSS APPLY sys.dm_exec_sql_text(CN.most_recent_sql_handle) AS ST
WHERE resource_database_id = db_id()
AND L.request_session_id = '66') as locksForSPID66
WHERE locksForSPID57.DatabaseName = locksForSPID66.DatabaseName
AND locksForSPID57.LockedObjectId = locksForSPID66.LockedObjectId

I'll probably get jeered by the SQL experts for this kind of statement, but it does the job.

The next step then is to find out which table is behind object 2062018477.
'SELECT object_name(2062018477)' tells us this is the numberSequenceTTS table

This table is used by Ax for numbersequences defined as 'continuous'.

We would like to see the actual data that is being locked, but there seems to be no way to see the specific record(s) (KEY 6a018d6c5a90 I'd guess).
Therefor we're looking into the page (PAG 1:76393) that is not actually part of the problem (only KEY (ec0110e62862) with status WAIT is), but since SQL Server has this lock hierarchy (http://msdn.microsoft.com/en-us/library/ms189849.aspx) the page will include the data from KEY (ec0110e62862).
Again, if there's anyone out there who does know a way to identify wich specific data (resource type KEY or RID instead of complete pages) is held locked based on the sp_lock output ... please let me know!
For now, up to the data page:

DBCC traceon(3604) -- enables a specific trace flag to get the output in the console
DBCC PAGE(5, 1, 76393, 3) -- 5 = the ID of the DB we're working in, 1 is the datafile, 76393 the page, we get this from the SP_LOCK output
DBCC traceoff(3604) -- disables the specific trace flag (cover up your tracks)

A huge dump is created by 'DBCC PAGE' but we do recognize some things:
a compare between the 'TRANSID = ' lines from the dump and the records currently in the numberSequenceTTS table

SELECT *
FROM NumberSequenceTTS WITH (READUNCOMMITTED)

shows us 5652282410 and 5652282411 are the transID values to dig in to.
These are the onces tha appear in both the NumberSequenceTTS output and the page dump.
Or in other words: these are the records in use by SPID 57 while SPID 66 wants to use them as well.

This statement:
SELECT *
FROM NumberSequenceList WITH (READUNCOMMITTED)
WHERE EXISTS (SELECT RecId FROM NumberSequenceTTS WITH (READUNCOMMITTED) where NumberSequenceTTS.TRANSID = NumberSequenceList.TRANSID)
AND TRANSID != 0

tells us the lock is related to the TOB number sequence ... which is set to continuous .. but shouldn't be (or it is not strictly required anyway).

So the problem here is a number sequence that was set to be continuous and several processes fetching a new number from this number sequence simultaniously.
I adviced to make this specific number sequence not continuous and review all other continuous number sequences.

The solution is being applied ... fingers crossed ...

Monday, September 6, 2010

writing jobs

hi,

Also writing ad-hoc jobs to correct data or straighten faulty situations? These one-time jobs that will be used for weeks, months or even years? Put a stop to these jobs: refuse to write them!
Instead make decent and expandable user-proof solutions based on runBaseBatch for example. Put a little dialog on, make it configurable, useable by non-technical-skilled users, ....

Why going to all that trouble? We all know it is not just a one-time job ... it will start a life after you created it. Give your coding an honourable life, even the lousy jobs!

If you do stick to writing a job, at least make it a respectable one:
- don't put hardcoded variables in there, use a dialog (which is as simple as in the code example below),
- add an info to inform what's been done (how many records, how much time, ...)
- use try-catch when appropriate
- give your variables meaningfull names,
- some comment never hurts,
- ...

static void LAX_Job_Dialog(Args _args)
{
   Dialog dialog;
   DialogField dlgCustName;
   CustName custName;
   CustTable custTable;
   Counter cnt;
   ;

   dialog = new Dialog('find customer named ...');

   dlgCustName = dialog.addField(typeId(CustName));

   if (dialog.run())
   {
   custName = dlgCustName.value();
   if (custName)
   {
   // search for any customer with the entered value in the name
   custName = '*' + custName + '*';
   while select accountNum, Name
   from custTable
   where custTable.Name like custName
   {
   info(strFmt('%1 - %2', custTable.AccountNum, custTable.Name));
   cnt++;
   }
   }
   info(strFmt('%1 customers found', cnt));
   }
}

P.S. Jobs are executed on the client-side. Drag your job into a menu-item, set the 'runon' property to 'server' and its speed will most likely boost. Especially jobs with while select loops in them.

Wednesday, September 1, 2010

debugging tip

hi,

With some errors in Ax, I somethimes just don't know where to start debugging.
In those situations I use the call stack as a starting point to give me an idea on where to place breakpoints.

Here's what I do:

1 - create a new static method LAXGetCallStack in the Global class:
// this method returns the X++ call stack, the _offset is to skip entries at the top of the stack

client server static str LAXGetCallStack(int _offset = 0)
{
   #xppTexts
   container callStack = xsession::xppCallStack();
   int k;
   int lineNumber;
   str entry;
   str path;
   ;

   for (k = 1 + _offset ; k <= conlen(callSTack) ; k = k + 2)
   {
   path = conpeek(callStack,k);
   lineNumber = conpeek(callStack,k+1);
   entry += (entry ? #newline :'') + strfmt('-> %1 Line - %2', path, lineNumber);
   }
   return entry;
}

2 - modify the infolog.add method

a - add this to the declaration section

boolean stackTraceActive;

str strInfoTxtToDebug;

#xppTexts

b - add this at the beginning of the method (before switch(loglevel))

stackTraceActive = true;

strInfoTxtToDebug = 'debug';

strInfoTxtToDebug = '*' + strInfoTxtToDebug + '*';

if (stackTraceActive)

{

if (_txt like strInfoTxtToDebug)

{

_txt += #newLine + 'STACKTRACE for debugging:';

_txt += #newLine + LAXGetCallStack(4);

}

If I get to investigate situations where the only clue is some strange info/warning/error, I use (a specific part of) the text of the infolog as a condition to in the infolog.add method to get the stacktrace an thus the calling objects.

This can be improved by making the stackTraceActive variable a user settable parameter.

The same goes for the strInfoTxtToDebug variable.

list table properties

hi,

Ever got the question to list all tables that have a certain property set (caching, tableGroup, configurationKey, securityKey, ...), but don't feel like browsing throug the AOT manually? I do. This script enables you to export the table properties you want to study into an xls (well, a tab-delimited) file. The result then looks like this.

The same concept can be used to list properties of other objects.

Friday, August 27, 2010

datawarehousing tips

hi,

If you're implementing Ax and need to assist the guys in charge of the datawarehouse/BI, here are some tips:

- do insist on setting up a separate datawarehouse/BI database. "But the data needed for the DW is not that much, the queries are simple and not blocking, they will not obstruct the live Ax DB in any way." Uhu, and then on an unpleasant day it is blocking and obstructing. Don't give in to this temptation. Keep it clean and simple: the Ax and any other DB need to be separated DB's.

- since Ax 5.0 and the UTCDateTime datatype, the date/time you see in Ax is not necessarily the date/time stored in the DB. The Ax kernel handles the timezone offset on forms and reports (not in code!). If the datawarehouse is to produce reports, do take the timezones into consideration (for instance if totals are calculated per day: end of day may very well be 10PM in the DB, depending on the timezone)

- if tables are shared in virtual companies, the Ax kernel handles this automatically. In the DB on the other hand it's a hell of a job to link the correct dataAreaId's. This job gives an overview of which table is included in which table collection as a result the translation dataAreaId to virtual dataAreaId.

- security: Ax handles a lot of security (RLS, AOS aothorisation, user group permission, ...). This is (if you don't anticipate) not the case if the data from the Ax DB is transferred (on DB level) to a reporting/DB/BI database. Make sure sensitive/confidential information is not made public in that way.

Tuesday, August 24, 2010

creating recId indexes on SQL 2005

hi,

The Dynamics Ax Performance Team Blog gives a pretty good description why you want to add a recID-index for tables having created/modifiedDateTime property activated (on a SQL 2005 DB backend).

But you don't want to do this manually, do you?

Thought so. You can find the script I used to add these indexes here.

In short, here's what the script does:

- on each table in the AOT it checks whether the created/modifiedDateTime fields are active and the createRecIdIndex property is set not set

- for tables that meet these conditions: it creates a brand new index on recId

When a new index is created I'm also adding the modified/createdDateTime field. The Performance team blog states

When you Insert into a table with CreatedDateTime field on SQL 2005, the following SQL statement will be issued immediately following the Insert:

SELECT CreatedDateTime From %table% WHERE RecID=%recId% AND DataAreaId=%dataAreaId%

The same goes for the modifiedDateTime field with an update statement.

So my reasoning is to include the created/modifiedDateTime field right away in the newly created index.

This should enable SQL to fetch all required data using just this single index.

More and wider indexes slow down insert/update/delete operations: true ... but if this can speed up each and every insert/update, this may be a well-considered decision.