December 15, 2013

Northern Ireland’s Tech Bubble

When I moved to Northern Ireland in 2009 I was told by many people, both within the industry and outside, how fortunate I had been to find a job in NI within days of starting to look. The job market that I have known, across all industries and verticals, has always been fairly depressed in NI, but in the last number of years, probably starting around, or a little before, the time that I moved to Belfast, “tech” has started to be seen as the saviour of Northern Ireland’s ailing economy.

I’m assuming you know a little about the economy here. It has been well documented elsewhere, from the reliance on the “block grant” to our disproportionately large public sector, many have put their two cents in.

My fear comes in that, in recent years, the public sector body that provides inward investment into NI with the goal of building a more vibrant and sustainable private sector, has stumbled upon “tech” as a quick win to lift the ailing economy. The rationale goes that if we incentivise foreign companies to open offices in Belfast, the jobs they will create will start to relieve unemployment rates and provide much needed tax revenue. This certainly would be ideal, were the jobs distributed evenly across many verticals. However, what I believe is happening is that, due to the ease with which tangible savings are being represented to “tech” companies, is that there is a disproportionate number of  “tech” jobs being created in the economy.

The ease with which these jobs are being brought to NI are built on two fundamental pillars:
1) government support – through its agencies the government supports organisations both foreign and domestic to grow in NI. This usually comes in the form of a “per job” lump sum, but can also be bolstered through a variety of grants and awards. That these are often nebulous and hard to access or understand for local SMEs is a topic for another time.
2) resource cost – be under no illusion the foundation on which the pitch to the companies moving in is heavily based on the low cost of working from NI, that is, salaries are cheaper.
I’m sure, like me, you can see the issues with basing the growth of an entire economy on two such factors. Both of these are things that can easily change.

Since moving here I have observed an, albeit anecdotal, 100-150% salary base increase in “tech” jobs, especially at the more skilled end of the market. You may wonder why I persist in parenthesising “tech”, the reason is this, although the term is bandied about euphemistically, it doesn’t relate to a rounded sell of “tech” professions. Sure, there are hangers on, but, where “tech” jobs are announced, read “programmers”. That’s where the majority of these jobs are, and where there isn’t proportional growth across the market there is the risk of further damage. You see, what is happening is that skilled developers are highly in demand, so highly, in fact, that salary negotiations have become heavily weighted in the developers favor.  Not only that, but there has been a dramatic rise in interim resourcing. When I dipped my toe in the contract market in 2011, prior to starting a business, there were no full time contract recruiters in Belfast, there are now 7 that I know of, and, no doubt, many more coming onto the scene soon. Now, I have no problems with interim resourcing, in fact, it’s a great way to scale up and de risk small businesses, however, the market has become one of opportunism rather than permanence, and it will be hit hard if the bubble bursts. Not only that, but, due to the supply and demand based salary growth in the programmer’s market, other related markets naturally rise, whether or not there is a supply issue.

Opportunistic agents have an economic incentive to place people in jobs they won’t be happy in, or in a place they know they can encourage a move on from shortly. After all, churn, rather than permanence in the market is what pays their bills. Who can blame them?

So my fundamental fear then, comes down to this; as the cost base rises, and once subsidies run dry, and they will run dry, who will stay? Add into that equation a politically volatile landscape that certainly worries outside investors, and I believe we may be on the precipice of seeing that bubble burst.

To me the bubble bursting may not look significant at first. A small closure, a branch of an SI, or a division of a financial player, but, as the market floods with candidates, even if a small flood, the rates will lower and any economy that has built around the artificial inflation in this market will start to falter. The “tech” world is small and failure carries faster and farther than (moderate) success. A lowering in rates doesn’t mean we will get another chance at this.

In some ways I feel this is inevitable now, and that I’m observing from the outside. No way to touch the inner core that is driving this machine towards this end, however I feel there are certainly things that can be done individually and collectively, to try to mitigate some of this:

1) productize / find your niche – find what you are good at and build expertise around it, whether that is product or an exceptional service offering, don’t aim to be the best in NI, aim to be the best in the world. It’s only with an attitude of wanting to be the best that we will keep business in NI. If we become an interchangeable part, able to be outsourced to the next cheap location, we lose.

2) work together / support local – collaborate, help each other out. Back scratching doesn’t only help yourself, but helps entire communities. This can be a differentiation of the tech community in NI. I’m talking to myself, in the main, here. There are many people doing excellent things, from user groups, to dojos, to collectives. Get involved, and if there’s no one championing your niche, start something.

3) upskill the young – from the junior developer in the office, to the 13 year old who wants to learn to code, try to help. The more skill we have in this are, the less likely we are to see a saturated market. This is a market that, regardless of outside interest, can grow and grow. But it will only grow while the skills are there to make it grow.

The more ways we differentiate our “tech” community, the more value there will be for those from the outside. But lastly, I think the answer to an ailing economy can only come from within, so I hope that there is a shift, and that there is more done to support local people who want to start something, but equally that people who don’t necessarily want to be the starters, and the entrepreneurs, can gather around them and cheer them on, because, at the end of the day, that is what will save the economy that they are in.

November 13, 2013

Operation ShareLove – Help Typhoon Haiyan victims and SharePoint Experts will help you

Dux Raymond Sy, an MVP in the SharePoint community has come up with a unique and brilliant idea to encourage people to help support the Typhoon Haiyan victims at this time. You can read more about his response to this tragedy at – http://meetdux.com/2013/11/12/operation-sharelove-help-typhoon-haiyan-victims-sharepoint-experts-will-help-you-rescueph/

If you’re a SharePoint expert please think about giving your time to support his efforts, and if you are a company/individual, please do give to the cause.

Tags:
November 4, 2013

Continuous deploy of multiple Azure Cloud Services within a single solution using TFS Online

Team Foundation Server is a fantastic configuration management system that integrates with the full Microsoft development stack.

With Windows Azure and its TFS online integration, continuous deploy can easily be set up and customized. However, one frustration I came across, is that I have a solution that has a worker role and a web role. I want these deployed to separate Cloud Services within my Azure Subscription, but the AzureContinuousDeployment build process only supports targeting the first project to be deployed.

In order to get around this, we can modify the AzureContinuousDeployment process to allow the developer to dictate which Cloud Service project to deploy. The first step to do this is to duplicate the AzureContinuousDeployment.11.xaml workflow. You can do this by opening the Source Control Explorer in Visual Studio, and navigating to the BuildProcessTemplates folder.

You will see, in the right hand pane, a list of the Build Process Templates that are available to you.

Build Process Templates

You can copy AzureContinuousDeployment.11.xaml and use the copied Build Process Template to create the alternate template that will allow you to dictate which Cloud Service is deployed.

When you open the copied xaml file you will see the, hopefully familiar, Windows Workflow designer. What we are going to do is modify the process flow to allow the definition of the Cloud Service to deploy.

When you open the xaml file, find the block that defines the Cloud Service to deploy, it will look like this:

Find the Azure Project

You’ll notice that the action that finds the Azure Project in the solution has no inputs to it, this is because it simply looks for the first Azure Project in the solution. In order to look for others, we need to define the MSBuild target name of the Azure Project.

To do this we first need to add an argument to the build process template that can then be configured in the build definition.

Open the argument editor by clicking “Arguments” at the bottom of the content pane and add a new argument.

Next, we can change the block that defines the Azure Project to check if the Alternate MS Build Target has been set and if it is, set ccProjName, the variable that captures the Azure project name, to that value.

We can now save this build process template, and select it in the build definition.

Finally, we need to choose the project that we want to deploy from the build definition.

We do this by setting the AlternateMSBuildTarget argument to the target name that we want to deploy. This is usually the name of the project with _ (underscore) replacing . (period). So, for example, if your project name was “Ebenezer.Web.Azure”, you would use “Ebenezer_Web_Azure”.

June 6, 2011

NullReferenceException when using a custom WorkflowTask derived content type

If you are encountering an “Unknown Error” or exception when calling Update() on a task item using a custom content type derived from WorkflowTask, of type NullReferenceException; then it’s worth checking that your ContentType definition includes the FieldRefs tag, even if it will be a self closing tag.

I encountered this problem recently, and it manifested itself when calling Update() on the list item from the custom Edit form.

May 10, 2011

Creating a lazy loading UIImageView for iOS

The UIImageView in the iOS SDK is a great class that allows developers to quickly and easily handle the display of an image. The UIImageView is simple a container view with a single UIImage displayed within it. The nice part is that it handles all the display and sizing of the UIImage, so that developers don’t need to spend a long time playing around with ratios to get the sizing that they require.

We can initialise the UIImageView directly with an image, and this image supports using NSData to initialise it. It would seem we could simple get NSData from an NSURL, and bingo, we could load images easily from remote addresses. However, as you will find out – although this works, it essentially blocks the UI display thread until the image has loaded. This means that the display does not update until the image has been fetched from the web, and will present the user with what seems like an unusable pause.

To get round this we can implement lazy loading; where the request to retrieve the image is fired off asynchronously, and allows the UI display thread to continue till the image is received. Apple has an example in their dev knowledge base that allows you to do this directly on models. In this article we are going to look at doing this directly on a UIImageView.

In order to do this, we first create a header file for a class that inherits from UIImageView. We can call this UILazyImageView. It will, at first, look like the below:

#import <Foundation/Foundation.h>


@interface UILazyImageView : UIImageView {
}

@end

In order to implement the delegate for NSURLConnection, which we will use to do the asynchronous fetch of the image (as in last week’s post on creating an asynchronous XML parser for iOS applications); we will need a NSMutableData variable to store the data retrieved from the URL. We use NSMutableData rather than NSData, because we can append to it since it is a mutable data type.

We will also need a new initialisation method, to initialise with a URL; and this will call off to a method to load from a URL.

Having input these the header file will now look like this:

#import <Foundation/Foundation.h>


@interface UILazyImageView : UIImageView {
    NSMutableData *receivedData;    
}

- (id)initWithURL:(NSURL *)url;
- (void)loadWithURL:(NSURL *)url;

@end

Next, in our code (.m) file for UILazyImageView we can add the init method, which calls the load method, as follows:

- (id)initWithURL:(NSURL *)url
{
    self = [self init];
    
    if (self)
    {
	    receivedData = [[NSMutableData alloc] init];
        [self loadWithURL:url];
    }

    return self;
}

Next, we need to implement the loadWithURL method. This needs to fire of an asynchronous connection; the work of dealing with the image will only happen once the connection has completed.

This will look like:

- (void)loadWithURL:(NSURL *)url    
{
    NSURLConnection *connection = [NSURLConnection connectionWithRequest:[NSURLRequest requestWithURL:url]delegate:self];
    [connection start];
}

We now need to implement the delegate for the connection (which has been set to the UILazyImageView. For this we need to at least implement didReceiveResponse, didReceiveData and connectionDidFinishLoading.

We will implement didReceiveResponse to clear the data (as this may be called when 302 redirects occur); and didReceiveData to append to the receivedData item.

Finally in connectionDidFinishLoading we can display the image, from the receivedData.

This will look like:

- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response
{
    [receivedData setLength:0];
}

-(void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data
{
    [receivedData appendData:data];
}

- (void)connectionDidFinishLoading:(NSURLConnection *)connection
{
    self.image = [[UIImage alloc] initWithData:receivedData];
}

If we run this, we will notice it works as planned, but the image seems to appear very suddenly. We can make this a more gentle introduction, by setting the alpha to 0 before the connection begins, and then animating it to 1 (effectively creating a fade in effect).

We would do this with code like:

- (void)loadWithURL:(NSURL *)url    
{
	self.alpha = 0;
    NSURLConnection *connection = [NSURLConnection connectionWithRequest:[NSURLRequest requestWithURL:url]delegate:self];
    [connection start];
}

- (void)connectionDidFinishLoading:(NSURLConnection *)connection
{
    self.image = [[UIImage alloc] initWithData:receivedData];
    [UIView beginAnimations:@"fadeIn" context:NULL];
    [UIView setAnimationDuration:0.5];
    self.alpha = 1.0;
    [UIView commitAnimations];
}
April 28, 2011

Creating an asynchronous XML parser for iOS applications

If you are anything like me, and are a developer who has been working primarily with the .NET platform for the last little while, you may have a bit of a culture shock when coming to iOS development. I have worked with Objective C in the past, but primarily for OS X, and although the paradigms for programming are similar, the APIs available to iOS are slightly more limited. Subjectively, I have to say that Visual Studio is a far better IDE than Xcode – though Xcode has improved massively with version 4.

In this article I am going to explore how to write an XML parser that parses XML from a given URL asynchronously by implementing the NSXMLParserDelegate protocol and the NSURLConnectionProtocol, and using the NSXMLParser and NSURLConnection objects.

If you do not have a good grounding in Objective C, first please go and have a read of a few tutorials; if you have a decent grounding in an object oriented language, you should pick it up fairly quickly – though the memory management side of things may be confusing (and will probably be for a while!), unless you come from a non GC language background.

For the purpose of this exercise we are going to assume we are tackling a standard RSS XML feed (SlashDot RSS), that we want to represent as a collection of objects to be used in our application.

The first thing we need is an object that we will use for the representation of each item in the RSS.

We can create this as a new object, deriving from NSObject, with instance variables and @property’s for each of the RSS items.

The object’s header file would look something like:

@interface RSSItem : NSObject {
    NSString *title;
    NSString *description;
}

@property (nonatomic, retain) NSString *title;
@property (nonatomic, retain) NSString *description;

@end

And its simple implementation file would like something like:

#import "RSSItem.h"

@implementation RSSItem

@synthesize title, description;

- (void)dealloc
{
    [title release];
    [description release];
    
    [super dealloc];
}

@end

The above obviously only captures two of our RSS elements, but as a proof-of-concept will be fine, and can be extended to capture more elements.

Now that we have this defined, we can define our RSSParser. Our RSSParser can inherit directly from NSObject, but we will need to define it to comply to the NSXMLParserDelegate protocol, which allows us to set it to be the delegate object for an NSXMLParser object, so that it can deal with the elements therein.

Our header file for RSSParser would, therefore, look something like:

#import "RSSItem.h"

@interface RSSParser : NSObject <NSXMLParserDelegate> {
	//This variable will eventually (once the asynchronous event has completed) hold all the RSSItems in the feed
    NSMutableArray *allItems;
    
    //This variable will be used to map properties in the XML to properties in the RSSItem object
    NSMutableArray *propertyMap;
    
    //This variable will be used to build up the data coming back from NSURLConnection
    NSMutableData *receivedData;
    
    //This item will be declared and created each time a new RSS item is encountered in the XML
    RSSItem *currentItem;
    
    //This stores the value of the XML element that is currently being processed
    NSMutableString *currentValue;
    
    //This allows the creating object to know when parsing has completed
    BOOL parsing;
    
    //This internal variable allows the object to know if the current property is inside an item element
    BOOL inItemElement;
}

@property (nonatomic, readonly) NSMutableArray *allItems;
@property (nonatomic, retain) NSMutableArray *propertyMap;
@property (nonatomic, retain) NSData *receivedData;
@property (nonatomic, retain) RSSItem *currentItem;
@property (nonatomic, retain) NSMutableString *currentValue;
@property BOOL parsing;

//This method kicks off a parse of a URL at a specified string
- (void)startParse:(NSString*)url;

@end

You will notice we have NOT specified that this object complies to the NSURLConnectionProtocol. The reason for this is that NSObject informally complies to this protocol by default, so we can add the necessary methods we need, and set this object as the delegate for the NSURLConnection object too.

So, when we first kick off a parse using the startParse: method, we will need to create an NSURLConnection object, which will kick off the retrieval of the data from the specified URL. Once this data is all retrieved we can then actually parse the XML.

We can use code similar to the following to achieve this:

- (void)startParse:(NSString *)url
{
    //Set the status to parsing
    parsing = true;

	//Initialise the receivedData object
    receivedData = [[NSMutableData alloc] init];
    
    //Create the connection with the string URL and kick it off
    NSURLConnection *urlConnection = [NSURLConnection connectionWithRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:url]] delegate:self];
    [urlConnection start];
}

As can be seen in our creation of the connection object, we have set the delegate to be the current object.

We can then define the methods we need to add the necessary data from the request as it comes back; and once it has all finished, to start the XML parser.

- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response
{
	//Reset the data as this could be fired if a redirect or other response occurs
    [receivedData setLength:0];
}

- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data
{
	//Append the received data each time this is called
    [receivedData appendData:data];
}

- (void)connectionDidFinishLoading:(NSURLConnection *)connection
{
	//Start the XML parser with the delegate pointing at the current object
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:receivedData];
    [parser setDelegate:self];
    [parser parse];
    
    parsing = false;
}

As can be seen, the implementation of this is very simple. In addition we should implement the method

- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error

to handle the failure of the connection request.

Finally, once we have done this we can define the methods for the NSXMLParserDelegate protocol, so that the item is built and added to allItems correctly.

We can do this with code similar to the following:

- (void)parserDidStartDocument:(NSXMLParser *)parser
{
	//Create the property map that will be used to check and populate from elements
    propertyMap = [[NSMutableArray alloc] initWithObjects:@"title", @"description", nil];
    //Clear allItems each time we kick off a new parse
    [allItems removeAllObjects];
}

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
	//If we find an item element, then ensure that the object knows we are inside it, and that the new item is allocated
    if ([elementName isEqualToString:@"item"])
    {
        currentItem = [[RSSItem alloc] init];
        inItemElement = true;
    }
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
	//When we reach the end of an item element, we should add the RSSItem to the allItems array
    if ([elementName isEqualToString:@"item"])
    {
        [allItems addObject:currentItem];
        [currentItem release];
        currentItem = nil;
        inItemElement = false;
    }
    //If we are in element and we reach the end of an element in the propertyMap, we can trim its value and set it using the setValue method on RSSItem
    if (inItemElement)
    {
        if ([propertyMap containsObject:elementName])
        {
            [currentItem setValue:[currentValue stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] forKey:elementName];
        }
    }
    
    //If we've reached the end of an element then we should the scrap the value regardless of whether we've used it
    [currentValue release];
    currentValue = nil;
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
	//When we find characters inside an element, we can add it to the current value, which is created if it is not initialized at present
    if (!currentValue)
    {
        currentValue = [[NSMutableString alloc] init];
    }
    [currentValue appendString:string];
}

As can be seen again, this implementation is very simple, but requires a slight degree of lateral thinking, compared to using a DOM based parser, as most programmers from a .NET background would be familiar with.

April 14, 2011

Access Denied error for pages with SummaryLinkWebPart

If you are getting an Access Denied error for pages with a (potentially empty) Summary Link Web Part on it, on WSS 3 or MOSS2007, do check that the permissions on the Style Library for the current site collection allow read access for the group that the user, getting the error, is in.

You may see List guids appended to the end of the Access Denied page URL, but these MAY NOT be directly at fault, and it’s worth a quick check to ensure that the Style Library has the correct permissions, as the XSL Style Sheets for the Summary Link Web Part are used directly from this web part.

April 8, 2011

OSX Cisco VPN Client: Error 39: Unable to import certificate

If you’re having problems with the Cisco VPN client on Mac rejecting certificates, and giving the unhelpful error:

Error 39: Unable to import certificate

Check that you have the correct password for the certificate. Be assure that the Mac client can read pfx files, and this does not require you to import the certificate into Keychain Access and export as a p12 file – as reported elsewhere.

February 8, 2011

Inbox Zero with Google Mail

I’ve started using the inbox zero techniques recently, and have seen my productivity increase massively due to it.

If you don’t know what inbox zero is – you should read the articles at 43 folders first.

Google Mail gives us some additional tools that can really help over and above what the 43 folders articles offer. In this article I’m going to briefly outline some of the tools you can use with Google Mail to improve your productivity around email.

Much of what is referenced below uses the google lab features. If you are not familiar with them, look for the green conical jar icon in between your email address and the Settings menu in the top right of your Google Mail screen.

Using inbox zero, I went from 21,000 emails in my inbox to just actionable email (currently 5).

1. Filters and Labels

Create filters! Lots of them! Make this an active, on going process. The Gmail filters are very powerful, and so I’d recommend using these to clear your inbox. I don’t quite agree with the 43 folders motto of “Delete, delete, delete”. Instead, with Gmail, archive, archive, archive is the way forward. Make certain email types auto archive (for me this is Twitter, Facebook, LinkedIn etc.); and also make sure you label all emails that come in – this not only increases the findability of information in the future, but also ensures that you can deal with only the emails you want to at any moment. Archived emails are not visible to you, but you can use the powerful search functionality of Gmail to find old information at any time.

2. Nested Labels

There is a great lab feature available to enable nested labels. Nested labels allow you to create “folders” of labels; and this is great for being able to manage large numbers of labels (I have 60 labels, seperated into 3 main categories, some of which have sub categories).

3. Hide Read Labels

Another lab feature that is incredible is the “Hide Read Labels” lab. This hides any label which does not have any unread messages in it. Pairing this with auto archive, means that you can quickly come back to unread (though unimportant) messages at any time.

4. Send and Archive

Send and archive is a lab feature that allows you to archive a thread as soon as you send the response. Enable this to ensure that your inbox remains clean.

5. Tasks

Enable Google Tasks lab feature if you haven’t already, and then use the More Actions menu to create tasks from emails where you have actions within an email. You can then archive the email, and use the Google Tasks feature to manage your to do list. In addition, if you add a deadline to the task, this will appear in your Google Calendar.

January 12, 2011

Capacity planning for FAST Search Server 2010 for SharePoint

Information on FAST Search Server 2010, though better than it has been, is still scarce. Particularly its application in large environments is largely undocumented, though there is evidence that, at least internally, it has been proved. FAST ESP of course is well proven on large corpus’ of data and has been used in very large enterprises.

When it came to my latest project, to architect the FAST search component of a system with a highly active userbase of 140,000, and a reasonably sized corpus of data (240 terabytes); with FAST Search Server 2010 for SharePoint chosen as the search server, I started to discover there were some holes, or at least some level of disjointing in the data presented by Microsoft.

The purpose of this article is to draw together some of this knowledge into a more coherent form to help future architects designing the topology and in particular ensuring that their design is performant and scalable.

This article assumes a reasonable knowledge of SharePoint and FAST search architecture.

FAST farms consist of two composite components:

  • Service servers: one or more servers which holding one or more of the core roles. (Technet article)
  • Search cluster matrix: one or more servers in a row/column structure to handle the indexing and query matching components. See the above article for detailed information on the performance of these services.

There are two primary drivers when considering the topology of our FAST farm. The size of the corpus, and the performance needed. In my case the performance I was aiming for was 350 queries per second (qps). The general rule of thumb with FAST is that you can reasonably expect 1qps performance for each 1ghz core available to the query matching component of FAST. This is assuming that you are using the search cluster model topology, and not a single server deployment.

Since we are looking for a performance of 350 qps, we can assume we need 350 cores at 1ghz, or 175 cores at 2 ghz. Since our target servers are running dual quad-core CPUs @ 2.93 ghz, we essentially have the ability to run 23.44 qps per query matching server ((1 x) 2.93 x 8). Dividing 350 through by 23.44 we round up to get 15. So we need 15 of our servers running the query matching service to achieve 350 qps throughput.

Once we know we need 15 query matching servers (ie. servers in the search row component of the clusters) we need to ensure this is arranged correctly in the server matrix to ensure that we can handle the corpus of data we need. In my case, we have a 30 million item corpus, however; allowing for reasonably growth, I am allowing for 60 million items. Since we can have a performant maximum of 15 million items per index column this means we are looking at 4 index columns.

The query matching servers MUST be distributed evenly between the index columns, so the closest we can get to 15 query matching servers is having 4 index columns with 4 query matching servers in each; a total of 16 query matching servers.

Each index column should also have at least one server running the indexer service. An additional index server can be used, but is present purely for failover.

On the storage sizing side of things, we have two types of storage needed:

  • Local directly attached storage on the indexer boxes. This is used for the storage of the indices; and is stored in a flat file format. A good rule of thumb is to allow approximately 20% of the total corpus size for the indices.
  • SQL Database storage for the Crawl Databases. These are used by the Content SSA in SharePoint to store metadata on the corpus crawled. A general rule of thumb is to use 5% of the total corpus size (in my case 240 terabytes, so a crawl DB allowance of 12 terabytes.) These can be scaled out across multiple crawl databases, and the best advice would be to host these on a separate box to the SharePoint database as they will be extremely intense on disk I/O; especially with a large, fast changing corpus.