Archive for August, 2009

Faster document management content search with Lucene

Thursday, August 27th, 2009

 

We have recently replaced the full text search engine in our product from a commercial engine used under license to the open source search engine Lucene.

 

Lucene is an open source indexing and search technology that builds an index of key words and positions within the text of documents and provides a very rapid search of that index to retrieve relevant documents, including the position within those documents that the search terms appear. Unlike the commercial engine we used previously which can index a wide range of document types Lucene works purely on text, so we also had to provide the text filtering to extract the raw text content from files like MS Office documents. The Lucene .Net project is available from http://incubator.apache.org/lucene.net/

 

We have also changed the algorithm we use for content searching so we can now combine structured (index field) and unstructured (content) searches into a single search. This gives far greater performance especially where the content is common to many of the documents being searched, for example ‘Tax’ in documents for an accountant would be very common, resulting in a large no.of hits in the content part of the search. By combining the structured indexes into the content search database we can reduce the no.of hits and thereby improve the performance.

 

Tim

Tesseract OCR for document management with Business Edition?

Monday, August 24th, 2009

 

I previously alluded to a couple of the new features we’d managed to sneak into the next cut of Business Edition. Well it’s all down to us playing with Tesseract-ocr, an open source OCR engine developed by HP Labs between 1985 and 1995 and now hosted by Google http://code.google.com/p/tesseract-ocr/  . It requires a not insignificant amount of training but once done it delivers amazingly good results comparable in speed and accuracy with many of the commercial engines we have also evaluated.

 

Our key requirement was for use in the  OCR process to extract keywords for full text searching. It also works really well reading coloured text on coloured backgrounds so we tried it for reading screen grabs with the objective of extracting screen content to find related documents.

 

Check out the Google site, the support and user base is considerable and there is a wealth of downloads including support for different languages. The Source code is in C++ but there are wrappers and interfaces for other environments including .Net.

Document Management for dummies

Tuesday, August 18th, 2009

 

It’s not often that I get sent a link to a youtube at work and can click on it without feeling a little bit like a naughty schoolboy. But today was one of those rare days.

 

A while ago the marketing people here started trying to explain their vision for a dummies guide to document management for accounts payable. Something anyone could use to sell and explain the software. The paperless automated office was, they explained, a complex concept that lots of people in the real world would find tricky to understand.

 

What they wanted from me was help in creating a youtube film that would effectively be canned sales pitch for Infonic’s document management software. I obliged.

 

I didn’t think much more about it until I was sent the link a week or so ago, but I’ve only just got around to having a really good look at the film. I have to say I really think it works. It makes the quite complex business benefits of DM software so clear that anyone should be able to understand them.

 

So if you have 5 minutes to spare and you want to see how Infonic software makes accounts payable processes paperless – I highly recommend you watch the film at http://www.youtube.com/watch?v=uwJ60eaR2vE

Go Green with Document Management!

Monday, August 17th, 2009

 
Our Antipodean marketing guy has been looking at how Document Manager helps promote a more environmentally friendly workplace. He’s actually created a pretty good summary and highlighted things I hadn’t considered.  We are all focused (as we should be) on the tangible financial benefits of Document Management but forget the intangibles gained by reducing our reliance on paper, like saving the planet ! His summary was as follows:
 

Create a greener, paperless office with Infonic Document Management.

 

Dramatically reduce paper usage.

Deploying document management software dramatically reduces a business’ reliance on paper. All incoming paper mail, invoices and other documents are scanned and the digital images are archived along with your digital documents such as emails. Users can then file, search, annotate, share and process these documents without ever reverting back to physical paper. Typical paper based processes result in documents getting copied 2 or 3 times during their process lifecycle.

 

Eliminate printing and photocopying.

Printing and photocopying are expensive and environmentally  unfriendly processes. They have no place in the modern paperless office. Document management software cuts the amount of printing and photo-copying required to run any business. DM software enables users to view, archive, annotate and share documents with colleagues in an entirely digital format. All of the tasks users once printed or photocopied documents for are taken care of in the digital form. There is not just the cost of the paper and toner to consider, but the environmental disposal of used consumables and their packaging.

 

Avoid using courier vans.

The courier and mail services that ferry documents between businesses and sites are an often overlooked environmental impact of paper based processes. Adopting a truly paperless office reduces the use of polluting couriers and postal vans.

 

Eliminate the internal mail system.

It may sound obvious, but internal mail trolley staff are major users of elevators in many large businesses. Moving to a digital document management system will eliminate a very large amount of lift usage over time as documents no longer need to be ferried about the building.

 

Reduce headcount.

By digitising your business processes you reduce the number of people needed to perform the same administrative tasks. Eliminating menial data entry, archiving roles and internal mail functions allows you to reduce office space, heating, lighting and elevator costs. Making your business less labour intensive has a positive environmental impact.

 

Eliminate filing cabinets and archive rooms.

By moving to a paperless office with Infonic document management, you will get rid your archive rooms and filing cabinets. That means  you can cut the office space that you have to heat and provide lighting for.

 

Reduced consumption of envelopes, folders and archive boxes.

It’s not just the paper you use that has an environmental impact. When you send documents around via snail mail you use envelopes. As you accumulate paper documents you use ring binders and other folders, and when you physically archive your business documents you use archive boxes. All that consumption is eliminated when you migrate your business to an Infonic document management software solution.

 

Home working

With information and documents available digitally there is no  need to have all your staff come to the office just because that is where the paper is. Staff can work from home more, thereby reducing the need for office heating, lighting and eco-unfriendly means of transport like cars.

 

 

As a footnote…Infonic offer a bureau document scanning service to our document managment software clients –  for scanning their archives and converting all their old paper records to digital format. Once their paper documents are scanned into their new DM system the originals are disposed, by the truck full! Its all done securely and the papr is recycled to create new paper. In addition they plant one tree for every 50 sacks of paper we recycle in a local forestry centre.

 

So all in all DM is actually a pretty good story from a reducing carbon footprint point of view. Makes me feel a lot better about the massive hours going into building our next version knowing I’m saving the planet.

 

Busy time with Business Edition coming up!

Thursday, August 13th, 2009

Quick blog posting before I dash off for a much needed week with the Misses, the Harley and a Tent.

 

Looks like we’re on target for the second release of our business edition product next month, development has completed and we’re into QA now. The major objective was to add ‘full text search’ to the product which we previously disabled as we considered it was too ‘complex’ for the Business Edition market. Anyway it’s now much improved both in terms of ease of use and ease of installation so should satisfy all those reseller demands we’ve had for it to feature in the product.

 

Whilst doing this we also had a couple of additional quick wins as feature enhancements which will give our Business Edition a bigger edge over it’s competitors. I’m not going to say what they are yet, just that it’s the ‘wow’ factor that my namesake Simon (Cowell) is always looking for.

 

And it looks like the rest of the year is going to be busy, not only do we have to finish the Silverlight product but the sales team are predicting some excellent numbers for the next two quarters and Kim (heads our USA operations) has just signed a distribution deal with DELL! Things are really taking off here.

August update

Monday, August 3rd, 2009

No major announcements this time just some updates on progress made with various projects.  

 

Our ‘partner portal’ for the distribution channel is about to go live, allowing resellers to sign up and create ‘Not for Resale’ licenses for themselves and their prospects. This whole site was started as a concept from Tech Data ,our USA distributor, just a few weeks ago and will make managing of resellers and opportunities a far easier task. The success of this is largely down to one developer, and they know who they are, so thank you.

 

We have just implemented a project for capturing, reading and rendering Electronic Proof of Delivery documents from the UKs largest retailer, for not one but two of their suppliers, so if you also need to manage EPOD documents from you know who then call us.

 

The Silverlight development for our DotNet product is progressing nicely although the Silverlight 3 official release product omitted a feature (proxy generated code) which has caused our developers some heartache.

 

Good progress has also been made on our new full text search engine based on the Tesseract ocr engine and the Lucene text search engine which are currently being integrated into the next release of our Business Edition and Enterprise product, hopefully to be released sometime in September.

 

Bye for now

 

Tim