Architecture
Currently changes to one translation of a taxonomy will not change the translation of the term in another taxonomy as they are separate objects.
There are two different versions of DataFacet for SharePoint: DataFacet for SharePoint 2010 Server and Enterprise, and DataFacet for SharePoint 2010 Foundation Server and SharePoint 2007.
DataFacet for SharePoint 2010 is fully integrated with SharePoint and requires the Enterprise Keywords and either SharePoint Enterprise Search or FAST for SharePoint 2010 to be installed.
FAST for SharePoint requires SharePoint 2010 Enterprise edition.
Another options for search engine is DataFacet RebelSearch, which provides a high-performance search engine with complete deep facet navigation that is native to SharePoint, yet has a small footprint and is easy to configure.
For example: DEV, PROD, and TEST and the subsequent synchronization of terms across the various environments?
You can export DataFacet taxonomies to the industry-standard SKOS format and import them into another SharePoint farm.
We have a user story for a more automated process, but for now, this approach can be automated with PowerShell.
For example: a term that has been edited or a rule added; is this audited and accessible from the Central Admin Server App user interface?
We do have logging, but it has not been designed specifically for auditing purposes. Better logging and auditing support is planned for a future release.
The rules are stored as properties on the term store objects. Ultimately they are stored in the SharePoint database, but this is an opaque data store.
Business and Pricing
Not yet. We are looking into options for this.
Our general Business taxonomy is available as a free download and is provided in CSV format.
Please visit http://www.datafacet.com/ for details.
The XML version (in encrypted SKOS format) of the DataFacet General Business taxonomy is shipped with DataFacet. You can find it in the 14 hive, commonly:
C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\WebServices\TaxonomyService\StaticData\DataFacet_GeneralBusiness_SKOS.tax"
This role is usually overseen by content administrators or similar business roles. IT roles and SharePoint architects are not necessarily the best choices for this task. Someone who is in charge of Master Data Management can typically transfer that knowledge to creating and managing taxonomies. In addition, company librarians can do an excellent job managing taxonomies; however, they are typically only present in large corporate environments.
Depending on the size of the company, anywhere from 4-8 hours a week. For larger companies this is often a full-time job and can even have multiple people engaged in managing the taxonomies.
Import/Export and File Formats
ASPX documents are the same as HTML Documents. DataFacet does not process them any different from any other content type - they are handled by the protocol handlers and iFilters in SharePoint.
OWL is not supported natively, but it is very easy to convert from OWL to SKOS. We do support SKOS natively. We can help you with that. Just ask.
Since the Term Store only imports .CSV files, and special characters—even though they can be entered and used in the .CSV file—do not import correctly into the Term Store. Both UTF-7 and UTF-8 code produces the same result: special characters do not import correctly into the Term Store. For special characters have had to adjust/edit in the Term Store as necessary.
One work around is to identify terms with special characters in advance within the .CSV, notice any duplications, and copy or reuse those terms once they’ve been adjusted within the Term Store. This way, the term with the special character only has to be edited once and then copied where necessary, rather than importing the same term many times and having to edit them all within the Term Store. Not an elegant solution, for sure, but one that is a shortcoming of the Term Store.
Also DataFacet for SharePoint does handle UTF8 encoded XML files, so there is no problem importing taxonomy files in any of the XML formats that we support (e.g., SKOS and ARTX).
Yes, taxonomies and term store operates the same way in SharePoint Online as it does in SharePoint 2010. DataFacet Automatic Tagging Engine is currently not compatible with SharePoint Online however.
SharePoint 2010 itself does not support importing taxonomies in XML format. It only supports a specific variant of CSV. However, if you have taxonomies in XML which you wish to import into SharePoint, we can assist with transforming the data into the appropriate format on a services basis.
See http://technet.microsoft.com/en-us/library/ee424396.aspx for the SharePoint CSV import format.
DataFacet supports two XML schemas for import:
- SKOS - Simple Knowledge Organization System http://www.w3.org/2009/08/skos-reference/skos.html
- ARTX - Applied Relevance TaXonomy. This is a proprietary format that is easy to generate from hierarchical data.
DataFacet also supports a limited subset of RDF/XML documents with a constantly evolving fidelity.
Operational Features
There are two distinct events.One is On Document Update, the other is On Term Store Update.
For the On Document Update event, we add a delegate to the built-in SharePoint API et voila, all documents checked into the library are annotated.
The on Term Store Update event is a different story. Since each document must be read in order to annotate it, it is an I/O intensive process, so it is inherently best done in batch mode in the background. Currently, we have a PowerShell script that can be scheduled to re-annotate documents in a given library.
DataFacet integrates well with records management systems out of the box by virtue of the support for the SharePoint Managed Keywords feature. We do provide audit logs through the normal SharePoint logging system, but there is no specific integration with records management at this time.
The terms get written to the index of the document that is stored in SharePoint but it does not change the contents of the document itself. The original document is never updated, only the SharePoint metadata.
We prefer FAST for a variety of reasons, not just the crawling capabilities. Primary for us on the technical side is the ability to have use the "deep facet" navigation that is available in SharePoint search but is not available in standard SharePoint search. FAST is a much more scalable search engine that the SharePoint native search engine, but the trade-of is an increased in complexity. FAST is quite resource hungry. However, the trade-off in search results accuracy is substantial, especially for our taxonomy navigation facets. Another feature is the more extensible pipeline. SharePoint search crawlers are not extensible like FAST crawlers are. So, there is no interface for us to link into to use BCD. We support SharePoint search only for web sites, file systems, and native SharePoint data sources.
Yes. DataFacet is a SharePoint Application Service. It requires farm administrator rights to install and run.
Performance
DataFacet can tag documents one at a time when checked into SharePoint. DataFacet can also tag multiple documents checked in or can tag entire collections of thousands of documents. DataFacet is fast and scalable even on commodity hardware for SharePoint 2010.
Very little impact. Our annotator is very fast, with the ability to process 200,000 queries per second on commodity hardware. In a general sense, the annotator does not contribute any perceptible overhead to the ingestion process, except possibly for the first document which populates the caches.
That is somewhat configurable. By default, we create a fairly large index by storing all document text in the Lucene index file. The reason we do this is to allow preview of results documents directly from the stored text in the index. This is configurable, however. So, if you find your Lucene index is getting too big, we can configure it to store less information and show truncated document previews in teh taxonomy manager.
In general terms, DataFacet is much lighter-weight than SharePoint. Chances are if you meet the minimum requirements to run SharePoint, you automatically have the minimum requirements to run DataFacet. The biggest variable is disk space for the intermediate Lucene index. If you have a lot of documents, you will need a fairly large disk drive to handle the full text index. Remember, however, that document text is usually much smaller than it's container. The actual text in a 1MB PDF file is often just a few KB. So if your documents contain a lot of formatting overhead (PDF, Graphics, Multimedia), then the ratio of stored text to source document size can be quite small.
DataFacet does not use a SQL database, so there is no reason to allocate storage on a database server. We do maintain a local full-text index that is used for testing. RebelSearch uses a similar index as well, but both are self-contained data stores.
As a rule of thumb, the internal data storage will depend heavily on the character of the incoming documents. It will be some percentage of the documents in the repository depending on the ratio of non-text formatting to actual text. We only store the text part of a document, so markup and images are completely discarded. There is no way to pre-calculate the percentage - but an estimate can be made based on the length and type of documents.
PDF Images from scanned sources will have a very low text/format ratio. A 100MB file could easily have only 1k of text. Word Documents are often mostly text - so they would have a fairly high text/format ratio. A 100KB Word file might have 80KB text, if there are no images in the document. You can extrapolate to other document formats.
A safe rule-of-thumb would be to have 100% of the document source size available on indexes. So, if you have 100GB of data, you should have at least 100GB of local storage for the index.
Semantic Features
Currently changes to one translation of a taxonomy will not change the translation of the term in another taxonomy as they are separate objects.
EXAMPLE: What if one of their medical clients want to tag head and neck neoplasm’s. They want head and neck to be in the same sentence and neoplasm to be in the title.
At the very least, can the tag be designated as title specific? Proximity searches are available to some extent (not as much 'same sentence' as within x characters). Both proximity ("head neoplasm"~10 OR "neck neoplasm"~10) and Field (title: "head neoplasm"~10) are supported.
Currently changes to one translation of a taxonomy will not change the translation of the term in another taxonomy as they are separate objects.
Currently no,
We do not model the "Reuse" option, because that is essentially a "reference" term that we do not currently support in our UI. Each term is unique, even if it shares a name with another term. For examples; /Animals/Bears is completely different from /Football Teams/Bears. Even though the name "Bears" is the same, they are completely different objects.
Troubleshooting
"To restore the DataFacet Term group:" 1. Open your term store management page. (Central Administration -> Manage Service Applications -> Managed Metadata Service)
- Expand Managed Metadata Service context menu, click New Group
- Enter """"Data Facet"""" without double quotes & hit Enter
- Ensure that Group Managers has at least this accounts: account used to run Central Admin app pool, account used to run DataFacet taxonomy service app pool, your account.
- run iisreset
- Check the """"manage taxonomies"""" page to ensure that you could see an empty list of taxonomies now without any errors """
Check to make sure your operator is all uppercase. Otherwise it will be treated as a stop word.
The Adobe web site seems to suggest that the stand-alone iFilter is not required if you install the latest Acrobat Reader. This is true for the desktop search, but for server products like SharePoint, you still must install the Adobe iFilter for 64bit platforms.
http://www.adobe.com/support/downloads/thankyou.jsp?ftpID=4025&fileID=3941
If not, application pool will fail to start and there will be error log entry in Event Log. Something like: ...this pool failed to start because login credentials are invalid...
It depends. First if DataFacet uses a different account to run service application - no effect. Second, if password was updated through SharePoint interface - no issues too. The only problem could be if password changed outside of SharePoint (and IIS), then IIS's Application Pool will fail start.
User Interface
Yes. We have a Stand-Alone HTML5 application that can be hosted in Adobe Air or on any modernbrowser that supports HTML 5. It has the same features as the Central Admin Version, and is basedon the same JavaScript code base.
Not a member? Register today!
