The crawler_surface_config.xml file contains the following XML sections.
Specifies the information about the objects and attributes that the Crawler Surface exposes for an object. The objects section describes the layout of a detail page for each object type that is exposed to a crawler. This section does not control the selection of individual records. The <objects> section is a collection of <object> sections.
Each object is defined in an <object> section. The default specifications for these objects are provided as:
Indicates Knowledge Documents.
Indicates Change Orders.
Indicates Issues.
Indicates Incidents.
Indicates Problems.
Indicates Requests.
Note: For more information about the objects definitions, see the Technical Reference Guide.
The XML file contains the following sections that create the <head> section of a detail page in CA SDM:
Indicates the Majic object name of the exposed object.
Indicates a place for a short description of the object. This element is only for documentation purposes and the Crawler Surface ignores this element.
Indicates the attribute name that stores the Last Modified Date and Time. This timestamp is exposed to the search engine crawler to allow the search engine to determine whether the record was updated. Many crawlers use this timestamp during an incremental crawl. An updated time stamp signals that the record changed after the record was last crawled. The search engine crawler skips the crawl when the record is not updated since the last crawl.
Indicates the attribute that is used for the title of the detail page. The search engines use this element as the title of the document that is returned in search results. This element entry generates an HTML <title> tag in the <head> of the detail page. For Knowledge Document, the Title defaults to the Tile of the Knowledge Document. The summary is used for the title for Incidents, Problems, Requests, Change Orders, and Issues.
Indicates one or more properties that are exposed as a metadata. Metadata allows a search engine to store extra characteristics of the document in its index. Metadata is not searched directly but instead used to filter search results. This section generates HTML <meta> tags in the <head> of the detail page.
Each entry in the <meta_data> section contains one or more <property> entries. Each <property> element consists of a <name> element and a <content> element.
Indicates the name of the metadata property.
Indicates the attribute of the object that will be used as the value for the metadata.
Together each <name> and <content> element pair of a <property> generate an HTML <meta> tag. The search engine crawlers use the following two metadata properties by default:
Indicates the metadata property of a search engine that stores a short summary of the document.
Indicates the author of the document.
The CASDMTENANT metadata property is also configured by default for each object. This property is a CA SDM specific metadata property. When CA SDM is configured for multi-tenancy, the Crawler Surface uses this property to expose the Tenant name of the object to the crawler of the search engine. Later, during a Federated Search, the results returned from the search engine are filtered based on this metadata property. Only results that either have no CASDMTENANT metadata property or results whose CASDMTENANT metadata property matches the Tenant field of the user of their Contact record are returned to the user. If CA SDM is configured for multi-tenancy, the Crawler Surface exposes the CASDMTENANT metadata property.
The XML file contains the following sections that create the <body> section of a detail page in CA SDM:
Indicates a list of attributes from the object that the Crawler Surface exposes. Separate multiple entries with a comma and a space. For example, PROBLEM, RESOLUTION, SD_ASSET_ID.name.
Indicates information that is exposed by the Crawler Surface from Activity Logs for objects that have Activity Logs. The <activity_logs> section contains the <object>, <select_criteria>, <rel_attr>, and <attributes> elements.
Specifies the object name that contains the Activity Log entries for the object. For example, the Activity Log object for:
Allows you to filter the Activity Log objects that are exposed. This element is important to increase the relevancy of your search results by decreasing frequently occurring words. For example, the <select_criteria> for chgalg contains the following Magic Where clause:
"type IN ('ST', 'UPD_RISK', 'CB', 'RS', 'LOG', 'TR', 'ESC' ,'NF', 'UPD_SCHED')"
This criteria includes only Activity Log entries that allow a user to enter comments and eliminates Activity Log entries with fixed text like Initial or Attach Document.
Specifies how an Activity Log entry relates to its parent object. The <rel_attr> subsection contains <parent_obj_attr> and <join_attr> elements.
Indicates an attribute of an Activity Log that contains an SREL (or foreign key pointer) to the parent object. For example, the change_id is the attribute of chgalg.
Indicates the Relational Attribute (Rel Attr) of the parent object that is stored in <parent_obj_attr>. For example, the <join_attr> for chgalg is id. You can verify these values by using the following command:
bop_sinfo -df chgalg
You can verify both of these values by using the bop_sinfo -df chgalg command. The output must show that the value for change_id is SREL -> chg.id and ISS is SREL -> iss.persistent_id.
This subsection allows you to expose Attachments to the crawler of the search engine so that their content can be indexed in conjunction with the parent object. The <attachments> section is only allowed for objects that have Attachments.
Attachments are handled in a special manner by the Crawler Surface. Rather than sending the content of each Attachment to the crawler from the Crawler Surface, the Crawler Surface instead exposes a hyperlink that the crawler can follow to download the Attachment from CA SDM. Later during a Federated Search, if an Attachment is included in the search results, clicking on the hyperlink will take the user to the parent object instead of directly to the Attachment.
The <attachments> section contains <object>, <rel_attr>, <attmnt_id> and <is_parent_updated> elements.
This element specifies the Majic object that links the Attachment to its parent object.
This subsection works the same as it does in Activity Logs. It specifies how the parent object relates to this object which links the parent object to the attachment.
This element specifies the attribute of this linking object that points to the attachment.
This is a special flag that tells the Crawler Surface how to expose the last-modified date for the object. For some objects like Knowledge Documents (KDs) when an attachment is added, the Knowledge Document’s last-modified date is not updated. The last-modified date is important when the search engine is doing an incremental crawl. Usually if the crawler sees that an object has not been updated since it was last encountered, it skips it and does not update its index by re-indexing the object. When you specify No for <is_parent_updated>, the Crawler Surface checks the last-modified date of all the Attachments. If it finds any date that is later than the parent object, it will use the later date when it exposes the parent object. This will usually cause the search engine to reindex the object and include the new attachments during indexing.
This section is used for objects that contain a list of Configuration Items. This section contains the <object>, <rel_attr> and <attributes> elements.
Works the same as they do in Activity Logs and Attachments.
Work the same as they do in Activity Logs and Attachments.
This element works the same as it does in Attachments.
After the <objects> section is the <multi-farm_datasets> section. While the <objects> section defines the CA SDM objects and attributes that can be exposed by the Crawler Surface, the <multi-farm_datasets> specifies which objects are exposed and how their records are selected. The <multi-farm_datasets> section is a collection of <farm> sections.
Each <farm> section controls the CA SDM information that is exposed to a crawler. When a crawler is configured, the <farm> section is specified in the URL. Only the information specified in the <farm> section is exposed to the crawler. Each <farm> section contains <name>, <data_sets> and <sdm_user> elements.<name>
This element specifies the name of the <farm> section. This name is specified in the URL that is used to configure the crawler. This value is case-sensitive.
This subsection specifies the objects that will be exposed and how their records are selected. This subsection contains one or more <object> elements. Each object element contains a <name> and a <select_criteria> element.
This element references the <object> which was defined in <objects> section.
This element specifies a Majic where clause that is used to select the records of the object.
This element specifies the CA SDM user ID that must be used when accessing this farm. This use ID must have Access Type=crawler and Role=crawler.
|
Copyright © 2013 CA.
All rights reserved.
|
|