`

Solr添加IKAnalysis中文分词

    博客分类:
  • Solr
阅读更多

1.下载中文分词器IKAnalyzer

地址:http://code.google.com/p/ik-analyzer/downloads/list

 

2.修改schema.xml文件,加入以下配置:

 <fieldType name="textik" class="solr.TextField" >
               <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>  
       
               <analyzer type="index">  
                   <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/>  
                   <filter class="solr.StopFilterFactory"  
                           ignoreCase="true" words="stopwords.txt"/>  
                   <filter class="solr.WordDelimiterFilterFactory"  
                           generateWordParts="1"  
                           generateNumberParts="1"  
                           catenateWords="1"  
                           catenateNumbers="1"  
                           catenateAll="0"  
                           splitOnCaseChange="1"/>  
                   <filter class="solr.LowerCaseFilterFactory"/>  
                   <filter class="solr.EnglishPorterFilterFactory"  
                       protected="protwords.txt"/>  
                   <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>  
               </analyzer>  
     			<analyzer type="query">  
                   <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/>  
                   <filter class="solr.StopFilterFactory"  
                           ignoreCase="true" words="stopwords.txt"/>  
                   <filter class="solr.WordDelimiterFilterFactory"  
                           generateWordParts="1"  
                           generateNumberParts="1"  
                           catenateWords="1"  
                           catenateNumbers="1"  
                           catenateAll="0"  
                           splitOnCaseChange="1"/>  
                   <filter class="solr.LowerCaseFilterFactory"/>  
                   <filter class="solr.EnglishPorterFilterFactory"  
                       protected="protwords.txt"/>  
                   <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>  
               </analyzer>  
       
</fieldType>

 然后定义需要使用中文分词功能的字段,比如我这里是title,代码如下:

 

 <fields>
  <field name="title" type="textik" indexed="true" stored="true" required="true" /> 
 </fields>

 

 

3. 将下载的IKAnalyzer目录下的IKAnalyzer3.2.8.jar放入 TOMCAT/webapps/该solr工程/WEB-INFO/lib 目录下

 

4. 将下载的IKAnalyzer目录下的IKAnalyzer.cfg.xml和ext_stopword.dic文件放入 TOMCAT/webapps/该solr工程/classes 目录下,你也可以自己定义停用词字典,然后在IKAnalyzer.cfg.xml中进行配置,多个停用词字典之间用逗号隔开

 

5. 重启tomcat,输入http://域名:端口号/该solr工程/admin/analysis.jsp,效果如下:



 

 

 

  • 大小: 53.6 KB
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics