Skip to content

weeyum/language-analyzer-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

Given an input string, it will return a list of tokens after performing the following transformations:

  1. tokenize
  2. filter stopwords
  3. stem

The API also allows an optional locale parameter to specify which analyzer to use.

Deploy

Setup (start Elasticsearch process first)

git clone https://github.com/weeyum/language-analyzer-service.git
cd language-analyzer-service
bundle install
rake
rackup

API Specification

GET /analyze
POST /analyze

GET /analyze

parameters:

  • locale String
  • text String

supported locales:

["ar", "hy", "eu", "pt-br", "bg", "ca", "zh", "cs", "da", "nl", "en", "fi", "fr", "gl", "de", "hi", "hu", "id", "ga", "it", "ja", "ko", "ku", "no", "fa", "pt-pt", "ro", "ru", "es", "sv", "tr", "th"]

example:

curl https://language-analyzer-service.herokuapp.com/analyze?text=this%20is%20hello%20world.

response:

{
    "tokens": [
        {
            "token": "hello",
            "start_offset": 8,
            "end_offset": 13,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "world",
            "start_offset": 14,
            "end_offset": 19,
            "type": "<ALPHANUM>",
            "position": 4
        }
    ]
}

POST /analyze

parameters:

  • locale String
  • text String

supported locales:

["ar", "hy", "eu", "pt-br", "bg", "ca", "zh", "cs", "da", "nl", "en", "fi", "fr", "gl", "de", "hi", "hu", "id", "ga", "it", "ja", "ko", "ku", "no", "fa", "pt-pt", "ro", "ru", "es", "sv", "tr", "th"]

example:

curl -X POST https://language-analyzer-service.herokuapp.com/analyze \
  --header "Content-Type:application/json" \
  --data '{"locale": "en", "text": "this is hello world."}'

response:

{
    "tokens": [
        {
            "token": "hello",
            "start_offset": 8,
            "end_offset": 13,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "world",
            "start_offset": 14,
            "end_offset": 19,
            "type": "<ALPHANUM>",
            "position": 4
        }
    ]
}

About

text tokenizer microservice

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages