How to get generative answers

The /ask endpoint allows you to get generative answers from a Knowledge Box.

For example, if you store information about Hedy Lamarr in your Knowledge Box, you can ask questions like:

Who is Hedy Lamarr?

You will get a generative answer like:

Hedy Lamarr was an actress and inventor known for her contributions to the development of wireless communication technology.

Then, you can continue chatting with the Knowledge Box, based on the context of the previous question:

What did she do during the war?

Here, "she" is understood as "Hedy Lamarr", because it refers to the first question.

Data structure

As the answer generation is a slow process, the /ask endpoint is delivering a readable HTTP stream.

The stream is a newline-delimited JSON, according the NDJSON format.

Each line is a JSON object containing an item of the response:

{ item: AskResponseItem}

The possible item types are:

retrieval: The search results matching the query (same as the /find endpoint). They are the paragraphs passed to the generative model.
answer: The generative answer.
metadata: The amount of tokens consumed by the query and the answer generation and the time taken to produce the response.
citations: The paragraphs actually used to generated the answer (among the search results initially passed to the generative model) and the positions of the corresponding parts of the answer.
status: The status of the response when complete. It can be success or error.
error: The error message when the status is error.
relations: The relations of the entities mentioned in the query.

Usage

You can get a fully decoded response directly using the Nuclia Python CLI/SDK.

To get generative answers in the Agentic RAG search widget, you need to enable the answers feature:

<script src="https://cdn.rag.progress.cloud/nuclia-widget.umd.js"></script>
<nuclia-search-bar
  knowledgebox="YOUR-KB"
  zone="ZONE"
  features="answers"
></nuclia-search-bar>
<nuclia-search-results></nuclia-search-results>

For testing, you can use it with curl:
```
curl 'https://<ZONE>.rag.progress.cloud/api/v1/kb/<YOUR-KB>/ask' -H 'content-type: application/json' --data-raw '{"query":"Who is Hedy Lamarr?","context":[]}' -H "x-synchronous: true"
```
note
The x-synchronous header on the /ask is mostly meant for testing purpose. Without this header, the default behavior is to return a readable stream, as it allows to display the beginning of the answer without waiting for the end of the generation. The x-synchronous header turns the response in a regular HTTP response, so it makes the query slower, as it waits for the end of the generation before returning the answer.
To implement your own chat widget, you can get inspiration from the Agentic RAG search widget implementation:
- Reading a readable HTTP stream (check the getStream method)
- Decoding the result

Citations

By default, the /ask endpoint makes a /find query to retrieve relevant paragraphs, and the 20 best ones are passed to the generative model to produce the answer. The retrieval item contains the list of paragraphs used to generate the answer. So you know what was provided to the generative model as input, but you do not have any information about the output:

you do not know which paragraphs were used to produce which part of the answer,
you do not know if some of the paragraphs were not even used.

The /ask endpoint accepts a citations parameter. When enabled, the response includes an additional information that links parts of the generated answer back to the specific source paragraphs.

Two citation modes are supported:

LLM footnote citations (`citations: "llm_footnotes"`)

This is the new citation format, currently in BETA.

Inline footnotes are injected into the generated answer using Markdown footnote syntax.
Each inline footnote refers to a block identifier such as block-AA, block-CB, etc.
A footnote reference section is appended at the end of the answer showing the correspondence between citation number and block id.
A mapping from each block-* id to the original paragraph (context) id is provided:
- In a synchronous (non-streaming) call: via the footnote_to_context field.
- In a streaming call: as a response with type: "footnote_citations".

Default citations (`citations: "default"`)

You receive a mapping of paragraph ids to the relevant spans of the answer.
- In a synchronous call: this mapping is included under the citations field.
- In a streaming call: as a response with type: "citations".

Filters and other parameters

The /ask endpoint accepts the globally same parameters as the /find endpoint.

Typically filtering works exactly the same way as in the /find endpoint, and you can refer to the filtering documentation for more details. By filtering on the /ask endpoint, you can control the sources used to generate the answer.

Data structure​

Usage​

Citations​

LLM footnote citations (citations: "llm_footnotes")​

Default citations (citations: "default")​

Filters and other parameters​