Genmini for data scientists

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

CS log

Genmini for data scientists 본문

AI/NLP

Genmini for data scientists

sj.cath 2024. 10. 7. 17:27

Use Colab Enterprise Python notebooks inside BigQuery Studio.
Use BigQuery DataFrames inside of BigQuery Studio.
Use Gemini to generate code from natural language prompts.
Build a K-means clustering model.
Generate a visualization of the clusters.
Use the text-bison model to develop the next steps for a marketing campaign.
Clean up project resources.

dataset 생성

python notebook 생성

이후 기본적인 라이브러리를 불러오는 등 설정을 해준 후

from google.cloud import bigquery
from google.cloud import aiplatform
import bigframes.pandas as bpd
import pandas as pd
from vertexai.language_models._language_models import TextGenerationModel
from bigframes.ml.cluster import KMeans
from bigframes.ml.model_selection import train_test_split

project_id = 'qwiklabs-gcp-00-8ea7d7dd7cbd'
dataset_name = "ecommerce"
model_name = "customer_segmentation_model"
table_name = "customer_stats"
location = "us-central1"
client = bigquery.Client(project=project_id)
aiplatform.init(project=project_id, location=location)

Create and import the ecommerce.customer_stats table

Next you will store data from thelook_ecommerce BigQuery public dataset into a new table entitled customer_status in your ecommerce dataset.

%%bigquery
CREATE OR REPLACE TABLE ecommerce.customer_stats AS
SELECT
  user_id,
  DATE_DIFF(CURRENT_DATE(), CAST(MAX(order_created_date) AS DATE), day) AS days_since_last_order, ---RECENCY
  COUNT(order_id) AS count_orders, --FREQUENCY
  AVG(sale_price) AS average_spend --MONETARY
  FROM (
      SELECT
        user_id,
        order_id,
        sale_price,
        created_at AS order_created_date
        FROM `bigquery-public-data.thelook_ecommerce.order_items`
        WHERE
        created_at
            BETWEEN '2022-01-01' AND '2023-01-01'
  )
GROUP BY user_id;

You will see the BigQuery DataFrame output with the first 10 rows of the dataset displayed.

Generate the K-means clustering model

Now that you have your customer data in a BigQuery DataFrame, you will create a K-means clustering model to split the customer data into clusters based on fields like order recency, order count, and spend, and you will then visualize these as groups within a chart directly within the notebook.

model이 생성되었다.

# prompt: 1. Using predictions_df, and matplotlib, generate a scatterplot. 2. On the x-axis of the scatterplot, display days_since_last_order and on the y-axis, display average_spend from predictions_df. 3. Color by cluster. 4. The chart should be titled "Attribute grouped by K-means cluster."

import matplotlib.pyplot as plt

# Create the scatter plot
plt.figure(figsize=(8, 6))
plt.scatter(predictions_df['days_since_last_order'], predictions_df['average_spend'], c=predictions_df['CENTROID_ID'], cmap='viridis')
plt.xlabel('Days Since Last Order')
plt.ylabel('Average Spend')
plt.title('Attribute grouped by K-means cluster')
plt.colorbar(label='Cluster ID')
plt.show()

Task 5. Generate insights from the results of the model

In this task, you generate insights from the results of the model by performing the following steps:

Summarize each cluster generated from the K-means model
Define a prompt for the marketing campaign
Generate the marketing campaign using the text-bison model

Summarize each cluster generated from the K-means model

query = """
SELECT
 CONCAT('cluster ', CAST(centroid_id as STRING)) as centroid,
 average_spend,
 count_orders,
 days_since_last_order
FROM (
 SELECT centroid_id, feature, ROUND(numerical_value, 2) as value
 FROM ML.CENTROIDS(MODEL `{0}.{1}`)
)
PIVOT (
 SUM(value)
 FOR feature IN ('average_spend',  'count_orders', 'days_since_last_order')
)
ORDER BY centroid_id
""".format(dataset_name, model_name)

df_centroid = client.query(query).to_dataframe()
df_centroid.head()

df_query = client.query(query).to_dataframe()
df_query.to_string(header=False, index=False)

cluster_info = []
for i, row in df_query.iterrows():
 cluster_info.append("{0}, average spend ${2}, count of orders per person {1}, days since last order {3}"
  .format(row["centroid"], row["count_orders"], row["average_spend"], row["days_since_last_order"]) )

cluster_info = (str.join("\n", cluster_info))
print(cluster_info)

# result
cluster 1, average spend $58.99, count of orders per person 3.79, days since last order 776.36
cluster 2, average spend $295.69, count of orders per person 1.23, days since last order 817.32
cluster 3, average spend $52.48, count of orders per person 1.3, days since last order 932.69
cluster 4, average spend $51.86, count of orders per person 1.32, days since last order 742.85
cluster 5, average spend $53.1, count of orders per person 1.3, days since last order 752.05

'AI > NLP' 카테고리의 다른 글

[프로젝트] Leveraging LLM Reasoning Enhances Personalized Recommender Systems (18)	2024.11.14
Gemini for Data Scientists and Analysts (1)	2024.10.02
Integrating Applications with Gemini 1.0 Pro on Google Cloud (4)	2024.09.30
Speech to Text Transcription with the Cloud Speech API (2)	2024.09.25
Entity and Sentiment Analysis with the Natural Language API (2)	2024.09.23

'AI/NLP' Related Articles

CS log

Genmini for data scientists 본문

Genmini for data scientists

Create and import the ecommerce.customer_stats table

Generate the K-means clustering model

Task 5. Generate insights from the results of the model

Summarize each cluster generated from the K-means model

'AI > NLP' 카테고리의 다른 글

티스토리툴바